Skip to content
Free Tool Arena

How-To & Life · Guide · Developer Utilities

How to build an XML sitemap

sitemap.xml schema, lastmod and priority realities, sitemap index files, when to split, submitting to Google/Bing, and keeping it accurate.

Updated April 2026 · 6 min read

An XML sitemap is a machine-readable index of the URLs you want search engines to crawl, wrapped in a spec the major engines have agreed on since 2005 (sitemaps.org). It will not magically boost rankings, but on large or deep sites it meaningfully improves discovery — especially for pages that sit more than three clicks from the homepage or have few inbound links. The format itself is simple, but the size limits, freshness signals, and submission workflow all have sharp edges. This guide covers the required XML structure, the <lastmod>, <changefreq>, and <priority> elements (and which ones Google still reads in 2026), the 50 MB / 50,000 URL limits, sitemap indexes for bigger sites, gzipping, and how to submit and monitor in Google Search Console and Bing Webmaster Tools.

Advertisement

What a sitemap does and does not do

A sitemap tells search engines “these are URLs that exist, here is when they changed.” It helps crawlers find pages faster and prioritize freshly updated ones.

It does not force indexing. Google still decides whether each URL is worth including. It does not replace internal linking — a page reachable only via the sitemap is a weak signal of importance.

The minimum valid sitemap

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
    <lastmod>2026-04-22</lastmod>
  </url>
  <url>
    <loc>https://example.com/pricing</loc>
    <lastmod>2026-04-18</lastmod>
  </url>
</urlset>

The XML declaration and the xmlns are required. Each <url> needs a <loc>; everything else is optional. URLs must be absolute, use one canonical domain, and be URL-encoded (non-ASCII characters as percent-encoded UTF-8).

lastmod — the one optional element Google actually uses

John Mueller and Gary Illyes have confirmed repeatedly that Google uses <lastmod> as a recrawl signal, but only when it is consistent with the actual content change date. Systems that set lastmod to the current time on every regeneration get their sitemap’s lastmod ignored entirely.

Use W3C Datetime format: 2026-04-22 or the full 2026-04-22T14:30:00+00:00. Date-only is fine. Only update the value when the page’s meaningful content changes — not on every template tweak.

changefreq and priority — ignored by Google

Google has publicly stated (2020, reconfirmed 2024) that it ignores <changefreq> and <priority>. Bing still uses them as weak hints. Keep them out of new sitemaps unless you specifically need them for Bing or internal tooling — they add noise and file size.

If included: changefreq takes values always, hourly, daily, weekly, monthly, yearly, never. priority is a number from 0.0 to 1.0, default 0.5.

Size limits

A single sitemap may contain up to 50,000 URLs and be no larger than 50 MB uncompressed. Gzip-compressed sitemaps are allowed (file extension .xml.gz); the 50 MB limit still applies to the uncompressed size.

Hit either limit and your sitemap is rejected in full, not truncated. Split into multiple sitemaps well before the limit — 40,000 URLs and 40 MB are sensible thresholds.

Sitemap index for larger sites

Sites over 50,000 URLs need a sitemap index: a sitemap of sitemaps.

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap-posts.xml.gz</loc>
    <lastmod>2026-04-22</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-products.xml.gz</loc>
    <lastmod>2026-04-21</lastmod>
  </sitemap>
</sitemapindex>

An index itself is capped at 50,000 child sitemaps. Index files cannot nest — only one level deep.

Partition by content type (posts, products, categories) or by date (one sitemap per month/year). Partitioning by date makes the per-sitemap lastmod meaningful — old months rarely change, so Google quickly learns which files to skip.

What to include and exclude

Include only canonical, indexable URLs that return 200 and that you want Google to evaluate. Exclude:

URLs that 301 or 302 — list the destination instead. 404/410 URLs. URLs marked <meta name="robots" content="noindex">. Non-canonical versions (if ?utm= variants exist, list the clean URL). Pages blocked by robots.txt.

Search Console will log a warning for each noindex or non-canonical URL in your sitemap — keeping it clean helps the report stay useful.

Submitting and discovering

Three ways to tell search engines about your sitemap:

robots.txt: add Sitemap: https://example.com/sitemap.xml. Picked up automatically by every major engine.

Google Search Console: Sitemaps report in the left nav. Paste the URL, submit, and GSC will show the last read time, URL count, discovered/indexed split, and any parse errors.

Bing Webmaster Tools: similar submission flow. Also honors the robots.txt declaration.

The legacy ping endpoint (google.com/ping?sitemap=) was deprecated in June 2023 and now returns 404. Do not use it.

Image, video, and news extensions

The base sitemap spec handles URLs only. Three XML extensions add media metadata:

Image sitemap extension(xmlns:image): lists images per URL. Useful for image-heavy sites — e-commerce, photography, recipes. Google Images discovery benefits.

Video sitemap extension: duration, thumbnail, content URL. Required for surfacing in Google Video results when the player is complex.

News sitemap: separate file at most 1,000 URLs, only articles published in the last two days, for inclusion in Google News.

Common mistakes

Including noindex or redirected URLs. Google flags these as conflicts and can start ignoring your sitemap signals. Scrub the list on every regeneration.

Auto-updating lastmod on every build. Makes the field useless. Only bump lastmod when the actual content changes — tie it to post update timestamps, not deploy times.

Mixing protocols or domains. A sitemap at https://www.example.com/sitemap.xml may only list URLs on that exact host. Moving between http/https or www/non-www requires the sitemap to live on the same variant.

Gzipping without the .gz extension. Google detects compressed sitemaps by extension. A .xml file served with Content-Encoding: gzip but gzipped bytes inside confuses parsers.

Listing redirect chains. Sitemap URLs that redirect through two or more hops often get dropped before indexing. Keep everything direct-200.

Forgetting to update on content changes. A sitemap regenerated nightly from a database stays fresh. One hand-maintained in a text file stops matching reality within weeks.

Run the numbers

Build a valid URL list and <lastmod>-tagged sitemap with the sitemap URL generator. Pair with the robots.txt generator so the Sitemap: line is in place, and the URL cleaner to make sure the URLs you are listing are the canonical versions without tracking cruft.

Advertisement

Found this useful?Email