Glossary · Definition
robots.txt
robots.txt is a small text file served at /robots.txt that instructs search-engine crawlers which parts of a site they can and can't crawl. It's a suggestion, not a lock — well-behaved bots honor it.
Definition
robots.txt is a small text file served at /robots.txt that instructs search-engine crawlers which parts of a site they can and can't crawl. It's a suggestion, not a lock — well-behaved bots honor it.
What it means
The robots.txt file has been a web standard since 1994. It supports a few directives: User-agent (which bot the rule applies to), Allow, Disallow (path patterns), and Sitemap (URL of a sitemap file). Example: 'User-agent: * / Disallow: /admin/ / Allow: / / Sitemap: https://example.com/sitemap.xml'. Note that robots.txt is public — anyone can read it. It should NEVER be used to 'hide' sensitive paths. Use auth for security; use robots.txt for crawl efficiency and de-duplication.
Advertisement
Why it matters
A misconfigured robots.txt can accidentally de-index your entire site ('Disallow: /' is a classic launch-day disaster). A properly-configured one helps search engines spend their crawl budget on pages that matter, blocking staging, filters, and parameter-heavy URLs. Most small sites don't need much in robots.txt — just allow everything and point to a sitemap.
Example
User-agent: * Allow: / Disallow: /admin/ Disallow: /_next/ Disallow: /api/ Sitemap: https://example.com/sitemap.xml
Related free tools
Frequently asked questions
Does Disallow remove pages from Google's index?
Not directly — it prevents crawling, but if a page is already indexed, it stays. Use a <meta name='robots' content='noindex'> tag to remove a specific page from the index.
What's the difference between robots.txt and noindex?
robots.txt controls crawling (can the bot visit this URL?); noindex controls indexing (should this URL appear in search results?). They solve different problems.
Do all bots respect robots.txt?
Major search engines do (Googlebot, Bingbot, etc.). Scrapers and malicious bots often ignore it.
Related terms
- DefinitionXML sitemapAn XML sitemap is an XML file that lists every URL you want search engines to crawl and (usually) index. It's not a ranking factor, but it helps Google discover and re-crawl your pages faster.
- DefinitionCanonical URLA canonical URL is the one 'official' URL for a piece of content, declared to search engines via a <link rel="canonical"> tag. It tells Google 'if you find this page at multiple URLs, treat this one as the main version.'