How to Setup robots.txt for SEO
Google Indexing is very well known and important issue to the SEO professionals and Digital Marketers. That’s why they always keep themselves dedicated to index their target pages and Google crawl. They spend their time, money, resources, etc for doing their best indexing and best Google crawl their pages.
They do on-page as well as off-page optimization, image optimization, link building, social bookmarking, etc for their good ranking. But if they forget about the technical part which is very minor in volume but large in effect, then there arise huge bad impact for the terms of SEO as well as Google ranking.
The Local SEO Expert Guide (LSEG) is ready to give you the support for setting up the robots.txt. I am here going to discuss about robots.txt and its effect on SEO.
What is robots.txt?
A text file which is used to give instructions to the search engine bots (also known as crawlers, robots, or spiders) on how to crawl and index website pages is the robots.txt.
This file is kept into the root directory of the website so that the crawlers can access the instruction perfectly. This file treats as the part of the robots exclusion protocol (REP), which is a group of web standards that regulate on how robots crawl the web, access and index content as well as serve that content up to users. The REP also has the list of inclusion of directives like meta robots, pages, subdirectory or site-wide instructions on how search engines should treat links such as “follow” or “nofollow”.
If everything is right according to the concepts of SEO, please check your robots.txt with Google’s robots.txt testing tool as this file will speed up the whole indexing process.
Basic Format of robots.txt
User-agent: [user-agent name]
Disallow: [URL string not to be crawled]
To be completed, a robots.txt file must have the above two lines. The file can contain multiple lines of user agents and directives like allows, disallows, crawl-delays, etc.
Each set of user-agent directives appear as a directive set, separated by a line break.
In order to access a robots.txt file, simply type the /robots.txt just after the domain like the below
Technical Syntax of robots.txt
The robots.txt file has to use the following technical terms according to the Google page speed test tool MOZ like below:
- User-agent: The specific web crawler to which you’re giving crawl instructions (usually a search engine). A list of most user agents can be found here.
- Disallow: The command used to tell a user-agent not to crawl particular URL. Only one “Disallow:” line is allowed for each URL.
- Allow (Only applicable for Googlebot): The command to tell Googlebot it can access a page or subfolder even though its parent page or subfolder may be disallowed.
- Crawl-delay: How many milliseconds a crawler should wait before loading and crawling page content. Note that Googlebot does not acknowledge this command, but crawl rate can be set in Google Search Console.
- Sitemap: Used to call out the location of any XML sitemap(s) associated with this URL. Note this command is only supported by Google, Ask, Bing, and Yahoo.
Steps to Setup the robots.txt
The process to setup the robots are described by the SearchEngineJournal as below
- Place your robots.txt file in the top-level directory of your website code to simplify crawling and indexing.
- Structure your robots.txt properly, like this: User-agent → Disallow → Allow → Host → Sitemap. This way, search engine spiders access categories and web pages in the appropriate order.
- Make sure that every URL you want to “Allow:” or “Disallow:” is placed on an individual line. If several URLs appear on one single line, crawlers will have a problem accessing them.
- Use lowercase to name your robots.txt. Having “robots.txt” is always better than “Robots.TXT.” Also, file names are case sensitive.
- Don’t separate query parameters with spacing. For instance, a line query like this “/cars/ /audi/” would cause mistakes in the robots.txt file.
- Don’t use any special characters except * and $. Other characters aren’t recognized.
- Create separate robots.txt files for different subdomains. For example, “hubspot.com” and “blog.hubspot.com” have individual files with directory- and page-specific directives.
- Use # to leave comments in your robots.txt file. Crawlers don’t honor lines with the # character.
- Don’t rely on robots.txt for security purposes. Use passwords and other security mechanisms to protect your site from hacking, scraping, and data fraud.
Benefits of robots.txt
Robots.txt file can control crawler access to the specific areas of your site. For any reason, if Googlebot disallows whole site to crawl, there will be handy ways to be recover from this dangerous scenario.
Few common use cases include:
- Able to keep private a whole section or department of your website like the admin section or the accounts section
- It protects from appearing the duplicate contents in SERPs
- It keeps the internal search results pages from showing up on a public SERP
- It can species the sitemaps location
- It prevents search engines to index certain files of your website (images, PDFs, html, php, etc)
- It can define the delay to crawl the pages to make the server’s load balance
If there are no requirements of areas on your site to which you want to control user-agent access, you may not need a robots.txt file at all.