You might be shocked to listen that one little content record, known as robots.txt, might be the destruction of your site. In case you get the record wrong you may end up telling search engine robots not to crawl your site, meaning your web pages won’t show up within the search results. In this manner, it’s imperative to merely understand the reason for a robots.txt record in SEO and learn how to check you’re utilizing it accurately.
A robots.txt file instructs web crawlers about pages that the website owner does not wish to be ‘crawled’. For example, if you do not want your images listed by Google and other search engines, you can block them using your robots.txt file.
You can go to your website and check that you have a robots.txt record by adding /robots.txt instantly after your domain name within the address bar at the top: the URL you entered ought to see like this: https://www.deepit.com/Robots.txt
HOW DOES IT WORK?
Before a search engine crawler crawls your website, it will look at your robots.txt file as instruction on where they are allowed to crawl (visit) and index (save) on search engine results.
Robots.txt files are useful:
- When you want search engines to ignore any duplicate pages on your website
- When you don’t want search engines to index your internal linked results pages
- When you possessive for the website and you don’t want search engines to index certain areas of your website or a whole website
- When you don’t want search engines to index certain files on your website (images, PDFs, etc.)
- When you want to tell search engines where your sitemap is located for better search engine results.
WHAT TO INCLUDE IN YOUR ROBOTS.TXT FILE
Please note again that robots.txt isn’t utilized to deal with security issues for your website, so we prescribe that the area of any admin or private pages on your site be included within the robots.txt record Isn’t. If you want to safely prevent robots from getting to any private content on your site, you need to secure the zone where they are stored. Keep in mind, robots.txt is planned to act as a guide for web crawlers, and not all of them will follow your instructions. Let’s see at diverse illustrations of how you will need to utilize the robots.txt record:
- Allowing everything and submits the sitemap – This is the most excellent choice for most websites, it permits all search engine to completely crawl the site and index all the information, it indeed appears the search engines where the XML sitemap is found so they can discover new pages exceptionally quickly:User-agent: *Allow: /#Sitemap ReferenceSitemap:http://www.example.com/sitemap.xml
- Allow everything apart from one sub-directory – In some cases, you will have a section on your website where you don’t need search engines to appear within the search engine results. This may be a checkout area, image files, an insignificant portion of a forum or a grown-up segment of an online site for case all shown below. Any URL including the way disallowed will be avoided by the search engines:User-agent: *Allow: /# Disallowed Sub-DirectoriesDisallow: /checkout/Disallow: /website-images/Disallow: /forum/off-topic/
- Allow everything apart from certain files – If you want to show media content on your website but don’t want them to appear in search of image or document or in media search. The file that you want to block they could be any animated GIFs, PDF instruction manuals or any development PHP files for instance shown below:User-agent: *Allow: /# Disallowed File TypesDisallow: /*.gif$Disallow: /*.pdf$Disallow: /*.PDF$Disallow: /*.php$
- Allow everything apart from certain webpages – A few webpages on your site may not be appropriate to appear in search engine results and you’ll be able to block individual pages as well using the robots.txt file. Webpages that you may wish to block could be your terms and conditions page, a page which you need to evacuate rapidly for legitimate reasons or a page with delicate data on which you don’t need to be searchable (keep in mind that individuals can still examine your robot.txt record and the pages will still be seen by a few trustworthy crawler bots):User-agent: *Allow: /# Disallowed Web PagesDisallow: /terms.html
- Allow everything apart from certain patterns of URLs – Maybe you have an awkward pattern of URLs which you want to disallow, one’s which may be pleasantly gathered into a certain sub-directory. Illustrations of URL designs you will wish to block may be inside search result pages, cleared out over test pages from improvement or 2nd, 3rd, 4th, etc. pages of an eCommerce category page:User-agent: *Allow: /# Disallowed URL PatternsDisallow: /*search=Disallow: /*_test.php$Disallow: /*?page=*
PUTTING IT ALL TOGETHER
Clearly, you may wish to use a combination of these methods to block off different areas of your website, the key things to remember are:
- If you disallow a sub-directory then ANY file, sub-directory or webpage within that URL pattern will be disallowed
- The star symbol (*) substitutes for any character or number of characters
- The dollar symbol ($) signifies the end of the URL, without using this for blocking file extensions you may block a huge number of URLs by accident
- The URLs are case sensitive matched so you may have to put in both caps and non-caps versions to capture all
- It can take search engines several days to a few weeks to notice a disallowed URL and remove it from their index
- The “User-agent” setting allows you to block certain crawler bots or treat them differently if needed, a full list of user agent bots can be found here which replace the catch-all star symbol (*)
WHAT NOT TO INCLUDE IN YOUR ROBOTS.TXT FILE
Occasionally, a website has a robots.txt file which includes the following command:
User-agent: *Disallow: /
This code is letting you
This code conveys that all bots to disregard The Whole space, meaning none of that website’s pages or records would be recorded at all by the look engines! The previously mentioned case highlights the significance of appropriately executing a robots.txt record, so be beyond any doubt to check yours to guarantee you’re not unwittingly restricting your chances of being indexed by search engines.
What happens if you have no robots.txt file?
Without a robots.txt file search engines will have a free run to crawl and index anything they find on the website. This is fine for most websites but it’s really good practice to at least point out where your XML sitemap is so search engines can find new content without having to slowly crawl through all the pages on your website and bumping into them days later.
Testing Your Robots.txt File
You can test your robots.txt file to ensure it works as you expect it to – we’d recommend you do this with your robots.txt file even if you think it’s all correct.
To test your robots.txt file, you’ll need to have the site to which it is applied registered with Google Webmaster Tools. You then simply select the site from the list and Google will return notes for you where it highlights any errors. Test your robots.txt file using the Google Robots.txt Tester