How to setup Robots.txt for BigCommerce Stores?
You must grant access to your web pages for search engines like Google and Bing to find your shop. However, there are certain websites where you don’t like to appear in search results, such as registration pages, search result pages, and cart and checkout pages. The
robots.txt file serves as a barrier between these sites and search engine crawlers (robots).
BigCommerce can automatically back up and adjust your robots.txt file if you want to use sitewide HTTPS for improved storefront protection. When you link to your store through WebDAV, you’ll find these backup files in the root folder. The files do not need to be modified.
This article will provide you with the definition and guidelines for setting up BigCommerce robots.txt. So that you can understand and optimize your store.
Table of contents
- What is a BigCommerce robots.txt file?
- How to create a BigCommerce robots.txt file?
- How to edit a BigCommerce robots.txt file?
- The disallowed files when using BigCommerce robots.txt
- Bigcommerce robots.txt FAQ
What is a BigCommerce robots.txt file?
The robots exclusion standard, also identified as the robots exclusion protocol or basically robots.txt, is a standard used mostly for websites to connect with web crawlers and other web robots. The specification defines how to tell the web robot regarding the parts of the website need not be processed or scanned. Search engines also use robots to categorize websites. Not all robots follow the standard; email harvesters, spambots, ransomware, and robots that search for security bugs may also begin with areas of the website where they have been instructed to avoid. The norm should be used in combination with Sitemaps, a robot inclusion standard for websites.
A BigCommerce robots.txt file informs search engine crawlers of which sites or files they may and cannot request from your web. This is mostly intended to prevent searches from overwhelming the site; it is not a tool for holding a web page out of Google. To hold a web page out of Google’s registry, use noindex directives or password-protect it.
What is robots.txt used for?
BigCommerce robots.txt is used mainly to handle crawler traffic to the web and, depending on the file format, to hold a website off Google:
Robots.txt will be used on web pages (HTML, PDF, or other non-media formats that Google will read) to handle crawling traffic if you believe your server would be overrun with requests from Google’s crawler or to stop crawling unimportant or related sites on your platform.
Robots.txt can not be used to shield the web pages from Google search results. This is due to the fact that if other sites refer to your page with informative text, your page will always be indexed even if you do not visit the page. Use another tool, such as password security or a noindex directive, to prevent your page from appearing in search results.
robots.txt file disables the site page, it will still show in search results, but it will lack a definition and look like this. Image, video, and PDF files, as well as other non-HTML files, would be omitted. Delete the
robots.txt entry that is blocking the page if you see this search query for your domain and wish to correct it. Using another tool if you want to mask the page from the quest fully.
Robots.txt can be used to handle crawl traffic and to block image, video, and audio files from appearing in Google search results. (Please keep in mind that this would not prohibit any sites or users from connecting to your image/video/audio file.)
If you believe that pages loaded without these resources would not be greatly impacted by the failure, you may use robots.txt to block property files such as unimportant pictures, scripts, or style files. However, if the lack of these tools makes it difficult for Google’s crawler to interpret the website, you can not block them; otherwise, Google would not do a decent job reviewing sites that depend on such resources.
Standard search engine BigCommerce robots.txt file
Stores with sitewide HTTPS
If your store uses sitewide HTTPS, you can only have one robots.txt file. The default file can be used in Recommendations for Search Engines.
Stores without sitewide HTTPS
If your shop does not use sitewide HTTPS, you can find two robots.txt files: one for HTTP pages and one for HTTPS pages.
Here is the HTTP robots file’s default. You will use this whenever you need to return to the original file for some purpose.
*User-agent: AdsBot-Google Disallow: /account.php Disallow: /cart.php Disallow: /checkout.php Disallow: /finishorder.php Disallow: /login.php Disallow: /orderstatus.php Disallow: /postreview.php Disallow: /productimage.php Disallow: /productupdates.php Disallow: /remote.php Disallow: /search.php Disallow: /viewfile.php Disallow: /wishlist.php Disallow: /admin/ Disallow: /__socialshop/* *User-agent: * Disallow: /account.php Disallow: /cart.php Disallow: /checkout.php Disallow: /finishorder.php Disallow: /login.php Disallow: /orderstatus.php Disallow: /postreview.php Disallow: /productimage.php Disallow: /productupdates.php Disallow: /remote.php Disallow: /search.php Disallow: /viewfile.php Disallow: /wishlist.php Disallow: /admin/ Disallow: /__socialshop/*
The HTTPS robots file default is shown below. You can still go back to this if necessary. Stores that have allowed sitewide HTTPS can use the one mentioned in Sitewide HTTPS.
*User-agent: AdsBot-Google Disallow: /account.php Disallow: /cart.php Disallow: /checkout.php Disallow: /finishorder.php Disallow: /login.php Disallow: /orderstatus.php Disallow: /postreview.php Disallow: /productimage.php Disallow: /productupdates.php Disallow: /remote.php Disallow: /search.php Disallow: /viewfile.php Disallow: /wishlist.php Disallow: /admin/ Disallow: /__socialshop/* *User-agent: * Disallow: /* *User-agent: google-xrawler Allow: /feeds/* *
If you choose to go back to the original
robots.txt archives, you should use these defaults. They can only work if you do not use Sitewide HTTPS.
The limitation of the BigCommerce robots.txt files
Until you build or update
robots.txt, you should be aware of its limitations. You will want to explore other mechanisms to guarantee that your URLs are not searchable on the site at times.
Any search engines will not accept robots.txt directives.
The directions in robots.txt files cannot compel crawlers to visit your site; it is up to the crawler to follow them. If Googlebot and other trustworthy web crawlers may follow the instructions in a robots.txt file, other crawlers do not. As a result, if you want to hide details from site crawlers, you can use other blocking techniques, such as password-protecting private data on your server.
Syntax is interpreted differently by various crawlers.
While reputable web crawlers adhere to the directives in a robots.txt file, each crawler can interpret the directives differently. Since certain site crawlers can not comprehend such commands, you should be acquainted with the correct syntax for answering them.
If a robotized page is connected to from another tab, it can always be indexed.
Although Google can not crawl or index content that has been prohibited by robots.txt, we can find and index a disallowed URL if it is connected from other places on the internet. As a consequence, the URL address and possibly other publicly accessible content, such as anchor text in links to the website, can continue to appear in Google search results. You can password-protect your server’s files or use the noindex meta tag or answer header to better avoid your URL from featuring in Google Search results (or remove the page entirely).
How to create a BigCommerce robots.txt file?
A robots.txt file can be used to define the folders, and files on your web server may not be crawled by a Robots Exclusion Protocol (REP)-compliant search engine crawler (aka a robot or bot). It is important to note that this would not mean that a website that is not crawled would not be indexed.
Step 1: Determine folders and files on your site server you want to prevent the crawler from accessing.
- Examine the webserver for written material that you do not want search engines to see.
- Create a list of the open files and folders on your web server that you choose to limit access to. As an example, You will want to advise bots to stop crawling web directories such as
/tmp(or their equivalents, if they exist in your server architecture).
Step 2: Determine if you need to include special guidance for a custom search engine bot in addition to a default collection of crawling directives.
Examine the referrer logs on your webserver to see if there are any bots crawling your domain that you want to ban in addition to the general instructions that extend to all bots.
Build the robots.txt file in a text editor and apply REP instructions to prohibit bots from visiting material. Save the text file in ASCII or UTF-8 encoding.
- In the robots.txt code, bots are referred to as user-agents. Start the first segment of directives relevant to all bots at the beginning of the file by inserting the following line: User-agent: *
- Make a set of Disallow orders that includes the material you wish to be blocked. As an example, Provided our previous directory instances, such a series of directives will be as follows:
- /cgi-bin/ is not allowed.
- /scripts/ is not allowed.
- /tmp/ is not allowed.
- If you choose to apply customized directives for particular bots that are not suitable for all bots, such as crawl-delay:, place them after the first, generic segment, changing the User-agent reference to a specific bot. See the Robots Database for a selection of appropriate bot titles.
Step 3: Optional: Have a link to your sitemap file (if you have one)
- You can direct the bot to a Sitemap file that lists the most relevant pages on your web by referring it in its own line at the end of the file.
- As an example, A Sitemap file is usually stored in a site’s root directory. This is an example of a Sitemap directive line:
Step 4: Validate your robots.txt file for errors.
Step 5: Upload the robots.txt file to the site’s root directory.
How to edit a BigCommerce robots.txt file?
You can adjust which websites are crawled by editing your robot’s code. However, whether you are acquainted with robots.txt files and appreciate SEO’s possible effect, we highly advise against doing so.
If you want search engine robots to stop crawling a certain page or subdirectory, add “Disallow:” followed by the URL. As an example:
- Disallowed: /shipping-return-policy.html
- Disallowed: /content/
Since you save the updates, it can take many days or weeks for search engines to re-crawl your web and properly reindex it. You may resubmit the sitemap or remove a URL directly from Google Search Console. See Using a Sitemap for more detail on submitting a sitemap.
The disallowed files when using BigCommerce robots.txt
User-agent: AdsBot-Google or * — If the value is *, then ALL bots/spiders must abide by the disallowed laws. If the value is AdsBot-Google, this line means that the following disallow guidelines are only applicable to AdsBot-Google. AdsBot-Google is a bot used by Google to crawl landing pages connected with advertising, usually paying search, through the Google Ads network. AdsBot-Google, on the other hand, is used for display ads distributed by DoubleClick, Google Advertising, and AdSense.
/account.php is not allowed — AdsBot-Google is prevented from crawling the store account pages by this line. These sites are usually reached when a shop guest registers with the store to complete a transaction or receive order status alerts. This has little to do with the shop owner’s BigCommerce account.
/bus.php is not allowed — This stops the cart page from being crawled. It will be strange to see the cart page displayed in search engines since it is based on the products a shop customer chooses. Furthermore, landing on a cart page of products picked by someone else will give a bad user interface to new site users.
/checkout.php is not allowed — This stops the checkout page from being crawled. Like the cart page, this tab is based on user feedback and will be useless as a search result. Furthermore, the checkout page can include confidential information such as name, email, address, and credit card information. BigCommerce preserves users’ personal details shopping from every shop and ensures PCI compliance by prohibiting this website from featuring in search engines.
/finishorder.php is not allowed — Finishorder.php usually includes a large amount of personal information. BigCommerce safeguards user privacy and ensures PCI enforcement by blocking search engines from crawling this link.
/login.php is not allowed — This stops the store’s consumer login page from becoming crawled. Search engines block this page because it contains so little information and has no appeal to potential customers to the shop.
/orderstatus.php is not allowed — Before seeing the order status tab’s content, a customer must first log in. This page is disabled because search engines do not have store accounts and cannot enter data into text fields.
/postreview.php is not allowed — Similarly to the orderstatus.php tab, a customer must login before publishing a product review. This page is disabled because search engines do not have store accounts and cannot enter data into text fields.
/productimage.php is not allowed — Productimage.php is used on product sites to generate a jquery lightbox window, which is normally executed when a user clicks on a product image on a web page. Since the pop-up window is not a separate website with its own URL and duplicates any text on the product page, it is disabled to avoid redundant material, missed title tags, and summary alerts in the search console (webmaster tools), and thin content penalties.
/productupdates.php is not allowed — No longer in operation.
/remote.php is not allowed — This is used for store AJAX calls and does not generate a human-readable tab.
/search.php is not allowed — This page manages requests from a store’s search box. Google has previously confirmed that they do not want search results sites in their database. Moving from one search results page to another instead of straight to the answer provides a bad user interface.
/viewfile.php is not allowed — This is used to bind files to orders. This is common with digital purchases like digital files and pdfs. Since the item being offered is a digital good, indexing it makes it accessible to anyone who did not buy the file.
/wishlist.php is not allowed — Wishlist.php is user-dependent and can be of little to no benefit to searchers. Furthermore, based on how many items a person contributes to a wishlist, the pages might be called slim and/or redundant material. This page has been blocked to discourage a harmful user interface and alleviate thin/duplicate material complaints.
/admin/ is not allowed — For protection purposes, the store login route has been disabled. Hackers are deterred from launching overt attacks by rendering the login page difficult to locate. Furthermore, this website will be useless to a searcher.
/? bc fsnf=1 is not allowed — This prevents bots from pursuing faceted search links and degrading results.
/& bc fsnf=1 is not allowed — This prevents bots from pursuing faceted search links and degrading results.
Bigcommerce robots.txt FAQ
I use the same robots.txt file for all of my websites. May I use a complete URL rather than a relative path?
No, it does not. With the exception of Sitemap:code>, the instructions in the robots.txt file are only applicable for relative routes.
Is it okay to put the robots.txt file in a subdirectory?
No, it does not. The file must be put in the website’s root directory.
I’d like to restrict access to a private folder. May I keep anybody from reading my robots.txt file?
No, it does not. Various users will be able to read the robots.txt format. If content files or file names should not be made available, they should not be specified in the robots.txt format. It is not advised that separate robots.txt files be served depending on the user-agent or other attributes.
Is it necessary to provide an allow directive in order to allow crawling?
No, there is no need to provide an enable order. Enable directives in the same robots.txt file override disallow directives.
What if I make an error in my robots.txt file or use an unsupported directive?
Web crawlers are normally very adaptable and would not be swayed by small errors in the robots.txt code. The most that may happen is that faulty/unsupported orders are followed. Keep in mind that Google cannot read minds while interpreting a robots.txt file; we must interpret the robots.txt file that we retrieved. However, if you are aware of any issues with your robots.txt file, they are typically simple to address.
What software can I use to make a robots.txt file?
Anything that generates a legitimate text file should be used. Notepad, TextEdit, vi, or emacs are popular programs for generating robots.txt files. Once you’ve finished making the file, check it with the robots.txt tester.
Will, a website, vanish from search results if I use a robots.txt disallow directive to prevent Google from crawling it? Blocking Google from crawling a website is likely to result in the page being removed from Google’s database.
Robots.txt Disallow, on the other hand, does not ensure that a website would not appear in search results: Google can also determine that it is appropriate based on external details such as incoming links. If you choose to exclude a website from being indexed directly, use the noindex robots meta tag or the X-Robots-Tag HTTP header. Under this scenario, you can not disallow the page in robots.txt, so the tag must be noticed and obeyed if the page is crawled.
How long would it take for updates to my robots.txt file to have an effect on my search results?
First, the robots.txt file’s cache must be refreshed (we generally cache the contents for up to one day). Even after defining the update, crawling and indexing is a dynamic method that may take a long time for individual URLs, rendering it difficult to have an exact timeline. Also, bear in mind that even though your robots.txt file blocks access to a Link, that URL can always appear in search results even though we can’t crawl it. Send a delete request via Google Search Console if you want to expedite the removal of the pages you’ve blocked from Google.
How do I momentarily halt all website crawling?
You may momentarily stop all crawling by returning an HTTP answer code of 503 for all URLs, including the robots.txt format. The robots.txt file can be attempted again and again before it can be downloaded. We may not suggest that you change your robots.txt file to prevent crawling.
My server does not worry for event. How do I absolutely prevent crawling of those folders?
The robots.txt file includes case-sensitive instructions. It is advised in this situation to use canonicalization approaches to ensure that only one edition of the URL is indexed. This helps you to have less lines in your robots.txt file, making it easy to handle. If this is not feasible, we propose that you mention the most popular folder name variations or simplify it as much as possible, using just the first few characters instead of the full name. Instead of naming both upper and lower-case permutations of /MyPrivateFolder, you could list “/MyP” permutations (if you are certain that no other, crawlable URLs exist with those first characters). If crawling is not a problem, it could be preferable to use a robots meta tag or X-Robots-Tag HTTP header instead.
For all URLs, including the robots.txt text, I return 403 Forbidden. Why is the website still crawled?
The 403 Forbidden HTTP status code (and all other 4xx HTTP status codes) was translated as a lack of a robots.txt register. This ensures that crawlers can usually believe they will crawl any of the website’s URLs. To prevent crawling of the website, the robots.txt file must be returned with a 200 OK HTTP status code and an acceptable disallow clause.
In the final analysis, BigCommerce robots.txt is a text file that contains instructions for web crawlers. It specifies which parts of a website crawler is permitted to scan. The robots.txt file, however, does not specifically call these. Instead, those places are not permitted to be scanned. You may conveniently remove whole domains, full folders, one or more subdirectories, or individual files from search engine crawling using this basic text format. This file, though, does not guard against unauthorized entry.
Robots.txt is located in the domain’s root directory. As a result, it is the first document that crawlers open when they enter your web. The file, though, does not just regulate crawling. You should also provide a path to your sitemap, which provides search engine crawlers with an overview of all of your domain’s current URLs.