You’ve probably already know what a Shopify Robots.txt file is. But in case you don’t, here is the definition of Robots.txt.
Definition of Robots.txt file
According to Google: “A robots.txt file tells search engine crawlers which URLs the crawler can access on your site. This is used mainly to avoid overloading your site with requests; it is not a mechanism for keeping a web page out of Google.” (or any other web crawlers, not just Google)
And why do I need it?
To keep a web page out of Google, block indexing with noindex or password-protect the page.
Example:
For example, there are many typical pages that you don’t want Google to index such as duplicated collection pages.
Note: There are 2 ways in which you can edit your Shopify Robots.txt file. We will focus mainly on using liquid to add or remove directives (which we will discuss right after this) from the Shopify template to make the robots.txt updated automatically.
Table of Contents
Robots.txt syntax
A basic Robots.txt would look like this:
User-agent: *
Disallow: /private
Disallow: /thank-you
Disallow: /free-stuff
Allow: /
Let’s dive in and learn about its syntax
User-agent in robots.txt
Each search engine should identify itself with a user agent. Google’s robots identify as Googlebot for example, Yahoo’s robots as Slurp and Bing’s robot as BingBot and so on.
The user-agent record defines the start of a group of directives. All directives in between the first user-agent and the next user-agent record are treated as directives for the first user-agent.
Directives can apply to specific user-agents, but they can also be applicable to all user-agents. In that case, a wildcard is used: User-agent: *.
See the list of most popular user-agent here.
Wildcards
include “*” to match any sequence of characters, and patterns may end in “$” to indicate the end of a name.
For example:
- Block access to every URL that contains a question mark “?”
User-agent: *
Disallow: /*?
- The $ character is used for “end of URL” matches. This example blocks GoogleBot crawling URLs that end with “.php”
User-agent: Googlebot
Disallow: /*.php$
Disallow directive in Shopify robots.txt
You can tell search engines not to access certain files, pages or sections of your website.
Disallow The command used to tell a user agent not to crawl a particular URL. Only one “Disallow:” line is allowed for each URL.
Example: You don’t want any of the bots to crawl your checkout pages
User-agent: *
Disallow: /checkout/
Allow directive in Shopify robots.txt
Allow (Only applicable for Googlebot): The command to tell Googlebot it can access a page or subfolder even though its parent page or subfolder may be disallowed.
The Allow directive is used to counteract a Disallow directive. The Allow directive is supported by Google and Bing.
Important: when using Allow and Disallow directives together, be sure not to use wildcards since this may lead to conflicting directives.
Example:
User-agent: *
Allow: /media/terms-and-conditions.pdf
Disallow: /media/
In this example, all users cannot access the media directory except for the file /media/terms-and-conditions.pdf
Example of conflicting directives
User-agent: *
Allow: /directory
Disallow: *.html
Search engines will not know what to do with the URL http://www.domain.com/directory.html. They don’t know if they have permission to use it. When Google is unsure about a directive, they will choose the least restrictive option, which in this case implies accessing http://www.domain.com/directory.html.
Crawl-delay in Shopify robots.txt
The Crawl-delay directive is an unofficial directive used to prevent overloading servers with too many requests.
The crawl-delay directives indicate how many seconds a crawler should wait before loading and crawling page content. Note that Googlebot does not acknowledge this command, but the crawl rate can be set in Google Search Console.
Note: The Crawl-delay directive should be placed right after the Disallow or Allow directives.
Example:
User-agent: BingBot
Disallow: /private/
Crawl-delay: 10
Use Liquid to edit Shopify robots.txt
There are 2 ways in which you can use to edit your Shopify Robots.txt file
- You can create your own Robots.txt file and upload it manually to your Shopify store
- You can use Liquid to add or remove directives from the robots.txt.liquid template. This method preserves Shopify’s ability to keep the file updated automatically in the future, and is recommended.
In the scope of this article, we will focus mainly on using the latter method.
How to Hide Shopify Search Template from Google
To hide the search template from Google:
- From your Shopify admin, go to Online Store > Themes.
- Find the theme you want to edit, and then click Actions > Edit code.
- Click the theme.liquid layout file.
- Paste the following code in the <head> section:
{% if template contains ‘search’ %}
<meta name=”robots” content=”noindex”>
{% endif %}
- Click Save.
How to Hide Multiple Shopify Pages from Google
To hide the multiple Shopify pages from Google:
- From your Shopify admin, go to Online Store > Themes.
- Find the theme you want to edit, and then click Actions > Edit code.
- Click the theme.liquid layout file.
- Paste the following code in the <head> section:
{% if handle contains ‘page-handle-you-want-to-exclude’ %}
<meta name=”robots” content=”noindex”>
{% elsif handle contains ‘your-page-handle-2’ %}
<meta name=”robots” content=”noindex”>
{% elsif handle contains ‘your-page-handle-3’ %}
<meta name=”robots” content=”noindex”>
{% else %}
{% endif %}
You can include as many pages you want to the code block, remember to end it with {% else %} then {% endif %}
A handle is what most people refer to as a slug. Shopify refers to a slug as a handle. For example:
The handle of this product page is: /add-mega-menu-to-shopify-store-t50
- Click Save.
How to Hide Dynamic Shopify Pages from Google
To hide dynamic pages which contains certain keywords in their handles from Google:
- From your Shopify admin, go to Online Store > Themes.
- Find the theme you want to edit, and then click Actions > Edit code.
- Click the theme.liquid layout file.
- Paste the following code in the <head> section:
{% if page.handle contains ‘your-text’ or collection.handle contains ‘your-text’ or product.handle contains ‘your-text’ }
<meta name=”robots” content=”noindex”>
- Click Save.
How to Hide Shopify Pages from Web Crawlers
- From your Shopify admin, go to Online Store > Themes.
- Find the theme you want to edit, and then click Actions > Edit code.
- Click the theme.liquid layout file.
- Paste the following code in the <head> section:
{% if current_tags %}
Block Robots
{% endif %}
- Click Save.
Conclusion
All Shopify stores start with the same robots.txt. The company says it works for most sites. Now we can edit the file through robots.txt.liquid theme template. Hope our article provides you with enough information so that you can customize your Shopify robots.txt by yourself.
If you need any help with making changes to your Shopify Robots.txt we are happy to lend a hand. HappyPoints is a team of certified Shopify experts who have delivered 1000+ tasks worldwide with development and marketing solutions.
We’re offering a free SEO project so that you can experience our expertise and professionalism in working and communicating. We want to enhance your confidence and reliability before you decide to walk through a long Shopify SEO journey with us. There are limited slots so you might want to sign-up early. Feel free to learn more about the free SEO project here.