Search Engine Tags, Attributes, Commands and Suggestions: Robots.txt and rel=“canonical”

Part 1

When I prompted some friends for their questions to help me write this blog, one of my favorite responses was, “First I would have to understand what the heck you’re writing about to begin with!” Duly noted. And I’ve noticed in my work with companies large and small that there’s a big knowledge gap when it comes to the following:

  • Robots.txt disallow
  • rel=”canonical”
  • 301 redirect
  • “nofollow” & “noindex”

Some top questions I’ve heard about these tags and attributes are:

  • Which are best for SEO?
  • What’s the difference between a 301 and a 302 redirect?
  • Why is my page still in search when I removed it from my site?
  • Where on my page do I put the tag?
  • What are these things?

They’re actually really important signals that tell a search engine what to do with your site’s content, and if you haven’t used any of these on your website, I’m fairly confident you have some issues, namely duplicate content and 404 errors. Let’s break these down Classroom 101 style. In this blog, I’ll cover robots.txt files and rel=”canonical” tags.

  1. Robots.txt Disallow

Pronounced “robots dot text,” a robots.txt file tells a search engine which parts of your site it should not crawl.

Why wouldn’t you want something to be crawled? One reason might be that you have content on your site that just isn’t useful to users, and you don’t want it to be found. Note that you should not use a robots.txt to hide “bad” content from search engines. These files are public and can be seen if a webmaster wants to view your file. I set up a robots.txt file for our website because our CMS was creating pages in an /uploads file that were being crawled and indexed and cluttering our web presence. I excluded this folder from being crawled using a robots.txt file—Ta-da! Problem solved.

To create a robots.txt file look to Webmaster Tools  if you have an account for your website (You should!).  If you use WordPress, you can download the Yoast SEO plugin and create a file easily. Here are instructions for setting up a robots.txt file.

It will look something like this:

User-agent: *

Disallow: /oldfile/

Disallow: /dumbfile/thisparticularfile.htm

  1. rel=”canonical”

Pronounced “Rel Canonical,” and sometimes referred to just as “a canonical,” this tag tells a search engine which version of a URL it should index in search results. If you don’t tell a search engine which version to use, it will choose one on its own, and search engines should never be left to their own devices.

Let’s look at Home Depot. Here’s a URL for a bathroom sink faucet I found by navigating through their site:

Here’s the same faucet with a different URL (reachable from a Google search result page):

Or let’s try this. A popular Greenville restaurant is Pomegranate on Main. Because they have not specified which homepage URL is preferred, they have at least two live versions:

Other homepages might have even more versions:

Why is that a problem? Because as people link to your homepage, they might link to different URLs, spreading out value that should be consolidated to one page.

You could do a 301 redirect here (I’ll talk about that next), or you can set up a canonical tag. Basically, as Google explains it:

Add a <link> element with the attribute rel=”canonical” to the <head> section of the non-preferred pages:

<link rel=”canonical” href=”” />

Stay tuned for more in Part 2! I’ll be covering 301 redirects and rel=”nofollow” and rel=”noindex” tags.

Enjoyed this post? Read more by Laura here

Laura Lee – Account Manager

Related Posts

When Do I Use Machine Translation?

Though machine translation has not yet succeeded in making human translators 100% obsolete, the technology has made — is still

Google knows my birthday which raises privacy concerns

Google Knows My Birthday… and yours too

Who doesn’t love an unexpected birthday wish? I’ve been around the sun almost 70 times, and I still get a

Why You Should Consider Open Source Software

Why You Should Consider Open Source Software

Open source software (OSS) is an integral part of almost every aspect of digital life today. The majority of Internet