tinkerEdge > Blog > SEO > Use of robots.txt, noindex, nofollow, canonical URL & 301/302 Redirects

Use of robots.txt, noindex, nofollow, canonical URL & 301/302 Redirects

  • SEO
  • Cheok Lup

Many are mistaken that page crawling & indexing is an intertwined function in Google Search, but in fact they are actually separated mechanism layers. Web pages can be crawled, but that doesn’t mean that they will be indexed by Google. Overall, we have seen a lot of ambiguity and confusion with the various SEO implementations faced by website owners, managers, and even search specialists or SEO practitioners.

Despite Google mentioned that Google does not transfer PageRank or anchor text across nofollowed (rel=”nofollow”) links, many have the misconception that Google will not crawl these hyperlinks. In fact, Google does so and therefore recommends the use of robots.txt to block Googlebot from crawling the affected nofollowed links.

Google recommends the use of HTTP 301 Redirects to transfer PageRank from your old page to new page. This server-level redirect is recognized and followed by Googlebot to identify and crawl the new page URL. Page-level redirect, i.e. meta refresh is not an effective implementation to transfer most of the PageRank from old page to new page, and it may also screw up web analytics tracking on the page – resulting in the erroneous attribution of traffic source to the new page as direct traffic instead.

Below is a table displaying the comparative results of different type of SEO implementations: blocking Googlebot with robots.txt, noindex, nofollow, canonical URL, 301/302 redirects.

ActionsPageRank be passed from other pages to Page A?Visitors able to view Page A?Googlebot able to crawl Page A?Google able to index Page A?Page A able to accumulate PageRank?Page A able to pass PageRank to other pages?
Block Page A with robots.txtNoYesNoDepends, Google may have already index the page before blocking with robots.txtNoNo, hyperlinks on Page A are NOT detected & crawled, as Googlebot unable to crawl to Page A in the first place
Use rel=”nofollow” on hyperlinks to Page ANoYesYesYesYes, assuming that there are other “followed” hyperlinks to Page AYes
Use noindex meta standard (<meta name=”robots” content=”noindex” />) on Page AYesYesYesNoYesYes
Use nofollow meta standard (<meta name=”robots” content=”nofollow” />) on Page AYesYesYesYesYesNo, however Googlebot can still crawl through the hyperlinks on Page A
Use canonical URL of Page B (<link rel=”canonical” href=”Page B’s URL” />) on Page ANo, PageRank is passed to Page BYesYesYes, Google may choose to serve up Page B instead of Page A on Google SERPNo, PageRank is passed to Page BSince PageRank on Page A is passed to Page B, there is little or no PageRank left on Page A
Implement HTTP 301 Redirect on Page A to Page BNo, PageRank is passed to Page BNo, user lands on and views Page BNo, Googlebot will crawl Page BNoNoNo, Page B passes PageRank to other pages
Implement HTTP 302 Redirect on Page A to Page BNo, PageRank is NOT passed to neither Page A nor Page BNo, user lands on and views Page BNo, Googlebot will crawl Page BNoNoNo

Reference:
Learn about robots.txt files
Block search indexing with meta tags
Use rel=”nofollow” for specific links
Use canonical URLs
Change page URLs with 301 redirects
Redirection