Use of robots.txt, noindex, nofollow, canonical URL & 301/302 Redirects

Many are mistaken that page crawling & indexing is an intertwined function in Google Search, but in fact they are actually separated mechanism layers. Web pages can be crawled, but that doesn’t mean that they will be indexed by Google. Overall, we have seen a lot of ambiguity and confusion with the various SEO implementations faced by website owners, managers, and even search specialists or SEO practitioners.

Despite Google mentioned that Google does not transfer PageRank or anchor text across nofollowed (rel=”nofollow”) links, many have the misconception that Google will not crawl these hyperlinks. In fact, Google does so and therefore recommends the use of robots.txt to block Googlebot from crawling the affected nofollowed links.

Google recommends the use of HTTP 301 Redirects to transfer PageRank from your old page to new page. This server-level redirect is recognized and followed by Googlebot to identify and crawl the new page URL. Page-level redirect, i.e. meta refresh is not an effective implementation to transfer most of the PageRank from old page to new page, and it may also screw up web analytics tracking on the page – resulting in the erroneous attribution of traffic source to the new page as direct traffic instead.

Below is a table displaying the comparative results of different type of SEO implementations: blocking Googlebot with robots.txt, noindex, nofollow, canonical URL, 301/302 redirects.

Actions PageRank be passed from other pages to Page A? Visitors able to view Page A? Googlebot able to crawl Page A? Google able to index Page A? Page A able to accumulate PageRank? Page A able to pass PageRank to other pages?
Block Page A with robots.txt No Yes No Depends, Google may have already index the page before blocking with robots.txt No No, hyperlinks on Page A are NOT detected & crawled, as Googlebot unable to crawl to Page A in the first place
Use rel=”nofollow” on hyperlinks to Page A No Yes Yes Yes Yes, assuming that there are other “followed” hyperlinks to Page A Yes
Use noindex meta standard (<meta name=”robots” content=”noindex” />) on Page A Yes Yes Yes No Yes Yes
Use nofollow meta standard (<meta name=”robots” content=”nofollow” />) on Page A Yes Yes Yes Yes Yes No, however Googlebot can still crawl through the hyperlinks on Page A
Use canonical URL of Page B (<link rel=”canonical” href=”Page B’s URL” />) on Page A No, PageRank is passed to Page B Yes Yes Yes, Google may choose to serve up Page B instead of Page A on Google SERP No, PageRank is passed to Page B Since PageRank on Page A is passed to Page B, there is little or no PageRank left on Page A
Implement HTTP 301 Redirect on Page A to Page B No, PageRank is passed to Page B No, user lands on and views Page B No, Googlebot will crawl Page B No No No, Page B passes PageRank to other pages
Implement HTTP 302 Redirect on Page A to Page B No, PageRank is NOT passed to neither Page A nor Page B No, user lands on and views Page B No, Googlebot will crawl Page B No No No

Reference:
Learn about robots.txt files
Block search indexing with meta tags
Use rel=”nofollow” for specific links
Use canonical URLs
Change page URLs with 301 redirects
Redirection

Cheok Lup is a data driven & hands-on practitioner in digital marketing, who has acquired more than 10 years of expertise in web development, web analytics, enterprise level SEO, paid search, and performance marketing strategies & tactics. He has worked with established global firms – GlobalSources.com, Accenture Interactive, SAP, educational institutions – Singapore Management University (SMU), small-medium businesses (SMBs/SMEs), and start-ups.