Use of robots.txt, noindex, nofollow, canonical URL & 301/302 Redirects
Many are mistaken that page crawling & indexing is an intertwined function in Google Search, but in fact they are actually separated mechanism layers. Web pages can be crawled, but that doesn’t mean that they will be indexed by Google. Overall, we have seen a lot of ambiguity and confusion with the various SEO implementations faced by website owners, managers, and even search specialists or SEO practitioners.
Despite Google mentioned that Google does not transfer PageRank or anchor text across nofollowed (rel=”nofollow”) links, many have the misconception that Google will not crawl these hyperlinks. In fact, Google does so and therefore recommends the use of robots.txt to block Googlebot from crawling the affected nofollowed links.
Google recommends the use of HTTP 301 Redirects to transfer PageRank from your old page to new page. This server-level redirect is recognized and followed by Googlebot to identify and crawl the new page URL. Page-level redirect, i.e. meta refresh is not an effective implementation to transfer most of the PageRank from old page to new page, and it may also screw up web analytics tracking on the page – resulting in the erroneous attribution of traffic source to the new page as direct traffic instead.
Below is a table displaying the comparative results of different type of SEO implementations: blocking Googlebot with robots.txt, noindex, nofollow, canonical URL, 301/302 redirects.
|Actions||PageRank be passed from other pages to Page A?||Visitors able to view Page A?||Googlebot able to crawl Page A?||Google able to index Page A?||Page A able to accumulate PageRank?||Page A able to pass PageRank to other pages?|
|Block Page A with robots.txt||No||Yes||No||Depends, Google may have already index the page before blocking with robots.txt||No||No, hyperlinks on Page A are NOT detected & crawled, as Googlebot unable to crawl to Page A in the first place|
|Use rel=”nofollow” on hyperlinks to Page A||No||Yes||Yes||Yes||Yes, assuming that there are other “followed” hyperlinks to Page A||Yes|
|Use noindex meta standard (<meta name=”robots” content=”noindex” />) on Page A||Yes||Yes||Yes||No||Yes||Yes|
|Use nofollow meta standard (<meta name=”robots” content=”nofollow” />) on Page A||Yes||Yes||Yes||Yes||Yes||No, however Googlebot can still crawl through the hyperlinks on Page A|
|Use canonical URL of Page B (<link rel=”canonical” href=”Page B’s URL” />) on Page A||No, PageRank is passed to Page B||Yes||Yes||Yes, Google may choose to serve up Page B instead of Page A on Google SERP||No, PageRank is passed to Page B||Since PageRank on Page A is passed to Page B, there is little or no PageRank left on Page A|
|Implement HTTP 301 Redirect on Page A to Page B||No, PageRank is passed to Page B||No, user lands on and views Page B||No, Googlebot will crawl Page B||No||No||No, Page B passes PageRank to other pages|
|Implement HTTP 302 Redirect on Page A to Page B||No, PageRank is NOT passed to neither Page A nor Page B||No, user lands on and views Page B||No, Googlebot will crawl Page B||No||No||No|
Cheok Lup is a data driven & hands-on practitioner in digital marketing, who has acquired more than 10 years of expertise in web development, web analytics, enterprise level SEO, paid search, and performance marketing strategies & tactics. He has worked with established global firms – GlobalSources.com, Accenture Interactive, SAP, educational institutions – Singapore Management University (SMU), small-medium businesses (SMBs/SMEs), and start-ups.