Check On-Page for Broken Links with Google Docs

It can be tedious to check for broken links on a web page, especially a web page with many links. Other than using desktop tools, i.e. Xenu, Screaming Frog installed on your workstation, Google Docs is a good alternative tool that can be accessed anywhere to check for broken links on your web pages.

1. Create a new spreadsheet in Google Docs, to extract the URLs and anchor text on the web page.

2. Input the web page URL in cell A1.

3. The ImportXML function is useful to extract the necessary desired data on the web page. Insert the ImportXML function, =ImportXML(A1, “//a”) in cell A3 to extract the anchor text of all hyperlinks on the web page. Insert another ImportXML function, =ImportXML(A1, “//a/@href”) in cell B3 to extract the pointing URLs of all hyperlinks.

Here is an example of the spreadsheet on Google Docs with ImportXML function. Feel free to make a copy of it.

Extract URLs in Google Docs Spreadsheet

4. Copy the generated list of anchor texts and URLs onto another new spreadsheet on Google Docs.

5. Update all the relative path URLs to absolute path.

6. In the new spreadsheet with the list of anchor texts and URLs, create a script in the spreadsheet to check for the server status code for each of the URLs. Click “Tools > Script editor” on the spreadsheet top menu to open a new window to insert or edit the script.

Open Script Editor in Google Docs Spreadsheet

7. Insert the below chunk of code. You have created a script function, named “HTTPResponse”. Save the script with a suitable project name, i.e. “Check Server Status Code”.

function HTTPResponse( uri )
{
var response_code ;
try {
response_code = UrlFetchApp .fetch( uri ) .getResponseCode() .toString() ;
}
catch( error ) {
response_code = error .toString() .match( / returned code (ddd)./ )[1] ;
}
finally {
return response_code ;
}
}

Insert Script in Google Docs Spreadsheet

8. Go back to the spreadsheet with the list of anchor texts and URLs. Insert the HTTPResponse function in a new column, to check for the server status code.

A “200” value suggests that the URL is available working.
A “404” value means that the URL is unavailable, which you have to remove the broken link or update it to a new working URL.

You can learn more about the HTTP server status code.

Feel free to make a copy of the Google Docs spreadsheet with HTTPResponse function.

Check HTTP Server Status Code in Google Docs Spreadsheet

Having a career spanning more than 10 years of expertise in web development, web analytics, enterprise level SEO, paid search, competitive analysis, and performance marketing strategies & tactics, Cheok Lup has worked with established global e-commerce firms - GlobalSources.com, enterprise content - SAP, educational institutions - Singapore Management University (SMU), small-medium businesses (SMBs/SMEs), and start-ups.