Saturday 27 April 2013


Finding the broken links in a webpage using Selenium


We heard about some Firefox plug-ins to find broken links in a webpage, like Link Checker, Xenu and etc. We need to install these plug-ins with Firefox browser and find the broken URLs or 404 pages.

We can write the Selenium script for the same functionality. How can we do that?
  • We need to find the number of links available on the page
  • We need to track each and every link
  • Finally we can get the response code for each and every URL or link with the help of HttpURLConnection Class and getResponseCode method.
How to find the number of links on the page?

We can find the number using selenium.getXpathCount("//a").intValue() method.

selenium=new DefaultSelenium("localhost", 4444, "*firefox", "http://www.yahoo.com");
selenium.start();
selenium.open("/");
int linkCount = selenium.getXpathCount("//a").intValue();

How to track each and every link on the page?

We can use the for loop and track the links one by one using this.browserbot.getUserWindow().document.links[] method. This will return the complete properties of the <a> tag of each URL. Then we can use selenium.getEval() method to extract only the HREF part of the <a> tag.

for (int i = 0; i < linkCount; i++) 
    {
     
        currentLink = "this.browserbot.getUserWindow().document.links[" + i + "]";
        temp = selenium.getEval(currentLink + ".href");

             }

How to find out the response code of the URL?

We can use HttpURLConnection Class and getResponseCode method for finding the reponse code of the URL.

public static int getResponseCode(String urlString) throws MalformedURLException, IOException {
    URL u = new URL(urlString); 
    HttpURLConnection huc =  (HttpURLConnection)  u.openConnection(); 
    huc.setRequestMethod("GET"); 
    huc.connect(); 
    return huc.getResponseCode();
}

Find out the complete Selenium Script below:

import java.io.FileOutputStream;
import java.io.IOException;
import java.io.PrintStream;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import org.testng.annotations.BeforeMethod;
import org.testng.annotations.Test;
import com.thoughtworks.selenium.DefaultSelenium;
import com.thoughtworks.selenium.SeleneseTestBase;

public class BrokenURL extends SeleneseTestBase {
public int invalidLink;
String currentLink;
String temp;
public DefaultSelenium selenium;
@BeforeMethod
public void setUp() throws Exception
{
selenium=new DefaultSelenium("localhost", 4444, "*firefox", "http://www.yahoo.com");
selenium.start();
}
@Test
public void testUntitled() throws Exception {
FileOutputStream fout = new FileOutputStream ("broken_links.txt", true);
invalidLink=0;
selenium.open("/");
int linkCount = selenium.getXpathCount("//a").intValue();
    
new PrintStream(fout).println("URL : " + selenium.getLocation());
new PrintStream(fout).println("--------------------------------------------");
    for (int i = 0; i < linkCount; i++) 
    {
     int statusCode=0;
    
        currentLink = "this.browserbot.getUserWindow().document.links[" + i + "]";
        temp = selenium.getEval(currentLink + ".href");
        statusCode=getResponseCode(temp);
        if (statusCode==404)
        {
         new PrintStream(fout).println(selenium.getEval(currentLink + ".href") + " "+ statusCode);
         invalidLink++; 
        }
    }
    new PrintStream(fout).println("Total broken Links = " + invalidLink);
    new PrintStream(fout).println(" ");
fout.close();
    System.out.println(currentLink);
    System.out.println(temp);
}
public static int getResponseCode(String urlString) throws MalformedURLException, IOException {
    URL u = new URL(urlString); 
    HttpURLConnection huc =  (HttpURLConnection)  u.openConnection(); 
    huc.setRequestMethod("GET"); 
    huc.connect(); 
    return huc.getResponseCode();
}


public void tearDown()
{
selenium.close();
selenium.stop();
}
    
}

The above script will identify all the broken links(if any) in yahoo.com and store the 404 URLs in a notepad file called broken_links.txt. If you want to check the broken links for N number of URLs, you can pass the parameters through Data Provider concept or Excel sheet using JXL package.

No comments:

Post a Comment

Angular JS Protractor Installation process - Tutorial Part 1

                     Protractor, formally known as E2E testing framework, is an open source functional automation framework designed spe...