Wednesday, 29 October 2014

Automating CAPTCHA using selenium webdriver

The full form of CAPTCHA is - "Completely Automated Public Turing test to tell Computers and Humans Apart".

A CAPTCHA is a program that  protects  websites against bots  by generating and grading tests that humans can pass but current computer programs cannot.

Captchas are not brakeable but there are some third party captchas that can be breakable and one of the example for it is "jQuery Real Person" captcha. 

It is possible to bypass the captcha on the JQuery-Real-Person plugin to perform a brute force attack.

There is associated parameter with each image, to checkout the characters introduced by the user. But there is not a good chek to assure that the
characteres introduced are the characters shown on the picture.

Therefore we can just choose a pair of parameter and characters and use them in all the request to the web server.

The name of the parameter that determinate the captcha image is "value".
   
Example: The captcha image shown in the example is JYYBME and we use "Inspect Element" on Google Chorme or Firebug on Firefox to search this
line in the code:
  
<input type="hidden" class="realperson-hash" name="defaultRealHash" *
value="-1158072107"*>
  
In this case we already know a valid pair of parameter and characters that we can use to perform a brute force attack bypassing the captcha restriction.

JYYBME ----> *-1158072107*

We can generate as many valid pairs as we want but only one is necessary to perform the brute force attack.


It does not matters that the captcha does not show the characters that we type because the check is done through the value parameter so we just need to type one valid pair of parameter and characters.

The below example illustrates how to break captcha with jQuery Real Person plugin.

import org.openqa.selenium.By;
import org.openqa.selenium.JavascriptExecutor;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;

 public class Sample { 
  
  static WebDriver driver;
  
 public static void main(String args[]){ 
 try{
  driver = new FirefoxDriver();
  //Loading jQuery Real Person Captcha demonstration page
  driver.get("http://keith-wood.name/realPerson.html");
  Thread.sleep(2000);
  JavascriptExecutor js = (JavascriptExecutor) driver;
  //Setting the captcha values
  js.executeScript("document.getElementsByName('defaultRealHash')[0].setAttribute('value', '-897204064')");
  driver.findElement(By.name("defaultReal")).sendKeys("QNXCUL");
  //Submit the form
  driver.findElement(By.xpath(".//*[@id='default']/form/p[2]/input")).submit(); 
 }
 catch(Exception e){
 //gulp the exception
 }
 }

}

Below are some of the workarounds that we can do to handle captchas in testing scenarios:
  • Captcha is build to avoid automation. But if this is some kind of blocking your testing in QA environment then there is a way to do it. Developers are generating captcha code and display as image. This generated captch code might be stored somewhere in database. Ask your developer the db detail of for storing captcha code and get the code from there and validate on the front.

  • You can ask your development team set a default password/captcha Which you can use to automate in order to check if the flow works fine.Beaware that it is not going to be a test to test Captcha works as such but to check if the flow/scenario that includes captcha pre & pro works accurate.


Tuesday, 28 October 2014

Extracting text from PDF files using Selenium + PDF Box

In many production environments, PDF files need to be checked before going to print  or send to customer in order to avoid Legal issues and costly reprints.This PDF files cannot be read by using Selenium. So, here we use PDFBOX, which is third party jar file that reads data from PDF Files. The below example illustrates how to read PDF file by opening them in the browser. To work with this, add the below jar file in classpath of eclipse along with selenium webdriver.
pdfbox-app-1.8.3.jar


import java.io.BufferedInputStream;
import java.io.IOException;
import java.net.URL;
import java.util.concurrent.TimeUnit;
import org.apache.pdfbox.pdfparser.PDFParser;
import org.apache.pdfbox.util.PDFTextStripper;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;

public class Sample {

 Static WebDriver driver;

   public Static void main(String args[]) throws IOException{
 try{
    // Proxy has to be set if we working under any firewal 
   System.setProperty("http.proxyHost", "proxyname.com");
System.setProperty("http.proxyPort", "portnumber");
System.setProperty("https.proxyHost", "proxyname.com");
System.setProperty("https.proxyPort", "portnumber");
driver = new FirefoxDriver();
 driver.get("http://keith-wood.name/realPerson.html");
 driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);
 URL url = new URL(driver.getCurrentUrl()); 
 BufferedInputStream fileToParse=new BufferedInputStream(url.openStream());

 //parse()  --  This will parse the stream and populate the COSDocument object. 
 //COSDocument object --  This is the in-memory representation of the PDF document

 PDFParser parser = new PDFParser(fileToParse);
 parser.parse();

 //getPDDocument() -- This will get the PD document that was parsed. When you are done with this document you must call    close() on it to release resources
 //PDFTextStripper() -- This class will take a pdf document and strip out all of the text and ignore the formatting and such.

 String output=new PDFTextStripper().getText(parser.getPDDocument());
 System.out.println(output);
 parser.getPDDocument().close(); 
 driver.manage().timeouts().implicitlyWait(100, TimeUnit.SECONDS);
 }
 catch(Exception e){
 System.out.println(e.getMessage());
 }
  }

}