org.xenbase.scraper
Class Scraper_DevDyn

java.lang.Object
  extended by org.xenbase.scraper.BasicScraper
      extended by org.xenbase.scraper.Scraper_DevDyn

public class Scraper_DevDyn
extends BasicScraper


Constructor Summary
Scraper_DevDyn()
           
 
Method Summary
 java.lang.String getRedirURL(java.lang.String url)
          Because we are using URLs from pubmed and because each journal publisher's website is different, we need to go through a series of HTTP 301 redirects, then search the resulting page to find the URL of the full article.
 ScrapedData scrape(java.lang.String url)
          This is the actual function that takes the URL (produced by getRedirURL) and returns the images and captions of that article.
 
Methods inherited from class org.xenbase.scraper.BasicScraper
getData
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Scraper_DevDyn

public Scraper_DevDyn()
Method Detail

getRedirURL

public java.lang.String getRedirURL(java.lang.String url)
                             throws java.lang.Exception,
                                    java.lang.Error
Description copied from class: BasicScraper
Because we are using URLs from pubmed and because each journal publisher's website is different, we need to go through a series of HTTP 301 redirects, then search the resulting page to find the URL of the full article. Because each publisher website is different, this function needs to be unique for each journal publisher website.

Specified by:
getRedirURL in class BasicScraper
Parameters:
url - - URL to full article from PubMed
Returns:
String - Containing actual URL of full journal article
Throws:
java.lang.Exception
java.lang.Error

scrape

public ScrapedData scrape(java.lang.String url)
                   throws java.lang.Exception,
                          java.lang.Error
Description copied from class: BasicScraper
This is the actual function that takes the URL (produced by getRedirURL) and returns the images and captions of that article. This is the core of the scraper, and obviously each webpage is different, and so different string parsing is done for different journals.

Specified by:
scrape in class BasicScraper
Returns:
ScrapedData - The Object containing all the images and captions
Throws:
java.lang.Exception
java.lang.Error