- Prev Class
 - Next Class
 
org.apache.nutch.parse.js
Class JSParseFilter
- java.lang.Object
 - org.apache.nutch.parse.js.JSParseFilter
 
- All Implemented Interfaces:
 - org.apache.hadoop.conf.Configurable, HtmlParseFilter, Parser, Pluggable
 
public class JSParseFilter extends Object implements HtmlParseFilter, Parser
This class is a heuristic link extractor for JavaScript files and code snippets. The general idea of a two-pass regex matching comes from Heritrix. Parts of the code come from OutlinkExtractor.java
Field Summary
 Fields   Modifier and Type Field and Description   static org.slf4j.Logger LOG   
-    
Fields inherited from interface org.apache.nutch.parse.HtmlParseFilter
 X_POINT_ID   
-    
Fields inherited from interface org.apache.nutch.parse.Parser
 X_POINT_ID      
Constructor Summary
 Constructors   Constructor and Description   JSParseFilter()   
Method Summary
 Methods   Modifier and Type Method and Description   ParseResult filter(Content content,
      ParseResult parseResult,
      HTMLMetaTags metaTags,
      DocumentFragment doc) 
Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page.
    org.apache.hadoop.conf.Configuration getConf()    ParseResult getParse(Content c) 
 This method parses the given content and returns a map of static void main(String[] args)    void setConf(org.apache.hadoop.conf.Configuration conf)   
-    
Methods inherited from class java.lang.Object
 clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait     
Field Detail
-  
LOG
public static final org.slf4j.Logger LOG
Constructor Detail
-  
JSParseFilter
public JSParseFilter()
Method Detail
-  
filter
public ParseResult filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)
Description copied from interface: HtmlParseFilter
Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page.
  - Specified by: 
  - <code>filter</code> in interface <code>HtmlParseFilter</code>        
-  
getParse
public ParseResult getParse(Content c)
Description copied from interface: Parser
 This method parses the given content and returns a map of Parse instances will be persisted under the given key. 
Note: Meta-redirects should be followed only when they are coming from the original URL. That is:
 Assume fetcher is in parsing mode and is currently processing foo.bar.com/redirect.html. If this url contains a meta redirect to another url, fetcher should only follow the redirect if the map contains an entry of the form <"foo.bar.com/redirect.html", Parse with a ParseStatus indicating the redirect>. 
  - Specified by: 
  - <code>getParse</code> in interface <code>Parser</code> 
  - Parameters:
  - <code>c</code> - Content to be parsed 
  - Returns:
  - a map containing <key, parse=""> pairs       
-  
main
public static void main(String[] args) throws Exception
  - Throws: 
  - <code>Exception</code>       
-  
setConf
public void setConf(org.apache.hadoop.conf.Configuration conf)
  - Specified by: 
  - <code>setConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>        
-  
getConf
public org.apache.hadoop.conf.Configuration getConf()
  - Specified by: 
  - <code>getConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>       
- Prev Class
 - Next Class
 
