- Prev Class
 - Next Class
 
org.apache.nutch.urlfilter.regex
Class RegexURLFilter
- java.lang.Object
 - org.apache.nutch.urlfilter.api.RegexURLFilterBase
 - org.apache.nutch.urlfilter.regex.RegexURLFilter
 
public class RegexURLFilter extends RegexURLFilterBase
Filters URLs based on a file of regular expressions using the Java Regex implementation.
Field Summary
 Fields   Modifier and Type Field and Description   static String URLFILTER_REGEX_FILE    static String URLFILTER_REGEX_RULES   
-    
Fields inherited from interface org.apache.nutch.net.URLFilter
 X_POINT_ID      
Constructor Summary
 Constructors   Constructor and Description   RegexURLFilter()    RegexURLFilter(String filename)   
Method Summary
 Methods   Modifier and Type Method and Description   protected RegexRule createRule(boolean sign,
          String regex) 
Creates a new RegexRule.
    protected Reader getRulesReader(org.apache.hadoop.conf.Configuration conf) 
Rules specified as a config property will override rules specified as a config file.
    static void main(String[] args)   
-    
Methods inherited from class org.apache.nutch.urlfilter.api.RegexURLFilterBase
 filter, getConf, main, setConf   
-    
Methods inherited from class java.lang.Object
 clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait     
Field Detail
-  
URLFILTER_REGEX_FILE
public static final String URLFILTER_REGEX_FILE
  - See Also:
  - [Constant Field Values](../../../../../constant-values.html#org.apache.nutch.urlfilter.regex.RegexURLFilter.URLFILTER_REGEX_FILE)       
-  
URLFILTER_REGEX_RULES
public static final String URLFILTER_REGEX_RULES
  - See Also:
  - [Constant Field Values](../../../../../constant-values.html#org.apache.nutch.urlfilter.regex.RegexURLFilter.URLFILTER_REGEX_RULES)       
Constructor Detail
-  
RegexURLFilter
public RegexURLFilter()
-  
RegexURLFilter
public RegexURLFilter(String filename) throws IOException, PatternSyntaxException
  - Throws: 
  - <code>IOException</code> 
  - <code>PatternSyntaxException</code>       
Method Detail
-  
getRulesReader
protected Reader getRulesReader(org.apache.hadoop.conf.Configuration conf) throws IOException
Rules specified as a config property will override rules specified as a config file.
  - Specified by: 
  - <code>getRulesReader</code> in class <code>RegexURLFilterBase</code> 
  - Parameters:
  - <code>conf</code> - is the current configuration. 
  - Returns:
  - the name of the resource containing the rules to use. 
  - Throws: 
  - <code>IOException</code>       
-  
createRule
protected RegexRule createRule(boolean sign, String regex)
Description copied from class: RegexURLFilterBase
Creates a new RegexRule.
  - Specified by: 
  - <code>createRule</code> in class <code>RegexURLFilterBase</code> 
  - Parameters:
  - <code>sign</code> - of the regular expression. A <code>true</code> value means that any URL matching this rule must be included, whereas a <code>false</code> value means that any URL matching this rule must be excluded.
  - <code>regex</code> - is the regular expression associated to this rule.       
-  
main
public static void main(String[] args) throws IOException
  - Throws: 
  - <code>IOException</code>      
   
- Prev Class
 - Next Class
 
