[TOC]

  • Prev Class
  • Next Class

org.apache.nutch.urlfilter.regex

Class RegexURLFilter


public class RegexURLFilter
extends RegexURLFilterBase

Filters URLs based on a file of regular expressions using the Java Regex implementation.

Field Summary

Fields Modifier and Type Field and Description static String URLFILTER_REGEX_FILE static String URLFILTER_REGEX_RULES

-    

Fields inherited from interface org.apache.nutch.net.URLFilter

X_POINT_ID

Constructor Summary

Constructors Constructor and Description RegexURLFilter() RegexURLFilter(String filename)

Method Summary

Methods Modifier and Type Method and Description protected RegexRule createRule(boolean sign, String regex) Creates a new RegexRule. protected Reader getRulesReader(org.apache.hadoop.conf.Configuration conf) Rules specified as a config property will override rules specified as a config file. static void main(String[] args)

-    

Methods inherited from class org.apache.nutch.urlfilter.api.RegexURLFilterBase

filter, getConf, main, setConf

-    

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

-  

URLFILTER_REGEX_FILE

public static final String URLFILTER_REGEX_FILE
  - See Also:
  - [Constant Field Values](../../../../../constant-values.html#org.apache.nutch.urlfilter.regex.RegexURLFilter.URLFILTER_REGEX_FILE)       
-  

URLFILTER_REGEX_RULES

public static final String URLFILTER_REGEX_RULES
  - See Also:
  - [Constant Field Values](../../../../../constant-values.html#org.apache.nutch.urlfilter.regex.RegexURLFilter.URLFILTER_REGEX_RULES)       

Constructor Detail

-  

RegexURLFilter

public RegexURLFilter()
-  

RegexURLFilter

public RegexURLFilter(String filename)
               throws IOException,
                      PatternSyntaxException
  - Throws: 
  - <code>IOException</code> 
  - <code>PatternSyntaxException</code>       

Method Detail

-  

getRulesReader

protected Reader getRulesReader(org.apache.hadoop.conf.Configuration conf)
                         throws IOException

Rules specified as a config property will override rules specified as a config file.

  - Specified by: 
  - <code>getRulesReader</code> in class <code>RegexURLFilterBase</code> 
  - Parameters:
  - <code>conf</code> - is the current configuration. 
  - Returns:
  - the name of the resource containing the rules to use. 
  - Throws: 
  - <code>IOException</code>       
-  

createRule

protected RegexRule createRule(boolean sign,
                   String regex)

Description copied from class: RegexURLFilterBase

Creates a new RegexRule.

  - Specified by: 
  - <code>createRule</code> in class <code>RegexURLFilterBase</code> 
  - Parameters:
  - <code>sign</code> - of the regular expression. A <code>true</code> value means that any URL matching this rule must be included, whereas a <code>false</code> value means that any URL matching this rule must be excluded.
  - <code>regex</code> - is the regular expression associated to this rule.       
-  

main

public static void main(String[] args)
                 throws IOException
  - Throws: 
  - <code>IOException</code>      

  • Prev Class
  • Next Class

Copyright © 2014 The Apache Software Foundation