[TOC]

  • Prev Class
  • Next Class

org.apache.nutch.urlfilter.automaton

Class AutomatonURLFilter


public class AutomatonURLFilter
extends RegexURLFilterBase

RegexURLFilterBase implementation based on the dk.brics.automaton Finite-State Automata for JavaTM.

Field Summary

Fields Modifier and Type Field and Description static String URLFILTER_AUTOMATON_FILE static String URLFILTER_AUTOMATON_RULES

-    

Fields inherited from interface org.apache.nutch.net.URLFilter

X_POINT_ID

Constructor Summary

Constructors Constructor and Description AutomatonURLFilter() AutomatonURLFilter(String filename)

Method Summary

Methods Modifier and Type Method and Description protected RegexRule createRule(boolean sign, String regex) Creates a new RegexRule. protected Reader getRulesReader(org.apache.hadoop.conf.Configuration conf) Rules specified as a config property will override rules specified as a config file. static void main(String[] args)

-    

Methods inherited from class org.apache.nutch.urlfilter.api.RegexURLFilterBase

filter, getConf, main, setConf

-    

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

-  

URLFILTER_AUTOMATON_FILE

public static final String URLFILTER_AUTOMATON_FILE
  - See Also:
  - [Constant Field Values](../../../../../constant-values.html#org.apache.nutch.urlfilter.automaton.AutomatonURLFilter.URLFILTER_AUTOMATON_FILE)       
-  

URLFILTER_AUTOMATON_RULES

public static final String URLFILTER_AUTOMATON_RULES
  - See Also:
  - [Constant Field Values](../../../../../constant-values.html#org.apache.nutch.urlfilter.automaton.AutomatonURLFilter.URLFILTER_AUTOMATON_RULES)       

Constructor Detail

-  

AutomatonURLFilter

public AutomatonURLFilter()
-  

AutomatonURLFilter

public AutomatonURLFilter(String filename)
                   throws IOException,
                          PatternSyntaxException
  - Throws: 
  - <code>IOException</code> 
  - <code>PatternSyntaxException</code>       

Method Detail

-  

getRulesReader

protected Reader getRulesReader(org.apache.hadoop.conf.Configuration conf)
                         throws IOException

Rules specified as a config property will override rules specified as a config file.

  - Specified by: 
  - <code>getRulesReader</code> in class <code>RegexURLFilterBase</code> 
  - Parameters:
  - <code>conf</code> - is the current configuration. 
  - Returns:
  - the name of the resource containing the rules to use. 
  - Throws: 
  - <code>IOException</code>       
-  

createRule

protected RegexRule createRule(boolean sign,
                   String regex)

Description copied from class: RegexURLFilterBase

Creates a new RegexRule.

  - Specified by: 
  - <code>createRule</code> in class <code>RegexURLFilterBase</code> 
  - Parameters:
  - <code>sign</code> - of the regular expression. A <code>true</code> value means that any URL matching this rule must be included, whereas a <code>false</code> value means that any URL matching this rule must be excluded.
  - <code>regex</code> - is the regular expression associated to this rule.       
-  

main

public static void main(String[] args)
                 throws IOException
  - Throws: 
  - <code>IOException</code>      

  • Prev Class
  • Next Class

Copyright © 2014 The Apache Software Foundation