- Prev Class
- Next Class
org.apache.nutch.urlfilter.regex
Class RegexURLFilter
- java.lang.Object
- org.apache.nutch.urlfilter.api.RegexURLFilterBase
- org.apache.nutch.urlfilter.regex.RegexURLFilter
public class RegexURLFilter extends RegexURLFilterBase
Filters URLs based on a file of regular expressions using the Java Regex implementation
.
Field Summary
Fields Modifier and Type Field and Description static String
URLFILTER_REGEX_FILE
static String
URLFILTER_REGEX_RULES
-
Fields inherited from interface org.apache.nutch.net.URLFilter
X_POINT_ID
Constructor Summary
Constructors Constructor and Description RegexURLFilter()
RegexURLFilter(String filename)
Method Summary
Methods Modifier and Type Method and Description protected RegexRule
createRule(boolean sign,
String regex)
Creates a new RegexRule
.
protected Reader
getRulesReader(org.apache.hadoop.conf.Configuration conf)
Rules specified as a config property will override rules specified as a config file.
static void
main(String[] args)
-
Methods inherited from class org.apache.nutch.urlfilter.api.RegexURLFilterBase
filter, getConf, main, setConf
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Field Detail
-
URLFILTER_REGEX_FILE
public static final String URLFILTER_REGEX_FILE
- See Also:
- [Constant Field Values](../../../../../constant-values.html#org.apache.nutch.urlfilter.regex.RegexURLFilter.URLFILTER_REGEX_FILE)
-
URLFILTER_REGEX_RULES
public static final String URLFILTER_REGEX_RULES
- See Also:
- [Constant Field Values](../../../../../constant-values.html#org.apache.nutch.urlfilter.regex.RegexURLFilter.URLFILTER_REGEX_RULES)
Constructor Detail
-
RegexURLFilter
public RegexURLFilter()
-
RegexURLFilter
public RegexURLFilter(String filename) throws IOException, PatternSyntaxException
- Throws:
- <code>IOException</code>
- <code>PatternSyntaxException</code>
Method Detail
-
getRulesReader
protected Reader getRulesReader(org.apache.hadoop.conf.Configuration conf) throws IOException
Rules specified as a config property will override rules specified as a config file.
- Specified by:
- <code>getRulesReader</code> in class <code>RegexURLFilterBase</code>
- Parameters:
- <code>conf</code> - is the current configuration.
- Returns:
- the name of the resource containing the rules to use.
- Throws:
- <code>IOException</code>
-
createRule
protected RegexRule createRule(boolean sign, String regex)
Description copied from class: RegexURLFilterBase
Creates a new RegexRule
.
- Specified by:
- <code>createRule</code> in class <code>RegexURLFilterBase</code>
- Parameters:
- <code>sign</code> - of the regular expression. A <code>true</code> value means that any URL matching this rule must be included, whereas a <code>false</code> value means that any URL matching this rule must be excluded.
- <code>regex</code> - is the regular expression associated to this rule.
-
main
public static void main(String[] args) throws IOException
- Throws:
- <code>IOException</code>
- Prev Class
- Next Class