org.apache.nutch.crawl
Class LinkDbFilter
- java.lang.Object
- org.apache.nutch.crawl.LinkDbFilter
- All Implemented Interfaces:
- Closeable, AutoCloseable, org.apache.hadoop.mapred.JobConfigurable, org.apache.hadoop.mapred.Mapper
public class LinkDbFilter extends Object implements org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.Text,Inlinks,org.apache.hadoop.io.Text,Inlinks>
This class provides a way to separate the URL normalization and filtering steps from the rest of LinkDb manipulation code.
- Author:
- Andrzej Bialecki
Field Summary
Fields Modifier and Type Field and Description static org.slf4j.Logger
LOG
static String
URL_FILTERING
static String
URL_NORMALIZING
static String
URL_NORMALIZING_SCOPE
Constructor Summary
Constructors Constructor and Description LinkDbFilter()
Method Summary
Methods Modifier and Type Method and Description void
close()
void
configure(org.apache.hadoop.mapred.JobConf job)
void
map(org.apache.hadoop.io.Text key,
Inlinks value,
org.apache.hadoop.mapred.OutputCollector
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Field Detail
-
URL_FILTERING
public static final String URL_FILTERING
- See Also:
- [Constant Field Values](../../../../constant-values.html#org.apache.nutch.crawl.LinkDbFilter.URL_FILTERING)
-
URL_NORMALIZING
public static final String URL_NORMALIZING
- See Also:
- [Constant Field Values](../../../../constant-values.html#org.apache.nutch.crawl.LinkDbFilter.URL_NORMALIZING)
-
URL_NORMALIZING_SCOPE
public static final String URL_NORMALIZING_SCOPE
- See Also:
- [Constant Field Values](../../../../constant-values.html#org.apache.nutch.crawl.LinkDbFilter.URL_NORMALIZING_SCOPE)
-
LOG
public static final org.slf4j.Logger LOG
Constructor Detail
-
LinkDbFilter
public LinkDbFilter()
Method Detail
-
configure
public void configure(org.apache.hadoop.mapred.JobConf job)
- Specified by:
- <code>configure</code> in interface <code>org.apache.hadoop.mapred.JobConfigurable</code>
-
close
public void close()
- Specified by:
- <code>close</code> in interface <code>Closeable</code>
- Specified by:
- <code>close</code> in interface <code>AutoCloseable</code>
-
map
public void map(org.apache.hadoop.io.Text key, Inlinks value, org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,Inlinks> output, org.apache.hadoop.mapred.Reporter reporter) throws IOException
- Specified by:
- <code>map</code> in interface <code>org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.text,inlinks,org.apache.hadoop.io.text,inlinks></org.apache.hadoop.io.text,inlinks,org.apache.hadoop.io.text,inlinks></code>
- Throws:
- <code>IOException</code>