[TOC]

org.apache.nutch.crawl

Class LinkDbFilter

    • All Implemented Interfaces:
    • Closeable, AutoCloseable, org.apache.hadoop.mapred.JobConfigurable, org.apache.hadoop.mapred.Mapper

public class LinkDbFilter
extends Object
implements org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.Text,Inlinks,org.apache.hadoop.io.Text,Inlinks>

This class provides a way to separate the URL normalization and filtering steps from the rest of LinkDb manipulation code.

  • Author:
  • Andrzej Bialecki

Field Summary

Fields Modifier and Type Field and Description static org.slf4j.Logger LOG static String URL_FILTERING static String URL_NORMALIZING static String URL_NORMALIZING_SCOPE

Constructor Summary

Constructors Constructor and Description LinkDbFilter()

Method Summary

Methods Modifier and Type Method and Description void close() void configure(org.apache.hadoop.mapred.JobConf job) void map(org.apache.hadoop.io.Text key, Inlinks value, org.apache.hadoop.mapred.OutputCollector output, org.apache.hadoop.mapred.Reporter reporter)

-    

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

-  

URL_FILTERING

public static final String URL_FILTERING
  - See Also:
  - [Constant Field Values](../../../../constant-values.html#org.apache.nutch.crawl.LinkDbFilter.URL_FILTERING)       
-  

URL_NORMALIZING

public static final String URL_NORMALIZING
  - See Also:
  - [Constant Field Values](../../../../constant-values.html#org.apache.nutch.crawl.LinkDbFilter.URL_NORMALIZING)       
-  

URL_NORMALIZING_SCOPE

public static final String URL_NORMALIZING_SCOPE
  - See Also:
  - [Constant Field Values](../../../../constant-values.html#org.apache.nutch.crawl.LinkDbFilter.URL_NORMALIZING_SCOPE)       
-  

LOG

public static final org.slf4j.Logger LOG

Constructor Detail

-  

LinkDbFilter

public LinkDbFilter()

Method Detail

-  

configure

public void configure(org.apache.hadoop.mapred.JobConf job)
  - Specified by: 
  - <code>configure</code> in interface <code>org.apache.hadoop.mapred.JobConfigurable</code>        
-  

close

public void close()
  - Specified by: 
  - <code>close</code> in interface <code>Closeable</code> 
  - Specified by: 
  - <code>close</code> in interface <code>AutoCloseable</code>        
-  

map

public void map(org.apache.hadoop.io.Text key,
       Inlinks value,
       org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,Inlinks> output,
       org.apache.hadoop.mapred.Reporter reporter)
         throws IOException
  - Specified by: 
  - <code>map</code> in interface <code>org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.text,inlinks,org.apache.hadoop.io.text,inlinks></org.apache.hadoop.io.text,inlinks,org.apache.hadoop.io.text,inlinks></code> 
  - Throws: 
  - <code>IOException</code>      

Copyright © 2014 The Apache Software Foundation