[TOC]

org.apache.nutch.indexer

Class IndexerMapReduce

  • java.lang.Object
    • org.apache.hadoop.conf.Configured
    • org.apache.nutch.indexer.IndexerMapReduce
    • All Implemented Interfaces:
    • Closeable, AutoCloseable, org.apache.hadoop.conf.Configurable, org.apache.hadoop.mapred.JobConfigurable, org.apache.hadoop.mapred.Mapper, org.apache.hadoop.mapred.Reducer

public class IndexerMapReduce
extends org.apache.hadoop.conf.Configured
implements org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.Text,org.apache.hadoop.io.Writable,org.apache.hadoop.io.Text,NutchWritable>, org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.Text,NutchWritable,org.apache.hadoop.io.Text,NutchIndexAction>

Field Summary

Fields Modifier and Type Field and Description static String INDEXER_DELETE static String INDEXER_DELETE_ROBOTS_NOINDEX static String INDEXER_PARAMS static String INDEXER_SKIP_NOTMODIFIED static org.slf4j.Logger LOG static String URL_FILTERING static String URL_NORMALIZING

Constructor Summary

Constructors Constructor and Description IndexerMapReduce()

Method Summary

Methods Modifier and Type Method and Description void close() void configure(org.apache.hadoop.mapred.JobConf job) static void initMRJob(org.apache.hadoop.fs.Path crawlDb, org.apache.hadoop.fs.Path linkDb, Collection segments, org.apache.hadoop.mapred.JobConf job) void map(org.apache.hadoop.io.Text key, org.apache.hadoop.io.Writable value, org.apache.hadoop.mapred.OutputCollector output, org.apache.hadoop.mapred.Reporter reporter) void reduce(org.apache.hadoop.io.Text key, Iterator values, org.apache.hadoop.mapred.OutputCollector output, org.apache.hadoop.mapred.Reporter reporter)

-    

Methods inherited from class org.apache.hadoop.conf.Configured

getConf, setConf

-    

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

-  

LOG

public static final org.slf4j.Logger LOG
-  

INDEXER_PARAMS

public static final String INDEXER_PARAMS
  - See Also:
  - [Constant Field Values](../../../../constant-values.html#org.apache.nutch.indexer.IndexerMapReduce.INDEXER_PARAMS)       
-  

INDEXER_DELETE

public static final String INDEXER_DELETE
  - See Also:
  - [Constant Field Values](../../../../constant-values.html#org.apache.nutch.indexer.IndexerMapReduce.INDEXER_DELETE)       
-  

INDEXER_DELETE_ROBOTS_NOINDEX

public static final String INDEXER_DELETE_ROBOTS_NOINDEX
  - See Also:
  - [Constant Field Values](../../../../constant-values.html#org.apache.nutch.indexer.IndexerMapReduce.INDEXER_DELETE_ROBOTS_NOINDEX)       
-  

INDEXER_SKIP_NOTMODIFIED

public static final String INDEXER_SKIP_NOTMODIFIED
  - See Also:
  - [Constant Field Values](../../../../constant-values.html#org.apache.nutch.indexer.IndexerMapReduce.INDEXER_SKIP_NOTMODIFIED)       
-  

URL_FILTERING

public static final String URL_FILTERING
  - See Also:
  - [Constant Field Values](../../../../constant-values.html#org.apache.nutch.indexer.IndexerMapReduce.URL_FILTERING)       
-  

URL_NORMALIZING

public static final String URL_NORMALIZING
  - See Also:
  - [Constant Field Values](../../../../constant-values.html#org.apache.nutch.indexer.IndexerMapReduce.URL_NORMALIZING)       

Constructor Detail

-  

IndexerMapReduce

public IndexerMapReduce()

Method Detail

-  

configure

public void configure(org.apache.hadoop.mapred.JobConf job)
  - Specified by: 
  - <code>configure</code> in interface <code>org.apache.hadoop.mapred.JobConfigurable</code>        
-  

map

public void map(org.apache.hadoop.io.Text key,
       org.apache.hadoop.io.Writable value,
       org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,NutchWritable> output,
       org.apache.hadoop.mapred.Reporter reporter)
         throws IOException
  - Specified by: 
  - <code>map</code> in interface <code>org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.text,org.apache.hadoop.io.writable,org.apache.hadoop.io.text,nutchwritable></org.apache.hadoop.io.text,org.apache.hadoop.io.writable,org.apache.hadoop.io.text,nutchwritable></code> 
  - Throws: 
  - <code>IOException</code>       
-  

reduce

public void reduce(org.apache.hadoop.io.Text key,
          Iterator<NutchWritable> values,
          org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,NutchIndexAction> output,
          org.apache.hadoop.mapred.Reporter reporter)
            throws IOException
  - Specified by: 
  - <code>reduce</code> in interface <code>org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.text,nutchwritable,org.apache.hadoop.io.text,nutchindexaction></org.apache.hadoop.io.text,nutchwritable,org.apache.hadoop.io.text,nutchindexaction></code> 
  - Throws: 
  - <code>IOException</code>       
-  

close

public void close()
           throws IOException
  - Specified by: 
  - <code>close</code> in interface <code>Closeable</code> 
  - Specified by: 
  - <code>close</code> in interface <code>AutoCloseable</code> 
  - Throws: 
  - <code>IOException</code>       
-  

initMRJob

public static void initMRJob(org.apache.hadoop.fs.Path crawlDb,
             org.apache.hadoop.fs.Path linkDb,
             Collection<org.apache.hadoop.fs.Path> segments,
             org.apache.hadoop.mapred.JobConf job)

Copyright © 2014 The Apache Software Foundation