[TOC]

org.apache.nutch.crawl

Class Generator.CrawlDbUpdater

  • java.lang.Object
    • org.apache.hadoop.mapred.MapReduceBase
    • org.apache.nutch.crawl.Generator.CrawlDbUpdater
    • All Implemented Interfaces:
    • Closeable, AutoCloseable, org.apache.hadoop.mapred.JobConfigurable, org.apache.hadoop.mapred.Mapper, org.apache.hadoop.mapred.Reducer
    • Enclosing class:
    • Generator

public static class Generator.CrawlDbUpdater
extends org.apache.hadoop.mapred.MapReduceBase
implements org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.Text,CrawlDatum,org.apache.hadoop.io.Text,CrawlDatum>, org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.Text,CrawlDatum,org.apache.hadoop.io.Text,CrawlDatum>

Update the CrawlDB so that the next generate won't include the same URLs.

Constructor Summary

Constructors Constructor and Description Generator.CrawlDbUpdater()

Method Summary

Methods Modifier and Type Method and Description void configure(org.apache.hadoop.mapred.JobConf job) void map(org.apache.hadoop.io.Text key, CrawlDatum value, org.apache.hadoop.mapred.OutputCollector output, org.apache.hadoop.mapred.Reporter reporter) void reduce(org.apache.hadoop.io.Text key, Iterator values, org.apache.hadoop.mapred.OutputCollector output, org.apache.hadoop.mapred.Reporter reporter)

-    

Methods inherited from class org.apache.hadoop.mapred.MapReduceBase

close

-    

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

-    

Methods inherited from interface java.io.Closeable

close

Constructor Detail

-  

Generator.CrawlDbUpdater

public Generator.CrawlDbUpdater()

Method Detail

-  

configure

public void configure(org.apache.hadoop.mapred.JobConf job)
  - Specified by: 
  - <code>configure</code> in interface <code>org.apache.hadoop.mapred.JobConfigurable</code> 
  - Overrides: 
  - <code>configure</code> in class <code>org.apache.hadoop.mapred.MapReduceBase</code>        
-  

map

public void map(org.apache.hadoop.io.Text key,
       CrawlDatum value,
       org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,CrawlDatum> output,
       org.apache.hadoop.mapred.Reporter reporter)
         throws IOException
  - Specified by: 
  - <code>map</code> in interface <code>org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.text,crawldatum,org.apache.hadoop.io.text,crawldatum></org.apache.hadoop.io.text,crawldatum,org.apache.hadoop.io.text,crawldatum></code> 
  - Throws: 
  - <code>IOException</code>       
-  

reduce

public void reduce(org.apache.hadoop.io.Text key,
          Iterator<CrawlDatum> values,
          org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,CrawlDatum> output,
          org.apache.hadoop.mapred.Reporter reporter)
            throws IOException
  - Specified by: 
  - <code>reduce</code> in interface <code>org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.text,crawldatum,org.apache.hadoop.io.text,crawldatum></org.apache.hadoop.io.text,crawldatum,org.apache.hadoop.io.text,crawldatum></code> 
  - Throws: 
  - <code>IOException</code>      

Copyright © 2014 The Apache Software Foundation