[TOC]

org.apache.nutch.indexer

Class CleaningJob

    • All Implemented Interfaces:
    • org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

public class CleaningJob
extends Object
implements org.apache.hadoop.util.Tool

The class scans CrawlDB looking for entries with status DB_GONE (404) or DB_DUPLICATE and sends delete requests to indexers for those documents.

Nested Class Summary

Nested Classes Modifier and Type Class and Description static class CleaningJob.DBFilter static class CleaningJob.DeleterReducer

Field Summary

Fields Modifier and Type Field and Description static org.slf4j.Logger LOG

Constructor Summary

Constructors Constructor and Description CleaningJob()

Method Summary

Methods Modifier and Type Method and Description void delete(String crawldb, boolean noCommit) org.apache.hadoop.conf.Configuration getConf() static void main(String[] args) int run(String[] args) void setConf(org.apache.hadoop.conf.Configuration conf)

-    

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

-  

LOG

public static final org.slf4j.Logger LOG

Constructor Detail

-  

CleaningJob

public CleaningJob()

Method Detail

-  

getConf

public org.apache.hadoop.conf.Configuration getConf()
  - Specified by: 
  - <code>getConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>        
-  

setConf

public void setConf(org.apache.hadoop.conf.Configuration conf)
  - Specified by: 
  - <code>setConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>        
-  

delete

public void delete(String crawldb,
          boolean noCommit)
            throws IOException
  - Throws: 
  - <code>IOException</code>       
-  

run

public int run(String[] args)
        throws IOException
  - Specified by: 
  - <code>run</code> in interface <code>org.apache.hadoop.util.Tool</code> 
  - Throws: 
  - <code>IOException</code>       
-  

main

public static void main(String[] args)
                 throws Exception
  - Throws: 
  - <code>Exception</code>      

Copyright © 2014 The Apache Software Foundation