
Class CleaningJob

    • All Implemented Interfaces:
    • org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

public class CleaningJob
extends Object
implements org.apache.hadoop.util.Tool

The class scans CrawlDB looking for entries with status DB_GONE (404) or DB_DUPLICATE and sends delete requests to indexers for those documents.

Nested Class Summary

Nested Classes Modifier and Type Class and Description static class CleaningJob.DBFilter static class CleaningJob.DeleterReducer

Field Summary

Fields Modifier and Type Field and Description static org.slf4j.Logger LOG

Constructor Summary

Constructors Constructor and Description CleaningJob()

Method Summary

Methods Modifier and Type Method and Description void delete(String crawldb, boolean noCommit) org.apache.hadoop.conf.Configuration getConf() static void main(String[] args) int run(String[] args) void setConf(org.apache.hadoop.conf.Configuration conf)


Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail



public static final org.slf4j.Logger LOG

Constructor Detail



public CleaningJob()

Method Detail



public org.apache.hadoop.conf.Configuration getConf()
  - Specified by: 
  - <code>getConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>        


public void setConf(org.apache.hadoop.conf.Configuration conf)
  - Specified by: 
  - <code>setConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>        


public void delete(String crawldb,
          boolean noCommit)
            throws IOException
  - Throws: 
  - <code>IOException</code>       


public int run(String[] args)
        throws IOException
  - Specified by: 
  - <code>run</code> in interface <code>org.apache.hadoop.util.Tool</code> 
  - Throws: 
  - <code>IOException</code>       


public static void main(String[] args)
                 throws Exception
  - Throws: 
  - <code>Exception</code>      

Copyright © 2014 The Apache Software Foundation