- Prev Class
- Next Class
org.apache.nutch.indexer
Class CleaningJob
- java.lang.Object
- org.apache.nutch.indexer.CleaningJob
- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool
public class CleaningJob extends Object implements org.apache.hadoop.util.Tool
The class scans CrawlDB looking for entries with status DB_GONE (404) or DB_DUPLICATE and sends delete requests to indexers for those documents.
Nested Class Summary
Nested Classes Modifier and Type Class and Description static class
CleaningJob.DBFilter
static class
CleaningJob.DeleterReducer
Field Summary
Fields Modifier and Type Field and Description static org.slf4j.Logger
LOG
Constructor Summary
Constructors Constructor and Description CleaningJob()
Method Summary
Methods Modifier and Type Method and Description void
delete(String crawldb,
boolean noCommit)
org.apache.hadoop.conf.Configuration
getConf()
static void
main(String[] args)
int
run(String[] args)
void
setConf(org.apache.hadoop.conf.Configuration conf)
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Field Detail
-
LOG
public static final org.slf4j.Logger LOG
Constructor Detail
-
CleaningJob
public CleaningJob()
Method Detail
-
getConf
public org.apache.hadoop.conf.Configuration getConf()
- Specified by:
- <code>getConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>
-
setConf
public void setConf(org.apache.hadoop.conf.Configuration conf)
- Specified by:
- <code>setConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>
-
delete
public void delete(String crawldb, boolean noCommit) throws IOException
- Throws:
- <code>IOException</code>
-
run
public int run(String[] args) throws IOException
- Specified by:
- <code>run</code> in interface <code>org.apache.hadoop.util.Tool</code>
- Throws:
- <code>IOException</code>
-
main
public static void main(String[] args) throws Exception
- Throws:
- <code>Exception</code>
- Prev Class
- Next Class