[TOC]

org.apache.nutch.crawl

Class CrawlDb

  • java.lang.Object
    • org.apache.hadoop.conf.Configured
    • org.apache.nutch.crawl.CrawlDb
    • All Implemented Interfaces:
    • org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

public class CrawlDb
extends org.apache.hadoop.conf.Configured
implements org.apache.hadoop.util.Tool

This class takes the output of the fetcher and updates the crawldb accordingly.

Field Summary

Fields Modifier and Type Field and Description static String CRAWLDB_ADDITIONS_ALLOWED static String CRAWLDB_PURGE_404 static String CURRENT_NAME static String LOCK_NAME static org.slf4j.Logger LOG

Constructor Summary

Constructors Constructor and Description CrawlDb() CrawlDb(org.apache.hadoop.conf.Configuration conf)

Method Summary

Methods Modifier and Type Method and Description static org.apache.hadoop.mapred.JobConf createJob(org.apache.hadoop.conf.Configuration config, org.apache.hadoop.fs.Path crawlDb) static void install(org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.fs.Path crawlDb) static void main(String[] args) int run(String[] args) void update(org.apache.hadoop.fs.Path crawlDb, org.apache.hadoop.fs.Path[] segments, boolean normalize, boolean filter) void update(org.apache.hadoop.fs.Path crawlDb, org.apache.hadoop.fs.Path[] segments, boolean normalize, boolean filter, boolean additionsAllowed, boolean force)

-    

Methods inherited from class org.apache.hadoop.conf.Configured

getConf, setConf

-    

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

-    

Methods inherited from interface org.apache.hadoop.conf.Configurable

getConf, setConf

Field Detail

-  

LOG

public static final org.slf4j.Logger LOG
-  

CRAWLDB_ADDITIONS_ALLOWED

public static final String CRAWLDB_ADDITIONS_ALLOWED
  - See Also:
  - [Constant Field Values](../../../../constant-values.html#org.apache.nutch.crawl.CrawlDb.CRAWLDB_ADDITIONS_ALLOWED)       
-  

CRAWLDB_PURGE_404

public static final String CRAWLDB_PURGE_404
  - See Also:
  - [Constant Field Values](../../../../constant-values.html#org.apache.nutch.crawl.CrawlDb.CRAWLDB_PURGE_404)       
-  

CURRENT_NAME

public static final String CURRENT_NAME
  - See Also:
  - [Constant Field Values](../../../../constant-values.html#org.apache.nutch.crawl.CrawlDb.CURRENT_NAME)       
-  

LOCK_NAME

public static final String LOCK_NAME
  - See Also:
  - [Constant Field Values](../../../../constant-values.html#org.apache.nutch.crawl.CrawlDb.LOCK_NAME)       

Constructor Detail

-  

CrawlDb

public CrawlDb()
-  

CrawlDb

public CrawlDb(org.apache.hadoop.conf.Configuration conf)

Method Detail

-  

update

public void update(org.apache.hadoop.fs.Path crawlDb,
          org.apache.hadoop.fs.Path[] segments,
          boolean normalize,
          boolean filter)
            throws IOException
  - Throws: 
  - <code>IOException</code>       
-  

update

public void update(org.apache.hadoop.fs.Path crawlDb,
          org.apache.hadoop.fs.Path[] segments,
          boolean normalize,
          boolean filter,
          boolean additionsAllowed,
          boolean force)
            throws IOException
  - Throws: 
  - <code>IOException</code>       
-  

createJob

public static org.apache.hadoop.mapred.JobConf createJob(org.apache.hadoop.conf.Configuration config,
                                         org.apache.hadoop.fs.Path crawlDb)
                                                  throws IOException
  - Throws: 
  - <code>IOException</code>       
-  

install

public static void install(org.apache.hadoop.mapred.JobConf job,
           org.apache.hadoop.fs.Path crawlDb)
                    throws IOException
  - Throws: 
  - <code>IOException</code>       
-  

main

public static void main(String[] args)
                 throws Exception
  - Throws: 
  - <code>Exception</code>       
-  

run

public int run(String[] args)
        throws Exception
  - Specified by: 
  - <code>run</code> in interface <code>org.apache.hadoop.util.Tool</code> 
  - Throws: 
  - <code>Exception</code>      

Copyright © 2014 The Apache Software Foundation