[TOC]

org.apache.nutch.crawl

Class CrawlDbReader


public class CrawlDbReader
extends Object
implements Closeable

Read utility for the CrawlDB.

  • Author:
  • Andrzej Bialecki

Nested Class Summary

Nested Classes Modifier and Type Class and Description static class CrawlDbReader.CrawlDatumCsvOutputFormat static class CrawlDbReader.CrawlDbDumpMapper static class CrawlDbReader.CrawlDbStatCombiner static class CrawlDbReader.CrawlDbStatMapper static class CrawlDbReader.CrawlDbStatReducer static class CrawlDbReader.CrawlDbTopNMapper static class CrawlDbReader.CrawlDbTopNReducer

Field Summary

Fields Modifier and Type Field and Description static org.slf4j.Logger LOG

Constructor Summary

Constructors Constructor and Description CrawlDbReader()

Method Summary

Methods Modifier and Type Method and Description void close() CrawlDatum get(String crawlDb, String url, org.apache.hadoop.conf.Configuration config) static void main(String[] args) void processDumpJob(String crawlDb, String output, org.apache.hadoop.conf.Configuration config, String format, String regex, String status, Integer retry) void processStatJob(String crawlDb, org.apache.hadoop.conf.Configuration config, boolean sort) void processTopNJob(String crawlDb, long topN, float min, String output, org.apache.hadoop.conf.Configuration config) void readUrl(String crawlDb, String url, org.apache.hadoop.conf.Configuration config)

-    

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

-  

LOG

public static final org.slf4j.Logger LOG

Constructor Detail

-  

CrawlDbReader

public CrawlDbReader()

Method Detail

-  

close

public void close()
  - Specified by: 
  - <code>close</code> in interface <code>Closeable</code> 
  - Specified by: 
  - <code>close</code> in interface <code>AutoCloseable</code>        
-  

processStatJob

public void processStatJob(String crawlDb,
                  org.apache.hadoop.conf.Configuration config,
                  boolean sort)
                    throws IOException
  - Throws: 
  - <code>IOException</code>       
-  

get

public CrawlDatum get(String crawlDb,
             String url,
             org.apache.hadoop.conf.Configuration config)
               throws IOException
  - Throws: 
  - <code>IOException</code>       
-  

readUrl

public void readUrl(String crawlDb,
           String url,
           org.apache.hadoop.conf.Configuration config)
             throws IOException
  - Throws: 
  - <code>IOException</code>       
-  

processDumpJob

public void processDumpJob(String crawlDb,
                  String output,
                  org.apache.hadoop.conf.Configuration config,
                  String format,
                  String regex,
                  String status,
                  Integer retry)
                    throws IOException
  - Throws: 
  - <code>IOException</code>       
-  

processTopNJob

public void processTopNJob(String crawlDb,
                  long topN,
                  float min,
                  String output,
                  org.apache.hadoop.conf.Configuration config)
                    throws IOException
  - Throws: 
  - <code>IOException</code>       
-  

main

public static void main(String[] args)
                 throws IOException
  - Throws: 
  - <code>IOException</code>      

Copyright © 2014 The Apache Software Foundation