[TOC]

org.apache.nutch.crawl

Class LinkDb

  • java.lang.Object
    • org.apache.hadoop.conf.Configured
    • org.apache.nutch.crawl.LinkDb
    • All Implemented Interfaces:
    • Closeable, AutoCloseable, org.apache.hadoop.conf.Configurable, org.apache.hadoop.mapred.JobConfigurable, org.apache.hadoop.mapred.Mapper, org.apache.hadoop.util.Tool

public class LinkDb
extends org.apache.hadoop.conf.Configured
implements org.apache.hadoop.util.Tool, org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.Text,ParseData,org.apache.hadoop.io.Text,Inlinks>

Maintains an inverted link map, listing incoming links for each url.

Field Summary

Fields Modifier and Type Field and Description static String CURRENT_NAME static String IGNORE_INTERNAL_LINKS static String LOCK_NAME static org.slf4j.Logger LOG

Constructor Summary

Constructors Constructor and Description LinkDb() LinkDb(org.apache.hadoop.conf.Configuration conf)

Method Summary

Methods Modifier and Type Method and Description void close() void configure(org.apache.hadoop.mapred.JobConf job) static void install(org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.fs.Path linkDb) void invert(org.apache.hadoop.fs.Path linkDb, org.apache.hadoop.fs.Path[] segments, boolean normalize, boolean filter, boolean force) void invert(org.apache.hadoop.fs.Path linkDb, org.apache.hadoop.fs.Path segmentsDir, boolean normalize, boolean filter, boolean force) static void main(String[] args) void map(org.apache.hadoop.io.Text key, ParseData parseData, org.apache.hadoop.mapred.OutputCollector output, org.apache.hadoop.mapred.Reporter reporter) int run(String[] args)

-    

Methods inherited from class org.apache.hadoop.conf.Configured

getConf, setConf

-    

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

-    

Methods inherited from interface org.apache.hadoop.conf.Configurable

getConf, setConf

Field Detail

-  

LOG

public static final org.slf4j.Logger LOG
-  

IGNORE_INTERNAL_LINKS

public static final String IGNORE_INTERNAL_LINKS
  - See Also:
  - [Constant Field Values](../../../../constant-values.html#org.apache.nutch.crawl.LinkDb.IGNORE_INTERNAL_LINKS)       
-  

CURRENT_NAME

public static final String CURRENT_NAME
  - See Also:
  - [Constant Field Values](../../../../constant-values.html#org.apache.nutch.crawl.LinkDb.CURRENT_NAME)       
-  

LOCK_NAME

public static final String LOCK_NAME
  - See Also:
  - [Constant Field Values](../../../../constant-values.html#org.apache.nutch.crawl.LinkDb.LOCK_NAME)       

Constructor Detail

-  

LinkDb

public LinkDb()
-  

LinkDb

public LinkDb(org.apache.hadoop.conf.Configuration conf)

Method Detail

-  

configure

public void configure(org.apache.hadoop.mapred.JobConf job)
  - Specified by: 
  - <code>configure</code> in interface <code>org.apache.hadoop.mapred.JobConfigurable</code>        
-  

close

public void close()
  - Specified by: 
  - <code>close</code> in interface <code>Closeable</code> 
  - Specified by: 
  - <code>close</code> in interface <code>AutoCloseable</code>        
-  

map

public void map(org.apache.hadoop.io.Text key,
       ParseData parseData,
       org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,Inlinks> output,
       org.apache.hadoop.mapred.Reporter reporter)
         throws IOException
  - Specified by: 
  - <code>map</code> in interface <code>org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.text,parsedata,org.apache.hadoop.io.text,inlinks></org.apache.hadoop.io.text,parsedata,org.apache.hadoop.io.text,inlinks></code> 
  - Throws: 
  - <code>IOException</code>       
-  

invert

public void invert(org.apache.hadoop.fs.Path linkDb,
          org.apache.hadoop.fs.Path segmentsDir,
          boolean normalize,
          boolean filter,
          boolean force)
            throws IOException
  - Throws: 
  - <code>IOException</code>       
-  

invert

public void invert(org.apache.hadoop.fs.Path linkDb,
          org.apache.hadoop.fs.Path[] segments,
          boolean normalize,
          boolean filter,
          boolean force)
            throws IOException
  - Throws: 
  - <code>IOException</code>       
-  

install

public static void install(org.apache.hadoop.mapred.JobConf job,
           org.apache.hadoop.fs.Path linkDb)
                    throws IOException
  - Throws: 
  - <code>IOException</code>       
-  

main

public static void main(String[] args)
                 throws Exception
  - Throws: 
  - <code>Exception</code>       
-  

run

public int run(String[] args)
        throws Exception
  - Specified by: 
  - <code>run</code> in interface <code>org.apache.hadoop.util.Tool</code> 
  - Throws: 
  - <code>Exception</code>      

Copyright © 2014 The Apache Software Foundation