[TOC]

org.apache.nutch.scoring.webgraph

Class WebGraph.OutlinkDb

  • java.lang.Object
    • org.apache.hadoop.conf.Configured
    • org.apache.nutch.scoring.webgraph.WebGraph.OutlinkDb
    • All Implemented Interfaces:
    • Closeable, AutoCloseable, org.apache.hadoop.conf.Configurable, org.apache.hadoop.mapred.JobConfigurable, org.apache.hadoop.mapred.Mapper, org.apache.hadoop.mapred.Reducer
    • Enclosing class:
    • WebGraph

public static class WebGraph.OutlinkDb
extends org.apache.hadoop.conf.Configured
implements org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.Text,org.apache.hadoop.io.Writable,org.apache.hadoop.io.Text,NutchWritable>, org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.Text,NutchWritable,org.apache.hadoop.io.Text,LinkDatum>

The OutlinkDb creates a database of all outlinks. Outlinks to internal urls by domain and host can be ignored. The number of Outlinks out to a given page or domain can also be limited.

Field Summary

Fields Modifier and Type Field and Description static String URL_FILTERING static String URL_NORMALIZING

Constructor Summary

Constructors Constructor and Description WebGraph.OutlinkDb() Default constructor. WebGraph.OutlinkDb(org.apache.hadoop.conf.Configuration conf) Configurable constructor.

Method Summary

Methods Modifier and Type Method and Description void close() void configure(org.apache.hadoop.mapred.JobConf conf) Configures the OutlinkDb job. void map(org.apache.hadoop.io.Text key, org.apache.hadoop.io.Writable value, org.apache.hadoop.mapred.OutputCollector output, org.apache.hadoop.mapred.Reporter reporter) Passes through existing LinkDatum objects from an existing OutlinkDb and maps out new LinkDatum objects from new crawls ParseData. void reduce(org.apache.hadoop.io.Text key, Iterator values, org.apache.hadoop.mapred.OutputCollector output, org.apache.hadoop.mapred.Reporter reporter)

-    

Methods inherited from class org.apache.hadoop.conf.Configured

getConf, setConf

-    

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

-  

URL_NORMALIZING

public static final String URL_NORMALIZING
  - See Also:
  - [Constant Field Values](../../../../../constant-values.html#org.apache.nutch.scoring.webgraph.WebGraph.OutlinkDb.URL_NORMALIZING)       
-  

URL_FILTERING

public static final String URL_FILTERING
  - See Also:
  - [Constant Field Values](../../../../../constant-values.html#org.apache.nutch.scoring.webgraph.WebGraph.OutlinkDb.URL_FILTERING)       

Constructor Detail

-  

WebGraph.OutlinkDb

public WebGraph.OutlinkDb()

Default constructor.

-  

WebGraph.OutlinkDb

public WebGraph.OutlinkDb(org.apache.hadoop.conf.Configuration conf)

Configurable constructor.

Method Detail

-  

configure

public void configure(org.apache.hadoop.mapred.JobConf conf)

Configures the OutlinkDb job. Sets up internal links and link limiting.

  - Specified by: 
  - <code>configure</code> in interface <code>org.apache.hadoop.mapred.JobConfigurable</code>        
-  

map

public void map(org.apache.hadoop.io.Text key,
       org.apache.hadoop.io.Writable value,
       org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,NutchWritable> output,
       org.apache.hadoop.mapred.Reporter reporter)
         throws IOException

Passes through existing LinkDatum objects from an existing OutlinkDb and maps out new LinkDatum objects from new crawls ParseData.

  - Specified by: 
  - <code>map</code> in interface <code>org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.text,org.apache.hadoop.io.writable,org.apache.hadoop.io.text,nutchwritable></org.apache.hadoop.io.text,org.apache.hadoop.io.writable,org.apache.hadoop.io.text,nutchwritable></code> 
  - Throws: 
  - <code>IOException</code>       
-  

reduce

public void reduce(org.apache.hadoop.io.Text key,
          Iterator<NutchWritable> values,
          org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,LinkDatum> output,
          org.apache.hadoop.mapred.Reporter reporter)
            throws IOException
  - Specified by: 
  - <code>reduce</code> in interface <code>org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.text,nutchwritable,org.apache.hadoop.io.text,linkdatum></org.apache.hadoop.io.text,nutchwritable,org.apache.hadoop.io.text,linkdatum></code> 
  - Throws: 
  - <code>IOException</code>       
-  

close

public void close()
  - Specified by: 
  - <code>close</code> in interface <code>Closeable</code> 
  - Specified by: 
  - <code>close</code> in interface <code>AutoCloseable</code>       

Copyright © 2014 The Apache Software Foundation