[TOC]

org.apache.nutch.scoring.webgraph

Class ScoreUpdater

  • java.lang.Object
    • org.apache.hadoop.conf.Configured
    • org.apache.nutch.scoring.webgraph.ScoreUpdater
    • All Implemented Interfaces:
    • Closeable, AutoCloseable, org.apache.hadoop.conf.Configurable, org.apache.hadoop.mapred.JobConfigurable, org.apache.hadoop.mapred.Mapper, org.apache.hadoop.mapred.Reducer, org.apache.hadoop.util.Tool

public class ScoreUpdater
extends org.apache.hadoop.conf.Configured
implements org.apache.hadoop.util.Tool, org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.Text,org.apache.hadoop.io.Writable,org.apache.hadoop.io.Text,org.apache.hadoop.io.ObjectWritable>, org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.Text,org.apache.hadoop.io.ObjectWritable,org.apache.hadoop.io.Text,CrawlDatum>

Updates the score from the WebGraph node database into the crawl database. Any score that is not in the node database is set to the clear score in the crawl database.

Field Summary

Fields Modifier and Type Field and Description static org.slf4j.Logger LOG

Constructor Summary

Constructors Constructor and Description ScoreUpdater()

Method Summary

Methods Modifier and Type Method and Description void close() void configure(org.apache.hadoop.mapred.JobConf conf) static void main(String[] args) void map(org.apache.hadoop.io.Text key, org.apache.hadoop.io.Writable value, org.apache.hadoop.mapred.OutputCollector output, org.apache.hadoop.mapred.Reporter reporter) Changes input into ObjectWritables. void reduce(org.apache.hadoop.io.Text key, Iterator values, org.apache.hadoop.mapred.OutputCollector output, org.apache.hadoop.mapred.Reporter reporter) Creates new CrawlDatum objects with the updated score from the NodeDb or with a cleared score. int run(String[] args) Runs the ScoreUpdater tool. void update(org.apache.hadoop.fs.Path crawlDb, org.apache.hadoop.fs.Path webGraphDb) Updates the inlink score in the web graph node databsae into the crawl database.

-    

Methods inherited from class org.apache.hadoop.conf.Configured

getConf, setConf

-    

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

-    

Methods inherited from interface org.apache.hadoop.conf.Configurable

getConf, setConf

Field Detail

-  

LOG

public static final org.slf4j.Logger LOG

Constructor Detail

-  

ScoreUpdater

public ScoreUpdater()

Method Detail

-  

configure

public void configure(org.apache.hadoop.mapred.JobConf conf)
  - Specified by: 
  - <code>configure</code> in interface <code>org.apache.hadoop.mapred.JobConfigurable</code>        
-  

map

public void map(org.apache.hadoop.io.Text key,
       org.apache.hadoop.io.Writable value,
       org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,org.apache.hadoop.io.ObjectWritable> output,
       org.apache.hadoop.mapred.Reporter reporter)
         throws IOException

Changes input into ObjectWritables.

  - Specified by: 
  - <code>map</code> in interface <code>org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.text,org.apache.hadoop.io.writable,org.apache.hadoop.io.text,org.apache.hadoop.io.objectwritable></org.apache.hadoop.io.text,org.apache.hadoop.io.writable,org.apache.hadoop.io.text,org.apache.hadoop.io.objectwritable></code> 
  - Throws: 
  - <code>IOException</code>       
-  

reduce

public void reduce(org.apache.hadoop.io.Text key,
          Iterator<org.apache.hadoop.io.ObjectWritable> values,
          org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,CrawlDatum> output,
          org.apache.hadoop.mapred.Reporter reporter)
            throws IOException

Creates new CrawlDatum objects with the updated score from the NodeDb or with a cleared score.

  - Specified by: 
  - <code>reduce</code> in interface <code>org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.text,org.apache.hadoop.io.objectwritable,org.apache.hadoop.io.text,crawldatum></org.apache.hadoop.io.text,org.apache.hadoop.io.objectwritable,org.apache.hadoop.io.text,crawldatum></code> 
  - Throws: 
  - <code>IOException</code>       
-  

close

public void close()
  - Specified by: 
  - <code>close</code> in interface <code>Closeable</code> 
  - Specified by: 
  - <code>close</code> in interface <code>AutoCloseable</code>        
-  

update

public void update(org.apache.hadoop.fs.Path crawlDb,
          org.apache.hadoop.fs.Path webGraphDb)
            throws IOException

Updates the inlink score in the web graph node databsae into the crawl database.

  - Parameters:
  - <code>crawlDb</code> - The crawl database to update
  - <code>webGraphDb</code> - The webgraph database to use. 
  - Throws: 
  - <code>IOException</code> - If an error occurs while updating the scores.       
-  

main

public static void main(String[] args)
                 throws Exception
  - Throws: 
  - <code>Exception</code>       
-  

run

public int run(String[] args)
        throws Exception

Runs the ScoreUpdater tool.

  - Specified by: 
  - <code>run</code> in interface <code>org.apache.hadoop.util.Tool</code> 
  - Throws: 
  - <code>Exception</code>      

Copyright © 2014 The Apache Software Foundation