[TOC]

org.apache.nutch.scoring.webgraph

Class LinkDumper

  • java.lang.Object
    • org.apache.hadoop.conf.Configured
    • org.apache.nutch.scoring.webgraph.LinkDumper
    • All Implemented Interfaces:
    • org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

public class LinkDumper
extends org.apache.hadoop.conf.Configured
implements org.apache.hadoop.util.Tool

The LinkDumper tool creates a database of node to inlink information that can be read using the nested Reader class. This allows the inlink and scoring state of a single url to be reviewed quickly to determine why a given url is ranking a certain way. This tool is to be used with the LinkRank analysis.

Nested Class Summary

Nested Classes Modifier and Type Class and Description static class LinkDumper.Inverter Inverts outlinks from the WebGraph to inlinks and attaches node information. static class LinkDumper.LinkNode Bean class which holds url to node information. static class LinkDumper.LinkNodes Writable class which holds an array of LinkNode objects. static class LinkDumper.Merger Merges LinkNode objects into a single array value per url. static class LinkDumper.Reader Reader class which will print out the url and all of its inlinks to system out.

Field Summary

Fields Modifier and Type Field and Description static String DUMP_DIR static org.slf4j.Logger LOG

Constructor Summary

Constructors Constructor and Description LinkDumper()

Method Summary

Methods Modifier and Type Method and Description void dumpLinks(org.apache.hadoop.fs.Path webGraphDb) Runs the inverter and merger jobs of the LinkDumper tool to create the url to inlink node database. static void main(String[] args) int run(String[] args) Runs the LinkDumper tool.

-    

Methods inherited from class org.apache.hadoop.conf.Configured

getConf, setConf

-    

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

-    

Methods inherited from interface org.apache.hadoop.conf.Configurable

getConf, setConf

Field Detail

-  

LOG

public static final org.slf4j.Logger LOG
-  

DUMP_DIR

public static final String DUMP_DIR
  - See Also:
  - [Constant Field Values](../../../../../constant-values.html#org.apache.nutch.scoring.webgraph.LinkDumper.DUMP_DIR)       

Constructor Detail

-  

LinkDumper

public LinkDumper()

Method Detail

-  

dumpLinks

public void dumpLinks(org.apache.hadoop.fs.Path webGraphDb)
               throws IOException

Runs the inverter and merger jobs of the LinkDumper tool to create the url to inlink node database.

  - Throws: 
  - <code>IOException</code>       
-  

main

public static void main(String[] args)
                 throws Exception
  - Throws: 
  - <code>Exception</code>       
-  

run

public int run(String[] args)
        throws Exception

Runs the LinkDumper tool. This simply creates the database, to read the values the nested Reader tool must be used.

  - Specified by: 
  - <code>run</code> in interface <code>org.apache.hadoop.util.Tool</code> 
  - Throws: 
  - <code>Exception</code>      

Copyright © 2014 The Apache Software Foundation