[TOC]

org.apache.nutch.scoring.webgraph

Class NodeDumper

  • java.lang.Object
    • org.apache.hadoop.conf.Configured
    • org.apache.nutch.scoring.webgraph.NodeDumper
    • All Implemented Interfaces:
    • org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

public class NodeDumper
extends org.apache.hadoop.conf.Configured
implements org.apache.hadoop.util.Tool

A tools that dumps out the top urls by number of inlinks, number of outlinks, or by score, to a text file. One of the major uses of this tool is to check the top scoring urls of a link analysis program such as LinkRank. For number of inlinks or number of outlinks the WebGraph program will need to have been run. For link analysis score a program such as LinkRank will need to have been run which updates the NodeDb of the WebGraph.

Nested Class Summary

Nested Classes Modifier and Type Class and Description static class NodeDumper.Dumper Outputs the hosts or domains with an associated value. static class NodeDumper.Sorter Outputs the top urls sorted in descending order.

Field Summary

Fields Modifier and Type Field and Description static org.slf4j.Logger LOG

Constructor Summary

Constructors Constructor and Description NodeDumper()

Method Summary

Methods Modifier and Type Method and Description void dumpNodes(org.apache.hadoop.fs.Path webGraphDb, org.apache.nutch.scoring.webgraph.NodeDumper.DumpType type, long topN, org.apache.hadoop.fs.Path output, boolean asEff, org.apache.nutch.scoring.webgraph.NodeDumper.NameType nameType, org.apache.nutch.scoring.webgraph.NodeDumper.AggrType aggrType, boolean asSequenceFile) Runs the process to dump the top urls out to a text file. static void main(String[] args) int run(String[] args) Runs the node dumper tool.

-    

Methods inherited from class org.apache.hadoop.conf.Configured

getConf, setConf

-    

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

-    

Methods inherited from interface org.apache.hadoop.conf.Configurable

getConf, setConf

Field Detail

-  

LOG

public static final org.slf4j.Logger LOG

Constructor Detail

-  

NodeDumper

public NodeDumper()

Method Detail

-  

dumpNodes

public void dumpNodes(org.apache.hadoop.fs.Path webGraphDb,
             org.apache.nutch.scoring.webgraph.NodeDumper.DumpType type,
             long topN,
             org.apache.hadoop.fs.Path output,
             boolean asEff,
             org.apache.nutch.scoring.webgraph.NodeDumper.NameType nameType,
             org.apache.nutch.scoring.webgraph.NodeDumper.AggrType aggrType,
             boolean asSequenceFile)
               throws Exception

Runs the process to dump the top urls out to a text file.

  - Parameters:
  - <code>webGraphDb</code> - The WebGraph from which to pull values.
  - <code>topN</code> - 
  - <code>output</code> -  
  - Throws: 
  - <code>IOException</code> - If an error occurs while dumping the top values. 
  - <code>Exception</code>       
-  

main

public static void main(String[] args)
                 throws Exception
  - Throws: 
  - <code>Exception</code>       
-  

run

public int run(String[] args)
        throws Exception

Runs the node dumper tool.

  - Specified by: 
  - <code>run</code> in interface <code>org.apache.hadoop.util.Tool</code> 
  - Throws: 
  - <code>Exception</code>      

Copyright © 2014 The Apache Software Foundation