org.apache.nutch.scoring.webgraph
Class NodeDumper
- java.lang.Object
- org.apache.hadoop.conf.Configured
- org.apache.nutch.scoring.webgraph.NodeDumper
- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool
public class NodeDumper extends org.apache.hadoop.conf.Configured implements org.apache.hadoop.util.Tool
A tools that dumps out the top urls by number of inlinks, number of outlinks, or by score, to a text file. One of the major uses of this tool is to check the top scoring urls of a link analysis program such as LinkRank. For number of inlinks or number of outlinks the WebGraph program will need to have been run. For link analysis score a program such as LinkRank will need to have been run which updates the NodeDb of the WebGraph.
Nested Class Summary
Nested Classes Modifier and Type Class and Description static class
NodeDumper.Dumper
Outputs the hosts or domains with an associated value.
static class
NodeDumper.Sorter
Outputs the top urls sorted in descending order.
Field Summary
Fields Modifier and Type Field and Description static org.slf4j.Logger
LOG
Constructor Summary
Constructors Constructor and Description NodeDumper()
Method Summary
Methods Modifier and Type Method and Description void
dumpNodes(org.apache.hadoop.fs.Path webGraphDb,
org.apache.nutch.scoring.webgraph.NodeDumper.DumpType type,
long topN,
org.apache.hadoop.fs.Path output,
boolean asEff,
org.apache.nutch.scoring.webgraph.NodeDumper.NameType nameType,
org.apache.nutch.scoring.webgraph.NodeDumper.AggrType aggrType,
boolean asSequenceFile)
Runs the process to dump the top urls out to a text file.
static void
main(String[] args)
int
run(String[] args)
Runs the node dumper tool.
-
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
Field Detail
-
LOG
public static final org.slf4j.Logger LOG
Constructor Detail
-
NodeDumper
public NodeDumper()
Method Detail
-
dumpNodes
public void dumpNodes(org.apache.hadoop.fs.Path webGraphDb, org.apache.nutch.scoring.webgraph.NodeDumper.DumpType type, long topN, org.apache.hadoop.fs.Path output, boolean asEff, org.apache.nutch.scoring.webgraph.NodeDumper.NameType nameType, org.apache.nutch.scoring.webgraph.NodeDumper.AggrType aggrType, boolean asSequenceFile) throws Exception
Runs the process to dump the top urls out to a text file.
- Parameters:
- <code>webGraphDb</code> - The WebGraph from which to pull values.
- <code>topN</code> -
- <code>output</code> -
- Throws:
- <code>IOException</code> - If an error occurs while dumping the top values.
- <code>Exception</code>
-
main
public static void main(String[] args) throws Exception
- Throws:
- <code>Exception</code>
-
run
public int run(String[] args) throws Exception
Runs the node dumper tool.
- Specified by:
- <code>run</code> in interface <code>org.apache.hadoop.util.Tool</code>
- Throws:
- <code>Exception</code>