org.apache.nutch.scoring.webgraph
Class LinkDumper
- java.lang.Object
- org.apache.hadoop.conf.Configured
- org.apache.nutch.scoring.webgraph.LinkDumper
- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool
public class LinkDumper extends org.apache.hadoop.conf.Configured implements org.apache.hadoop.util.Tool
The LinkDumper tool creates a database of node to inlink information that can be read using the nested Reader class. This allows the inlink and scoring state of a single url to be reviewed quickly to determine why a given url is ranking a certain way. This tool is to be used with the LinkRank analysis.
Nested Class Summary
Nested Classes Modifier and Type Class and Description static class
LinkDumper.Inverter
Inverts outlinks from the WebGraph to inlinks and attaches node information.
static class
LinkDumper.LinkNode
Bean class which holds url to node information.
static class
LinkDumper.LinkNodes
Writable class which holds an array of LinkNode objects.
static class
LinkDumper.Merger
Merges LinkNode objects into a single array value per url.
static class
LinkDumper.Reader
Reader class which will print out the url and all of its inlinks to system out.
Field Summary
Fields Modifier and Type Field and Description static String
DUMP_DIR
static org.slf4j.Logger
LOG
Constructor Summary
Constructors Constructor and Description LinkDumper()
Method Summary
Methods Modifier and Type Method and Description void
dumpLinks(org.apache.hadoop.fs.Path webGraphDb)
Runs the inverter and merger jobs of the LinkDumper tool to create the url to inlink node database.
static void
main(String[] args)
int
run(String[] args)
Runs the LinkDumper tool.
-
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
Field Detail
-
LOG
public static final org.slf4j.Logger LOG
-
DUMP_DIR
public static final String DUMP_DIR
- See Also:
- [Constant Field Values](../../../../../constant-values.html#org.apache.nutch.scoring.webgraph.LinkDumper.DUMP_DIR)
Constructor Detail
-
LinkDumper
public LinkDumper()
Method Detail
-
dumpLinks
public void dumpLinks(org.apache.hadoop.fs.Path webGraphDb) throws IOException
Runs the inverter and merger jobs of the LinkDumper tool to create the url to inlink node database.
- Throws:
- <code>IOException</code>
-
main
public static void main(String[] args) throws Exception
- Throws:
- <code>Exception</code>
-
run
public int run(String[] args) throws Exception
Runs the LinkDumper tool. This simply creates the database, to read the values the nested Reader tool must be used.
- Specified by:
- <code>run</code> in interface <code>org.apache.hadoop.util.Tool</code>
- Throws:
- <code>Exception</code>