org.apache.nutch.scoring.webgraph
Class LoopReader
- java.lang.Object
- org.apache.hadoop.conf.Configured
- org.apache.nutch.scoring.webgraph.LoopReader
- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable
public class LoopReader extends org.apache.hadoop.conf.Configured
The LoopReader tool prints the loopset information for a single url.
Constructor Summary
Constructors Constructor and Description LoopReader()
LoopReader(org.apache.hadoop.conf.Configuration conf)
Method Summary
Methods Modifier and Type Method and Description void
dumpUrl(org.apache.hadoop.fs.Path webGraphDb,
String url)
Prints loopset for a single url.
static void
main(String[] args)
Runs the LoopReader tool.
-
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Constructor Detail
-
LoopReader
public LoopReader()
-
LoopReader
public LoopReader(org.apache.hadoop.conf.Configuration conf)
Method Detail
-
dumpUrl
public void dumpUrl(org.apache.hadoop.fs.Path webGraphDb, String url) throws IOException
Prints loopset for a single url. The loopset information will show any outlink url the eventually forms a link cycle.
- Parameters:
- <code>webGraphDb</code> - The WebGraph to check for loops
- <code>url</code> - The url to check.
- Throws:
- <code>IOException</code> - If an error occurs while printing loopset information.
-
main
public static void main(String[] args) throws Exception
Runs the LoopReader tool. For this tool to work the loops job must have already been run on the corresponding WebGraph.
- Throws:
- <code>Exception</code>