[TOC]

org.apache.nutch.tools

Class DmozParser


public class DmozParser
extends Object

Utility that converts DMOZ RDF into a flat file of URLs to be injected.

Field Summary

Fields Modifier and Type Field and Description static org.slf4j.Logger LOG

Constructor Summary

Constructors Constructor and Description DmozParser()

Method Summary

Methods Modifier and Type Method and Description static void main(String[] argv) Command-line access. void parseDmozFile(File dmozFile, int subsetDenom, boolean includeAdult, int skew, Pattern topicPattern) Iterate through all the items in this structured DMOZ file.

-    

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

-  

LOG

public static final org.slf4j.Logger LOG

Constructor Detail

-  

DmozParser

public DmozParser()

Method Detail

-  

parseDmozFile

public void parseDmozFile(File dmozFile,
                 int subsetDenom,
                 boolean includeAdult,
                 int skew,
                 Pattern topicPattern)
                   throws IOException,
                          SAXException,
                          ParserConfigurationException

Iterate through all the items in this structured DMOZ file. Add each URL to the web db.

  - Throws: 
  - <code>IOException</code> 
  - <code>SAXException</code> 
  - <code>ParserConfigurationException</code>       
-  

main

public static void main(String[] argv)
                 throws Exception

Command-line access. User may add URLs via a flat text file or the structured DMOZ file. By default, we ignore Adult material (as categorized by DMOZ).

  - Throws: 
  - <code>Exception</code>      

Copyright © 2014 The Apache Software Foundation