[TOC]

  • Prev Class
  • Next Class

org.apache.nutch.parse.ext

Class ExtParser

    • All Implemented Interfaces:
    • org.apache.hadoop.conf.Configurable, Parser, Pluggable

public class ExtParser
extends Object
implements Parser

A wrapper that invokes external command to do real parsing job.

  • Author:
  • John Xing

Field Summary

Fields Modifier and Type Field and Description static org.slf4j.Logger LOG

-    

Fields inherited from interface org.apache.nutch.parse.Parser

X_POINT_ID

Constructor Summary

Constructors Constructor and Description ExtParser()

Method Summary

Methods Modifier and Type Method and Description org.apache.hadoop.conf.Configuration getConf() ParseResult getParse(Content content) This method parses the given content and returns a map of pairs. void setConf(org.apache.hadoop.conf.Configuration conf)

-    

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

-  

LOG

public static final org.slf4j.Logger LOG

Constructor Detail

-  

ExtParser

public ExtParser()

Method Detail

-  

getParse

public ParseResult getParse(Content content)

Description copied from interface: Parser

This method parses the given content and returns a map of pairs. Parse instances will be persisted under the given key.

Note: Meta-redirects should be followed only when they are coming from the original URL. That is:

Assume fetcher is in parsing mode and is currently processing foo.bar.com/redirect.html. If this url contains a meta redirect to another url, fetcher should only follow the redirect if the map contains an entry of the form <"foo.bar.com/redirect.html", Parse with a ParseStatus indicating the redirect>.

  - Specified by: 
  - <code>getParse</code> in interface <code>Parser</code> 
  - Parameters:
  - <code>content</code> - Content to be parsed 
  - Returns:
  - a map containing <key, parse=""> pairs       
-  

setConf

public void setConf(org.apache.hadoop.conf.Configuration conf)
  - Specified by: 
  - <code>setConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>        
-  

getConf

public org.apache.hadoop.conf.Configuration getConf()
  - Specified by: 
  - <code>getConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>       

  • Prev Class
  • Next Class

Copyright © 2014 The Apache Software Foundation