- Prev Class
- Next Class
org.apache.nutch.parse.tika
Class TikaParser
- java.lang.Object
- org.apache.nutch.parse.tika.TikaParser
public class TikaParser extends Object implements Parser
Wrapper for Tika parsers. Mimics the HTMLParser but using the XHTML representation returned by Tika as SAX events
Field Summary
Fields Modifier and Type Field and Description static org.slf4j.Logger
LOG
-
Fields inherited from interface org.apache.nutch.parse.Parser
X_POINT_ID
Constructor Summary
Constructors Constructor and Description TikaParser()
Method Summary
Methods Modifier and Type Method and Description org.apache.hadoop.conf.Configuration
getConf()
ParseResult
getParse(Content content)
This method parses the given content and returns a map of void
setConf(org.apache.hadoop.conf.Configuration conf)
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Field Detail
-
LOG
public static final org.slf4j.Logger LOG
Constructor Detail
-
TikaParser
public TikaParser()
Method Detail
-
getParse
public ParseResult getParse(Content content)
Description copied from interface: Parser
This method parses the given content and returns a map of Parse
instances will be persisted under the given key.
Note: Meta-redirects should be followed only when they are coming from the original URL. That is:
Assume fetcher is in parsing mode and is currently processing foo.bar.com/redirect.html. If this url contains a meta redirect to another url, fetcher should only follow the redirect if the map contains an entry of the form <"foo.bar.com/redirect.html", Parse
with a ParseStatus
indicating the redirect>.
- Specified by:
- <code>getParse</code> in interface <code>Parser</code>
- Parameters:
- <code>content</code> - Content to be parsed
- Returns:
- a map containing <key, parse=""> pairs
-
setConf
public void setConf(org.apache.hadoop.conf.Configuration conf)
- Specified by:
- <code>setConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>
-
getConf
public org.apache.hadoop.conf.Configuration getConf()
- Specified by:
- <code>getConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>
- Prev Class
- Next Class