[TOC]

org.apache.nutch.parse.zip

Class ZipParser

    • All Implemented Interfaces:
    • org.apache.hadoop.conf.Configurable, Parser, Pluggable

public class ZipParser
extends Object
implements Parser

ZipParser class based on MSPowerPointParser class by Stephan Strittmatter. Nutch parse plugin for zip files - Content Type : application/zip

Field Summary

-    

Fields inherited from interface org.apache.nutch.parse.Parser

X_POINT_ID

Constructor Summary

Constructors Constructor and Description ZipParser() Creates a new instance of ZipParser

Method Summary

Methods Modifier and Type Method and Description org.apache.hadoop.conf.Configuration getConf() ParseResult getParse(Content content) This method parses the given content and returns a map of pairs. void setConf(org.apache.hadoop.conf.Configuration conf)

-    

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail

-  

ZipParser

public ZipParser()

Creates a new instance of ZipParser

Method Detail

-  

getParse

public ParseResult getParse(Content content)

Description copied from interface: Parser

This method parses the given content and returns a map of pairs. Parse instances will be persisted under the given key.

Note: Meta-redirects should be followed only when they are coming from the original URL. That is:

Assume fetcher is in parsing mode and is currently processing foo.bar.com/redirect.html. If this url contains a meta redirect to another url, fetcher should only follow the redirect if the map contains an entry of the form <"foo.bar.com/redirect.html", Parse with a ParseStatus indicating the redirect>.

  - Specified by: 
  - <code>getParse</code> in interface <code>Parser</code> 
  - Parameters:
  - <code>content</code> - Content to be parsed 
  - Returns:
  - a map containing <key, parse=""> pairs       
-  

setConf

public void setConf(org.apache.hadoop.conf.Configuration conf)
  - Specified by: 
  - <code>setConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>        
-  

getConf

public org.apache.hadoop.conf.Configuration getConf()
  - Specified by: 
  - <code>getConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>       

Copyright © 2014 The Apache Software Foundation