[TOC]

  • Prev Class
  • Next Class

org.apache.nutch.indexer.feed

Class FeedIndexingFilter


public class FeedIndexingFilter
extends Object
implements IndexingFilter
  • Since:
  • NUTCH-444 An IndexingFilter implementation to pull out the relevant extracted Metadata fields from the RSS feeds and into the index.
  • Author:
  • dogacan, mattmann

Field Summary

Fields Modifier and Type Field and Description static String dateFormatStr

-    

Fields inherited from interface org.apache.nutch.indexer.IndexingFilter

X_POINT_ID

Constructor Summary

Constructors Constructor and Description FeedIndexingFilter()

Method Summary

Methods Modifier and Type Method and Description NutchDocument filter(NutchDocument doc, Parse parse, org.apache.hadoop.io.Text url, CrawlDatum datum, Inlinks inlinks) Extracts out the relevant fields: FEED_AUTHOR FEED_TAGS FEED_PUBLISHED FEED_UPDATED FEED And sends them to the Indexer for indexing within the Nutch index. org.apache.hadoop.conf.Configuration getConf() void setConf(org.apache.hadoop.conf.Configuration conf) Sets the Configuration object used to configure this IndexingFilter.

-    

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

-  

dateFormatStr

public static final String dateFormatStr
  - See Also:
  - [Constant Field Values](../../../../../constant-values.html#org.apache.nutch.indexer.feed.FeedIndexingFilter.dateFormatStr)       

Constructor Detail

-  

FeedIndexingFilter

public FeedIndexingFilter()

Method Detail

-  

filter

public NutchDocument filter(NutchDocument doc,
                   Parse parse,
                   org.apache.hadoop.io.Text url,
                   CrawlDatum datum,
                   Inlinks inlinks)
                     throws IndexingException

Extracts out the relevant fields:

  - FEED_AUTHOR 
  - FEED_TAGS 
  - FEED_PUBLISHED 
  - FEED_UPDATED 
  - FEED  And sends them to the <code>Indexer</code> for indexing within the Nutch index.

  - Specified by: 
  - <code>filter</code> in interface <code>IndexingFilter</code> 
  - Parameters:
  - <code>doc</code> - document instance for collecting fields
  - <code>parse</code> - parse data instance
  - <code>url</code> - page url
  - <code>datum</code> - crawl datum for the page
  - <code>inlinks</code> - page inlinks 
  - Returns:
  - modified (or a new) document instance, or null (meaning the document should be discarded) 
  - Throws: 
  - <code>IndexingException</code>       
-  

getConf

public org.apache.hadoop.conf.Configuration getConf()
  - Specified by: 
  - <code>getConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code> 
  - Returns:
  - the <code>Configuration</code> object used to configure this [<code>IndexingFilter</code>](../../../../../org/apache/nutch/indexer/IndexingFilter.html).       
-  

setConf

public void setConf(org.apache.hadoop.conf.Configuration conf)

Sets the Configuration object used to configure this IndexingFilter.

  - Specified by: 
  - <code>setConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code> 
  - Parameters:
  - <code>conf</code> - The <code>Configuration</code> object used to configure this [<code>IndexingFilter</code>](../../../../../org/apache/nutch/indexer/IndexingFilter.html).      

  • Prev Class
  • Next Class

Copyright © 2014 The Apache Software Foundation