- Prev Class
- Next Class
org.apache.nutch.indexer.feed
Class FeedIndexingFilter
- java.lang.Object
- org.apache.nutch.indexer.feed.FeedIndexingFilter
- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable, IndexingFilter, Pluggable
public class FeedIndexingFilter extends Object implements IndexingFilter
- Since:
- NUTCH-444 An
IndexingFilter
implementation to pull out the relevant extractedMetadata
fields from the RSS feeds and into the index. - Author:
- dogacan, mattmann
Field Summary
Fields Modifier and Type Field and Description static String
dateFormatStr
-
Fields inherited from interface org.apache.nutch.indexer.IndexingFilter
X_POINT_ID
Constructor Summary
Constructors Constructor and Description FeedIndexingFilter()
Method Summary
Methods Modifier and Type Method and Description NutchDocument
filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
Extracts out the relevant fields: FEED_AUTHOR FEED_TAGS FEED_PUBLISHED FEED_UPDATED FEED And sends them to the Indexer
for indexing within the Nutch index.
org.apache.hadoop.conf.Configuration
getConf()
void
setConf(org.apache.hadoop.conf.Configuration conf)
Sets the Configuration
object used to configure this IndexingFilter
.
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Field Detail
-
dateFormatStr
public static final String dateFormatStr
- See Also:
- [Constant Field Values](../../../../../constant-values.html#org.apache.nutch.indexer.feed.FeedIndexingFilter.dateFormatStr)
Constructor Detail
-
FeedIndexingFilter
public FeedIndexingFilter()
Method Detail
-
filter
public NutchDocument filter(NutchDocument doc, Parse parse, org.apache.hadoop.io.Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException
Extracts out the relevant fields:
- FEED_AUTHOR
- FEED_TAGS
- FEED_PUBLISHED
- FEED_UPDATED
- FEED And sends them to the <code>Indexer</code> for indexing within the Nutch index.
- Specified by:
- <code>filter</code> in interface <code>IndexingFilter</code>
- Parameters:
- <code>doc</code> - document instance for collecting fields
- <code>parse</code> - parse data instance
- <code>url</code> - page url
- <code>datum</code> - crawl datum for the page
- <code>inlinks</code> - page inlinks
- Returns:
- modified (or a new) document instance, or null (meaning the document should be discarded)
- Throws:
- <code>IndexingException</code>
-
getConf
public org.apache.hadoop.conf.Configuration getConf()
- Specified by:
- <code>getConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>
- Returns:
- the <code>Configuration</code> object used to configure this [<code>IndexingFilter</code>](../../../../../org/apache/nutch/indexer/IndexingFilter.html).
-
setConf
public void setConf(org.apache.hadoop.conf.Configuration conf)
Sets the Configuration
object used to configure this IndexingFilter
.
- Specified by:
- <code>setConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>
- Parameters:
- <code>conf</code> - The <code>Configuration</code> object used to configure this [<code>IndexingFilter</code>](../../../../../org/apache/nutch/indexer/IndexingFilter.html).
- Prev Class
- Next Class