[TOC]

  • Prev Class
  • Next Class

org.apache.nutch.indexer.staticfield

Class StaticFieldIndexer


public class StaticFieldIndexer
extends Object
implements IndexingFilter

A simple plugin called at indexing that adds fields with static data. You can specify a list of fieldname:fieldcontent per nutch job. It can be useful when collections can't be created by urlpatterns, like in subcollection, but on a job-basis.

Field Summary

-    

Fields inherited from interface org.apache.nutch.indexer.IndexingFilter

X_POINT_ID

Constructor Summary

Constructors Constructor and Description StaticFieldIndexer()

Method Summary

Methods Modifier and Type Method and Description NutchDocument filter(NutchDocument doc, Parse parse, org.apache.hadoop.io.Text url, CrawlDatum datum, Inlinks inlinks) The StaticFieldIndexer filter object which adds fields as per configuration setting. org.apache.hadoop.conf.Configuration getConf() Get the Configuration object void setConf(org.apache.hadoop.conf.Configuration conf) Set the Configuration object

-    

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail

-  

StaticFieldIndexer

public StaticFieldIndexer()

Method Detail

-  

filter

public NutchDocument filter(NutchDocument doc,
                   Parse parse,
                   org.apache.hadoop.io.Text url,
                   CrawlDatum datum,
                   Inlinks inlinks)
                     throws IndexingException

The StaticFieldIndexer filter object which adds fields as per configuration setting. See index.static in nutch-default.xml.

  - Specified by: 
  - <code>filter</code> in interface <code>IndexingFilter</code> 
  - Parameters:
  - <code>doc</code> - The [<code>NutchDocument</code>](../../../../../org/apache/nutch/indexer/NutchDocument.html) object
  - <code>parse</code> - The relevant [<code>Parse</code>](../../../../../org/apache/nutch/parse/Parse.html) object passing through the filter
  - <code>url</code> - URL to be filtered for anchor text
  - <code>datum</code> - The [<code>CrawlDatum</code>](../../../../../org/apache/nutch/crawl/CrawlDatum.html) entry
  - <code>inlinks</code> - The [<code>Inlinks</code>](../../../../../org/apache/nutch/crawl/Inlinks.html) containing anchor text 
  - Returns:
  - filtered NutchDocument 
  - Throws: 
  - <code>IndexingException</code>       
-  

setConf

public void setConf(org.apache.hadoop.conf.Configuration conf)

Set the Configuration object

  - Specified by: 
  - <code>setConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>        
-  

getConf

public org.apache.hadoop.conf.Configuration getConf()

Get the Configuration object

  - Specified by: 
  - <code>getConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>       

  • Prev Class
  • Next Class

Copyright © 2014 The Apache Software Foundation