- Prev Class
- Next Class
org.apache.nutch.indexer.staticfield
Class StaticFieldIndexer
- java.lang.Object
- org.apache.nutch.indexer.staticfield.StaticFieldIndexer
- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable, IndexingFilter, Pluggable
public class StaticFieldIndexer extends Object implements IndexingFilter
A simple plugin called at indexing that adds fields with static data. You can specify a list of fieldname:fieldcontent per nutch job. It can be useful when collections can't be created by urlpatterns, like in subcollection, but on a job-basis.
Field Summary
-
Fields inherited from interface org.apache.nutch.indexer.IndexingFilter
X_POINT_ID
Constructor Summary
Constructors Constructor and Description StaticFieldIndexer()
Method Summary
Methods Modifier and Type Method and Description NutchDocument filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
The StaticFieldIndexer filter object which adds fields as per configuration setting.
org.apache.hadoop.conf.Configuration getConf()
Get the Configuration object
void setConf(org.apache.hadoop.conf.Configuration conf)
Set the Configuration object
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Constructor Detail
-
StaticFieldIndexer
public StaticFieldIndexer()
Method Detail
-
filter
public NutchDocument filter(NutchDocument doc, Parse parse, org.apache.hadoop.io.Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException
The StaticFieldIndexer filter object which adds fields as per configuration setting. See index.static in nutch-default.xml.
- Specified by:
- <code>filter</code> in interface <code>IndexingFilter</code>
- Parameters:
- <code>doc</code> - The [<code>NutchDocument</code>](../../../../../org/apache/nutch/indexer/NutchDocument.html) object
- <code>parse</code> - The relevant [<code>Parse</code>](../../../../../org/apache/nutch/parse/Parse.html) object passing through the filter
- <code>url</code> - URL to be filtered for anchor text
- <code>datum</code> - The [<code>CrawlDatum</code>](../../../../../org/apache/nutch/crawl/CrawlDatum.html) entry
- <code>inlinks</code> - The [<code>Inlinks</code>](../../../../../org/apache/nutch/crawl/Inlinks.html) containing anchor text
- Returns:
- filtered NutchDocument
- Throws:
- <code>IndexingException</code>
-
setConf
public void setConf(org.apache.hadoop.conf.Configuration conf)
Set the Configuration object
- Specified by:
- <code>setConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>
-
getConf
public org.apache.hadoop.conf.Configuration getConf()
Get the Configuration object
- Specified by:
- <code>getConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>
- Prev Class
- Next Class
