- Prev Class
- Next Class
org.apache.nutch.indexer.staticfield
Class StaticFieldIndexer
- java.lang.Object
- org.apache.nutch.indexer.staticfield.StaticFieldIndexer
- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable, IndexingFilter, Pluggable
public class StaticFieldIndexer extends Object implements IndexingFilter
A simple plugin called at indexing that adds fields with static data. You can specify a list of fieldname:fieldcontent per nutch job. It can be useful when collections can't be created by urlpatterns, like in subcollection, but on a job-basis.
Field Summary
-
Fields inherited from interface org.apache.nutch.indexer.IndexingFilter
X_POINT_ID
Constructor Summary
Constructors Constructor and Description StaticFieldIndexer()
Method Summary
Methods Modifier and Type Method and Description NutchDocument
filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
The StaticFieldIndexer
filter object which adds fields as per configuration setting.
org.apache.hadoop.conf.Configuration
getConf()
Get the Configuration
object
void
setConf(org.apache.hadoop.conf.Configuration conf)
Set the Configuration
object
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Constructor Detail
-
StaticFieldIndexer
public StaticFieldIndexer()
Method Detail
-
filter
public NutchDocument filter(NutchDocument doc, Parse parse, org.apache.hadoop.io.Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException
The StaticFieldIndexer
filter object which adds fields as per configuration setting. See index.static
in nutch-default.xml.
- Specified by:
- <code>filter</code> in interface <code>IndexingFilter</code>
- Parameters:
- <code>doc</code> - The [<code>NutchDocument</code>](../../../../../org/apache/nutch/indexer/NutchDocument.html) object
- <code>parse</code> - The relevant [<code>Parse</code>](../../../../../org/apache/nutch/parse/Parse.html) object passing through the filter
- <code>url</code> - URL to be filtered for anchor text
- <code>datum</code> - The [<code>CrawlDatum</code>](../../../../../org/apache/nutch/crawl/CrawlDatum.html) entry
- <code>inlinks</code> - The [<code>Inlinks</code>](../../../../../org/apache/nutch/crawl/Inlinks.html) containing anchor text
- Returns:
- filtered NutchDocument
- Throws:
- <code>IndexingException</code>
-
setConf
public void setConf(org.apache.hadoop.conf.Configuration conf)
Set the Configuration
object
- Specified by:
- <code>setConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>
-
getConf
public org.apache.hadoop.conf.Configuration getConf()
Get the Configuration
object
- Specified by:
- <code>getConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>
- Prev Class
- Next Class