[TOC]

  • Prev Class
  • Next Class

org.apache.nutch.indexer.tld

Class TLDIndexingFilter


public class TLDIndexingFilter
extends Object
implements IndexingFilter

Adds the Top level domain extensions to the index

/enis.soz.nutch@gmail.com

Field Summary

Fields Modifier and Type Field and Description static org.slf4j.Logger LOG

-    

Fields inherited from interface org.apache.nutch.indexer.IndexingFilter

X_POINT_ID

Constructor Summary

Constructors Constructor and Description TLDIndexingFilter()

Method Summary

Methods Modifier and Type Method and Description NutchDocument filter(NutchDocument doc, Parse parse, org.apache.hadoop.io.Text urlText, CrawlDatum datum, Inlinks inlinks) Adds fields or otherwise modifies the document that will be indexed for a parse. org.apache.hadoop.conf.Configuration getConf() void setConf(org.apache.hadoop.conf.Configuration conf)

-    

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

-  

LOG

public static final org.slf4j.Logger LOG

Constructor Detail

-  

TLDIndexingFilter

public TLDIndexingFilter()

Method Detail

-  

filter

public NutchDocument filter(NutchDocument doc,
                   Parse parse,
                   org.apache.hadoop.io.Text urlText,
                   CrawlDatum datum,
                   Inlinks inlinks)
                     throws IndexingException

Description copied from interface: IndexingFilter

Adds fields or otherwise modifies the document that will be indexed for a parse. Unwanted documents can be removed from indexing by returning a null value.

  - Specified by: 
  - <code>filter</code> in interface <code>IndexingFilter</code> 
  - Parameters:
  - <code>doc</code> - document instance for collecting fields
  - <code>parse</code> - parse data instance
  - <code>urlText</code> - page url
  - <code>datum</code> - crawl datum for the page
  - <code>inlinks</code> - page inlinks 
  - Returns:
  - modified (or a new) document instance, or null (meaning the document should be discarded) 
  - Throws: 
  - <code>IndexingException</code>       
-  

setConf

public void setConf(org.apache.hadoop.conf.Configuration conf)
  - Specified by: 
  - <code>setConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>        
-  

getConf

public org.apache.hadoop.conf.Configuration getConf()
  - Specified by: 
  - <code>getConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>       

  • Prev Class
  • Next Class

Copyright © 2014 The Apache Software Foundation