org.apache.nutch.segment
Interface SegmentMergeFilter
public interface SegmentMergeFilter
Interface used to filter segments during segment merge. It allows filtering on more sophisticated criteria than just URLs. In particular it allows filtering based on metadata collected while parsing page.
Field Summary
 Fields   Modifier and Type Field and Description   static String X_POINT_ID 
The name of the extension point.
Method Summary
 Methods   Modifier and Type Method and Description   boolean filter(org.apache.hadoop.io.Text key,
      CrawlDatum generateData,
      CrawlDatum fetchData,
      CrawlDatum sigData,
      Content content,
      ParseData parseData,
      ParseText parseText,
      Collection 
The filtering method which gets all information being merged for a given key (URL).
Field Detail
-  
X_POINT_ID
static final String X_POINT_ID
The name of the extension point.
Method Detail
-  
filter
boolean filter(org.apache.hadoop.io.Text key,
             CrawlDatum generateData,
             CrawlDatum fetchData,
             CrawlDatum sigData,
             Content content,
             ParseData parseData,
             ParseText parseText,
             Collection<CrawlDatum> linked)
The filtering method which gets all information being merged for a given key (URL).
  - Returns:
  - true values for this key (URL) should be merged into the new segment.      
   
  
                    