[TOC]

org.apache.nutch.segment

Interface SegmentMergeFilter


public interface SegmentMergeFilter

Interface used to filter segments during segment merge. It allows filtering on more sophisticated criteria than just URLs. In particular it allows filtering based on metadata collected while parsing page.

Field Summary

Fields Modifier and Type Field and Description static String X_POINT_ID The name of the extension point.

Method Summary

Methods Modifier and Type Method and Description boolean filter(org.apache.hadoop.io.Text key, CrawlDatum generateData, CrawlDatum fetchData, CrawlDatum sigData, Content content, ParseData parseData, ParseText parseText, Collection linked) The filtering method which gets all information being merged for a given key (URL).

Field Detail

-  

X_POINT_ID

static final String X_POINT_ID

The name of the extension point.

Method Detail

-  

filter

boolean filter(org.apache.hadoop.io.Text key,
             CrawlDatum generateData,
             CrawlDatum fetchData,
             CrawlDatum sigData,
             Content content,
             ParseData parseData,
             ParseText parseText,
             Collection<CrawlDatum> linked)

The filtering method which gets all information being merged for a given key (URL).

  - Returns:
  - true values for this key (URL) should be merged into the new segment.      

Copyright © 2014 The Apache Software Foundation