[TOC]

org.creativecommons.nutch

Class CCParseFilter


public class CCParseFilter
extends Object
implements HtmlParseFilter

Adds metadata identifying the Creative Commons license used, if any.

Nested Class Summary

Nested Classes Modifier and Type Class and Description static class CCParseFilter.Walker Walks DOM tree, looking for RDF in comments and licenses in anchors.

Field Summary

Fields Modifier and Type Field and Description static org.slf4j.Logger LOG

-    

Fields inherited from interface org.apache.nutch.parse.HtmlParseFilter

X_POINT_ID

Constructor Summary

Constructors Constructor and Description CCParseFilter()

Method Summary

Methods Modifier and Type Method and Description ParseResult filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc) Adds metadata or otherwise modifies a parse of an HTML document, given the DOM tree of a page. org.apache.hadoop.conf.Configuration getConf() void setConf(org.apache.hadoop.conf.Configuration conf)

-    

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

-  

LOG

public static final org.slf4j.Logger LOG

Constructor Detail

-  

CCParseFilter

public CCParseFilter()

Method Detail

-  

filter

public ParseResult filter(Content content,
                 ParseResult parseResult,
                 HTMLMetaTags metaTags,
                 DocumentFragment doc)

Adds metadata or otherwise modifies a parse of an HTML document, given the DOM tree of a page.

  - Specified by: 
  - <code>filter</code> in interface <code>HtmlParseFilter</code>        
-  

setConf

public void setConf(org.apache.hadoop.conf.Configuration conf)
  - Specified by: 
  - <code>setConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>        
-  

getConf

public org.apache.hadoop.conf.Configuration getConf()
  - Specified by: 
  - <code>getConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>       

Copyright © 2014 The Apache Software Foundation