org.creativecommons.nutch
Class CCParseFilter
- java.lang.Object
- org.creativecommons.nutch.CCParseFilter
- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable, HtmlParseFilter, Pluggable
public class CCParseFilter extends Object implements HtmlParseFilter
Adds metadata identifying the Creative Commons license used, if any.
Nested Class Summary
Nested Classes Modifier and Type Class and Description static class
CCParseFilter.Walker
Walks DOM tree, looking for RDF in comments and licenses in anchors.
Field Summary
Fields Modifier and Type Field and Description static org.slf4j.Logger
LOG
-
Fields inherited from interface org.apache.nutch.parse.HtmlParseFilter
X_POINT_ID
Constructor Summary
Constructors Constructor and Description CCParseFilter()
Method Summary
Methods Modifier and Type Method and Description ParseResult
filter(Content content,
ParseResult parseResult,
HTMLMetaTags metaTags,
DocumentFragment doc)
Adds metadata or otherwise modifies a parse of an HTML document, given the DOM tree of a page.
org.apache.hadoop.conf.Configuration
getConf()
void
setConf(org.apache.hadoop.conf.Configuration conf)
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Field Detail
-
LOG
public static final org.slf4j.Logger LOG
Constructor Detail
-
CCParseFilter
public CCParseFilter()
Method Detail
-
filter
public ParseResult filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)
Adds metadata or otherwise modifies a parse of an HTML document, given the DOM tree of a page.
- Specified by:
- <code>filter</code> in interface <code>HtmlParseFilter</code>
-
setConf
public void setConf(org.apache.hadoop.conf.Configuration conf)
- Specified by:
- <code>setConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>
-
getConf
public org.apache.hadoop.conf.Configuration getConf()
- Specified by:
- <code>getConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>