- Prev Class
- Next Class
org.apache.nutch.parse.metatags
Class MetaTagsParser
- java.lang.Object
- org.apache.nutch.parse.metatags.MetaTagsParser
- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable, HtmlParseFilter, Pluggable
public class MetaTagsParser extends Object implements HtmlParseFilter
Parse HTML meta tags (keywords, description) and store them in the parse metadata so that they can be indexed with the index-metadata plugin with the prefix 'metatag.'. Metatags are matched ignoring case.
Field Summary
-
Fields inherited from interface org.apache.nutch.parse.HtmlParseFilter
X_POINT_ID
Constructor Summary
Constructors Constructor and Description MetaTagsParser()
Method Summary
Methods Modifier and Type Method and Description ParseResult
filter(Content content,
ParseResult parseResult,
HTMLMetaTags metaTags,
DocumentFragment doc)
Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page.
org.apache.hadoop.conf.Configuration
getConf()
void
setConf(org.apache.hadoop.conf.Configuration conf)
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Constructor Detail
-
MetaTagsParser
public MetaTagsParser()
Method Detail
-
setConf
public void setConf(org.apache.hadoop.conf.Configuration conf)
- Specified by:
- <code>setConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>
-
getConf
public org.apache.hadoop.conf.Configuration getConf()
- Specified by:
- <code>getConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>
-
filter
public ParseResult filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)
Description copied from interface: HtmlParseFilter
Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page.
- Specified by:
- <code>filter</code> in interface <code>HtmlParseFilter</code>
- Prev Class
- Next Class