[TOC]

  • Prev Class
  • Next Class

org.apache.nutch.parse.metatags

Class MetaTagsParser


public class MetaTagsParser
extends Object
implements HtmlParseFilter

Parse HTML meta tags (keywords, description) and store them in the parse metadata so that they can be indexed with the index-metadata plugin with the prefix 'metatag.'. Metatags are matched ignoring case.

Field Summary

-    

Fields inherited from interface org.apache.nutch.parse.HtmlParseFilter

X_POINT_ID

Constructor Summary

Constructors Constructor and Description MetaTagsParser()

Method Summary

Methods Modifier and Type Method and Description ParseResult filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc) Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page. org.apache.hadoop.conf.Configuration getConf() void setConf(org.apache.hadoop.conf.Configuration conf)

-    

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail

-  

MetaTagsParser

public MetaTagsParser()

Method Detail

-  

setConf

public void setConf(org.apache.hadoop.conf.Configuration conf)
  - Specified by: 
  - <code>setConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>        
-  

getConf

public org.apache.hadoop.conf.Configuration getConf()
  - Specified by: 
  - <code>getConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>        
-  

filter

public ParseResult filter(Content content,
                 ParseResult parseResult,
                 HTMLMetaTags metaTags,
                 DocumentFragment doc)

Description copied from interface: HtmlParseFilter

Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page.

  - Specified by: 
  - <code>filter</code> in interface <code>HtmlParseFilter</code>       

  • Prev Class
  • Next Class

Copyright © 2014 The Apache Software Foundation