[TOC]

org.apache.nutch.parse

Class HTMLMetaTags


public class HTMLMetaTags
extends Object

This class holds the information about HTML "meta" tags extracted from a page. Some special tags have convenience methods for easy checking.

Constructor Summary

Constructors Constructor and Description HTMLMetaTags()

Method Summary

Methods Modifier and Type Method and Description URL getBaseHref() A convenience method. Metadata getGeneralTags() Returns all collected values of the general meta tags. Properties getHttpEquivTags() Returns all collected values of the "http-equiv" meta tags. boolean getNoCache() A convenience method. boolean getNoFollow() A convenience method. boolean getNoIndex() A convenience method. boolean getRefresh() A convenience method. URL getRefreshHref() A convenience method. int getRefreshTime() A convenience method. void reset() Sets all boolean values to false. void setBaseHref(URL baseHref) Sets the baseHref. void setNoCache() Sets noCache to true. void setNoFollow() Sets noFollow to true. void setNoIndex() Sets noIndex to true. void setRefresh(boolean refresh) Sets refresh to the supplied value. void setRefreshHref(URL refreshHref) Sets the refreshHref. void setRefreshTime(int refreshTime) Sets the refreshTime. String toString()

-    

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Constructor Detail

-  

HTMLMetaTags

public HTMLMetaTags()

Method Detail

-  

reset

public void reset()

Sets all boolean values to false. Clears all other tags.

-  

setNoFollow

public void setNoFollow()

Sets noFollow to true.

-  

setNoIndex

public void setNoIndex()

Sets noIndex to true.

-  

setNoCache

public void setNoCache()

Sets noCache to true.

-  

setRefresh

public void setRefresh(boolean refresh)

Sets refresh to the supplied value.

-  

setBaseHref

public void setBaseHref(URL baseHref)

Sets the baseHref.

-  

setRefreshHref

public void setRefreshHref(URL refreshHref)

Sets the refreshHref.

-  

setRefreshTime

public void setRefreshTime(int refreshTime)

Sets the refreshTime.

-  

getNoIndex

public boolean getNoIndex()

A convenience method. Returns the current value of noIndex.

-  

getNoFollow

public boolean getNoFollow()

A convenience method. Returns the current value of noFollow.

-  

getNoCache

public boolean getNoCache()

A convenience method. Returns the current value of noCache.

-  

getRefresh

public boolean getRefresh()

A convenience method. Returns the current value of refresh.

-  

getBaseHref

public URL getBaseHref()

A convenience method. Returns the baseHref, if set, or null otherwise.

-  

getRefreshHref

public URL getRefreshHref()

A convenience method. Returns the refreshHref, if set, or null otherwise. The value may be invalid if getRefresh())returns false.

-  

getRefreshTime

public int getRefreshTime()

A convenience method. Returns the current value of refreshTime. The value may be invalid if getRefresh())returns false.

-  

getGeneralTags

public Metadata getGeneralTags()

Returns all collected values of the general meta tags. Property names are tag names, property values are "content" values.

-  

getHttpEquivTags

public Properties getHttpEquivTags()

Returns all collected values of the "http-equiv" meta tags. Property names are tag names, property values are "content" values.

-  

toString

public String toString()
  - Overrides: 
  - <code>toString</code> in class <code>Object</code>       

Copyright © 2014 The Apache Software Foundation