[TOC]

org.apache.nutch.parse

Class ParseData

  • java.lang.Object
    • org.apache.hadoop.io.VersionedWritable
    • org.apache.nutch.parse.ParseData
    • All Implemented Interfaces:
    • org.apache.hadoop.io.Writable

public final class ParseData
extends org.apache.hadoop.io.VersionedWritable

Data extracted from a page's content.

Field Summary

Fields Modifier and Type Field and Description static String DIR_NAME

Constructor Summary

Constructors Constructor and Description ParseData() ParseData(ParseStatus status, String title, Outlink[] outlinks, Metadata contentMeta) ParseData(ParseStatus status, String title, Outlink[] outlinks, Metadata contentMeta, Metadata parseMeta)

Method Summary

Methods Modifier and Type Method and Description boolean equals(Object o) Metadata getContentMeta() The original Metadata retrieved from content String getMeta(String name) Get a metadata single value. Outlink[] getOutlinks() The outlinks of the page. Metadata getParseMeta() Other content properties. ParseStatus getStatus() The status of parsing the page. String getTitle() The title of the page. byte getVersion() static void main(String[] argv) static ParseData read(DataInput in) void readFields(DataInput in) void setOutlinks(Outlink[] outlinks) void setParseMeta(Metadata parseMeta) String toString() void write(DataOutput out)

-    

Methods inherited from class java.lang.Object

clone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Field Detail

-  

DIR_NAME

public static final String DIR_NAME
  - See Also:
  - [Constant Field Values](../../../../constant-values.html#org.apache.nutch.parse.ParseData.DIR_NAME)       

Constructor Detail

-  

ParseData

public ParseData()
-  

ParseData

public ParseData(ParseStatus status,
         String title,
         Outlink[] outlinks,
         Metadata contentMeta)
-  

ParseData

public ParseData(ParseStatus status,
         String title,
         Outlink[] outlinks,
         Metadata contentMeta,
         Metadata parseMeta)

Method Detail

-  

getStatus

public ParseStatus getStatus()

The status of parsing the page.

-  

getTitle

public String getTitle()

The title of the page.

-  

getOutlinks

public Outlink[] getOutlinks()

The outlinks of the page.

-  

getContentMeta

public Metadata getContentMeta()

The original Metadata retrieved from content

-  

getParseMeta

public Metadata getParseMeta()

Other content properties. This is the place to find format-specific properties. Different parser implementations for different content types will populate this differently.

-  

setParseMeta

public void setParseMeta(Metadata parseMeta)
-  

setOutlinks

public void setOutlinks(Outlink[] outlinks)
-  

getMeta

public String getMeta(String name)

Get a metadata single value. This method first looks for the metadata value in the parse metadata. If no value is found it the looks for the metadata in the content metadata.

  - See Also:
  - [<code>getContentMeta()</code>](../../../../org/apache/nutch/parse/ParseData.html#getContentMeta()), [<code>getParseMeta()</code>](../../../../org/apache/nutch/parse/ParseData.html#getParseMeta())       
-  

getVersion

public byte getVersion()
  - Specified by: 
  - <code>getVersion</code> in class <code>org.apache.hadoop.io.VersionedWritable</code>        
-  

readFields

public final void readFields(DataInput in)
                      throws IOException
  - Specified by: 
  - <code>readFields</code> in interface <code>org.apache.hadoop.io.Writable</code> 
  - Overrides: 
  - <code>readFields</code> in class <code>org.apache.hadoop.io.VersionedWritable</code> 
  - Throws: 
  - <code>IOException</code>       
-  

write

public final void write(DataOutput out)
                 throws IOException
  - Specified by: 
  - <code>write</code> in interface <code>org.apache.hadoop.io.Writable</code> 
  - Overrides: 
  - <code>write</code> in class <code>org.apache.hadoop.io.VersionedWritable</code> 
  - Throws: 
  - <code>IOException</code>       
-  

read

public static ParseData read(DataInput in)
                      throws IOException
  - Throws: 
  - <code>IOException</code>       
-  

equals

public boolean equals(Object o)
  - Overrides: 
  - <code>equals</code> in class <code>Object</code>        
-  

toString

public String toString()
  - Overrides: 
  - <code>toString</code> in class <code>Object</code>        
-  

main

public static void main(String[] argv)
                 throws Exception
  - Throws: 
  - <code>Exception</code>      

Copyright © 2014 The Apache Software Foundation