org.apache.nutch.parse
Class ParseData
- java.lang.Object
- org.apache.hadoop.io.VersionedWritable
- org.apache.nutch.parse.ParseData
- All Implemented Interfaces:
- org.apache.hadoop.io.Writable
public final class ParseData extends org.apache.hadoop.io.VersionedWritable
Data extracted from a page's content.
- See Also:
Parse.getData()
)
Field Summary
Fields Modifier and Type Field and Description static String
DIR_NAME
Constructor Summary
Constructors Constructor and Description ParseData()
ParseData(ParseStatus status,
String title,
Outlink[] outlinks,
Metadata contentMeta)
ParseData(ParseStatus status,
String title,
Outlink[] outlinks,
Metadata contentMeta,
Metadata parseMeta)
Method Summary
Methods Modifier and Type Method and Description boolean
equals(Object o)
Metadata
getContentMeta()
The original Metadata retrieved from content
String
getMeta(String name)
Get a metadata single value.
Outlink[]
getOutlinks()
The outlinks of the page.
Metadata
getParseMeta()
Other content properties.
ParseStatus
getStatus()
The status of parsing the page.
String
getTitle()
The title of the page.
byte
getVersion()
static void
main(String[] argv)
static ParseData
read(DataInput in)
void
readFields(DataInput in)
void
setOutlinks(Outlink[] outlinks)
void
setParseMeta(Metadata parseMeta)
String
toString()
void
write(DataOutput out)
-
Methods inherited from class java.lang.Object
clone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Field Detail
-
DIR_NAME
public static final String DIR_NAME
- See Also:
- [Constant Field Values](../../../../constant-values.html#org.apache.nutch.parse.ParseData.DIR_NAME)
Constructor Detail
-
ParseData
public ParseData()
-
ParseData
public ParseData(ParseStatus status, String title, Outlink[] outlinks, Metadata contentMeta)
-
ParseData
public ParseData(ParseStatus status, String title, Outlink[] outlinks, Metadata contentMeta, Metadata parseMeta)
Method Detail
-
getStatus
public ParseStatus getStatus()
The status of parsing the page.
-
getTitle
public String getTitle()
The title of the page.
-
getOutlinks
public Outlink[] getOutlinks()
The outlinks of the page.
-
getContentMeta
public Metadata getContentMeta()
The original Metadata retrieved from content
-
getParseMeta
public Metadata getParseMeta()
Other content properties. This is the place to find format-specific properties. Different parser implementations for different content types will populate this differently.
-
setParseMeta
public void setParseMeta(Metadata parseMeta)
-
setOutlinks
public void setOutlinks(Outlink[] outlinks)
-
getMeta
public String getMeta(String name)
Get a metadata single value. This method first looks for the metadata value in the parse metadata. If no value is found it the looks for the metadata in the content metadata.
- See Also:
- [<code>getContentMeta()</code>](../../../../org/apache/nutch/parse/ParseData.html#getContentMeta()), [<code>getParseMeta()</code>](../../../../org/apache/nutch/parse/ParseData.html#getParseMeta())
-
getVersion
public byte getVersion()
- Specified by:
- <code>getVersion</code> in class <code>org.apache.hadoop.io.VersionedWritable</code>
-
readFields
public final void readFields(DataInput in) throws IOException
- Specified by:
- <code>readFields</code> in interface <code>org.apache.hadoop.io.Writable</code>
- Overrides:
- <code>readFields</code> in class <code>org.apache.hadoop.io.VersionedWritable</code>
- Throws:
- <code>IOException</code>
-
write
public final void write(DataOutput out) throws IOException
- Specified by:
- <code>write</code> in interface <code>org.apache.hadoop.io.Writable</code>
- Overrides:
- <code>write</code> in class <code>org.apache.hadoop.io.VersionedWritable</code>
- Throws:
- <code>IOException</code>
-
read
public static ParseData read(DataInput in) throws IOException
- Throws:
- <code>IOException</code>
-
equals
public boolean equals(Object o)
- Overrides:
- <code>equals</code> in class <code>Object</code>
-
toString
public String toString()
- Overrides:
- <code>toString</code> in class <code>Object</code>
-
main
public static void main(String[] argv) throws Exception
- Throws:
- <code>Exception</code>