org.apache.nutch.parse
Class ParseResult
- java.lang.Object
- org.apache.nutch.parse.ParseResult
public class ParseResult extends Object implements Iterable<Map.Entry<org.apache.hadoop.io.Text,Parse>>
A utility class that stores result of a parse. Internally a ParseResult stores <Text
, Parse
> pairs.
Parsers may return multiple results, which correspond to parts or other associated documents related to the original URL.
There will be usually one parse result that corresponds directly to the original URL, and possibly many (or none) results that correspond to derived URLs (or sub-URLs).
Field Summary
Fields Modifier and Type Field and Description static org.slf4j.Logger
LOG
Constructor Summary
Constructors Constructor and Description ParseResult(String originalUrl)
Create a container for parse results.
Method Summary
Methods Modifier and Type Method and Description static ParseResult
createParseResult(String url,
Parse parse)
Convenience method for obtaining ParseResult
from a single Parse
output.
void
filter()
Remove all results where status is not successful (as determined by ParseStatus#isSuccess()).
Parse
get(String key)
Retrieve a single parse output.
Parse
get(org.apache.hadoop.io.Text key)
Retrieve a single parse output.
boolean
isEmpty()
Checks whether the result is empty.
boolean
isSuccess()
A convenience method which returns true only if all parses are successful.
Iterator
iterator()
Iterate over all entries in the void
put(String key,
ParseText text,
ParseData data)
Store a result of parsing.
void
put(org.apache.hadoop.io.Text key,
ParseText text,
ParseData data)
Store a result of parsing.
int
size()
Return the number of parse outputs (both successful and failed)
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Field Detail
-
LOG
public static final org.slf4j.Logger LOG
Constructor Detail
-
ParseResult
public ParseResult(String originalUrl)
Create a container for parse results.
- Parameters:
- <code>originalUrl</code> - the original url from which all parse results have been obtained.
Method Detail
-
createParseResult
public static ParseResult createParseResult(String url, Parse parse)
Convenience method for obtaining ParseResult
from a single Parse
output.
- Parameters:
- <code>url</code> - canonical url.
- <code>parse</code> - single parse output.
- Returns:
- result containing the single parse output.
-
isEmpty
public boolean isEmpty()
Checks whether the result is empty.
- Returns:
-
-
size
public int size()
Return the number of parse outputs (both successful and failed)
-
get
public Parse get(String key)
Retrieve a single parse output.
- Parameters:
- <code>key</code> - sub-url under which the parse output is stored.
- Returns:
- parse output corresponding to this sub-url, or null.
-
get
public Parse get(org.apache.hadoop.io.Text key)
Retrieve a single parse output.
- Parameters:
- <code>key</code> - sub-url under which the parse output is stored.
- Returns:
- parse output corresponding to this sub-url, or null.
-
put
public void put(org.apache.hadoop.io.Text key, ParseText text, ParseData data)
Store a result of parsing.
- Parameters:
- <code>key</code> - URL or sub-url of this parse result
- <code>text</code> - plain text result
- <code>data</code> - corresponding parse metadata of this result
-
put
public void put(String key, ParseText text, ParseData data)
Store a result of parsing.
- Parameters:
- <code>key</code> - URL or sub-url of this parse result
- <code>text</code> - plain text result
- <code>data</code> - corresponding parse metadata of this result
-
iterator
public Iterator<Map.Entry<org.apache.hadoop.io.Text,Parse>> iterator()
Iterate over all entries in the
- Specified by:
- <code>iterator</code> in interface <code>Iterable<map.entry<org.apache.hadoop.io.text,parse>></map.entry<org.apache.hadoop.io.text,parse></code>
-
filter
public void filter()
Remove all results where status is not successful (as determined by ParseStatus#isSuccess()). Note that effects of this operation cannot be reversed.
-
isSuccess
public boolean isSuccess()
A convenience method which returns true only if all parses are successful. Parse success is determined by ParseStatus#isSuccess()
.