[TOC]

org.apache.nutch.parse

Class ParseResult


public class ParseResult
extends Object
implements Iterable<Map.Entry<org.apache.hadoop.io.Text,Parse>>

A utility class that stores result of a parse. Internally a ParseResult stores <Text, Parse> pairs. Parsers may return multiple results, which correspond to parts or other associated documents related to the original URL.

There will be usually one parse result that corresponds directly to the original URL, and possibly many (or none) results that correspond to derived URLs (or sub-URLs).

Field Summary

Fields Modifier and Type Field and Description static org.slf4j.Logger LOG

Constructor Summary

Constructors Constructor and Description ParseResult(String originalUrl) Create a container for parse results.

Method Summary

Methods Modifier and Type Method and Description static ParseResult createParseResult(String url, Parse parse) Convenience method for obtaining ParseResult from a single Parse output. void filter() Remove all results where status is not successful (as determined by ParseStatus#isSuccess()). Parse get(String key) Retrieve a single parse output. Parse get(org.apache.hadoop.io.Text key) Retrieve a single parse output. boolean isEmpty() Checks whether the result is empty. boolean isSuccess() A convenience method which returns true only if all parses are successful. Iterator> iterator() Iterate over all entries in the map. void put(String key, ParseText text, ParseData data) Store a result of parsing. void put(org.apache.hadoop.io.Text key, ParseText text, ParseData data) Store a result of parsing. int size() Return the number of parse outputs (both successful and failed)

-    

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

-  

LOG

public static final org.slf4j.Logger LOG

Constructor Detail

-  

ParseResult

public ParseResult(String originalUrl)

Create a container for parse results.

  - Parameters:
  - <code>originalUrl</code> - the original url from which all parse results have been obtained.       

Method Detail

-  

createParseResult

public static ParseResult createParseResult(String url,
                            Parse parse)

Convenience method for obtaining ParseResult from a single Parse output.

  - Parameters:
  - <code>url</code> - canonical url.
  - <code>parse</code> - single parse output. 
  - Returns:
  - result containing the single parse output.       
-  

isEmpty

public boolean isEmpty()

Checks whether the result is empty.

  - Returns:
  -        
-  

size

public int size()

Return the number of parse outputs (both successful and failed)

-  

get

public Parse get(String key)

Retrieve a single parse output.

  - Parameters:
  - <code>key</code> - sub-url under which the parse output is stored. 
  - Returns:
  - parse output corresponding to this sub-url, or null.       
-  

get

public Parse get(org.apache.hadoop.io.Text key)

Retrieve a single parse output.

  - Parameters:
  - <code>key</code> - sub-url under which the parse output is stored. 
  - Returns:
  - parse output corresponding to this sub-url, or null.       
-  

put

public void put(org.apache.hadoop.io.Text key,
       ParseText text,
       ParseData data)

Store a result of parsing.

  - Parameters:
  - <code>key</code> - URL or sub-url of this parse result
  - <code>text</code> - plain text result
  - <code>data</code> - corresponding parse metadata of this result       
-  

put

public void put(String key,
       ParseText text,
       ParseData data)

Store a result of parsing.

  - Parameters:
  - <code>key</code> - URL or sub-url of this parse result
  - <code>text</code> - plain text result
  - <code>data</code> - corresponding parse metadata of this result       
-  

iterator

public Iterator<Map.Entry<org.apache.hadoop.io.Text,Parse>> iterator()

Iterate over all entries in the map.

  - Specified by: 
  - <code>iterator</code> in interface <code>Iterable<map.entry<org.apache.hadoop.io.text,parse>&gt;</map.entry<org.apache.hadoop.io.text,parse></code>        
-  

filter

public void filter()

Remove all results where status is not successful (as determined by ParseStatus#isSuccess()). Note that effects of this operation cannot be reversed.

-  

isSuccess

public boolean isSuccess()

A convenience method which returns true only if all parses are successful. Parse success is determined by ParseStatus#isSuccess().

Copyright © 2014 The Apache Software Foundation