[TOC]

org.apache.nutch.parse

Class ParseOutputFormat

    • All Implemented Interfaces:
    • org.apache.hadoop.mapred.OutputFormat

public class ParseOutputFormat
extends Object
implements org.apache.hadoop.mapred.OutputFormat<org.apache.hadoop.io.Text,Parse>

Constructor Summary

Constructors Constructor and Description ParseOutputFormat()

Method Summary

Methods Modifier and Type Method and Description void checkOutputSpecs(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.mapred.JobConf job) static String filterNormalize(String fromUrl, String toUrl, String fromHost, boolean ignoreExternalLinks, URLFilters filters, URLNormalizers normalizers) static String filterNormalize(String fromUrl, String toUrl, String fromHost, boolean ignoreExternalLinks, URLFilters filters, URLNormalizers normalizers, String urlNormalizerScope) org.apache.hadoop.mapred.RecordWriter getRecordWriter(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.mapred.JobConf job, String name, org.apache.hadoop.util.Progressable progress)

-    

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail

-  

ParseOutputFormat

public ParseOutputFormat()

Method Detail

-  

checkOutputSpecs

public void checkOutputSpecs(org.apache.hadoop.fs.FileSystem fs,
                    org.apache.hadoop.mapred.JobConf job)
                      throws IOException
  - Specified by: 
  - <code>checkOutputSpecs</code> in interface <code>org.apache.hadoop.mapred.OutputFormat<org.apache.hadoop.io.text,parse></org.apache.hadoop.io.text,parse></code> 
  - Throws: 
  - <code>IOException</code>       
-  

getRecordWriter

public org.apache.hadoop.mapred.RecordWriter<org.apache.hadoop.io.Text,Parse> getRecordWriter(org.apache.hadoop.fs.FileSystem fs,
                                                                                     org.apache.hadoop.mapred.JobConf job,
                                                                                     String name,
                                                                                     org.apache.hadoop.util.Progressable progress)
                                                                                       throws IOException
  - Specified by: 
  - <code>getRecordWriter</code> in interface <code>org.apache.hadoop.mapred.OutputFormat<org.apache.hadoop.io.text,parse></org.apache.hadoop.io.text,parse></code> 
  - Throws: 
  - <code>IOException</code>       
-  

filterNormalize

public static String filterNormalize(String fromUrl,
                     String toUrl,
                     String fromHost,
                     boolean ignoreExternalLinks,
                     URLFilters filters,
                     URLNormalizers normalizers)
-  

filterNormalize

public static String filterNormalize(String fromUrl,
                     String toUrl,
                     String fromHost,
                     boolean ignoreExternalLinks,
                     URLFilters filters,
                     URLNormalizers normalizers,
                     String urlNormalizerScope)

Copyright © 2014 The Apache Software Foundation