org.apache.nutch.parse
Class ParseOutputFormat
- java.lang.Object
- org.apache.nutch.parse.ParseOutputFormat
- All Implemented Interfaces:
- org.apache.hadoop.mapred.OutputFormat
public class ParseOutputFormat extends Object implements org.apache.hadoop.mapred.OutputFormat<org.apache.hadoop.io.Text,Parse>
Constructor Summary
Constructors Constructor and Description ParseOutputFormat()
Method Summary
Methods Modifier and Type Method and Description void
checkOutputSpecs(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.mapred.JobConf job)
static String
filterNormalize(String fromUrl,
String toUrl,
String fromHost,
boolean ignoreExternalLinks,
URLFilters filters,
URLNormalizers normalizers)
static String
filterNormalize(String fromUrl,
String toUrl,
String fromHost,
boolean ignoreExternalLinks,
URLFilters filters,
URLNormalizers normalizers,
String urlNormalizerScope)
org.apache.hadoop.mapred.RecordWriter
getRecordWriter(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.mapred.JobConf job,
String name,
org.apache.hadoop.util.Progressable progress)
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Constructor Detail
-
ParseOutputFormat
public ParseOutputFormat()
Method Detail
-
checkOutputSpecs
public void checkOutputSpecs(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.mapred.JobConf job) throws IOException
- Specified by:
- <code>checkOutputSpecs</code> in interface <code>org.apache.hadoop.mapred.OutputFormat<org.apache.hadoop.io.text,parse></org.apache.hadoop.io.text,parse></code>
- Throws:
- <code>IOException</code>
-
getRecordWriter
public org.apache.hadoop.mapred.RecordWriter<org.apache.hadoop.io.Text,Parse> getRecordWriter(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.mapred.JobConf job, String name, org.apache.hadoop.util.Progressable progress) throws IOException
- Specified by:
- <code>getRecordWriter</code> in interface <code>org.apache.hadoop.mapred.OutputFormat<org.apache.hadoop.io.text,parse></org.apache.hadoop.io.text,parse></code>
- Throws:
- <code>IOException</code>
-
filterNormalize
public static String filterNormalize(String fromUrl, String toUrl, String fromHost, boolean ignoreExternalLinks, URLFilters filters, URLNormalizers normalizers)
-
filterNormalize
public static String filterNormalize(String fromUrl, String toUrl, String fromHost, boolean ignoreExternalLinks, URLFilters filters, URLNormalizers normalizers, String urlNormalizerScope)