一、Snappy
1. snzip 基于 Snappy 的压缩/解压工具
- 使用的版本 1.0.4
- 支持框架的格式化
- framing-format
- old framing-format
- hadoop-snappy format (Hadoop Snappy 文件格式的压缩)
- raw format
- snappy-java
- snappy-in-java
- snzip 项目地址
下载、安装使用方法, 具体见 github 文档
1. 安装tar xvfz snzip-1.0.4.tar.gzcd snzip-1.0.4./configure --prefix=/usr/local/snappymakemake install2. 加载到系统环境vim ~/.bashrc# snzipexport SNZIP_HOME=/usr/local/snappyexport PATH=${SNZIP_HOME}/bin:$PATHsource ~/.bashrc3. snzip -helpgeneral options:-c 输出到标准输出,保持原始文件不变-d 解压缩-k 不删除原文件-t name 压缩框架文件格式-h give this helpraw_format option:-s size size of input data when compressing.The default value is the file size i f availabletuning options(调优参数):-b num internal block size in bytes-B num internal block size. \'num\'-th power of two.-R num size of read buffer in bytes-W num size of write buffer in bytes-T trace for debugsupported formats(压缩框架格式选择):NAME SUFFIX URL---- ------ ---framing2 sz https://github.com/google/snappy/blob/master/framing_format.txthadoop-snappy snappy https://code.google.com/p/hadoop-snappy/iwa iwa https://github.com/obriensp/iWorkFileFormat/blob/master/Docs/index.md#snappy-compressionframing sz https://github.com/google/snappy/blob/0755c815197dacc77d8971ae917c86d7aa96bf8e/framing_format.txtsnzip snz https://github.com/kubo/snzipsnappy-java snappy https://github.com/xerial/snappy-javasnappy-in-java snappy https://github.com/dain/snappycomment-43 snappy http://code.google.com/p/snappy/issues/detail?id=34#c434. 压缩 hadoop 框架支持的格式化snzip -t -k hadoop-snappy -k file_name 压缩snzip -d compressed_file.snappy 解压
2. python 压缩/解压接口(不兼容 HDFS 原生的 Snappy)
1. 安装依赖包ubuntu:sudo apt-get install libsnappy-devCentos:sudo yum install libsnappy-develBrew:brew install snappy安装pip install python-snappypython -m snappy --help2. 压缩/解压文件python -m snappy -c uncompressed_file compressed_file.snappypython -m snappy -d compressed_file.snappy uncompressed_file3. 压缩/解压 Streamcat uncompressed_data | python -m snappy -c > compressed_data.snappycat compressed_data.snappy | python -m snappy -d > uncompressed_data
3. java 压缩/解压接口
1. 注意事项, 如果在 Mac 环境中使用请把 jar 包解压复制 libsnappyjava.jnilib -> libsnappyjava.dylibcp org/xerial/snappy/native/Mac/x86_64/libsnappyjava.jnilib org/xerial/snappy/native/Mac/x86_64/libsnappyjava.dylib重启打包jar cf snappy-java-1.0.4.1.jar org2. pom.xml 配置加载本地包<dependency><groupId>org.xerial.snappy</groupId><artifactId>snappy-java</artifactId><version>1.0.4.1</version><scope>system</scope><systemPath>${basedir}/lib/snappy-java-1.0.4.1.jar</systemPath></dependency></dependencies>
二、SPARK 配置
# Spark 配置 Snappyexport JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:$HADOOP_HOME/lib/nativeexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/nativeexport SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:$HADOOP_HOME/lib/nativeexport SPARK_CLASSPATH=$SPARK_CLASSPATH:$HADOOP_HOME/lib/snappy-java-1.0.4.1.jarspark-sql --jars file:///etc/hive/auxlib/json-serde-1.3.7-jar-with-dependencies.jar,file:///usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar
