LZO库编译

  1. 进入官网下载源码包
  2. 安装maven

    1. 配置maven的仓库为阿里云镜像

      1. <mirrors>
      2. <mirror>
      3. <id>nexus-aliyun</id>
      4. <mirrorOf>central</mirrorOf>
      5. <name>Nexus aliyun</name>
      6. <url>http://maven.aliyun.com/nexus/content/groups/public</url>
      7. </mirror>
      8. </mirrors>
    2. 配置环境变量

  3. 通过yum安装: yum -y install gcc-c++ lzo-devel zlib-devel autoconf automake libtool
  4. 生成Confiuration文件: ./configure -prefix=/usr/local/hadoop/1zo/
  5. 编译: make
  6. 安装: make install

到目前位置,安装的是LZO的库,下面要编译安装hadoop-lzo的依赖

Hadoop-LZO

  1. 去官网下载Hadoop-lzo
  2. 解压后,去修改pom.xml:修改hadoop的版本

    <hadoop.current.version>2.7.2</hadoop.current.version>
    
  3. 声明两个变量(上面安装的LZO路径):

    export C_INCLUDE_PATH=/usr/local/hadoop/lzo/include
    export LIBRARY_PATH=/usr/local/hadoop/lzo/lib
    
  4. 执行 mvn package -Dmaven.test.skip=true

  5. 复制生成的 target/hadoop-lzo-0.4.20.jar 到 hadoop根目录 下的 /share/hadoop/common/
  6. 分发该jar包到其他节点上(如果有的话)
  7. 配置 core-site.xml ,添加一个节点 ```xml io.compression.codecs org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.DefaultCodec, org.apache.hadoop.io.compress.BZip2Codec, org.apache.hadoop.io.compress.SnappyCodec, com.hadoop.compression.1zo.LzoCodec, com.hadoop.compression.1zo.LzopCodec io.compression.codec.lzo.c1ass com.hadoop.compression.lzo.LzoCodec

8. 分发 `core-site.xml` ,重启所有节点

<a name="ctWnV"></a>
# 测试

1. 首先搞一个lzo测试文件,我把测试文件丢网盘上了,需要的小伙伴自行下载,戳[这里 (提取码:ricl)](https://pan.baidu.com/s/1UVqjkdmfmaCgUgBQQEBfpg)下载
1. 把该测试文件上传到hadoop里
   1. 创建文件夹:`hadoop fs -mkdir /input`
   1. 上传文件: `hadoop fs -put /home/codeleven/bigtable.lzo /input` 
3. 运行测试 `hadoop jar /opt/module/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /input /output1` 
```bash
20/08/01 20:09:47 INFO client.RMProxy: Connecting to ResourceManager at hadoop2/192.168.127.102:8032
20/08/01 20:09:48 INFO input.FileInputFormat: Total input paths to process : 1
20/08/01 20:09:48 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library from the embedded binaries
20/08/01 20:09:48 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 52decc77982b58949890770d22720a91adce0c3f]
20/08/01 20:09:49 INFO mapreduce.JobSubmitter: number of splits:1
20/08/01 20:09:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1596283728425_0001
20/08/01 20:09:49 INFO impl.YarnClientImpl: Submitted application application_1596283728425_0001
20/08/01 20:09:49 INFO mapreduce.Job: The url to track the job: http://hadoop2:8088/proxy/application_1596283728425_0001/
20/08/01 20:09:49 INFO mapreduce.Job: Running job: job_1596283728425_0001

创建索引分片

 hadoop jar ../share/hadoop/common/hadoop-lzo-0.4.20.jar \
 com.hadoop.compression.lzo.DistributedLzoIndexer \
 /input/bigtable.lzo

执行成功后会在HDFS里多出一个文件:
image.png
此时再进行测试: hadoop jar /opt/module/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /input /output3
输出如下结果,看第五行,已经是分成两个切片了:

20/08/01 20:55:48 INFO client.RMProxy: Connecting to ResourceManager at hadoop2/192.168.127.102:8032
20/08/01 20:55:50 INFO input.FileInputFormat: Total input paths to process : 2
20/08/01 20:55:50 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library from the embedded binaries
20/08/01 20:55:50 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 52decc77982b58949890770d22720a91adce0c3f]
20/08/01 20:55:50 INFO mapreduce.JobSubmitter: number of splits:2
20/08/01 20:55:50 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1596283728425_0004
20/08/01 20:55:50 INFO impl.YarnClientImpl: Submitted application application_1596283728425_0004
20/08/01 20:55:50 INFO mapreduce.Job: The url to track the job: http://hadoop2:8088/proxy/application_1596283728425_0004/
20/08/01 20:55:50 INFO mapreduce.Job: Running job: job_1596283728425_0004