编译impala-3.4.0

硬件要求:内存 >8G,硬盘>40G(交叉编译时硬盘空间占用大)
环境要求:与CDH集群环境一致的另一台机器,不要在集群节点上编译,部分环境变量会被修改,可能导致节点掉线
网络要求:最好科学上网,许多依赖包很大,下载容易出现问题
已编译好的包,百度云下载

1. 下载apach-impala-3.4.0源码

  1. git clone --single-branch --branch 3.4.0 https://github.com/apache/impala.git /opt/impala-3.4
  2. cd /opt/impala-3.4

由于Cloudera maven repo URL变更,需要修改pom.xml才能编译成功(IMPALA-9815),我们打上 IMPALA-9815 的commit: https://github.com/apache/impala/commit/481ea4ab0d476a4aa491f99c2a4e376faddc0b03

git fetch origin 481ea4ab0d476a4aa491f99c2a4e376faddc0b03
git cherry-pick 481ea4ab0d476a4aa491f99c2a4e376faddc0b03

若git 不成功,可手动修改pom.xml

2. bootstrap_system.sh 脚本安装系统环境

export IMPALA_HOME=/opt/impala-3.4
$IMPALA_HOME/bin/bootstrap_system.sh

如果报错,重新执行前需删除

rm -rf /var/lib/pgsql/*

apache-ant问题,以前的版本链接失效,修改版本

# 修改bootstrap_system.sh ant下载地址
vim $IMPALA_HOME/bin/bootstrap_system.sh

242 # Download ant for centos
243 if [ ! -d /usr/local/apache-ant-1.10.12 ];then
244   redhat sudo wget -nv \
245     https://downloads.apache.org/ant/binaries/apache-ant-1.10.12-bin.tar.gz
246   redhat sha512sum -c - <<< '2287dc5cfc21043c14e5413f9afb1c87c9f266ec2a9ba2d3bf2285446f6e4ccb59b558bf2e5c57911a05dfa293c7d5c7ad60ac9f744ba11406f4e6f9a27b2403  apache-ant-1.10.12-bin.tar.gz'
247   redhat sudo tar -C /usr/local -xzf apache-ant-1.10.12-bin.tar.gz
248   redhat sudo ln -s /usr/local/apache-ant-1.10.12/bin/ant /usr/local/bin
249 fi

ivy依赖下载问题,修改url地址

# 修改一下对应配置文件
vim /opt/hadoop-lzo/build.xml

96   <property name="ivy_repo_url" value="https://repo.maven.apache.org/maven2/org/apache/ivy/ivy/${ivy.version}/ivy-${ivy.version}.jar"/>

vim /root/hadoop-lzo/ivy/ivysettings.xml
15     value="https://repo.maven.apache.org/maven2/"

git clone 问题

# 修改一下对应配置文件
vim $IMPALA_HOME/bin/bootstrap_system.sh

# git clone https://github.com/cloudera/hadoop-lzo.git "$HADOOP_LZO_HOME"
git clone git://github.com/cloudera/hadoop-lzo.git "$HADOOP_LZO_HOME"

3. buildall.sh

source $IMPALA_HOME/bin/impala-config.sh
$IMPALA_HOME/buildall.sh -noclean -notests -release

注:如果是测试用途,可以把 -release 去掉,这样编译出来的 impalad 在遇到 bug 时能打出更多信息,比如 bug 能提前被 DCHECK 判定,更容易定位
buildall.sh 问题总结

  1. 可以修改pip镜像,提升python包下载速度 ```shell vim $IMPALA_HOME\infra\python\deps\pip_download.py

PYPI_MIRROR = os.environ.get(‘PYPI_MIRROR’, ‘https://pypi.python.org‘)

PYPI_MIRROR = os.environ.get(‘PYPI_MIRROR’, ‘http://mirrors.aliyun.com/pypi‘)

 如果遇到部分包一直下载不下来,相应的包名字可能需要修改,在$IMPALA_HOME\infra\python\deps\*.requirements中寻找对应的包修改,例如
```shell
#python_dateutil == 2.5.2  
python-dateutil == 2.5.2 

#Cython == 0.23.4
cython == 0.23.4

# 大概有4,5个,去镜像地址搜索真正的名字是什么,主要就是_改为-
  1. bootstrap_toolchain.py 由于python2 默认是ascii编码,需要修改为utf-8
    vi $IMPALA_HOME/bin/bootstrap_toolchain.py 
    import sys
    reload(sys)
    sys.setdefaultencoding('utf-8')
    

    3. 编译成功后

    查看impalad和impala-frontend-0.1-SNAPSHOT.jar这两个包
    $ ll -h be/build/latest/service/impalad fe/target/impala-frontend-0.1-SNAPSHOT.jar
    -rwxrwxr-x 1 root root 460M 6月  20 11:30 be/build/latest/service/impalad*
    -rw-rw-r-- 1 root root 7.5M 6月  20 11:33 fe/target/impala-frontend-0.1-SNAPSHOT.jar
    $ strings be/build/latest/service/impalad | grep 3.4.0
    3.4.0-RELEASE
    
    上面最后一条指令,应该可以在impalad可执行文件中找到 3.4.0-RELEASE 这样的字符串。编译出来的 impalad 可执行文件有 400 多M,因为包含了很多符号信息。可以用 strip —strip-debug impalad 降低它的大小

impala默认使用静态编译,但还是有一些动态依赖,用 ldd 指令查看:

ldd be/build/latest/service/impalad  
be/build/latest/service/impalad: /lib64/libstdc++.so.6: version `CXXABI_1.3.8' not found (required by be/build/latest/service/impalad)
be/build/latest/service/impalad: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by be/build/latest/service/impalad)
be/build/latest/service/impalad: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /opt/impala-3.4/toolchain/kudu-4ed0dbbd1/debug/lib64/libkudu_client.so.0)
be/build/latest/service/impalad: /lib64/libstdc++.so.6: version `CXXABI_1.3.8' not found (required by /opt/impala-3.4/toolchain/kudu-4ed0dbbd1/debug/lib64/libkudu_client.so.0)
    linux-vdso.so.1 =>  (0x00007ffd20ff6000)
    libjsig.so => /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.312.b07-1.el7_9.x86_64/jre/lib/amd64/libjsig.so (0x00007f1e8a775000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f1e8a559000)
    libsasl2.so.3 => /lib64/libsasl2.so.3 (0x00007f1e8a33c000)
    libjvm.so => not found
    libkudu_client.so.0 => /opt/impala-3.4/toolchain/kudu-4ed0dbbd1/debug/lib64/libkudu_client.so.0 (0x00007f1e89bbd000)
    librt.so.1 => /lib64/librt.so.1 (0x00007f1e899b5000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00007f1e897b1000)
    libssl.so.10 => /lib64/libssl.so.10 (0x00007f1e8953f000)
    libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007f1e890dc000)
    libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007f1e88df3000)
    libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007f1e88ba6000)
    libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f1e8889e000)
    libm.so.6 => /lib64/libm.so.6 (0x00007f1e8859c000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f1e88386000)
    libc.so.6 => /lib64/libc.so.6 (0x00007f1e87fb8000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f1e8a979000)
    libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f1e87d9e000)
    libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007f1e87b67000)
    libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007f1e87934000)
    libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f1e87730000)
    libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007f1e87520000)
    libz.so.1 => /lib64/libz.so.1 (0x00007f1e8730a000)
    libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f1e87106000)
    libfreebl3.so => /lib64/libfreebl3.so (0x00007f1e86f03000)
    libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f1e86cdc000)
    libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f1e86a7a000)

这些 so 文件大部分是系统自带的或者已安装的,我们只要复制跟Impala版本相关的就好,比如说 libkudu_client.so.0,其它的不需要一并复制。

CDH6.3.2升级impala-3.4.0

1. impala目录

复制一份原始的cdh impala目录,在它的基础上修改

# cd /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib
# cp -r impala apache-impala-3.4
# cd apache-impala-3.4

lib目录里的jar包都删了,剩下so文件

# rm lib/*.jar
# ll lib
-rw-r--r-- 1 root root   89864 Jun 19 22:37 libgcc_s.so.1
lrwxrwxrwx 1 root root      36 Jun 19 22:37 libhadoop.so -> ../../hadoop/lib/native/libhadoop.so
lrwxrwxrwx 1 root root      42 Jun 19 22:37 libhadoop.so.1.0.0 -> ../../hadoop/lib/native/libhadoop.so.1.0.0
-rw-r--r-- 1 root root 6638528 Jun 19 22:37 libkudu_client.so.0
-rw-r--r-- 1 root root 6638528 Jun 19 22:37 libkudu_client.so.0.1.0
-rw-r--r-- 1 root root 1003416 Jun 19 22:37 libstdc++.so.6
-rw-r--r-- 1 root root 1003424 Jun 19 22:37 libstdc++.so.6.0.20

libkudu_client.so.0

libkudu_client.so.0 替换为我们编译Impala 3.4时用的,从前面ldd的输出可以看到在 $IMPALA_HOME/toolchain/kudu-4ed0dbbd1/debug/lib/libkudu_client.so.0,其它so文件不用管

impala-3.4依赖的jar

impala-3.4依赖的jar包也都复制进这个lib目录,它们在编译目录里能找到,具体路径是 $IMPALA_HOME/fe/target/dependency/

impala-frontend

impala-3.4编译出来的 impala-frontend-0.1-SNAPSHOT.jar 放进lib目录,在编译目录里的路径是 fe/target/impala-frontend-0.1-SNAPSHOT.jar

impala-data-source-api

把 impala-3.4编译出来的 impala-data-source-api-1.0-SNAPSHOT.jar 放进lib目录,在编译目录里的路径是 ext-data-source/api/target/impala-data-source-api-1.0-SNAPSHOT.jar

sbin-retail目录

把里面的impalad换成apache impala 3.4编译后的impalad,在编译目录里的路径是 be/build/latest/service/impalad
检查catalogd和statestored两个软链是否指向了impalad:

# ll sbin-retail
lrwxrwxrwx 1 root root         7 Jun 19 22:37 catalogd -> impalad*
-rwxr-xr-x 1 root root 481420800 Jun 20 00:06 impalad*
lrwxrwxrwx 1 root root         7 Jun 19 22:37 statestored -> impalad*

www目录

这个目录是WebUI用的,把旧版的删了,复制新版的过来

新版impala依赖总结

$IMPALA_HOME/fe/target/dependency/
$IMPALA_HOME/www/
$IMPALA_HOME/be/build/latest/service/impalad
$IMPALA_HOME/fe/target/impala-frontend-0.1-SNAPSHOT.jar
$IMPALA_HOME/ext-data-source/api/target/impala-data-source-api-1.0-SNAPSHOT.jar
$IMPALA_HOME/toolchain/kudu-4ed0dbbd1/debugb64bkudu_client.so.0

2. 更改CM配置并重启

把新的Impala目录放到所有机器上,确保它们一致。然后在CM中去到Impala -> 配置 -> env,加一个环境变量 IMPALA_HOME=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/apache-impala-3.4
image.png
然后重启整个Impala集群。重启成功后,可以看到catalogd、statestored、impalad启用了新页面,并且版本号是3.4
image.png

3. 验证和回滚

最后要验证一下集群是否工作正常,包括Impala 3.4的一些新功能。如果有任何不兼容的地方,回滚CM的配置再重启Impala集群就回来了,因为我们并没有动老版本的任何东西

4. 总结

在CDH中单独升级Impala,可以通过以下步骤:

  1. 在 /opt/cloudera/parcels/CDH/lib 目录下生成一个新的impala目录
  2. 复制新版本的以下内容到目录对应位置:impalad、impala-frontend-0.1-SNAPSHOT.jar、所有新版FE依赖的jar包、www目录、新版本依赖的 libkudu_client.so.0
  3. 在CM的Impala Service Environment Advanced Configuration Snippet (Safety Valve)配置中,设置IMPALA_HOME环境变量指向新目录

    参考文献

  4. https://blog.csdn.net/weixin_45104537/article/details/121487197

  5. https://www.icode9.com/content-4-883156.html
  6. https://blog.csdn.net/huang_quanlong/article/details/106868826