1. 环境说明

系统环境:

  • 操作系统:CentOs 6.6
  • Hadoop版本:CDH5.4
  • JDK版本:1.7.0_71
  • 运行用户:root

集群各节点角色规划为:

  1. 192.168.56.121 cdh1 NameNodeResourceManagerHBaseHive metastoreImpala CatalogImpala statestoreSentry
  2. 192.168.56.122 cdh2 DataNodeSecondaryNameNodeNodeManagerHBaseHive Server2Impala Server
  3. 192.168.56.123 cdh3 DataNodeHBaseNodeManagerHive Server2Impala Server

cdh1作为master节点,其他节点作为slave节点,hostname 请使用小写,要不然在集成 kerberos 时会出现一些错误。

2. 安装必须的依赖

在每个节点上运行下面的命令:

  1. $ yum install python-devel openssl-devel python-pip cyrus-sasl cyrus-sasl-gssapi cyrus-sasl-devel -y
  2. $ pip-python install ssl

3. 生成 keytab

在 cdh1 节点,即 KDC server 节点上执行下面命令:

  1. $ cd /var/kerberos/krb5kdc/
  2. kadmin.local -q "addprinc -randkey impala/cdh1@JAVACHEN.COM "
  3. kadmin.local -q "addprinc -randkey impala/cdh2@JAVACHEN.COM "
  4. kadmin.local -q "addprinc -randkey impala/cdh3@JAVACHEN.COM "
  5. kadmin.local -q "xst -k impala-unmerge.keytab impala/cdh1@JAVACHEN.COM "
  6. kadmin.local -q "xst -k impala-unmerge.keytab impala/cdh2@JAVACHEN.COM "
  7. kadmin.local -q "xst -k impala-unmerge.keytab impala/cdh3@JAVACHEN.COM "

另外,如果你使用了haproxy来做负载均衡,参考官方文档Using Impala through a Proxy for High Availability,还需生成 proxy.keytab:

  1. $ cd /var/kerberos/krb5kdc/
  2. # proxy 为安装了 haproxy 的机器
  3. kadmin.local -q "addprinc -randkey impala/proxy@JAVACHEN.COM "
  4. kadmin.local -q "xst -k proxy.keytab impala/proxy@JAVACHEN.COM "

合并 proxy.keytab 和 impala-unmerge.keytab 生成 impala.keytab:

  1. $ ktutil
  2. ktutil: rkt proxy.keytab
  3. ktutil: rkt impala-unmerge.keytab
  4. ktutil: wkt impala.keytab
  5. ktutil: quit

拷贝 impala.keytab 和 proxy_impala.keytab 文件到其他节点的 /etc/impala/conf 目录

  1. $ scp impala.keytab cdh1:/etc/impala/conf
  2. $ scp impala.keytab cdh2:/etc/impala/conf
  3. $ scp impala.keytab cdh3:/etc/impala/conf

并设置权限,分别在 cdh1、cdh2、cdh3 上执行:

  1. $ ssh cdh1 "cd /etc/impala/conf/;chown impala:hadoop *.keytab ;chmod 400 *.keytab"
  2. $ ssh cdh2 "cd /etc/impala/conf/;chown impala:hadoop *.keytab ;chmod 400 *.keytab"
  3. $ ssh cdh3 "cd /etc/impala/conf/;chown impala:hadoop *.keytab ;chmod 400 *.keytab"

由于 keytab 相当于有了永久凭证,不需要提供密码(如果修改 kdc 中的 principal 的密码,则该 keytab 就会失效),所以其他用户如果对该文件有读权限,就可以冒充 keytab 中指定的用户身份访问 hadoop,所以 keytab 文件需要确保只对 owner 有读权限(0400)

4. 修改 impala 配置文件

修改 cdh1 节点上的 /etc/default/impala,在 IMPALA_CATALOG_ARGSIMPALA_SERVER_ARGSIMPALA_STATE_STORE_ARGS 中添加下面参数:

  1. -kerberos_reinit_interval=60
  2. -principal=impala/_HOST@JAVACHEN.COM
  3. -keytab_file=/etc/impala/conf/impala.keytab

如果使用了 HAProxy(关于 HAProxy 的配置请参考 Hive使用HAProxy配置HA),则 IMPALA_SERVER_ARGS 参数需要修改为(proxy为 HAProxy 机器的名称,这里我是将 HAProxy 安装在 cdh1 节点上):

  1. -kerberos_reinit_interval=60
  2. -be_principal=impala/_HOST@JAVACHEN.COM
  3. -principal=impala/proxy@JAVACHEN.COM
  4. -keytab_file=/etc/impala/conf/impala.keytab

IMPALA_CATALOG_ARGS 中添加:

  1. -state_store_host=${IMPALA_STATE_STORE_HOST} \

将修改的上面文件同步到其他节点。最后,/etc/default/impala 文件如下,这里,为了避免 hostname 存在大写的情况,使用 hostname 变量替换 _HOST

  1. IMPALA_CATALOG_SERVICE_HOST=cdh1
  2. IMPALA_STATE_STORE_HOST=cdh1
  3. IMPALA_STATE_STORE_PORT=24000
  4. IMPALA_BACKEND_PORT=22000
  5. IMPALA_LOG_DIR=/var/log/impala
  6. IMPALA_MEM_DEF=$(free -m |awk 'NR==2{print $2-5120}')
  7. hostname=`hostname -f |tr "[:upper:]" "[:lower:]"`
  8. IMPALA_CATALOG_ARGS=" -log_dir=${IMPALA_LOG_DIR} -state_store_host=${IMPALA_STATE_STORE_HOST} \
  9. -kerberos_reinit_interval=60\
  10. -principal=impala/${hostname}@JAVACHEN.COM \
  11. -keytab_file=/etc/impala/conf/impala.keytab
  12. "
  13. IMPALA_STATE_STORE_ARGS=" -log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT}\
  14. -statestore_subscriber_timeout_seconds=15 \
  15. -kerberos_reinit_interval=60 \
  16. -principal=impala/${hostname}@JAVACHEN.COM \
  17. -keytab_file=/etc/impala/conf/impala.keytab
  18. "
  19. IMPALA_SERVER_ARGS=" \
  20. -log_dir=${IMPALA_LOG_DIR} \
  21. -catalog_service_host=${IMPALA_CATALOG_SERVICE_HOST} \
  22. -state_store_port=${IMPALA_STATE_STORE_PORT} \
  23. -use_statestore \
  24. -state_store_host=${IMPALA_STATE_STORE_HOST} \
  25. -be_port=${IMPALA_BACKEND_PORT} \
  26. -kerberos_reinit_interval=60 \
  27. -be_principal=impala/${hostname}@JAVACHEN.COM \
  28. -principal=impala/cdh1@JAVACHEN.COM \
  29. -keytab_file=/etc/impala/conf/impala.keytab \
  30. -mem_limit=${IMPALA_MEM_DEF}m
  31. "
  32. ENABLE_CORE_DUMPS=false

将修改的上面文件同步到其他节点:cdh2、cdh3:

  1. $ scp /etc/default/impala cdh2:/etc/default/impala
  2. $ scp /etc/default/impala cdh3:/etc/default/impala

更新 impala 配置文件下的文件并同步到其他节点:

  1. cp /etc/hadoop/conf/core-site.xml /etc/impala/conf/
  2. cp /etc/hadoop/conf/hdfs-site.xml /etc/impala/conf/
  3. cp /etc/hive/conf/hive-site.xml /etc/impala/conf/
  4. scp -r /etc/impala/conf cdh2:/etc/impala
  5. scp -r /etc/impala/conf cdh3:/etc/impala

5. 启动服务

启动 impala-state-store

impala-state-store 是通过 impala 用户启动的,故在 cdh1 上先获取 impala 用户的 ticket 再启动服务:

  1. $ kinit -k -t /etc/impala/conf/impala.keytab impala/cdh1@JAVACHEN.COM
  2. $ service impala-state-store start

然后查看日志,确认是否启动成功。

  1. $ tailf /var/log/impala/statestored.INFO

启动 impala-catalog

impala-catalog 是通过 impala 用户启动的,故在 cdh1 上先获取 impala 用户的 ticket 再启动服务:

  1. $ kinit -k -t /etc/impala/conf/impala.keytab impala/cdh1@JAVACHEN.COM
  2. $ service impala-catalog start

然后查看日志,确认是否启动成功。

  1. $ tailf /var/log/impala/catalogd.INFO

启动 impala-server

impala-server 是通过 impala 用户启动的,故在 cdh1 上先获取 impala 用户的 ticket 再启动服务:

  1. $ kinit -k -t /etc/impala/conf/impala.keytab impala/cdh1@JAVACHEN.COM
  2. $ service impala-server start

然后查看日志,确认是否启动成功。

  1. $ tailf /var/log/impala/impalad.INFO

6. 测试

测试 impala-shell

在启用了 kerberos 之后,运行 impala-shell 时,需要添加 -k 参数:

  1. $ impala-shell -k
  2. Starting Impala Shell using Kerberos authentication
  3. Using service name 'impala'
  4. Connected to cdh1:21000
  5. Server version: impalad version 1.3.1-cdh4 RELEASE (build 907481bf45b248a7bb3bb077d54831a71f484e5f)
  6. Welcome to the Impala shell. Press TAB twice to see a list of available commands.
  7. Copyright (c) 2012 Cloudera, Inc. All rights reserved.
  8. (Shell build version: Impala Shell v1.3.1-cdh4 (907481b) built on Wed Apr 30 14:23:48 PDT 2014)
  9. [cdh1:21000] >
  10. [cdh1:21000] > show tables;
  11. Query: show tables
  12. +------+
  13. | name |
  14. +------+
  15. | a |
  16. | b |
  17. | c |
  18. | d |
  19. +------+
  20. Returned 4 row(s) in 0.08s

7. 排错

如果出现下面异常:

  1. [cdh1:21000] > select * from test limit 10;
  2. Query: select * from test limit 10
  3. ERROR: AnalysisException: Failed to load metadata for table: default.test
  4. CAUSED BY: TableLoadingException: Failed to load metadata for table: test
  5. CAUSED BY: TTransportException: java.net.SocketTimeoutException: Read timed out
  6. CAUSED BY: SocketTimeoutException: Read timed out

则需要在 hive-site.xml 中将 hive.metastore.client.socket.timeout 值设置大一些:

  1. <property>
  2. <name>hive.metastore.client.socket.timeout</name>
  3. <value>36000</value>
  4. </property>

8. 相关文章