一、实现原理

每一个备节点负责一个独立的集群,所有集群的keytab文件统一由主节点生成,然后同步到所有备节点,使每个备节点都有所有集群的认证信息,从而达到集群之间数据的互相访问。

二、kerberos一主多备部署

1、kerberos端口占用

主节点:
KDC:88
kadmin:749
备节点:
kpropd:754

2、环境准备

1.1 配置hosts

分别执行 vim /etc/hosts,配置hosts

  1. 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
  2. ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
  3. 10.1.236.92 testdp01
  4. 10.1.236.93 testdp02
  5. 10.1.241.73 testdp03

1.2 关闭防火墙

  1. systemctl stop firewalld.service

3、安装krb5

主节点执行:

  1. yum -y install krb5-server krb5-auth-dialog krb5-workstation krb5-devel krb5-libs

备节点执行:

  1. yum install -y krb5-server openldap-clients krb5-workstation krb5-libs

客户端执行:

  1. yum install -y krb5-workstation krb5-devel

4、修改配置

4.1 主节点修改三个文件

我这里testdp03为主,testdp01和testdp02为备
vim /etc/krb5.conf
注意这里realms配置成了ocdp

  1. # Configuration snippets may be placed in this directory as well
  2. includedir /etc/krb5.conf.d/
  3. [logging]
  4. default = FILE:/var/log/krb5libs.log
  5. kdc = FILE:/var/log/krb5kdc.log
  6. admin_server = FILE:/var/log/kadmind.log
  7. [libdefaults]
  8. dns_lookup_realm = false
  9. ticket_lifetime = 24h
  10. renew_lifetime = 7d
  11. forwardable = true
  12. rdns = false
  13. pkinit_anchors = FILE:/etc/pki/tls/certs/ca-bundle.crt
  14. default_realm = ocdp
  15. default_ccache_name = KEYRING:persistent:%{uid}
  16. [realms]
  17. ocdp = {
  18. kdc = testdp01:88
  19. kdc = testdp02:88
  20. kdc = testdp03:88
  21. admin_server = testdp03:749
  22. default_domain = ocdp
  23. }
  24. [domain_realm]
  25. .ocdp = ocdp
  26. [kdc]
  27. profile = /var/kerberos/krb5kdc/kdc.conf

vim /var/kerberos/krb5kdc/kadm5.acl

  1. */admin@ocdp *

4.2 在备份节点创建文件

vim /var/kerberos/krb5kdc/kpropd.acl

  1. host/testdp01@ocdp
  2. host/testdp02@ocdp
  3. host/testdp03@ocdp

5、主节点初始化数据库、生成krb5.keytab

5.1 初始化数据库

  1. kdb5_util create -r ocdp -s

5.2 生成krb5.keytab

  1. kadmin.local -q "ank -randkey host/testdp01@ocdp"
  2. kadmin.local -q "ank -randkey host/testdp02@ocdp"
  3. kadmin.local -q "ank -randkey host/testdp03@ocdp"
  4. kadmin.local -q "xst host/testdp01@ocdp"
  5. kadmin.local -q "xst host/testdp02@ocdp"
  6. kadmin.local -q "xst host/testdp03@ocdp"
  7. klist -ket /etc/krb5.keytab

6、从主->复制配置文件和keytab到->备节点

  1. cd /var/kerberos/krb5kdc
  2. scp .k5.EXAMPLE.COM testdp01:$PWD
  3. scp kadm5.acl testdp01:$PWD
  4. scp kdc.conf testdp01:$PWD
  5. scp .k5.EXAMPLE.COM testdp02:$PWD
  6. scp kadm5.acl testdp02:$PWD
  7. scp kdc.conf testdp02:$PWD
  8. cd /etc
  9. scp krb5.keytab testdp01:$PWD
  10. scp krb5.conf testdp01:$PWD
  11. scp krb5.keytab testdp01:/var/kerberos/krb5kdc/
  12. scp krb5.keytab testdp02:$PWD
  13. scp krb5.conf testdp02:$PWD
  14. scp krb5.keytab testdp02:/var/kerberos/krb5kdc/

7、启动主节点

  1. systemctl enable krb5kdc.service
  2. systemctl enable kadmin.service
  3. systemctl start krb5kdc.service
  4. systemctl start kadmin.service

8、在主上添加管理员账户

  1. kadmin.local -q "addprinc admin/admin@ocdp"

8.1 验证管理员用户是否可用

  1. kinit admin/admin@ocdp
  2. kadmin

能登录则ok。

9、备份节点启动krpop

  1. kpropd -S
  2. systemctl start kprop
  3. systemctl status kprop
  4. systemctl enable kprop

提示:
如果启动失败,请查看kprop端口是否被占用

  1. netstat -ntlpgrep 754
  2. lsof -i:754
  3. kill PID

10、同步数据

在主上执行:

  1. kdb5_util dump /var/kerberos/krb5kdc/slave_datatrans
  2. kprop -f /var/kerberos/krb5kdc/slave_datatrans testdp01
  3. kprop -f /var/kerberos/krb5kdc/slave_datatrans testdp02

提示:
Database propagation to hostname2: SUCCEEDED
这一步很有可能出错,出错就需要排查一下从节点的keytab文件是否可用、防火墙、从节点的kpropd.acl是否正确等
报错信息:kprop: Decrypt integrity check failed while getting initialcredentials
解决方案:由于主备keytab文件不匹配,需要重新手动由主同步到备
执行后,备份节点的/var/kerberos/krb5kdc目录下会发现增加了principal开头的文件

11、在从节点启动kdc

  1. systemctl enable krb5kdc.service
  2. systemctl start krb5kdc.service

12、测试

12.1 分别查看主备服务器 kdc状态

  1. systemctl status krb5kdc.service

12.2 分别查看日志

  1. tail -f /var/log/krb5kdc.log

12.3 在主服务器上kinit查看日志,发现日志刷在了 testdp03节点

12.4 stop或kill掉主服务器的kdc,再次kinit,发现日志刷在了testdp01或者testdp02节点

13、同步数据库脚本

13.1 在主节点编写同步数据库脚本

vim /root/sync_db.sh

  1. #!/bin/sh
  2. kdclist="testdp01,testdp02"
  3. echo `date`"start to sync!"
  4. sudo kdb5_util dump /var/kerberos/krb5kdc/slave_datatrans
  5. for kdc in $kdclist;
  6. do
  7. sudo kprop -f /var/kerberos/krb5kdc/slave_datatrans $kdc
  8. done
  9. echo `date`"end to sync!"

13.2 添加执行权限

  1. chmod +x sync_db.sh

13.3 设置定时任务

crontab -e

  1. */1 * * * * /root/sync_db.sh >> /root/sync.log

14、测试是否同步

14.1 在kdcmaster添加用户usertest1

  1. kadmin.local
  2. kadmin.local: addprinc usertest1
  3. Principal "usertest1@ocdp" created.

14.2 在kdcslave上查看

  1. kadmin.local
  2. kadmin.local: list_principals
  3. K/M@ocdp
  4. admin/admin@ocdp
  5. usertest1@ocdp

三、ambari开启kerberos

1、环境配置

下载jce并解压至JAVA_HOME/jre/lib/security目录下,ambari所有节点均需要http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html

  1. unzip -o -j -q jce_policy-8.zip -d $JAVA_HOME/jre/lib/security

提示:集群如果开启selinux了,拷贝后可能需要执行restorecon -R -v /etc/krb5.conf

2、ambari开启kerbreos操作步骤

2.1 按章常规操作开启

图例1:
image.png
图例2:
image.png

3、开Kerberos过程报错解决

3.1 Check Kerberos报错:kinit: Password incorrect while getting initial credentials

image.png
原因:kdc数据库未同步到备节点
解决方案:手动同步备节点

  1. kdb5_util dump /var/kerberos/krb5kdc/slave_datatrans
  2. kprop -f /var/kerberos/krb5kdc/slave_datatrans testdp01
  3. kprop -f /var/kerberos/krb5kdc/slave_datatrans testdp02

3.2 Check Kerberos报错:not found in Kerberos database while getting initial credentials

image.png
原因:暂未调查
解决方案:将kdcmaster所在服务器上的/etc/security同步到当前所在kdcslave服务器上

3.3 可能报错:add_principal: Malformed representation of principal while parsing principal

image.png
报错原因:通过命令验证该错误是mycluster-112020@admin/admin@OCDP.COM这个principal语法不合法
解决方案:ambari会去数据库里拿principal名称,现在这个不正常的名称导致报错,所以要把数据库中相关记录删除
这几张表有外键检查,所以删除前先把外键检查关掉,删除完了再恢复即可

  1. --关掉外键检查
  2. show global variables like "%foreign_key_checks%";
  3. set global foreign_key_checks=0;
  4. show global variables like "%foreign_key_checks%";
  5. --删除kerberos三个表中报错principl
  6. delete from kerberos_keytab where keytab_path="/etc/security/keytabs/kerberos.service_check.112020.keytab";
  7. delete from kerberos_principal where principal_name="mycluster-112020@admin/admin@OCDP.COM";
  8. delete from kerberos_keytab_principal where keytab_path="/etc/security/keytabs/kerberos.service_check.112020.keytab";
  9. --恢复外键检查
  10. set global foreign_key_checks=1;
  11. show global variables like "%foreign_key_checks%";

四、多集群互通验证

集群1
image.png
image.png
集群2
image.png
image.png

部分测试如下:
su ocdp
kinit -kt nn.service.keytab nn/testdp02@ocdp
image.png
hadoop fs -ls hdfs://testdp01:8020/
image.png
用集群2给集群1创建文件
hadoop fs -touch hdfs://testdp01:8020/lixl/a.txt
image.png
cp数据
image.png