一、实现原理
每一个备节点负责一个独立的集群,所有集群的keytab文件统一由主节点生成,然后同步到所有备节点,使每个备节点都有所有集群的认证信息,从而达到集群之间数据的互相访问。
二、kerberos一主多备部署
1、kerberos端口占用
主节点:
KDC:88
kadmin:749
备节点:
kpropd:754
2、环境准备
1.1 配置hosts
分别执行 vim /etc/hosts,配置hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4::1 localhost localhost.localdomain localhost6 localhost6.localdomain610.1.236.92 testdp0110.1.236.93 testdp0210.1.241.73 testdp03
1.2 关闭防火墙
systemctl stop firewalld.service
3、安装krb5
主节点执行:
yum -y install krb5-server krb5-auth-dialog krb5-workstation krb5-devel krb5-libs
备节点执行:
yum install -y krb5-server openldap-clients krb5-workstation krb5-libs
客户端执行:
yum install -y krb5-workstation krb5-devel
4、修改配置
4.1 主节点修改三个文件
我这里testdp03为主,testdp01和testdp02为备
vim /etc/krb5.conf
注意这里realms配置成了ocdp
# Configuration snippets may be placed in this directory as wellincludedir /etc/krb5.conf.d/[logging]default = FILE:/var/log/krb5libs.logkdc = FILE:/var/log/krb5kdc.logadmin_server = FILE:/var/log/kadmind.log[libdefaults]dns_lookup_realm = falseticket_lifetime = 24hrenew_lifetime = 7dforwardable = truerdns = falsepkinit_anchors = FILE:/etc/pki/tls/certs/ca-bundle.crtdefault_realm = ocdpdefault_ccache_name = KEYRING:persistent:%{uid}[realms]ocdp = {kdc = testdp01:88kdc = testdp02:88kdc = testdp03:88admin_server = testdp03:749default_domain = ocdp}[domain_realm].ocdp = ocdp[kdc]profile = /var/kerberos/krb5kdc/kdc.conf
vim /var/kerberos/krb5kdc/kadm5.acl
*/admin@ocdp *
4.2 在备份节点创建文件
vim /var/kerberos/krb5kdc/kpropd.acl
host/testdp01@ocdphost/testdp02@ocdphost/testdp03@ocdp
5、主节点初始化数据库、生成krb5.keytab
5.1 初始化数据库
kdb5_util create -r ocdp -s
5.2 生成krb5.keytab
kadmin.local -q "ank -randkey host/testdp01@ocdp"kadmin.local -q "ank -randkey host/testdp02@ocdp"kadmin.local -q "ank -randkey host/testdp03@ocdp"kadmin.local -q "xst host/testdp01@ocdp"kadmin.local -q "xst host/testdp02@ocdp"kadmin.local -q "xst host/testdp03@ocdp"klist -ket /etc/krb5.keytab
6、从主->复制配置文件和keytab到->备节点
cd /var/kerberos/krb5kdcscp .k5.EXAMPLE.COM testdp01:$PWDscp kadm5.acl testdp01:$PWDscp kdc.conf testdp01:$PWDscp .k5.EXAMPLE.COM testdp02:$PWDscp kadm5.acl testdp02:$PWDscp kdc.conf testdp02:$PWDcd /etcscp krb5.keytab testdp01:$PWDscp krb5.conf testdp01:$PWDscp krb5.keytab testdp01:/var/kerberos/krb5kdc/scp krb5.keytab testdp02:$PWDscp krb5.conf testdp02:$PWDscp krb5.keytab testdp02:/var/kerberos/krb5kdc/
7、启动主节点
systemctl enable krb5kdc.servicesystemctl enable kadmin.servicesystemctl start krb5kdc.servicesystemctl start kadmin.service
8、在主上添加管理员账户
kadmin.local -q "addprinc admin/admin@ocdp"
8.1 验证管理员用户是否可用
kinit admin/admin@ocdpkadmin
能登录则ok。
9、备份节点启动krpop
kpropd -Ssystemctl start kpropsystemctl status kpropsystemctl enable kprop
提示:
如果启动失败,请查看kprop端口是否被占用
netstat -ntlp|grep 754lsof -i:754kill PID
10、同步数据
在主上执行:
kdb5_util dump /var/kerberos/krb5kdc/slave_datatranskprop -f /var/kerberos/krb5kdc/slave_datatrans testdp01kprop -f /var/kerberos/krb5kdc/slave_datatrans testdp02
提示:
Database propagation to hostname2: SUCCEEDED
这一步很有可能出错,出错就需要排查一下从节点的keytab文件是否可用、防火墙、从节点的kpropd.acl是否正确等
报错信息:kprop: Decrypt integrity check failed while getting initialcredentials
解决方案:由于主备keytab文件不匹配,需要重新手动由主同步到备
执行后,备份节点的/var/kerberos/krb5kdc目录下会发现增加了principal开头的文件
11、在从节点启动kdc
systemctl enable krb5kdc.servicesystemctl start krb5kdc.service
12、测试
12.1 分别查看主备服务器 kdc状态
systemctl status krb5kdc.service
12.2 分别查看日志
tail -f /var/log/krb5kdc.log
12.3 在主服务器上kinit查看日志,发现日志刷在了 testdp03节点
12.4 stop或kill掉主服务器的kdc,再次kinit,发现日志刷在了testdp01或者testdp02节点
13、同步数据库脚本
13.1 在主节点编写同步数据库脚本
vim /root/sync_db.sh
#!/bin/shkdclist="testdp01,testdp02"echo `date`"start to sync!"sudo kdb5_util dump /var/kerberos/krb5kdc/slave_datatransfor kdc in $kdclist;dosudo kprop -f /var/kerberos/krb5kdc/slave_datatrans $kdcdoneecho `date`"end to sync!"
13.2 添加执行权限
chmod +x sync_db.sh
13.3 设置定时任务
crontab -e
*/1 * * * * /root/sync_db.sh >> /root/sync.log
14、测试是否同步
14.1 在kdcmaster添加用户usertest1
kadmin.localkadmin.local: addprinc usertest1Principal "usertest1@ocdp" created.
14.2 在kdcslave上查看
kadmin.localkadmin.local: list_principalsK/M@ocdpadmin/admin@ocdpusertest1@ocdp
三、ambari开启kerberos
1、环境配置
下载jce并解压至JAVA_HOME/jre/lib/security目录下,ambari所有节点均需要http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html
unzip -o -j -q jce_policy-8.zip -d $JAVA_HOME/jre/lib/security
提示:集群如果开启selinux了,拷贝后可能需要执行restorecon -R -v /etc/krb5.conf
2、ambari开启kerbreos操作步骤
2.1 按章常规操作开启
3、开Kerberos过程报错解决
3.1 Check Kerberos报错:kinit: Password incorrect while getting initial credentials

原因:kdc数据库未同步到备节点
解决方案:手动同步备节点
kdb5_util dump /var/kerberos/krb5kdc/slave_datatranskprop -f /var/kerberos/krb5kdc/slave_datatrans testdp01kprop -f /var/kerberos/krb5kdc/slave_datatrans testdp02
3.2 Check Kerberos报错:not found in Kerberos database while getting initial credentials

原因:暂未调查
解决方案:将kdcmaster所在服务器上的/etc/security同步到当前所在kdcslave服务器上
3.3 可能报错:add_principal: Malformed representation of principal while parsing principal

报错原因:通过命令验证该错误是mycluster-112020@admin/admin@OCDP.COM这个principal语法不合法
解决方案:ambari会去数据库里拿principal名称,现在这个不正常的名称导致报错,所以要把数据库中相关记录删除
这几张表有外键检查,所以删除前先把外键检查关掉,删除完了再恢复即可
--关掉外键检查show global variables like "%foreign_key_checks%";set global foreign_key_checks=0;show global variables like "%foreign_key_checks%";--删除kerberos三个表中报错principldelete from kerberos_keytab where keytab_path="/etc/security/keytabs/kerberos.service_check.112020.keytab";delete from kerberos_principal where principal_name="mycluster-112020@admin/admin@OCDP.COM";delete from kerberos_keytab_principal where keytab_path="/etc/security/keytabs/kerberos.service_check.112020.keytab";--恢复外键检查set global foreign_key_checks=1;show global variables like "%foreign_key_checks%";
四、多集群互通验证
集群1

集群2

部分测试如下:
su ocdp
kinit -kt nn.service.keytab nn/testdp02@ocdp
hadoop fs -ls hdfs://testdp01:8020/
用集群2给集群1创建文件
hadoop fs -touch hdfs://testdp01:8020/lixl/a.txt
cp数据

