一、实现原理
每一个备节点负责一个独立的集群,所有集群的keytab文件统一由主节点生成,然后同步到所有备节点,使每个备节点都有所有集群的认证信息,从而达到集群之间数据的互相访问。
二、kerberos一主多备部署
1、kerberos端口占用
主节点:
KDC:88
kadmin:749
备节点:
kpropd:754
2、环境准备
1.1 配置hosts
分别执行 vim /etc/hosts,配置hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.1.236.92 testdp01
10.1.236.93 testdp02
10.1.241.73 testdp03
1.2 关闭防火墙
systemctl stop firewalld.service
3、安装krb5
主节点执行:
yum -y install krb5-server krb5-auth-dialog krb5-workstation krb5-devel krb5-libs
备节点执行:
yum install -y krb5-server openldap-clients krb5-workstation krb5-libs
客户端执行:
yum install -y krb5-workstation krb5-devel
4、修改配置
4.1 主节点修改三个文件
我这里testdp03为主,testdp01和testdp02为备
vim /etc/krb5.conf
注意这里realms配置成了ocdp
# Configuration snippets may be placed in this directory as well
includedir /etc/krb5.conf.d/
[logging]
default = FILE:/var/log/krb5libs.log
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log
[libdefaults]
dns_lookup_realm = false
ticket_lifetime = 24h
renew_lifetime = 7d
forwardable = true
rdns = false
pkinit_anchors = FILE:/etc/pki/tls/certs/ca-bundle.crt
default_realm = ocdp
default_ccache_name = KEYRING:persistent:%{uid}
[realms]
ocdp = {
kdc = testdp01:88
kdc = testdp02:88
kdc = testdp03:88
admin_server = testdp03:749
default_domain = ocdp
}
[domain_realm]
.ocdp = ocdp
[kdc]
profile = /var/kerberos/krb5kdc/kdc.conf
vim /var/kerberos/krb5kdc/kadm5.acl
*/admin@ocdp *
4.2 在备份节点创建文件
vim /var/kerberos/krb5kdc/kpropd.acl
host/testdp01@ocdp
host/testdp02@ocdp
host/testdp03@ocdp
5、主节点初始化数据库、生成krb5.keytab
5.1 初始化数据库
kdb5_util create -r ocdp -s
5.2 生成krb5.keytab
kadmin.local -q "ank -randkey host/testdp01@ocdp"
kadmin.local -q "ank -randkey host/testdp02@ocdp"
kadmin.local -q "ank -randkey host/testdp03@ocdp"
kadmin.local -q "xst host/testdp01@ocdp"
kadmin.local -q "xst host/testdp02@ocdp"
kadmin.local -q "xst host/testdp03@ocdp"
klist -ket /etc/krb5.keytab
6、从主->复制配置文件和keytab到->备节点
cd /var/kerberos/krb5kdc
scp .k5.EXAMPLE.COM testdp01:$PWD
scp kadm5.acl testdp01:$PWD
scp kdc.conf testdp01:$PWD
scp .k5.EXAMPLE.COM testdp02:$PWD
scp kadm5.acl testdp02:$PWD
scp kdc.conf testdp02:$PWD
cd /etc
scp krb5.keytab testdp01:$PWD
scp krb5.conf testdp01:$PWD
scp krb5.keytab testdp01:/var/kerberos/krb5kdc/
scp krb5.keytab testdp02:$PWD
scp krb5.conf testdp02:$PWD
scp krb5.keytab testdp02:/var/kerberos/krb5kdc/
7、启动主节点
systemctl enable krb5kdc.service
systemctl enable kadmin.service
systemctl start krb5kdc.service
systemctl start kadmin.service
8、在主上添加管理员账户
kadmin.local -q "addprinc admin/admin@ocdp"
8.1 验证管理员用户是否可用
kinit admin/admin@ocdp
kadmin
能登录则ok。
9、备份节点启动krpop
kpropd -S
systemctl start kprop
systemctl status kprop
systemctl enable kprop
提示:
如果启动失败,请查看kprop端口是否被占用
netstat -ntlp|grep 754
lsof -i:754
kill PID
10、同步数据
在主上执行:
kdb5_util dump /var/kerberos/krb5kdc/slave_datatrans
kprop -f /var/kerberos/krb5kdc/slave_datatrans testdp01
kprop -f /var/kerberos/krb5kdc/slave_datatrans testdp02
提示:
Database propagation to hostname2: SUCCEEDED
这一步很有可能出错,出错就需要排查一下从节点的keytab文件是否可用、防火墙、从节点的kpropd.acl是否正确等
报错信息:kprop: Decrypt integrity check failed while getting initialcredentials
解决方案:由于主备keytab文件不匹配,需要重新手动由主同步到备
执行后,备份节点的/var/kerberos/krb5kdc目录下会发现增加了principal开头的文件
11、在从节点启动kdc
systemctl enable krb5kdc.service
systemctl start krb5kdc.service
12、测试
12.1 分别查看主备服务器 kdc状态
systemctl status krb5kdc.service
12.2 分别查看日志
tail -f /var/log/krb5kdc.log
12.3 在主服务器上kinit查看日志,发现日志刷在了 testdp03节点
12.4 stop或kill掉主服务器的kdc,再次kinit,发现日志刷在了testdp01或者testdp02节点
13、同步数据库脚本
13.1 在主节点编写同步数据库脚本
vim /root/sync_db.sh
#!/bin/sh
kdclist="testdp01,testdp02"
echo `date`"start to sync!"
sudo kdb5_util dump /var/kerberos/krb5kdc/slave_datatrans
for kdc in $kdclist;
do
sudo kprop -f /var/kerberos/krb5kdc/slave_datatrans $kdc
done
echo `date`"end to sync!"
13.2 添加执行权限
chmod +x sync_db.sh
13.3 设置定时任务
crontab -e
*/1 * * * * /root/sync_db.sh >> /root/sync.log
14、测试是否同步
14.1 在kdcmaster添加用户usertest1
kadmin.local
kadmin.local: addprinc usertest1
Principal "usertest1@ocdp" created.
14.2 在kdcslave上查看
kadmin.local
kadmin.local: list_principals
K/M@ocdp
admin/admin@ocdp
usertest1@ocdp
三、ambari开启kerberos
1、环境配置
下载jce并解压至JAVA_HOME/jre/lib/security目录下,ambari所有节点均需要http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html
unzip -o -j -q jce_policy-8.zip -d $JAVA_HOME/jre/lib/security
提示:集群如果开启selinux了,拷贝后可能需要执行restorecon -R -v /etc/krb5.conf
2、ambari开启kerbreos操作步骤
2.1 按章常规操作开启
3、开Kerberos过程报错解决
3.1 Check Kerberos报错:kinit: Password incorrect while getting initial credentials
原因:kdc数据库未同步到备节点
解决方案:手动同步备节点
kdb5_util dump /var/kerberos/krb5kdc/slave_datatrans
kprop -f /var/kerberos/krb5kdc/slave_datatrans testdp01
kprop -f /var/kerberos/krb5kdc/slave_datatrans testdp02
3.2 Check Kerberos报错:not found in Kerberos database while getting initial credentials
原因:暂未调查
解决方案:将kdcmaster所在服务器上的/etc/security同步到当前所在kdcslave服务器上
3.3 可能报错:add_principal: Malformed representation of principal while parsing principal
报错原因:通过命令验证该错误是mycluster-112020@admin/admin@OCDP.COM这个principal语法不合法
解决方案:ambari会去数据库里拿principal名称,现在这个不正常的名称导致报错,所以要把数据库中相关记录删除
这几张表有外键检查,所以删除前先把外键检查关掉,删除完了再恢复即可
--关掉外键检查
show global variables like "%foreign_key_checks%";
set global foreign_key_checks=0;
show global variables like "%foreign_key_checks%";
--删除kerberos三个表中报错principl
delete from kerberos_keytab where keytab_path="/etc/security/keytabs/kerberos.service_check.112020.keytab";
delete from kerberos_principal where principal_name="mycluster-112020@admin/admin@OCDP.COM";
delete from kerberos_keytab_principal where keytab_path="/etc/security/keytabs/kerberos.service_check.112020.keytab";
--恢复外键检查
set global foreign_key_checks=1;
show global variables like "%foreign_key_checks%";
四、多集群互通验证
集群1
集群2
部分测试如下:
su ocdp
kinit -kt nn.service.keytab nn/testdp02@ocdp
hadoop fs -ls hdfs://testdp01:8020/
用集群2给集群1创建文件
hadoop fs -touch hdfs://testdp01:8020/lixl/a.txt
cp数据