本文主要记录配置安全的Hive集群集成Sentry的过程。Hive上配置了Kerberos认证,配置的过程请参考:

集群各节点角色规划为:

  1. 192.168.56.121 cdh1 NameNodeResourceManagerHBaseHive metastoreImpala CatalogImpala statestoreSentry
  2. 192.168.56.122 cdh2 DataNodeNodeManagerHBaseHiveserver2Impala Server
  3. 192.168.56.123 cdh3 DataNodeHBaseNodeManagerHiveserver2Impala Server

cdh1作为master节点,其他节点作为slave节点,我们在cdh1节点安装kerberos Server,在其他节点安装kerberos client。

2. 安装和配置Sentry

这部分内容,请参考安装和配置Sentry,因为集群中配置了kerberos,所以需要在KDC节点上(cdh1)生成 Sentry 服务的 principal 并导出为 ticket:

  1. $ cd /etc/sentry/conf
  2. $ kadmin.local -q "addprinc -randkey sentry/cdh1@JAVACHEN.COM "
  3. $ kadmin.local -q "xst -k sentry.keytab sentry/cdh1@JAVACHEN.COM "
  4. $ chown sentry:hadoop sentry.keytab ; chmod 400 *.keytab
  5. $ cp sentry.keytab /etc/sentry/conf

然后,修改/etc/sentry/conf/sentry-site.xml 中下面的参数:

  1. <property>
  2. <name>sentry.service.security.mode</name>
  3. <value>kerberos</value>
  4. </property>
  5. <property>
  6. <name>sentry.service.server.principal</name>
  7. <value>sentry/cdh1@JAVACHEN.COM</value>
  8. </property>
  9. <property>
  10. <name>sentry.service.server.keytab</name>
  11. <value>/etc/sentry/conf/sentry.keytab</value>
  12. </property>

获取Sentry的ticket再启动sentry-store服务:

  1. $ kinit -k -t /etc/sentry/conf/sentry.keytab sentry/cdh1@JAVACHEN.COM
  2. $ /etc/init.d/sentry-store start

3. 配置Hive

Hive Metastore集成Sentry

需要在 /etc/hive/conf/hive-site.xml中添加:

  1. <property>
  2. <name>hive.metastore.pre.event.listeners</name>
  3. <value>org.apache.sentry.binding.metastore.MetastoreAuthzBinding</value>
  4. </property>
  5. <property>
  6. <name>hive.metastore.event.listeners</name>
  7. <value>org.apache.sentry.binding.metastore.SentryMetastorePostEventListener</value>
  8. </property>

Hive-server2集成Sentry

在Hive配置了Kerberos认证之后,Hive-server2集成Sentry有以下要求:

  • 修改 /user/hive/warehouse 权限: ``` $ kinit -k -t /etc/hadoop/conf/hdfs.keytab hdfs/cdh1@JAVACHEN.COM

$ hdfs dfs -chmod -R 770 /user/hive/warehouse $ hdfs dfs -chown -R hive:hive /user/hive/warehouse

  1. - 禁止 HiveServer2 impersonation
  1. <property>
  2. <name>hive.server2.enable.impersonation</name>
  3. <value>false</value>
  4. </property>
  1. - 确认 /etc/hadoop/conf/container-executor.cfg 文件中 `min.user.id=0`
  2. 修改 /etc/hive/conf/hive-site.xml
  1. <property>
  2. <name>hive.server2.enable.impersonation</name>
  3. <value>false</value>
  4. </property>
  5. <property>
  6. <name>hive.security.authorization.task.factory</name>
  7. <value>org.apache.sentry.binding.hive.SentryHiveAuthorizationTaskFactoryImpl</value>
  8. </property>
  9. <property>
  10. <name>hive.server2.session.hook</name>
  11. <value>org.apache.sentry.binding.hive.HiveAuthzBindingSessionHook</value>
  12. </property>
  13. <property>
  14. <name>hive.sentry.conf.url</name>
  15. <value>file:///etc/hive/conf/sentry-site.xml</value>
  16. </property>
  1. 另外,因为集群配置了kerberos,故需要/etc/hive/conf/sentry-site.xml添加以下内容:

<?xml version=”1.0” encoding=”UTF-8”?> sentry.service.security.mode kerberos sentry.service.server.principal sentry/_HOST@JAVACHEN.COM sentry.service.server.keytab /etc/sentry/conf/sentry.keytab

  1. 参考模板[sentry-site.xml.hive-client.template](https://github.com/cloudera/sentry/blob/cdh5-1.4.0_5.4.0/conf%2Fsentry-site.xml.hive-client.template)在 /etc/hive/conf/ 目录创建 sentry-site.xml:

<?xml version=”1.0” encoding=”UTF-8”?>

sentry.service.client.server.rpc-port 8038 sentry.service.client.server.rpc-address cdh1 sentry.service.client.server.rpc-connection-timeout 200000 sentry.provider org.apache.sentry.provider.file.HadoopGroupResourceAuthorizationProvider sentry.hive.provider.backend org.apache.sentry.provider.db.SimpleDBProviderBackend sentry.metastore.service.users hive sentry.hive.server server1 sentry.hive.testing.mode true
  1. > 注意:这里`sentry.hive.provider.backend`配置的是`org.apache.sentry.provider.db.SimpleDBProviderBackend`方式,关于`org.apache.sentry.provider.file.SimpleFileProviderBackend`的配置方法,后面再作说明。
  2. hive添加对sentry的依赖,创建软连接:

$ ln -s /usr/lib/sentry/lib/sentry-binding-hive.jar /usr/lib/hive/lib/sentry-binding-hive.jar

  1. ## 重启HiveServer2
  2. cdh1上启动或重启hiveserver2

$ kinit -k -t /etc/hive/conf/hive.keytab hive/cdh1@JAVACHEN.COM

$ /etc/init.d/hive-server2 restart

  1. # 4. 准备测试数据
  2. 参考 [Securing Impala for analysts](http://blog.evernote.com/tech/2014/06/09/securing-impala-for-analysts/),准备测试数据:

$ cat /tmp/events.csv 10.1.2.3,US,android,createNote 10.200.88.99,FR,windows,updateNote 10.1.2.3,US,android,updateNote 10.200.88.77,FR,ios,createNote 10.1.4.5,US,windows,updateTag

  1. 然后,在hive中运行下面 sql 语句:

create database sensitive;

create table sensitive.events ( ip STRING, country STRING, client STRING, action STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’;

load data local inpath ‘/tmp/events.csv’ overwrite into table sensitive.events; create database filtered; create view filtered.events as select country, client, action from sensitive.events; create view filtered.events_usonly as select * from filtered.events where country = ‘US’;

  1. cdh1上通过 beeline 连接 hiveserver2,运行下面命令创建角色和组:

$ beeline -u “jdbc:hive2://cdh1:10001/default;principal=hive/cdh1@JAVACHEN.COM”

  1. 创建 rolegroup 等等,执行下面的 sql 语句:

create role admin_role; GRANT ALL ON SERVER server1 TO ROLE admin_role; GRANT ROLE admin_role TO GROUP admin; GRANT ROLE admin_role TO GROUP hive;

create role test_role; GRANT ALL ON DATABASE filtered TO ROLE test_role; GRANT ROLE test_role TO GROUP test;

  1. 上面创建了两个角色:
  2. - admin_role,具有管理员权限,可以读写所有数据库,并授权给 admin hive 组(对应操作系统上的组)
  3. - test_role,只能读写 filtered 数据库,并授权给 test 组。
  4. # 5. 测试
  5. ## 使用 kerberos 测试
  6. test 用户为例,通过 beeline 连接 hive-server2

$ su test

$ kinit -k -t test.keytab test/cdh1@JAVACHEN.COM

$ beeline -u “jdbc:hive2://cdh1:10001/default;principal=test/cdh1@JAVACHEN.COM”

  1. 接下来运行一些sql查询,查看是否有权限。
  2. ## 使用 ldap 用户测试
  3. ldap 服务器上创建系统用户 yy_test,并使用 migrationtools 工具将该用户导入 ldap,最后设置 ldap 中该用户密码。

创建 yy_test用户

useradd yy_test

grep -E “yy_test” /etc/passwd >/opt/passwd.txt /usr/share/migrationtools/migrate_passwd.pl /opt/passwd.txt /opt/passwd.ldif ldapadd -x -D “uid=ldapadmin,ou=people,dc=lashou,dc=com” -w secret -f /opt/passwd.ldif

使用下面语句修改密码,填入上面生成的密码,输入两次:

ldappasswd -x -D ‘uid=ldapadmin,ou=people,dc=lashou,dc=com’ -w secret “uid=yy_test,ou=people,dc=lashou,dc=com” -S

  1. 在每台 datanode 机器上创建 test 分组,并将 yy_test 用户加入到 test 分组:

groupadd test ; useradd yy_test; usermod -G test,yy_test yy_test

  1. 运行 beeline 查看是否能够使用 ldap 用户连接 hiveserver2

$ beeline -u “jdbc:hive2://cdh1:10001/“ -n yy_test -p yy_test -d org.apache.hive.jdbc.HiveDriver

  1. # 6. 其他说明
  2. 如果要使用基于文件存储的方式配置Sentry store,则需要修改/etc/hive/conf/sentry-site.xml为:

<?xml version=”1.0” encoding=”UTF-8”?>

hive.sentry.server server1 sentry.hive.provider.backend org.apache.sentry.provider.file.SimpleFileProviderBackend hive.sentry.provider org.apache.sentry.provider.file.LocalGroupResourceAuthorizationProvider hive.sentry.provider.resource /user/hive/sentry/sentry-provider.ini
  1. 创建 sentry-provider.ini 文件并将其上传到 hdfs /user/hive/sentry/ 目录:

$ cat /tmp/sentry-provider.ini [databases]

Defines the location of the per DB policy file for the customers DB/schema

db1 = hdfs://cdh1:8020/user/hive/sentry/db1.ini

[groups] admin = any_operation hive = any_operation test = select_filtered

[roles] any_operation = server=server1->db=->table=->action= select_filtered = server=server1->db=filtered->table=->action=SELECT select_us = server=server1->db=filtered->table=events_usonly->action=SELECT

[users] test = test hive= hive

$ hdfs dfs -rm -r /user/hive/sentry/sentry-provider.ini $ hdfs dfs -put /tmp/sentry-provider.ini /user/hive/sentry/ $ hdfs dfs -chown hive:hive /user/hive/sentry/sentry-provider.ini $ hdfs dfs -chmod 640 /user/hive/sentry/sentry-provider.ini

``` 关于 sentry-provider.ini 文件的语法说明,请参考官方文档。这里我指定了 Hive 组有全部权限,并指定 Hive 用户属于 Hive 分组,而其他两个分组只有部分权限。

7. 参考文章