HBase Hook & Bridge

1. HBase 模型

HBase模型包含以下类型(type):

  • Entity types:
    • hbase_namespace
      • super-types: !Asset
      • attributes: qualifiedName, name, description, owner, clusterName, parameters, createTime, modifiedTime
    • hbase_table
      • super-types: DataSet
      • attributes: qualifiedName, name, description, owner, namespace, column_families, uri, parameters, createtime, modifiedtime, maxfilesize, isReadOnly, isCompactionEnabled, isNormalizationEnabled, ReplicaPerRegion, Durability
    • hbase_column_family
      • super-types: DataSet
      • attributes: qualifiedName, name, description, owner, columns, createTime, bloomFilterType, compressionType, compactionCompressionType, encryptionType, inMemoryCompactionPolicy, keepDeletedCells, maxversions, minVersions, datablockEncoding, storagePolicy, ttl, blockCachedEnabled, cacheBloomsOnWrite, cacheDataOnWrite, evictBlocksOnClose, prefetchBlocksOnOpen, newVersionsBehavior, isMobEnabled, mobCompactPartitionPolicy

使用唯一属性qualifiedName在Atlas中创建和删除HBase实体,其值应格式如下所述。请注意,namespaceName,tableName和columnFamilyName应为小写。

  1. hbase_namespace.qualifiedName: <namespaceName>@<clusterName>
  2. hbase_table.qualifiedName: <namespaceName>:<tableName>@<clusterName>
  3. hbase_column_family.qualifiedName: <namespaceName>:<tableName>.<columnFamilyName>@<clusterName>

2. HBase Hook

Atlas HBase hook与HBase master注册为协处理器。在检测到对HBase名称空间/表/列族的更改时,Atlas Hook过Kafka通知更新Atlas中的元数据。按照以下说明在HBase中设置Atlas Hook:

通过添加以下内容在hbase-site.xml中注册Atlas hook:

  1. <property>
  2. <name>hbase.coprocessor.master.classes</name>
  3. <value>org.apache.atlas.hbase.hook.HBaseAtlasCoprocessor</value>
  4. </property>
  • 解压 apache-atlas-${project.version}-hbase-hook.tar.gz
  • cd apache-atlas-hbase-hook-${project.version}
  • 将文件夹apache-atlas-hbase-hook-${project.version}/hook/hbase的全部内容复制到<atlas package>/hook/hbase
  • 在HBase类路径中链接Atlas hook jar ln -s <atlas package>/hook/hbase/* <hbase-home>/lib/
  • <atlas-conf>/atlas-application.properties复制到HBaseconf目录。

atlas-application.properties中的以下属性控制线程池和通知详细信息:

  1. atlas.hook.hbase.synchronous=false # whether to run the hook synchronously. false recommended to avoid delays in HBase operations. Default: false
  2. atlas.hook.hbase.numRetries=3 # number of retries for notification failure. Default: 3
  3. atlas.hook.hbase.queueSize=10000 # queue size for the threadpool. Default: 10000
  4. atlas.cluster.name=primary # clusterName to use in qualifiedName of entities. Default: primary
  5. atlas.kafka.zookeeper.connect= # Zookeeper connect URL for Kafka. Example: localhost:2181
  6. atlas.kafka.zookeeper.connection.timeout.ms=30000 # Zookeeper connection timeout. Default: 30000
  7. atlas.kafka.zookeeper.session.timeout.ms=60000 # Zookeeper session timeout. Default: 60000
  8. atlas.kafka.zookeeper.sync.time.ms=20 # Zookeeper sync time. Default: 20

可以通过在配置名称前加上“atlas.kafka”前缀来指定Kafka通知生成器的其他配置。有关Kafka制作人支持的配置列表,请参阅Kafka Producer Configs

3. 注意

Atlas HBase hook仅捕获名称空间、表和列族的创建/更新/删除操作。将捕获对列的更改。

4. 导入HBase元数据

Apache Atlas提供了一个命令行脚本import-hbase.sh,用于将Apache HBase名称空间和表的元数据导入Apache Atlas。此脚本可用于使用Apache HBase集群中存在的名称空间/表初始化Apache Atlas。此脚本支持导入特定表的元数据,特定命名空间中的表或所有表。

  1. Usage 1: <atlas package>/hook-bin/import-hbase.sh
  2. Usage 2: <atlas package>/hook-bin/import-hbase.sh [-n <namespace regex> OR --namespace <namespace regex>] [-t <table regex> OR --table <table regex>]
  3. Usage 3: <atlas package>/hook-bin/import-hbase.sh [-f <filename>]
  4. File Format:
  5. namespace1:tbl1
  6. namespace1:tbl2
  7. namespace2:tbl1