前言
apache-atlas作为开源的元数据管理工具,受到广泛的使用。atlas集成了多种hook,可以方便的管理血缘关系。但其官方文档和网上教程大都只有hive hook的配置方法,没有impala hook的配置方法。各种百度,Google及阅读源码后,终于成功,接下来说说结论及具体的步骤。
结论
- apache-atlas-2.1.0源码中有impala-hook,可以实现血缘关系的采集,但官方文档没有提及。
CDH6.3.2中impala的版本是impala-3.2.0-cdh6.3.2,但是impala 3.3.0中才有QueryEventHookManager这个类,提供了hook方法,所以需要升级impala至3.3以上的版本。
除了impala hook的方法,由于impala 和hive都使用同一个元数据服务,即hive metastore,所以理论上,使用hive metastore hook ,atlas也可以得到元数据。经实际测试后,确实使用hive 和 impala 建表,在atlas中可以搜索到表,但没有血缘关系。观察hive 的源码后,发现metastore hook传回的数据没有包含与血缘相关的,所以这种方法行不通。查看hive metastore hook配置方法
步骤
1. CDH6.3.2升级impala组件
2. 编译apach-atlas-2.1.0
参考集成hive https://blog.csdn.net/h952520296/article/details/110874432
除此之外,由于与impala-3.4.0 jar包冲突,还需修改pom.xml(apache-atlas)<!-- <jackson.databind.version>2.10.0</jackson.databind.version> -->
<jackson.databind.version>2.9.10</jackson.databind.version>
不然会报如下错误
Java exception follows: java.lang.VerifyError: Stack map does not match the one at exception handler 77 Exception Details: Location: com/fasterxml/jackson/databind/deser/std/StdDeserializer._parseDate(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/util/Date; @77: astore Reason: Type 'com/fasterxml/jackson/core/JsonParseException' (current frame, stack[0]) is not assignable to 'com/fasterxml/jackson/core/exc/StreamReadException' (stack map, stack[0]) Current Frame: bci: @69 flags: { } locals: { 'com/fasterxml/jackson/databind/deser/std/StdDeserializer', 'com/fasterxml/jackson/core/JsonParser', 'com/fasterxml/jackson/databind/DeserializationContext' } stack: { 'com/fasterxml/jackson/core/JsonParseException' } Stackmap Frame: bci: @77 flags: { } locals: { 'com/fasterxml/jackson/databind/deser/std/StdDeserializer', 'com/fasterxml/jackson/core/JsonParser', 'com/fasterxml/jackson/databind/DeserializationContext' } stack: { 'com/fasterxml/jackson/core/exc/StreamReadException' } Bytecode: 0x0000000: 2bb6 0035 aa00 XXXX-XXXX-XXXX-XXXX 0003 0x0000010: 0000 000b 0000 007a XXXX-XXXX-XXXX-XXXX 0x0000020: XXXX-XXXX-XXXX-XXXX XXXX-XXXX-XXXX-XXXX 0x0000030: XXXX-XXXX-XXXX-XXXX 2a2b b600 11b6 0012 0x0000040: 2cb6 006b b02b b600 4742 a700 223a 052c 0x0000050: 2ab4 0002 2bb6 006e 126f 03bd 0004 b600 0x0000060: 70c0 002d 3a06 1906 b600 4c42 bb00 7159 0x0000070: 21b7 0072 b02a 2cb6 0073 c000 71b0 2a2b 0x0000080: 2cb6 0074 b02c 2ab4 0002 2bb6 0025 c000 0x0000090: 71b0 Exception Handler Table: bci [69, 74] => handler: 77 bci [69, 74] => handler: 77 Stackmap Table: same_frame(@56) same_frame(@69) same_locals_1_stack_item_frame(@77,Object[#359]) append_frame(@108,Long) chop_frame(@117,1) same_frame(@126) same_frame(@133) at com.fasterxml.jackson.databind.deser.std.JdkDeserializers.<clinit>(JdkDeserializers.java:26) at com.fasterxml.jackson.databind.deser.BasicDeserializerFactory.findDefaultDeserializer(BasicDeserializerFactory.java:1852) at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.findStdDeserializer(BeanDeserializerFactory.java:167) at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.createBeanDeserializer(BeanD
3. CDH配置impala hook
进入CM impala配置页面,搜索Impala Daemon 命令行参数高级配置代码段(安全阀),添加query_event_hook_classes=org.apache.atlas.impala.hook.ImpalaLineageHook
- vim ${ATLAS_HOME}/conf/atlas-application.properties,如果以前配置过,就添加最后一部分impala configuration ```properties
### Graph Database Configs
atlas.graph.storage.backend=hbase2 atlas.graph.storage.hbase.table=apache_atlas_janus
Hbase
atlas.graph.storage.hostname=cdh1:2181,cdh2:2181,cdh3:2181 atlas.graph.storage.hbase.regions-per-server=1 atlas.graph.storage.lock.wait-time=10000 atlas.EntityAuditRepository.impl=org.apache.atlas.repository.audit.HBaseBasedAuditRepository
Graph Search Index
atlas.graph.index.search.backend=solr
Solr
atlas.graph.index.search.solr.mode=cloud atlas.graph.index.search.solr.zookeeper-url=cdh1:2181/solr,cdh2:2181/solr,cdh3:2181/solr atlas.graph.index.search.solr.zookeeper-connect-timeout=60000 atlas.graph.index.search.solr.zookeeper-session-timeout=60000 atlas.graph.index.search.solr.wait-searcher=true
Solr-specific configuration property
atlas.graph.index.search.max-result-set-size=150
### Notification Configs
atlas.notification.embedded=false atlas.kafka.data=/opt/apache-atlas-2.1.0/data/kafka atlas.kafka.zookeeper.connect=cdh1:2181,cdh2:2181,cdh3:2181/kafka atlas.kafka.bootstrap.servers=cdh2:9092,cdh3:9092,cdh1:9092 atlas.kafka.zookeeper.session.timeout.ms=400 atlas.kafka.zookeeper.connection.timeout.ms=200 atlas.kafka.zookeeper.sync.time.ms=20 atlas.kafka.auto.commit.interval.ms=1000 atlas.kafka.hook.group.id=atlas atlas.kafka.enable.auto.commit=true atlas.kafka.auto.offset.reset=earliest atlas.kafka.session.timeout.ms=30000 atlas.kafka.offsets.topic.replication.factor=1 atlas.kafka.poll.timeout.ms=1000
atlas.notification.create.topics=true atlas.notification.replicas=1 atlas.notification.topics=ATLAS_HOOK,ATLAS_ENTITIES,ATLAS_HIVEPROCESS,ATLAS_HIVEPARTITION atlas.notification.hook.consumer.topic.names=ATLAS_HOOK,ATLAS_HIVEPROCESS atlas.notification.entities.consumer.topic.names=ATLAS_ENTITIES atlas.notification.hivepartition.consumer.topic.names=ATLAS_HIVEPARTITION atlas.notification.log.failed.messages=true atlas.notification.consumer.retry.interval=500 atlas.notification.hook.retry.interval=1000
Server port configuration
默认端口21000与impala冲突
atlas.server.http.port=21001
atlas.server.https.port=21443
### Security Properties
SSL config
atlas.enableTLS=false
Authentication config
atlas.authentication.method.kerberos=false atlas.authentication.method.file=true
ldap.type= LDAP or AD
atlas.authentication.method.ldap.type=none
user credentials file
atlas.authentication.method.file.filename=/opt/apache-atlas-2.1.0/conf/users-credentials.properties
### Server Properties
atlas.rest.address=http://192.168.80.123:21001
### Entity Audit Configs
atlas.audit.hbase.tablename=apache_atlas_entity_audit atlas.audit.zookeeper.session.timeout.ms=1000 atlas.audit.hbase.zookeeper.quorum=192.168.80.121:2181,192.168.80.122:2181,192.168.80.123:2181
### High Availability Configuration
atlas.server.ha.enabled=false
### Atlas Authorization
atlas.authorizer.impl=simple atlas.authorizer.simple.authz.policy.file=atlas-simple-authz-policy.json
### CSRF Configs
atlas.rest-csrf.enabled=true atlas.rest-csrf.browser-useragents-regex=^Mozilla.,^Opera.,^Chrome.* atlas.rest-csrf.methods-to-ignore=GET,OPTIONS,HEAD,TRACE atlas.rest-csrf.custom-header=X-XSRF-HEADER
###### Atlas Metric/Stats configs
Format: atlas.metric.query..
atlas.metric.query.cache.ttlInSecs=900
Set to false to disable gremlin search.
atlas.search.gremlin.enable=false
### UI Configuration
atlas.ui.default.version=v1 atlas.cluster.name=primary
## HIVE HOOK Configuration
atlas.hook.hive.synchronous=false atlas.hook.hive.numRetries=3 atlas.hook.hive.queueSize=100
#### impala configuration
atlas.hook.impala.keepAliveTime=10 atlas.hook.impala.maxThreads=3 atlas.hook.impala.minThreads=3 atlas.hook.impala.numRetries=3 atlas.hook.impala.queueSize=100 atlas.hook.impala.synchronous=false
3. mkdir ${ATLAS_HOME}/hook/impala
解压 apache-atlas-2.1.0-impala-hook.tar.gz<br />把以下三个文件放入${ATLAS_HOME}/hook/impala目录下<br />atlas-impala-plugin-impl<br />atlas-plugin-classloader-2.1.0.jar<br />impala-bridge-shim-2.1.0.jar
4. 把atlas-application.properties将添加进atlas-plugin-classloader-2.1.0.jar
```shell
zip -u ${ATLAS_HOME}/hook/impala/atlas-plugin-classloader-2.1.0.jar ${ATLAS_HOME}/conf/atlas-application.properties
不然会报如下错误
Java exception follows:
org.apache.atlas.AtlasException: Failed to load application properties
at org.apache.atlas.ApplicationProperties.get(ApplicationProperties.java:147)
at org.apache.atlas.ApplicationProperties.get(ApplicationProperties.java:100)
at org.apache.atlas.hook.AtlasHook.<clinit>(AtlasHook.java:80)
at org.apache.atlas.impala.hook.ImpalaHook.onImpalaStartup(ImpalaHook.java:47)
at org.apache.atlas.impala.hook.ImpalaLineageHook.onImpalaStartup(ImpalaLineageHook.java:79)
at org.apache.impala.hooks.QueryEventHookManager.<init>(QueryEventHookManager.java:148)
at org.apache.impala.hooks.QueryEventHookManager.createFromConfig(QueryEventHookManager.java:103)
at org.apache.impala.service.Frontend.<init>(Frontend.java:325)
at org.apache.impala.service.Frontend.<init>(Frontend.java:285)
at org.apache.impala.service.JniFrontend.<init>(JniFrontend.java:141)
Caused by: org.apache.commons.configuration.ConfigurationException: Cannot locate configuration source null
at org.apache.commons.configuration.AbstractFileConfiguration.load(AbstractFileConfiguration.java:259)
at org.apache.commons.configuration.AbstractFileConfiguration.load(AbstractFileConfiguration.java:238)
at org.apache.commons.configuration.AbstractFileConfiguration.<init>(AbstractFileConfiguration.java:197)
at org.apache.commons.configuration.PropertiesConfiguration.<init>(PropertiesConfiguration.java:284)
at org.apache.atlas.ApplicationProperties.<init>(ApplicationProperties.java:83)
at org.apache.atlas.ApplicationProperties.get(ApplicationProperties.java:136)
... 9 more
I0114 17:15:36.668628 20758 jni-util.cc:288] java.lang.ExceptionInInitializerError
at org.apache.atlas.impala.hook.ImpalaHook.onImpalaStartup(ImpalaHook.java:47)
at org.apache.atlas.impala.hook.ImpalaLineageHook.onImpalaStartup(ImpalaLineageHook.java:79)
at org.apache.impala.hooks.QueryEventHookManager.<init>(QueryEventHookManager.java:148)
at org.apache.impala.hooks.QueryEventHookManager.createFromConfig(QueryEventHookManager.java:103)
at org.apache.impala.service.Frontend.<init>(Frontend.java:325)
at org.apache.impala.service.Frontend.<init>(Frontend.java:285)
at org.apache.impala.service.JniFrontend.<init>(JniFrontend.java:141)
Caused by: java.lang.NullPointerException
at org.apache.atlas.hook.AtlasHook.<clinit>(AtlasHook.java:85)
... 7 more
I0114 17:15:36.695966 20758 status.cc:126] ExceptionInInitializerError: null
CAUSED BY: NullPointerException: null
@ 0x1c91278 impala::Status::Status()
@ 0x24fe82c impala::JniUtil::GetJniExceptionMsg()
@ 0x2310e8d impala::Frontend::Frontend()
@ 0x2110a1d impala::ExecEnv::ExecEnv()
@ 0x2110586 impala::ExecEnv::ExecEnv()
@ 0x2332bdc ImpaladMain()
@ 0x1c3918f main
@ 0x7f61a9e0c554 __libc_start_main
@ 0x1c39006 (unknown)
F0114 17:15:36.696000 20758 frontend.cc:134] ExceptionInInitializerError: null
CAUSED BY: NullPointerException: null
- 把以下三个文件拷入impala lib 目录下(/opt/cloudera/parcels/CDH/lib/apache-impala-3.4/lib),并分发至各个impala 节点。
atlas-impala-plugin-impl
atlas-plugin-classloader-2.1.0.jar
impala-bridge-shim-2.1.0.jar
- 在CM中重启impala
在impala daemon节点查看日志,或者在CM中查看日志
tail -f /var/log/impalad/impalad.INFO
在atlas中查看血缘关系