前言

apache-atlas作为开源的元数据管理工具,受到广泛的使用。atlas集成了多种hook,可以方便的管理血缘关系。但其官方文档和网上教程大都只有hive hook的配置方法,没有impala hook的配置方法。各种百度,Google及阅读源码后,终于成功,接下来说说结论及具体的步骤。

结论

  1. apache-atlas-2.1.0源码中有impala-hook,可以实现血缘关系的采集,但官方文档没有提及。

image.png image.png

  1. CDH6.3.2中impala的版本是impala-3.2.0-cdh6.3.2,但是impala 3.3.0中才有QueryEventHookManager这个类,提供了hook方法,所以需要升级impala至3.3以上的版本。

  2. 除了impala hook的方法,由于impala 和hive都使用同一个元数据服务,即hive metastore,所以理论上,使用hive metastore hook ,atlas也可以得到元数据。经实际测试后,确实使用hive 和 impala 建表,在atlas中可以搜索到表,但没有血缘关系。观察hive 的源码后,发现metastore hook传回的数据没有包含与血缘相关的,所以这种方法行不通。查看hive metastore hook配置方法

    步骤

    1. CDH6.3.2升级impala组件

    升级impala-3.4.0

    2. 编译apach-atlas-2.1.0

    参考集成hive https://blog.csdn.net/h952520296/article/details/110874432
    除此之外,由于与impala-3.4.0 jar包冲突,还需修改pom.xml(apache-atlas)

    1. <!-- <jackson.databind.version>2.10.0</jackson.databind.version> -->
    2. <jackson.databind.version>2.9.10</jackson.databind.version>

    不然会报如下错误

    Java exception follows:
    java.lang.VerifyError: Stack map does not match the one at exception handler 77
    Exception Details:
    Location:
    com/fasterxml/jackson/databind/deser/std/StdDeserializer._parseDate(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/util/Date; @77: astore
    Reason:
     Type 'com/fasterxml/jackson/core/JsonParseException' (current frame, stack[0]) is not assignable to 'com/fasterxml/jackson/core/exc/StreamReadException' (stack map, stack[0])
    Current Frame:
     bci: @69
     flags: { }
     locals: { 'com/fasterxml/jackson/databind/deser/std/StdDeserializer', 'com/fasterxml/jackson/core/JsonParser', 'com/fasterxml/jackson/databind/DeserializationContext' }
     stack: { 'com/fasterxml/jackson/core/JsonParseException' }
    Stackmap Frame:
     bci: @77
     flags: { }
     locals: { 'com/fasterxml/jackson/databind/deser/std/StdDeserializer', 'com/fasterxml/jackson/core/JsonParser', 'com/fasterxml/jackson/databind/DeserializationContext' }
     stack: { 'com/fasterxml/jackson/core/exc/StreamReadException' }
    Bytecode:
     0x0000000: 2bb6 0035 aa00 XXXX-XXXX-XXXX-XXXX 0003
     0x0000010: 0000 000b 0000 007a XXXX-XXXX-XXXX-XXXX
     0x0000020: XXXX-XXXX-XXXX-XXXX XXXX-XXXX-XXXX-XXXX
     0x0000030: XXXX-XXXX-XXXX-XXXX 2a2b b600 11b6 0012
     0x0000040: 2cb6 006b b02b b600 4742 a700 223a 052c
     0x0000050: 2ab4 0002 2bb6 006e 126f 03bd 0004 b600
     0x0000060: 70c0 002d 3a06 1906 b600 4c42 bb00 7159
     0x0000070: 21b7 0072 b02a 2cb6 0073 c000 71b0 2a2b
     0x0000080: 2cb6 0074 b02c 2ab4 0002 2bb6 0025 c000
     0x0000090: 71b0                                   
    Exception Handler Table:
     bci [69, 74] => handler: 77
     bci [69, 74] => handler: 77
    Stackmap Table:
     same_frame(@56)
     same_frame(@69)
     same_locals_1_stack_item_frame(@77,Object[#359])
     append_frame(@108,Long)
     chop_frame(@117,1)
     same_frame(@126)
     same_frame(@133)
    
     at com.fasterxml.jackson.databind.deser.std.JdkDeserializers.<clinit>(JdkDeserializers.java:26)
     at com.fasterxml.jackson.databind.deser.BasicDeserializerFactory.findDefaultDeserializer(BasicDeserializerFactory.java:1852)
     at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.findStdDeserializer(BeanDeserializerFactory.java:167)
     at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.createBeanDeserializer(BeanD
    

    3. CDH配置impala hook

  3. 进入CM impala配置页面,搜索Impala Daemon 命令行参数高级配置代码段(安全阀),添加query_event_hook_classes=org.apache.atlas.impala.hook.ImpalaLineageHook

image.png

  1. vim ${ATLAS_HOME}/conf/atlas-application.properties,如果以前配置过,就添加最后一部分impala configuration ```properties
### Graph Database Configs

atlas.graph.storage.backend=hbase2 atlas.graph.storage.hbase.table=apache_atlas_janus

Hbase

atlas.graph.storage.hostname=cdh1:2181,cdh2:2181,cdh3:2181 atlas.graph.storage.hbase.regions-per-server=1 atlas.graph.storage.lock.wait-time=10000 atlas.EntityAuditRepository.impl=org.apache.atlas.repository.audit.HBaseBasedAuditRepository

Graph Search Index

atlas.graph.index.search.backend=solr

Solr

atlas.graph.index.search.solr.mode=cloud atlas.graph.index.search.solr.zookeeper-url=cdh1:2181/solr,cdh2:2181/solr,cdh3:2181/solr atlas.graph.index.search.solr.zookeeper-connect-timeout=60000 atlas.graph.index.search.solr.zookeeper-session-timeout=60000 atlas.graph.index.search.solr.wait-searcher=true

Solr-specific configuration property

atlas.graph.index.search.max-result-set-size=150

### Notification Configs

atlas.notification.embedded=false atlas.kafka.data=/opt/apache-atlas-2.1.0/data/kafka atlas.kafka.zookeeper.connect=cdh1:2181,cdh2:2181,cdh3:2181/kafka atlas.kafka.bootstrap.servers=cdh2:9092,cdh3:9092,cdh1:9092 atlas.kafka.zookeeper.session.timeout.ms=400 atlas.kafka.zookeeper.connection.timeout.ms=200 atlas.kafka.zookeeper.sync.time.ms=20 atlas.kafka.auto.commit.interval.ms=1000 atlas.kafka.hook.group.id=atlas atlas.kafka.enable.auto.commit=true atlas.kafka.auto.offset.reset=earliest atlas.kafka.session.timeout.ms=30000 atlas.kafka.offsets.topic.replication.factor=1 atlas.kafka.poll.timeout.ms=1000

atlas.notification.create.topics=true atlas.notification.replicas=1 atlas.notification.topics=ATLAS_HOOK,ATLAS_ENTITIES,ATLAS_HIVEPROCESS,ATLAS_HIVEPARTITION atlas.notification.hook.consumer.topic.names=ATLAS_HOOK,ATLAS_HIVEPROCESS atlas.notification.entities.consumer.topic.names=ATLAS_ENTITIES atlas.notification.hivepartition.consumer.topic.names=ATLAS_HIVEPARTITION atlas.notification.log.failed.messages=true atlas.notification.consumer.retry.interval=500 atlas.notification.hook.retry.interval=1000

Server port configuration

默认端口21000与impala冲突

atlas.server.http.port=21001

atlas.server.https.port=21443

### Security Properties

SSL config

atlas.enableTLS=false

Authentication config

atlas.authentication.method.kerberos=false atlas.authentication.method.file=true

ldap.type= LDAP or AD

atlas.authentication.method.ldap.type=none

user credentials file

atlas.authentication.method.file.filename=/opt/apache-atlas-2.1.0/conf/users-credentials.properties

### Server Properties

atlas.rest.address=http://192.168.80.123:21001

### Entity Audit Configs

atlas.audit.hbase.tablename=apache_atlas_entity_audit atlas.audit.zookeeper.session.timeout.ms=1000 atlas.audit.hbase.zookeeper.quorum=192.168.80.121:2181,192.168.80.122:2181,192.168.80.123:2181

### High Availability Configuration

atlas.server.ha.enabled=false

### Atlas Authorization

atlas.authorizer.impl=simple atlas.authorizer.simple.authz.policy.file=atlas-simple-authz-policy.json

### CSRF Configs

atlas.rest-csrf.enabled=true atlas.rest-csrf.browser-useragents-regex=^Mozilla.,^Opera.,^Chrome.* atlas.rest-csrf.methods-to-ignore=GET,OPTIONS,HEAD,TRACE atlas.rest-csrf.custom-header=X-XSRF-HEADER

###### Atlas Metric/Stats configs

Format: atlas.metric.query..

atlas.metric.query.cache.ttlInSecs=900

Set to false to disable gremlin search.

atlas.search.gremlin.enable=false

### UI Configuration

atlas.ui.default.version=v1 atlas.cluster.name=primary

## HIVE HOOK Configuration

atlas.hook.hive.synchronous=false atlas.hook.hive.numRetries=3 atlas.hook.hive.queueSize=100

#### impala configuration

atlas.hook.impala.keepAliveTime=10 atlas.hook.impala.maxThreads=3 atlas.hook.impala.minThreads=3 atlas.hook.impala.numRetries=3 atlas.hook.impala.queueSize=100 atlas.hook.impala.synchronous=false


3. mkdir ${ATLAS_HOME}/hook/impala

解压 apache-atlas-2.1.0-impala-hook.tar.gz<br />把以下三个文件放入${ATLAS_HOME}/hook/impala目录下<br />atlas-impala-plugin-impl<br />atlas-plugin-classloader-2.1.0.jar<br />impala-bridge-shim-2.1.0.jar

4. 把atlas-application.properties将添加进atlas-plugin-classloader-2.1.0.jar
```shell
zip -u ${ATLAS_HOME}/hook/impala/atlas-plugin-classloader-2.1.0.jar ${ATLAS_HOME}/conf/atlas-application.properties

不然会报如下错误

Java exception follows:
org.apache.atlas.AtlasException: Failed to load application properties
    at org.apache.atlas.ApplicationProperties.get(ApplicationProperties.java:147)
    at org.apache.atlas.ApplicationProperties.get(ApplicationProperties.java:100)
    at org.apache.atlas.hook.AtlasHook.<clinit>(AtlasHook.java:80)
    at org.apache.atlas.impala.hook.ImpalaHook.onImpalaStartup(ImpalaHook.java:47)
    at org.apache.atlas.impala.hook.ImpalaLineageHook.onImpalaStartup(ImpalaLineageHook.java:79)
    at org.apache.impala.hooks.QueryEventHookManager.<init>(QueryEventHookManager.java:148)
    at org.apache.impala.hooks.QueryEventHookManager.createFromConfig(QueryEventHookManager.java:103)
    at org.apache.impala.service.Frontend.<init>(Frontend.java:325)
    at org.apache.impala.service.Frontend.<init>(Frontend.java:285)
    at org.apache.impala.service.JniFrontend.<init>(JniFrontend.java:141)
Caused by: org.apache.commons.configuration.ConfigurationException: Cannot locate configuration source null
    at org.apache.commons.configuration.AbstractFileConfiguration.load(AbstractFileConfiguration.java:259)
    at org.apache.commons.configuration.AbstractFileConfiguration.load(AbstractFileConfiguration.java:238)
    at org.apache.commons.configuration.AbstractFileConfiguration.<init>(AbstractFileConfiguration.java:197)
    at org.apache.commons.configuration.PropertiesConfiguration.<init>(PropertiesConfiguration.java:284)
    at org.apache.atlas.ApplicationProperties.<init>(ApplicationProperties.java:83)
    at org.apache.atlas.ApplicationProperties.get(ApplicationProperties.java:136)
    ... 9 more
I0114 17:15:36.668628 20758 jni-util.cc:288] java.lang.ExceptionInInitializerError
    at org.apache.atlas.impala.hook.ImpalaHook.onImpalaStartup(ImpalaHook.java:47)
    at org.apache.atlas.impala.hook.ImpalaLineageHook.onImpalaStartup(ImpalaLineageHook.java:79)
    at org.apache.impala.hooks.QueryEventHookManager.<init>(QueryEventHookManager.java:148)
    at org.apache.impala.hooks.QueryEventHookManager.createFromConfig(QueryEventHookManager.java:103)
    at org.apache.impala.service.Frontend.<init>(Frontend.java:325)
    at org.apache.impala.service.Frontend.<init>(Frontend.java:285)
    at org.apache.impala.service.JniFrontend.<init>(JniFrontend.java:141)
Caused by: java.lang.NullPointerException
    at org.apache.atlas.hook.AtlasHook.<clinit>(AtlasHook.java:85)
    ... 7 more
I0114 17:15:36.695966 20758 status.cc:126] ExceptionInInitializerError: null
CAUSED BY: NullPointerException: null
    @          0x1c91278  impala::Status::Status()
    @          0x24fe82c  impala::JniUtil::GetJniExceptionMsg()
    @          0x2310e8d  impala::Frontend::Frontend()
    @          0x2110a1d  impala::ExecEnv::ExecEnv()
    @          0x2110586  impala::ExecEnv::ExecEnv()
    @          0x2332bdc  ImpaladMain()
    @          0x1c3918f  main
    @     0x7f61a9e0c554  __libc_start_main
    @          0x1c39006  (unknown)
F0114 17:15:36.696000 20758 frontend.cc:134] ExceptionInInitializerError: null
CAUSED BY: NullPointerException: null
  1. 把以下三个文件拷入impala lib 目录下(/opt/cloudera/parcels/CDH/lib/apache-impala-3.4/lib),并分发至各个impala 节点。

atlas-impala-plugin-impl
atlas-plugin-classloader-2.1.0.jar
impala-bridge-shim-2.1.0.jar

  1. 在CM中重启impala
  2. 在impala daemon节点查看日志,或者在CM中查看日志

    tail -f /var/log/impalad/impalad.INFO
    

    Snipaste_2022-01-18_09-58-41.png
    Snipaste_2022-01-18_10-04-16.png

  3. 在atlas中查看血缘关系

Snipaste_2022-01-18_10-05-36.png

参考文献

  1. https://www.jianshu.com/p/581e44f70044
  2. https://stackoverflow.com/questions/61689819/jar-conflicts-java-lang-verifyerror-stack-map-does-not-match-the-one-at-excep
  3. https://blog.csdn.net/h952520296/article/details/110874432
  4. impala-3.3.pdf