一、Atlas是什么?
Atlas是Hadoop的数据治理和元数据框架。简单理解就是一个跟 Hadoop 关系紧密的,可以用来做各类数据的元数据管理的一个软件系统;
Apache Atlas是Hadoop社区为解决Hadoop生态系统的元数据治理问题而产生的开源项目,它为Hadoop集群提供了包括数据分类、集中策略引擎、数据血缘、安全和生命周期管理在内的元数据治理核心功能。
atlas本身,就是一个典型的JAVAWEB系统。
Atlas官网地址:https://atlas.apache.org/
- Atlas支持各种Hadoop和非Hadoop元数据类型
- 提供了丰富的REST API进行集成
- 对数据血缘的追溯达到了字段级别,这种技术还没有其实类似框架可以实现
-
二、架构原理

Atlas包括以下组件:
Hbase:存储元数据
Solr:索引
Ingest/Export :采集导出组件
TypeSystem:类型系统
Graph Engine:图形引擎
Atlas支持各种源获取元数据:Hive,Sqoop,Storm。。。 -
三、核心特性
Apache Atlas为Hadoop的元数据治理提供了以下特性:
3.1数据分类管理
为元数据导入或定义业务导向的分类注释
定义,注释,以及自动捕获数据集和底层元素之间的关系
导出元数据到第三方系统3.2 集中审计
捕获与所有应用,过程以及与数据交互的安全访问信息
捕获执行,步骤,活动等操作的信息;3.3 搜索与血缘
预定义的导航路径用来探索数据分类以及审计信息
基于文本的搜索特性来快速和准确的定位相关联的数据和审计事件
对数据集血缘关系的可视化浏览使用户可以下钻到操作,安全以及数据起源相关的信息
安全与策略引擎
基于数据分类模式,属性以及角色的运行时合理合规策略
基于分类-预测的高级策略定义以防止数据推导
基于cell的属性和值的行/列级别的masking四、构建和安装
atlas编译老麻烦了,所以可以直接用编译后的版本,下载地址如下:
使用本地Apache HBase和Apache Solr来启动Apache Atlas
运行以下命令启动Apache Atlas
export MANAGE_LOCAL_HBASE=trueexport MANAGE_LOCAL_SOLR=truebin/atlas_start.py
https://www.freesion.com/article/2646101710/
五、REST API 官方文档翻译
https://atlas.apache.org/api/v2/index.html
Atlas公开了各种REST端点,以处理类型、实体、沿袭和数据发现。
资源
WADL document:描述REST API
你也可以使用Swagger提供的交互式接口来了解REST API
| name | path | methods | description |
|---|---|---|---|
| DiscoveryREST | - /v2/search/attribute - /v2/search/basic - /v2/search/dsl - /v2/search/fulltext - /v2/search/quick - /v2/search/relationship - /v2/search/saved - /v2/search/suggestions - /v2/search/saved/{guid} - /v2/search/saved/{name} - /v2/search/saved/execute/{name} - /v2/search/saved/execute/guid/{guid} |
- GET - GET POST - GET - GET - GET POST - GET - GET POST PUT - GET - DELETE - GET - GET - GET |
REST interface for data discovery using dsl or full text search |
| EntityREST | - /v2/entity - /v2/entity/bulk - /v2/entity/bulk/classification - /v2/entity/bulk/headers - /v2/entity/bulk/setClassifications - /v2/entity/businessmetadata/import - /v2/entity/guid/{guid} - /v2/entity/{guid}/audit - /v2/entity/businessmetadata/import/template - /v2/entity/guid/{guid}/businessmetadata - /v2/entity/guid/{guid}/classifications - /v2/entity/guid/{guid}/header - /v2/entity/guid/{guid}/labels - /v2/entity/uniqueAttribute/type/{typeName} - /v2/entity/bulk/uniqueAttribute/type/{typeName} - /v2/entity/guid/{guid}/businessmetadata/{bmName} - /v2/entity/guid/{guid}/classification/{classificationName} - /v2/entity/uniqueAttribute/type/{typeName}/classifications - /v2/entity/uniqueAttribute/type/{typeName}/header - /v2/entity/uniqueAttribute/type/{typeName}/labels - /v2/entity/uniqueAttribute/type/{typeName}/classification/{classificationName} |
- POST - DELETE GET POST - POST - GET - POST - POST - DELETE GET PUT - GET - GET - DELETE POST - GET POST PUT - GET - DELETE POST PUT - DELETE GET PUT - GET - DELETE POST - DELETE GET - POST PUT - GET - DELETE POST PUT - DELETE |
REST for a single entity |
| GlossaryREST | - /v2/glossary - /v2/glossary/categories - /v2/glossary/category - /v2/glossary/import - /v2/glossary/term - /v2/glossary/terms - /v2/glossary/{glossaryGuid} - /v2/glossary/category/{categoryGuid} - /v2/glossary/import/template - /v2/glossary/term/{termGuid} - /v2/glossary/{glossaryGuid}/categories - /v2/glossary/{glossaryGuid}/detailed - /v2/glossary/{glossaryGuid}/partial - /v2/glossary/{glossaryGuid}/terms - /v2/glossary/category/{categoryGuid}/partial - /v2/glossary/category/{categoryGuid}/related - /v2/glossary/category/{categoryGuid}/terms - /v2/glossary/term/{termGuid}/partial - /v2/glossary/terms/{termGuid}/assignedEntities - /v2/glossary/terms/{termGuid}/related - /v2/glossary/{glossaryGuid}/categories/headers - /v2/glossary/{glossaryGuid}/terms/headers |
- GET POST - POST - POST - POST - POST - POST - DELETE GET PUT - DELETE GET PUT - GET - DELETE GET PUT - GET - GET - PUT - GET - PUT - GET - GET - PUT - DELETE GET POST PUT - GET - GET - GET |
|
| LineageREST | - /v2/lineage/{guid} - /v2/lineage/uniqueAttribute/type/{typeName} |
- GET - GET |
REST interface for an entity’s lineage information |
| RelationshipREST | - /v2/relationship - /v2/relationship/guid/{guid} |
- POST PUT - DELETE GET |
REST interface for entity relationships. |
| TypesREST | - /v2/types/typedefs - /v2/types/typedefs/headers - /v2/types/businessmetadatadef/guid/{guid} - /v2/types/businessmetadatadef/name/{name} - /v2/types/classificationdef/guid/{guid} - /v2/types/classificationdef/name/{name} - /v2/types/entitydef/guid/{guid} - /v2/types/entitydef/name/{name} - /v2/types/enumdef/guid/{guid} - /v2/types/enumdef/name/{name} - /v2/types/relationshipdef/guid/{guid} - /v2/types/relationshipdef/name/{name} - /v2/types/structdef/guid/{guid} - /v2/types/structdef/name/{name} - /v2/types/typedef/guid/{guid} - /v2/types/typedef/name/{name} - /v2/types/typedef/name/{typeName} |
- DELETE GET POST PUT - GET - GET - GET - GET - GET - GET - GET - GET - GET - GET - GET - GET - GET - GET - GET - DELETE |
REST interface for CRUD operations on type definitions |
数据类型
JSON
| type | description |
|---|---|
| AtlasAggregationEntry | An instance of an entity - like hive_table, hive_database. |
| AtlasAttributeDef | class that captures details of a struct-attribute. |
| AtlasBaseModelObject | |
| AtlasBaseTypeDef | Base class that captures common-attributes for all Atlas types. |
| AtlasBusinessMetadataDef | class that captures details of a struct-type. |
| AtlasClassification | An instance of a classification; it doesn’t have an identity, this object exists only when associated with an entity. |
| AtlasClassificationDef | class that captures details of a classification-type. |
| AtlasClassifications | REST serialization friendly list. |
| AtlasConstraintDef | class that captures details of a constraint. |
| AtlasEntitiesWithExtInfo | An instance of an entity along with extended info - like hive_table, hive_database. |
| AtlasEntity | An instance of an entity - like hive_table, hive_database. |
| AtlasEntityDef | class that captures details of a entity-type. |
| AtlasEntityExtInfo | An instance of an entity along with extended info - like hive_table, hive_database. |
| AtlasEntityHeader | An instance of an entity - like hive_table, hive_database. |
| AtlasEntityHeaders | |
| AtlasEntityWithExtInfo | An instance of an entity along with extended info - like hive_table, hive_database. |
| AtlasEnumDef | class that captures details of an enum-type. |
| AtlasEnumElementDef | class that captures details of an enum-element. |
| AtlasFullTextResult | |
| AtlasGlossary | |
| AtlasGlossaryBaseObject | |
| AtlasGlossaryCategory | |
| AtlasGlossaryExtInfo | |
| AtlasGlossaryHeader | |
| AtlasGlossaryTerm | |
| AtlasGlossaryTermHeader | |
| AtlasLineageInfo | |
| AtlasObjectId | Reference to an object-instance of an Atlas type - like entity. |
| AtlasQueryType | |
| AtlasQuickSearchResult | |
| AtlasRelatedCategoryHeader | |
| AtlasRelatedObjectId | Reference to an object-instance of AtlasEntity type used in relationship attribute values |
| AtlasRelatedTermHeader | |
| AtlasRelationship | Atlas relationship instance. |
| AtlasRelationshipAttributeDef | class that captures details of a struct-attribute. |
| AtlasRelationshipDef | AtlasRelationshipDef is a TypeDef that defines a relationship. As with other typeDefs the AtlasRelationshipDef has a name. Once created the RelationshipDef has a guid. The name and the guid are the 2 ways that the RelationshipDef is identified. RelationshipDefs have 2 ends, each of which specify cardinality, an EntityDef type name and name and optionally whether the end is a container. RelationshipDefs can have AttributeDefs - though only primitive types are allowed. RelationshipDefs have a relationshipCategory specifying the UML type of relationship required RelationshipDefs also have a PropogateTag - indicating which way tags could flow over the relationships. The way EntityDefs and RelationshipDefs are intended to be used is that EntityDefs will define AttributeDefs these AttributeDefs will not specify an EntityDef type name as their types. RelationshipDefs introduce new attributes to the entity instances. For example EntityDef A might have attributes attr1,attr2,attr3 EntityDef B might have attributes attr4,attr5,attr6 RelationshipDef AtoB might define 2 ends |
end1: type A, name attr7 end2: type B, name attr8
When an instance of EntityDef A is created, it will have attributes attr1,attr2,attr3,attr7
When an instance of EntityDef B is created, it will have attributes attr4,attr5,attr6,attr8
In this way relationshipDefs can be authored separately from entityDefs and can inject relationship attributes into the entity instances |
| AtlasRelationshipEndDef | The relationshipEndDef represents an end of the relationship. The end of the relationship is defined by a type, an attribute name, cardinality and whether it is the container end of the relationship. |
| AtlasRelationshipWithExtInfo | |
| AtlasSearchResult | |
| AtlasStruct | Captures details of struct contents. Not instantiated directly, used only via AtlasEntity, AtlasClassification. |
| AtlasStructDef | class that captures details of a struct-type. |
| AtlasSuggestionsResult | |
| AtlasTermAssignmentHeader | |
| AtlasTermAssignmentStatus | |
| AtlasTermCategorizationHeader | |
| AtlasTermRelationshipStatus | |
| AtlasTypeDefHeader | |
| AtlasTypesDef | |
| AtlasUserSavedSearch | |
| AttributeSearchResult | |
| BulkImportResponse | |
| Cardinality | single-valued attribute or multi-valued attribute. |
| ClassificationAssociateRequest | |
| Condition | |
| DateFormat | |
| EntityAuditActionV2 | |
| EntityAuditEventV2 | Structure of v2 entity audit event |
| EntityAuditType | |
| EntityMutationResponse | |
| EntityOperation | |
| FilterCriteria | |
| Format | |
| ImportInfo | |
| ImportStatus | |
| IndexType | |
| LineageDirection | |
| LineageRelation | |
| NumberFormat | |
| Operator | NOTE : The names added in the String array should always contain first value in lower case |
| PList | Paginated-list, for returning search results. |
| PropagateTags | PropagateTags indicates whether tags should propagate across the relationship instance.
Tags can propagate:
NONE - not at all
ONE_TO_TWO - from end 1 to 2
TWO_TO_ONE - from end 2 to 1
BOTH - both ways
Care needs to be taken when specifying. The use cases we are aware of where this flag is useful:
- propagating confidentiality classifications from a table to columns - ONE_TO_TWO could be used here
- propagating classifications around Glossary synonyms - BOTH could be used here.
There is an expectation that further enhancements will allow more granular control of tag propagation and will address how to resolve conflicts. |
| QuickSearchParameters | |
| Relation | |
| RelationshipCategory | The Relationship category determines the style of relationship around containment and lifecycle. UML terminology is used for the values.
ASSOCIATION is a relationship with no containment.
COMPOSITION and AGGREGATION are containment relationships.
The difference being in the lifecycles of the container and its children. In the COMPOSITION case, the children cannot exist without the container. For AGGREGATION, the life cycles of the container and children are totally independant. |
| RoundingMode | |
| SavedSearchType | |
| SearchFilter | Generic filter, to specify search criteria using name/value pairs. |
| SearchParameters | |
| SortOrder | |
| SortType | to specify whether the result should be sorted? If yes, whether asc or desc. |
| Status | Status of the entity - can be active or deleted. Deleted entities are not removed from Atlas store. |
| Status | |
| TimeBoundary | Captures time-boundary details |
| TimeZone | |
| TypeCategory | |
XML
| type | description |
|---|---|
| PList | Paginated-list, for returning search results. |
| searchFilter | Generic filter, to specify search criteria using name/value pairs. |
| sortType | to specify whether the result should be sorted? If yes, whether asc or desc. |
| timeBoundary | Captures time-boundary details |
