一、Atlas是什么?

Atlas是Hadoop的数据治理和元数据框架。简单理解就是一个跟 Hadoop 关系紧密的,可以用来做各类数据的元数据管理的一个软件系统;
Apache Atlas是Hadoop社区为解决Hadoop生态系统的元数据治理问题而产生的开源项目,它为Hadoop集群提供了包括数据分类、集中策略引擎、数据血缘、安全和生命周期管理在内的元数据治理核心功能。
atlas本身,就是一个典型的JAVAWEB系统。
Atlas官网地址:https://atlas.apache.org/

  • Atlas支持各种Hadoop和非Hadoop元数据类型
  • 提供了丰富的REST API进行集成
  • 对数据血缘的追溯达到了字段级别,这种技术还没有其实类似框架可以实现
  • 对权限也有很好的控制

    二、架构原理

    图片.png
    Atlas包括以下组件:
    Hbase:存储元数据
    Solr:索引
    Ingest/Export :采集导出组件
    TypeSystem:类型系统
    Graph Engine:图形引擎
    Atlas支持各种源获取元数据:Hive,Sqoop,Storm。。。

  • 采用Hbase存储元数据

    三、核心特性

    Apache Atlas为Hadoop的元数据治理提供了以下特性:

    3.1数据分类管理

    为元数据导入或定义业务导向的分类注释
    定义,注释,以及自动捕获数据集和底层元素之间的关系
    导出元数据到第三方系统

    3.2 集中审计

    捕获与所有应用,过程以及与数据交互的安全访问信息
    捕获执行,步骤,活动等操作的信息;

    3.3 搜索与血缘

    预定义的导航路径用来探索数据分类以及审计信息
    基于文本的搜索特性来快速和准确的定位相关联的数据和审计事件
    对数据集血缘关系的可视化浏览使用户可以下钻到操作,安全以及数据起源相关的信息
    安全与策略引擎
    基于数据分类模式,属性以及角色的运行时合理合规策略
    基于分类-预测的高级策略定义以防止数据推导
    基于cell的属性和值的行/列级别的masking

    四、构建和安装

    atlas编译老麻烦了,所以可以直接用编译后的版本,下载地址如下:
    使用本地Apache HBase和Apache Solr来启动Apache Atlas

运行以下命令启动Apache Atlas

  1. export MANAGE_LOCAL_HBASE=true
  2. export MANAGE_LOCAL_SOLR=true
  3. bin/atlas_start.py

https://www.freesion.com/article/2646101710/

五、REST API 官方文档翻译

https://atlas.apache.org/api/v2/index.html
Atlas公开了各种REST端点,以处理类型、实体、沿袭和数据发现。

资源

WADL document:描述REST API
你也可以使用Swagger提供的交互式接口来了解REST API

name path methods description
DiscoveryREST
- /v2/search/attribute
- /v2/search/basic
- /v2/search/dsl
- /v2/search/fulltext
- /v2/search/quick
- /v2/search/relationship
- /v2/search/saved
- /v2/search/suggestions
- /v2/search/saved/{guid}
- /v2/search/saved/{name}
- /v2/search/saved/execute/{name}
- /v2/search/saved/execute/guid/{guid}

- GET
- GET POST
- GET
- GET
- GET POST
- GET
- GET POST PUT
- GET
- DELETE
- GET
- GET
- GET
REST interface for data discovery using dsl or full text search
EntityREST
- /v2/entity
- /v2/entity/bulk
- /v2/entity/bulk/classification
- /v2/entity/bulk/headers
- /v2/entity/bulk/setClassifications
- /v2/entity/businessmetadata/import
- /v2/entity/guid/{guid}
- /v2/entity/{guid}/audit
- /v2/entity/businessmetadata/import/template
- /v2/entity/guid/{guid}/businessmetadata
- /v2/entity/guid/{guid}/classifications
- /v2/entity/guid/{guid}/header
- /v2/entity/guid/{guid}/labels
- /v2/entity/uniqueAttribute/type/{typeName}
- /v2/entity/bulk/uniqueAttribute/type/{typeName}
- /v2/entity/guid/{guid}/businessmetadata/{bmName}
- /v2/entity/guid/{guid}/classification/{classificationName}
- /v2/entity/uniqueAttribute/type/{typeName}/classifications
- /v2/entity/uniqueAttribute/type/{typeName}/header
- /v2/entity/uniqueAttribute/type/{typeName}/labels
- /v2/entity/uniqueAttribute/type/{typeName}/classification/{classificationName}

- POST
- DELETE GET POST
- POST
- GET
- POST
- POST
- DELETE GET PUT
- GET
- GET
- DELETE POST
- GET POST PUT
- GET
- DELETE POST PUT
- DELETE GET PUT
- GET
- DELETE POST
- DELETE GET
- POST PUT
- GET
- DELETE POST PUT
- DELETE
REST for a single entity
GlossaryREST
- /v2/glossary
- /v2/glossary/categories
- /v2/glossary/category
- /v2/glossary/import
- /v2/glossary/term
- /v2/glossary/terms
- /v2/glossary/{glossaryGuid}
- /v2/glossary/category/{categoryGuid}
- /v2/glossary/import/template
- /v2/glossary/term/{termGuid}
- /v2/glossary/{glossaryGuid}/categories
- /v2/glossary/{glossaryGuid}/detailed
- /v2/glossary/{glossaryGuid}/partial
- /v2/glossary/{glossaryGuid}/terms
- /v2/glossary/category/{categoryGuid}/partial
- /v2/glossary/category/{categoryGuid}/related
- /v2/glossary/category/{categoryGuid}/terms
- /v2/glossary/term/{termGuid}/partial
- /v2/glossary/terms/{termGuid}/assignedEntities
- /v2/glossary/terms/{termGuid}/related
- /v2/glossary/{glossaryGuid}/categories/headers
- /v2/glossary/{glossaryGuid}/terms/headers

- GET POST
- POST
- POST
- POST
- POST
- POST
- DELETE GET PUT
- DELETE GET PUT
- GET
- DELETE GET PUT
- GET
- GET
- PUT
- GET
- PUT
- GET
- GET
- PUT
- DELETE GET POST PUT
- GET
- GET
- GET
LineageREST
- /v2/lineage/{guid}
- /v2/lineage/uniqueAttribute/type/{typeName}

- GET
- GET
REST interface for an entity’s lineage information
RelationshipREST
- /v2/relationship
- /v2/relationship/guid/{guid}

- POST PUT
- DELETE GET
REST interface for entity relationships.
TypesREST
- /v2/types/typedefs
- /v2/types/typedefs/headers
- /v2/types/businessmetadatadef/guid/{guid}
- /v2/types/businessmetadatadef/name/{name}
- /v2/types/classificationdef/guid/{guid}
- /v2/types/classificationdef/name/{name}
- /v2/types/entitydef/guid/{guid}
- /v2/types/entitydef/name/{name}
- /v2/types/enumdef/guid/{guid}
- /v2/types/enumdef/name/{name}
- /v2/types/relationshipdef/guid/{guid}
- /v2/types/relationshipdef/name/{name}
- /v2/types/structdef/guid/{guid}
- /v2/types/structdef/name/{name}
- /v2/types/typedef/guid/{guid}
- /v2/types/typedef/name/{name}
- /v2/types/typedef/name/{typeName}

- DELETE GET POST PUT
- GET
- GET
- GET
- GET
- GET
- GET
- GET
- GET
- GET
- GET
- GET
- GET
- GET
- GET
- GET
- DELETE
REST interface for CRUD operations on type definitions

数据类型

JSON

type description
AtlasAggregationEntry An instance of an entity - like hive_table, hive_database.
AtlasAttributeDef class that captures details of a struct-attribute.
AtlasBaseModelObject
AtlasBaseTypeDef Base class that captures common-attributes for all Atlas types.
AtlasBusinessMetadataDef class that captures details of a struct-type.
AtlasClassification An instance of a classification; it doesn’t have an identity, this object exists only when associated with an entity.
AtlasClassificationDef class that captures details of a classification-type.
AtlasClassifications REST serialization friendly list.
AtlasConstraintDef class that captures details of a constraint.
AtlasEntitiesWithExtInfo An instance of an entity along with extended info - like hive_table, hive_database.
AtlasEntity An instance of an entity - like hive_table, hive_database.
AtlasEntityDef class that captures details of a entity-type.
AtlasEntityExtInfo An instance of an entity along with extended info - like hive_table, hive_database.
AtlasEntityHeader An instance of an entity - like hive_table, hive_database.
AtlasEntityHeaders
AtlasEntityWithExtInfo An instance of an entity along with extended info - like hive_table, hive_database.
AtlasEnumDef class that captures details of an enum-type.
AtlasEnumElementDef class that captures details of an enum-element.
AtlasFullTextResult
AtlasGlossary
AtlasGlossaryBaseObject
AtlasGlossaryCategory
AtlasGlossaryExtInfo
AtlasGlossaryHeader
AtlasGlossaryTerm
AtlasGlossaryTermHeader
AtlasLineageInfo
AtlasObjectId Reference to an object-instance of an Atlas type - like entity.
AtlasQueryType
AtlasQuickSearchResult
AtlasRelatedCategoryHeader
AtlasRelatedObjectId Reference to an object-instance of AtlasEntity type used in relationship attribute values
AtlasRelatedTermHeader
AtlasRelationship Atlas relationship instance.
AtlasRelationshipAttributeDef class that captures details of a struct-attribute.
AtlasRelationshipDef AtlasRelationshipDef is a TypeDef that defines a relationship.
As with other typeDefs the AtlasRelationshipDef has a name. Once created the RelationshipDef has a guid. The name and the guid are the 2 ways that the RelationshipDef is identified.
RelationshipDefs have 2 ends, each of which specify cardinality, an EntityDef type name and name and optionally whether the end is a container.
RelationshipDefs can have AttributeDefs - though only primitive types are allowed.
RelationshipDefs have a relationshipCategory specifying the UML type of relationship required
RelationshipDefs also have a PropogateTag - indicating which way tags could flow over the relationships.
The way EntityDefs and RelationshipDefs are intended to be used is that EntityDefs will define AttributeDefs these AttributeDefs will not specify an EntityDef type name as their types.
RelationshipDefs introduce new attributes to the entity instances. For example
EntityDef A might have attributes attr1,attr2,attr3
EntityDef B might have attributes attr4,attr5,attr6
RelationshipDef AtoB might define 2 ends

end1: type A, name attr7 end2: type B, name attr8
When an instance of EntityDef A is created, it will have attributes attr1,attr2,attr3,attr7
When an instance of EntityDef B is created, it will have attributes attr4,attr5,attr6,attr8
In this way relationshipDefs can be authored separately from entityDefs and can inject relationship attributes into the entity instances | | AtlasRelationshipEndDef | The relationshipEndDef represents an end of the relationship. The end of the relationship is defined by a type, an attribute name, cardinality and whether it is the container end of the relationship. | | AtlasRelationshipWithExtInfo | | | AtlasSearchResult | | | AtlasStruct | Captures details of struct contents. Not instantiated directly, used only via AtlasEntity, AtlasClassification. | | AtlasStructDef | class that captures details of a struct-type. | | AtlasSuggestionsResult | | | AtlasTermAssignmentHeader | | | AtlasTermAssignmentStatus | | | AtlasTermCategorizationHeader | | | AtlasTermRelationshipStatus | | | AtlasTypeDefHeader | | | AtlasTypesDef | | | AtlasUserSavedSearch | | | AttributeSearchResult | | | BulkImportResponse | | | Cardinality | single-valued attribute or multi-valued attribute. | | ClassificationAssociateRequest | | | Condition | | | DateFormat | | | EntityAuditActionV2 | | | EntityAuditEventV2 | Structure of v2 entity audit event | | EntityAuditType | | | EntityMutationResponse | | | EntityOperation | | | FilterCriteria | | | Format | | | ImportInfo | | | ImportStatus | | | IndexType | | | LineageDirection | | | LineageRelation | | | NumberFormat | | | Operator | NOTE : The names added in the String array should always contain first value in lower case | | PList | Paginated-list, for returning search results. | | PropagateTags | PropagateTags indicates whether tags should propagate across the relationship instance.
Tags can propagate:
NONE - not at all
ONE_TO_TWO - from end 1 to 2
TWO_TO_ONE - from end 2 to 1
BOTH - both ways
Care needs to be taken when specifying. The use cases we are aware of where this flag is useful:
- propagating confidentiality classifications from a table to columns - ONE_TO_TWO could be used here
- propagating classifications around Glossary synonyms - BOTH could be used here.
There is an expectation that further enhancements will allow more granular control of tag propagation and will address how to resolve conflicts. | | QuickSearchParameters | | | Relation | | | RelationshipCategory | The Relationship category determines the style of relationship around containment and lifecycle. UML terminology is used for the values.
ASSOCIATION is a relationship with no containment.
COMPOSITION and AGGREGATION are containment relationships.
The difference being in the lifecycles of the container and its children. In the COMPOSITION case, the children cannot exist without the container. For AGGREGATION, the life cycles of the container and children are totally independant. | | RoundingMode | | | SavedSearchType | | | SearchFilter | Generic filter, to specify search criteria using name/value pairs. | | SearchParameters | | | SortOrder | | | SortType | to specify whether the result should be sorted? If yes, whether asc or desc. | | Status | Status of the entity - can be active or deleted. Deleted entities are not removed from Atlas store. | | Status | | | TimeBoundary | Captures time-boundary details | | TimeZone | | | TypeCategory | |

XML

type description
PList Paginated-list, for returning search results.
searchFilter Generic filter, to specify search criteria using name/value pairs.
sortType to specify whether the result should be sorted? If yes, whether asc or desc.
timeBoundary Captures time-boundary details