• step1: Identify data in the scope of the aechive.

    确定存档范围内的数据:对于同一个问题,角度不同会导致需要保存数据的不同。

  • step2: Get data

  • step3: Fix data
  • step4: Make findable and usable.
  • step5: Ensure it will last forever.

  • big data : 对于创通数据库工具来说 太难 获得 保存 管理 和 分析 的 超大数据集
  • size永远都是增长很快的目标
  • Bigness:
  1. 量化—IoT
  • The V’s
  • Volume
  • Variety
  • Velocity
  • Veracity 真实

zettabyte:10的21次方

  • Promise
  • Finacial services
  • Education
  • Type : 经验主义者 怀疑论者
  • Training:教练 或者 data-driven 决策

很多时候,有些变量或者情况是未知的或没有发生过的

Big Data Ethics

  1. Identity
  • 线上和线下的关系?
  1. Privacy
  • 谁能访问?
  1. Ownership
  • 谁能拥有?
  1. Reputation
  • 谁值得信赖?

Metadata

种类(5种)

  1. Administrative 管理性元素
  • Manage and administer collections/information resources
  • Origin and maintenance of object, e.g. copyright info
  1. Descriptive 描述性元素
  • Identify and describe collections and information resources/objects
  1. Preservation 保存
  • Related to preservation management of collections/information resources
  1. Technical 技术性元素
  • Related to how a system functions
  • E.g. hardware or software documentation
  1. Use 使用
  • Related to level and type of use of collections and information resources
  • E.g. number of times resource downloaded

    来源

  1. The original authoring/ management systems
  2. The object itself
  3. The existing descriptive record, e.g. in catalog
  4. Other documentation
  • e.g. system documentation or manuals, data dictionaries
  1. Oral history
  • e.g. depositors, original users, or end users

    Automatic generation

    Automatic approaches:
  1. Derived: automatically generated according to system design (i.e. pre-programmed)
  • e.g. date created, who created, date modified, or resource size
  1. Extracted: generated by running automatic indexing algorithms on resource content
  • e.g. subject keywords or noun phrases
  1. Harvested: automatically gathered regardless of how it was generated originally
  • key in Open Archives Initiative (OAI)

    Representation

  • Element = Attribute + Value

  • 元分析—计算聚合
  • 数据管理的内容

(1)制定相应的监管规则
(2)维护数据收集处理过程的记录
(3)

  • 元数据是以数字形式产生的数据