2015年 Google 的论文:
The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing
The Dataflow Model.pdf

1. Intruction

1.1 Unbounded/Bounded vs Streaming/Batch

1.2 Windowing

1.3 Time Domains

  • Event time
  • Processing time

2. DataFlow Model

2.1 Core Primitives

  • ParDo
  • GroupByKey

    2.2 Windowing

    2.2.1 Window Assignment

    2.2.2 Window Merging

    2.2.3 API

    2.3 Triggers & Incremental Processing

    Reference

  1. The Dataflow Model.pptx