参考Timely Stream Processing

Introduction

timely stream processing是stateful stream processing的扩展

Time

  • Processing Time
    • 按operator的主机时间来计算窗口
    • 性能好,低延迟
    • 但是消息到达、处理和输出的速度是不确定的
    • 可能造成某些任务的结果异常
  • Event Time
    • 事件或者消息发生的时间
    • 在record中一般就有这个time字段
    • Event time programs must specify how to generate Event Time Watermarks, which is the mechanism that signals progress in event time

Event Time and Watermarks

  • 支持event time的stream operator需要一个方法,衡量event time的进度
  • The mechanism in Flink to measure progress in event time is watermarks.
  • Watermarks flow as part of the data stream and carry a timestamp t.
  • Watermark(t) 代表 event time已经达到了t
    • 所有早于t的数据都已经到达
    • 后续不应该再有早于t或等于的数据进来了

image.png
image.png

  • Once a watermark reaches an operator, the operator can advance its internal event time clock to the value of the watermark.


Watermarks in Parallel Streams

  • watermarks在source function处生成
  • 有多个source的情况,每个source处独立生成其watermark
  • operator接收到watermark后会把自己的时间修改到最新(不再接受watermark更早的数据了),同时把watermark再向后传递
  • operator接收多个stream时,如 keyBy或者partition function. 取多个流的watermark的最小值作为本operator的时间

image.png

Lateness

Windowing