Introduction
timely stream processing是stateful stream processing的扩展
Time
- Processing Time
- 按operator的主机时间来计算窗口
- 性能好,低延迟
- 但是消息到达、处理和输出的速度是不确定的
- 可能造成某些任务的结果异常
- Event Time
- 事件或者消息发生的时间
- 在record中一般就有这个time字段
- Event time programs must specify how to generate Event Time Watermarks, which is the mechanism that signals progress in event time
Event Time and Watermarks
- 支持event time的stream operator需要一个方法,衡量event time的进度
- The mechanism in Flink to measure progress in event time is watermarks.
- Watermarks flow as part of the data stream and carry a timestamp t.
- Watermark(t) 代表 event time已经达到了t
- 所有早于t的数据都已经到达
- 后续不应该再有早于t或等于的数据进来了
- Once a watermark reaches an operator, the operator can advance its internal event time clock to the value of the watermark.
Watermarks in Parallel Streams
- watermarks在source function处生成
- 有多个source的情况,每个source处独立生成其watermark
- operator接收到watermark后会把自己的时间修改到最新(不再接受watermark更早的数据了),同时把watermark再向后传递
- operator接收多个stream时,如 keyBy或者partition function. 取多个流的watermark的最小值作为本operator的时间