Apache Beam managed service for unified stream and batch data processing patterns. Automated infrastructure provisioning and cluster management with horizontal autoscaling.
Dataflow is an Apache Beam managed service for unified stream and batch data processing patterns. It provides automated infrastructure provisioning and cluster management with horizontal autoscaling.
Note: If you have an existing Hadoop/Spark ecosystem, use Dataproc instead.
Billed in per-second increments, on a per-job basis:
Dividing unbounded data streams into logical chunks for processing.
Watermark is an estimation of the oldest unprocessed data. It represents the system's understanding of how complete the data stream is.
Q1. When should you choose Dataflow over Dataproc?
Q2. What is a Watermark in Dataflow?
Q3. What are the three types of windowing supported by Dataflow?