Introduction Pravega is an open-source distributed storage system implementing streams as first-class primitive for storing/serving continuous and unbounded data [1]. A Pravega stream is a durable, elastic, append-only, and unbounded sequence of bytes providing a strong consistency model guaranteeing data durability, message ordering, and exactly-once support. Since Pravega stores sequences of bytes and not events or messages, it provides […]
If you missed my previous post, Pravega Flink connector 101, we strongly recommend you take the time to read that one first. It introduced how Flink DataStream API works with reading from and writing to Pravega streams, which lays the necessary foundation for the topics we’ll cover in this post. To briefly recap the last […]
Over the last 40 years, the European Union has built a powerful research framework through a variety of research programmes, such as FP1-9, Horizon 2020, and more recently, Horizon Europe, among others [1]. Research programmes are organized into calls that address timely and relevant societal, economic, and cultural challenges in the European landscape, including health […]
Change Data Capture (CDC) is becoming a popular technique for interconnecting disparate systems, for replicating state across traditional boundaries, for decomposing existing monoliths into microservices, and for the recordation of audit trails. CDC is the idea of emitting a changelog of all INSERT‘s, UPDATE‘s, DELETE‘s, and schema changes performed on a database. Debezium.io is an […]
Introduction Pravega is a storage system based on the stream abstraction, providing the ability to process tail data (low-latency streaming) and historical data (catchup and batch reads). Relatedly, Apache Flink is a widely-used real-time computing engine that provides unified batch and stream processing. Flink provides high-throughput, low-latency streaming data processing, as well as support for complex event […]
Introduction Today there are billions of sensors around the world, producing a massive amount of data. Some sensor data will be used only at the edge, and some will be sent to the cloud or data centers for aggregation, analytics, and AI efforts. These sensors may measure or produce images, video, lidar, audio, acceleration, GPS, […]
We are pleased to announce Pravega 0.9.0, our first release since Pravega became part of CNCF (Cloud Native Computing Foundation). This release continues to expand the Pravega feature-set and improves the performance of mission-critical use cases, and, of course, brings improved stability overall. In 2020, Pravega community delivered several significant releases. We introduced Streaming Cache […]
Raúl Gracia and Flavio Junqueira Introduction Streaming applications commonly ingest data from a wide range of elements – e.g., sensors, users, servers – concurrently to form a single stream of events. Using a single stream to capture the parallel data flows generated by multiple such elements enables applications to better reason about data and even […]
Raul Gracia and Flavio Junqueira Introduction Streaming systems continuously ingest and process data from a variety of data sources. They build on append-only data structures to enable efficient write and read access, targeting low-latency end-to-end. As more of the data sources in applications are machines, the expected volume of continuously generated data has been growing […]
Introduction The fundamentals of stream semantics in Pravega are learned through familiarity with its client APIs. In this article, we will overview Pravega’s client APIs with a handful of simple examples. As we reach the end, you should see Pravega in action, understand the guarantees afforded by Pravega streams, and have some familiarity with several […]