Home – Pravega

Open Source

Streaming is motivating us to rethink fundamental data processing and storage principles. As storage experts, Dell EMC is doing its part by designing a new storage primitive purpose-built for streaming data. We are open sourcing Pravega under the Apache 2.0 License to accelerate the adoption of streaming technology. Open source is right for Pravega because we believe that disruptive technologies should be owned and driven by a community of passionate open source developers.

Presentations

Quick Start

Why Pravega

Exactly-Once Semantics

Ensure that each event is delivered and processed exactly once, with exact ordering guarantees, despite failures in clients, servers or the network.

Auto-Scaling

Unlike systems with static partitioning, Pravega can automatically scale individual data streams to accommodate changes in data ingestion rate.

Distributed Computing Primitive

Pravega is great for distributed computing; it can be used as a data storage mechanism, for messaging between processes and for other distributed computing services such as leader election.

Write Efficiency

Pravega shrinks write latency to milliseconds, and seamlessly scales to handle high throughput reads and writes from thousands of concurrent clients, making it ideal for IoT and other time sensitive applications.

Unlimited Retention

Ingest, process and retain data in streams forever. Use same paradigm to access both real-time and historical events stored in Pravega.

Storage Efficiency

Use Pravega to build pipelines of data processing, combining batch, real-time and other applications without duplicating data for every step of the pipeline.

Durability

Don't compromise between performance, durability and consistency. Pravega persists and protects data before the write operation is acknowledged to the client.

Transaction Support

A developer uses a Pravega Transaction to ensure that a set of events are written to a stream atomically.

Architecture

Use Cases

Consistent, high performance storage, ideal for IoT

IoT Renewable Energy: Harnessing wind power at commercial scale requires a large number of wind turbines distributed over a large area. Each wind turbine generates thousands of data points per second (e.g. temperature, rotation speed, wind direction, energy output). Collecting all this data for historical and real-time analysis is necessary for prediction of potential failures as well as controlling power distribution networks to manage the variable nature of renewable energy. Unique benefits that Pravega offers:

Consistent high performance for ingestion of both small and large events
Scalable data ingestion from large number of sensors
Durable and low latency storage

Coordinating distributed applications

Distributed applications like micro-services: Pravega is a storage primitive, it is a messaging mechanism and it is a distributed computing coordination framework. Using the State Synchronizer API, micro-services can use Pravega as their database, sharing data without the overhead of a database. Other distributed computing problems such as discovery, leader election and many more can be built using Pravega. Unique benefits that Pravega offers

State synchronizer for sharing data between processes with strong consistency and optimistic concurrency.
Leader election and other distributed computing patterns implemented without a direct application dependency on other middleware such as Apache Zookeeper.
Consistent, reliable middleware for modern reactive, micro-services applications.

Same storage API for both real-time and historical data

Telecommunications: Companies have vast networks of complex infrastructure distributed across the world. Each component generates its own log representing its limited view of the global state. It's crucial all logs are aggregated and analyzed in real-time to detect issues before they disrupt service. Unique benefits that Pravega offers:

Same Streaming API for accessing and processing both real-time and historical data
Stream level auto-scaling to accommodate bursts of data
Connectors to stream processing engines such as Flink

Distributed Pub/Sub

Multi Player Gaming: Low latency and high speed message delivery is vital for online multi-player gaming platforms. Each player's movements, interactions and events need to be reflected across all connected devices as they happen in real-time. Game messaging data is also stored for generating player leader board statistics from historical data to troubleshooting. Unique benefits that Pravega offers:

Scalable read/write parallelism that can support millions of simultaneous players
Reliable and exactly-once delivery of game state and events across all connected devices
Real-time delivery of score updates, game stats, and in-game notifications and alerts

Exactly once stream processing with Apache Flink

Targeted Web Advertising: Targeting advertising to a customer’s interests or needs is key to increasing its effectiveness. First, unique users are identified in streams of data by correlating specific identifiers from multiple sources. Next, a user profile is generated by aggregating historical session data - extrapolating interests or demographic criteria against which the advertising can be targeted. Finally, user interactions on a website are correlated with their profiles in real-time to embed relevant ads from a catalogue of available options. Unique benefits that Pravega offers:

Exactly-once on sink/source with transactional writes, checkpointing and deduplication
One API to access real-time and historical data
Integration with Flink for elastically scalable data processing

Building modern, storage efficient data pipelines

Real-time and batch analytics: Historically, developers used a so-called Lambda architecture to build analytics platforms that process big data into accurate and real-time information. This approach required duplicate application development, one to provide accurate results over historical data using tools like HDFS and one to provide approximate but timely results using a different set of tools like Apache Storm. With Pravega, developers can build one application, satisfying both batch and real-time, eliminating complicated, hard to maintain dual infrastructures. Unique benefits that Pravega offers:

Data is stored once, in Pravega, not duplicated per middleware stack
Data is protected by Pravega, not replicated 3 (or more) times by middleware
Pravega's exactly once semantics allows developers to build pipelines of applications with both accurate and timely results

Open Source

FAQ

Join the Community

Presentations

Quick Start

Why Pravega

Exactly-Once Semantics

Auto-Scaling

Distributed Computing Primitive

Write Efficiency

Unlimited Retention

Storage Efficiency

Durability

Transaction Support

Architecture

Use Cases

Recent Blog

Change Data Capture with Pravega + Debezium

Presentations

DataWorks Summit: 2019

DataWorks Summit: 2018

Contact Us