loader

Open Source

open_source

Streaming is motivating us to rethink fundamental data processing and storage principles. As storage experts, Dell EMC is doing its part by designing a new storage primitive purpose-built for streaming data. We are open sourcing Pravega under the Apache 2.0 License to accelerate the adoption of streaming technology. Open source is right for Pravega because we believe that disruptive technologies should be owned and driven by a community of passionate open source developers.

Why Pravega

icon_wp01

Exactly-Once Semantics

Ensure that each event is delivered and processed exactly once, with exact ordering guarantees, despite failures in clients, servers or the network.

icon_wp02

Auto-Scaling

Unlike systems with static partitioning, Pravega can automatically scale individual data streams to accommodate changes in data ingestion rate.

icon_wp03

Distributed Computing Primitive

Pravega is great for distributed computing; it can be used as a data storage mechanism, for messaging between processes and for other distributed computing services such as leader election.

icon_wp04

Write Efficiency

Pravega shrinks write latency to milliseconds, and seamlessly scales to handle high throughput reads and writes from thousands of concurrent clients, making it ideal for IoT and other time sensitive applications.

icon_wp05

Unlimited Retention

Ingest, process and retain data in streams forever. Use same paradigm to access both real-time and historical events stored in Pravega.

icon_wp06

Storage Efficiency

Use Pravega to build pipelines of data processing, combining batch, real-time and other applications without duplicating data for every step of the pipeline.

icon_wp07

Durability

Don't compromise between performance, durability and consistency. Pravega persists and protects data before the write operation is acknowledged to the client.

icon_wp08

Transaction Support

A developer uses a Pravega Transaction to ensure that a set of events are written to a stream atomically.

Architecture

qloud
qloud

Use Cases

Consistent, high performance storage, ideal for IoT
IoT Renewable Energy: Harnessing wind power at commercial scale requires a large number of wind turbines distributed over a large area. Each wind turbine generates thousands of data points per second (e.g. temperature, rotation speed, wind direction, energy output). Collecting all this data for historical and real-time analysis is necessary for prediction of potential failures as well as controlling power distribution networks to manage the variable nature of renewable energy. Unique benefits that Pravega offers:
  • Consistent high performance for ingestion of both small and large events
  • Scalable data ingestion from large number of sensors
  • Durable and low latency storage
Coordinating distributed applications
Distributed applications like micro-services: Pravega is a storage primitive, it is a messaging mechanism and it is a distributed computing coordination framework. Using the State Synchronizer API, micro-services can use Pravega as their database, sharing data without the overhead of a database. Other distributed computing problems such as discovery, leader election and many more can be built using Pravega. Unique benefits that Pravega offers
  • State synchronizer for sharing data between processes with strong consistency and optimistic concurrency.
  • Leader election and other distributed computing patterns implemented without a direct application dependency on other middleware such as Apache Zookeeper.
  • Consistent, reliable middleware for modern reactive, micro-services applications.
Same storage API for both real-time and historical data
Telecommunications: Companies have vast networks of complex infrastructure distributed across the world. Each component generates its own log representing its limited view of the global state. It's crucial all logs are aggregated and analyzed in real-time to detect issues before they disrupt service. Unique benefits that Pravega offers:
  • Same Streaming API for accessing and processing both real-time and historical data
  • Stream level auto-scaling to accommodate bursts of data
  • Connectors to stream processing engines such as Flink
Distributed Pub/Sub
Multi Player Gaming: Low latency and high speed message delivery is vital for online multi-player gaming platforms. Each player's movements, interactions and events need to be reflected across all connected devices as they happen in real-time. Game messaging data is also stored for generating player leader board statistics from historical data to troubleshooting. Unique benefits that Pravega offers:
  • Scalable read/write parallelism that can support millions of simultaneous players
  • Reliable and exactly-once delivery of game state and events across all connected devices
  • Real-time delivery of score updates, game stats, and in-game notifications and alerts
Exactly once stream processing with Apache Flink
Targeted Web Advertising: Targeting advertising to a customer’s interests or needs is key to increasing its effectiveness. First, unique users are identified in streams of data by correlating specific identifiers from multiple sources. Next, a user profile is generated by aggregating historical session data - extrapolating interests or demographic criteria against which the advertising can be targeted. Finally, user interactions on a website are correlated with their profiles in real-time to embed relevant ads from a catalogue of available options. Unique benefits that Pravega offers:
  • Exactly-once on sink/source with transactional writes, checkpointing and deduplication
  • One API to access real-time and historical data
  • Integration with Flink for elastically scalable data processing
Building modern, storage efficient data pipelines
Real-time and batch analytics: Historically, developers used a so-called Lambda architecture to build analytics platforms that process big data into accurate and real-time information. This approach required duplicate application development, one to provide accurate results over historical data using tools like HDFS and one to provide approximate but timely results using a different set of tools like Apache Storm. With Pravega, developers can build one application, satisfying both batch and real-time, eliminating complicated, hard to maintain dual infrastructures. Unique benefits that Pravega offers:
  • Data is stored once, in Pravega, not duplicated per middleware stack
  • Data is protected by Pravega, not replicated 3 (or more) times by middleware
  • Pravega's exactly once semantics allows developers to build pipelines of applications with both accurate and timely results
Previous
Next

Recent Blog

Presentations

DataWorks Summit: 2019

DataWorks Summit: 2018

Contact Us

[contact-form-7 id="3088" title="Contact form 1_copy"]