We are pleased to announce Pravega 0.9.0, our first release since Pravega became part of CNCF (Cloud Native Computing Foundation). This release continues to expand the Pravega feature-set and improves the performance of mission-critical use cases, and, of course, brings improved stability overall.
In 2020, Pravega community delivered several significant releases. We introduced Streaming Cache in Pravega 0.7.0. In Pravega 0.8.0, we added Key-Value Tables support and invested effort in tuning and evaluating the performance of primary stream use cases. With Pravega 0.9.0, we continue to evolve rapidly and bring improvements in some key areas: tiered storage and data retention.
Tiered storage is a core concept and critical for Pravega to store historical streaming data. Long-Term Storage (LTS) is the layer that aggregates streaming data for efficient storage IO, organizes it on storage to ensure consistency, and adapts all necessary logic to support different scalable storage options. We took a stab at simplifying that layer to enable more storage options in this release, and we call this new version Simplified Long-Term Storage.
Storing historical data is a key concept in Pravega, it provides the ability to control what data to retain and for how long. Pravega exposes retention policies by time and space. In this release, we add another option that enables an application to indicate when they no longer need the data explicitly. We call this feature Consumption-Based Retention (CBR).
Other relevant changes include performance improvements, additions to support new Pravega Client bindings, and many other exciting changes.
You can always find more details about the new features and changes in the release notes on our GitHub project page. Pravega 0.9.0 is available directly on GitHub, as well as in Maven Central and DockerHub.
Continue reading to get an overview of all the key features Pravega 0.9.0 has to offer and for tips on where to get more details.
Simplified Long Term Storage (S-LTS)
The built-in capability to seamlessly move data to long-term storage is one of the unique features of Pravega, and, with the 0.9.0 release, we are making a significant effort to address various known limitations of the existing Storage layer. Such as a requirement to support append operation at the storage level and not having an easier way to implement a storage provider for a custom storage system. S-LTS allows Pravega to utilize not only Filesystem-based storage systems but also brings Object-based storage systems, opening endless possibilities of leveraging Pravega in cloud-like environments.
With Pravega 0.9.0, S-LTS feature is available as experimental, the team continues to put every effort to have it promoted to a stable one in further releases.
There will be additional information shared on S-LTS in the coming weeks, but those who cannot wait are welcome to dive into technical details in PDP-34.
Consumption Based Retention (CBR)
One of the principles we follow at Pravega is to let the user focus on their data processing rather than on getting their applications and streaming system configured for their needs. That is why you do not see many configuration knobs in Pravega Client, as the client must make the right decision for the user. With Pravega 0.9.0, we are adding support for a retention policy based on the actual consumption. The new policy eliminates challenges one may have with getting the right retention settings – truncate at certain intervals or deal with acknowledgment of individual events and complicate the application logic. All it takes now is to let Pravega client and server work together to track active readers and trigger truncate once they are beyond the same position in the stream. Of course, there are some knobs left for system administrators on the server-side to control CBR behavior.
Owing to the complications in implementing this feature, we are making it public as experimental for now. We cannot wait to get feedback on it from our users and community to make it even better.
The best place to start with is PDP-47, as it provides all necessary details, and then give the new API a shot.
If you followed us in the last year, you might have observed the efforts we have put in improving Pravega performance. We continue to make Pravega the fastest and most scalable streaming storage out there, and the team is dedicated to the battle for every single millisecond. In this release, we focused on better utilizing internal memory buffers to get benefits of moving to JDK11 and fine-tuning our unique automatic batching algorithms. All of that allowed us to significantly reduce latency and increase throughput in scale-out scenarios with hundreds of producers writing to thousands of segments. A true scenario to place Pravega in the center of streaming processing.
In the new blog, we outline how Pravega outperforms the competition from a scale-out perspective.
Earlier versions of Pravega provided its client in Java language only. The rise of IoT (Internet of Things) and AI/ML applications brings a need for Client bindings in other languages. With Pravega 0.9.0, we now support Rust and Python bindings. Yes! Now you are better equipped to deal with your IoT or AI/ML use cases with Pravega.
And these bindings are just the beginning of the effort to expand Pravega to a new set of use cases, and we will soon announce bindings in other languages.
Last but not least
Of course, all the above are not the only additions.
In the Security area, the new authorization string format is introduced. It improves the identification of resources between Pravega components and lays out the ground for new components coming soon. We also did not forget to update most of our 3rd party libraries to ensure we are up to date with known security vulnerabilities.
Moving the entire code base to JDK11 was another challenge the team addressed. Now we have access to all the goods of the Java 9+ platform. Those concerned about JRE8 being still widely used should not worry as we kept Pravega Java Client compatible with JRE8 by default.
Apache BookKeeper was upgraded to version 4.11.1 that brings numerous stability fixes to our Tier-1 sub-system.
The internal implementation of Batch API has undergone stability and performance improvements and is the basis for our recent addition to the connectors family – Spark Connector. The Spark Connector connects Pravega streams with Apache Spark for high-performance analytics.
The well-known Flink Connector received updates in this cycle as well. This release adds support to recent additions of Flink itself and introduced numerous fixes and other improvements across the board.
The complete list of changes is available in the release notes on Pravega GitHub.
Want to provide feedback?
We would love to hear back from you, whether you want to know more about Pravega or need a hand with anything related to the Pravega ecosystem.
Slack Invite: https://pravega-slack-invite.herokuapp.com/