Pravega Byte Stream Client API 101

By Sandeep Shridhar on Posted on February 2, 2024 in Uncategorized

Introduction

Pravega is an open-source distributed storage system implementing streams as first-class primitive for storing/serving continuous and unbounded data [1]. A Pravega stream is a durable, elastic, append-only, and unbounded sequence of bytes providing a strong consistency model guaranteeing data durability, message ordering, and exactly-once support. Since Pravega stores sequences of bytes and not events or messages, it provides unique flexibility to expose non-event-based APIs.

A previous blog [2] introduces EventStream APIs of Pravega, which enable applications to append events to a stream. This post aims to cover the current ByteStreamClient APIs that a developer can use to write applications against Pravega. The ByteStreamClient API is useful for use-cases where the data is large, there is no clear message boundary, or the data is indivisible. The ByteStreamClient API usage is explained with a simple example of appending large-sized images into a Pravega Stream and then displaying the images by reading the stream.

Byte Stream Client APIs

Recap of EventStream APIs:

Using the EventStream APIs of Pravega, applications can write individual events to a Pravega Stream. These events are converted to/from byteBuffers using the serializers provided by the application. Event serialization ensures that the application can then read back these events individually. For more details and examples, please refer the Pravega Client API 101 blog [3].

The ByteStreamWriter allows writing raw bytes directly to a stream. This writer does not frame, attach headers or modify the bytes written in any way. Unlike EventStream APIs, this data cannot be split apart when read by the ByteStreamReader. In order to split the data, the application is expected to implement the framing logic. Also, since raw bytes are directly written into a stream there is no notion of a routing key and hence the number of segments in this stream is always one.

A ByteStreamWriter can be created using the ByteStreamClientFactory via createByteStreamWriter(streamName). The ByteStreamReader, which allows the application to read raw bytes directly from a Stream, can be created using ByteStreamClientFactory via
createByteStreamReader(streamName).

Storing and processing Images using Byte Stream Client API of Pravega

With the above background on the ByteStreamClient APIs, let’s start implementing a simple application which stores images on a Pravega Stream. The application also reads/processes the images on the stream and displays them.

Before jumping into the actual details of the application, we should list some of the concerns of using EventStream APIs for ingesting images and videos into Streams.

Is the image size too large to be stored as a single event on the stream? The current maximum event size supported by Pravega is 8MB [4].

Videos can be thought of as another fast event stream where every frame is an individual event. Now while trying to ingest videos using EventStream APIs do we generate frames/images that are too fast to be ingested by the Streams.

Using ByteStreamClient APIs is one way to address these concerns. The below diagram shows the basic functionality of our sample application.

Before running this application, let’s create a Pravega Stream using the StreamManager APIs.

// Create scope and stream using the StreamManager APIs
@Cleanup
StreamManager streamManager = StreamManager.create(controllerURI);
streamManager.createScope(scopeName);
streamManager.createStream(scopeName, streamName, StreamConfiguration.builder().build());

Once the stream has been created, let’s go ahead and create a ByteStreamWriter as shown in the snippet below.

// Create a ByteStreamClientFactory.
@Cleanup
ByteStreamClientFactory bf = ByteStreamClientFactory.withScope(scopeName, ClientConfig.builder().controllerURI(controllerURI).build());
// Write images to the Pravega Stream.
@Cleanup
ByteStreamWriter byteWriter = bf.createByteStreamWriter(streamName);

The below snippet writes an image into the Pravega stream using this ByteStreamWriter.

// write an image into Pravega
BufferedImage image = javax.imageio.ImageIO.read(new File(getFilePath("anatomy_of_log.jpg")));
javax.imageio.ImageIO.write(image, "jpg", byteWriter);

The Pravega ByteStreamWriter implements the java.io.OutputStream interface. In this example, the write API of javax.imageio.ImageIO writes the image with jpeg format to this output stream. This ensures the image data is persisted into Pravega irrespective of the size of the image.

In the next step, let’s now read these images using the ByteStreamReader. The below example snippet shows the sample code for creating a ByteStreamReader and using it to read the image.

// Read images from the Pravega Stream
@Cleanup
ByteStreamReader byteStreamReader = bf.createByteStreamReader(streamName);
@Cleanup
ImageInputStream in = javax.imageio.ImageIO.createImageInputStream(byteStreamReader);
//Get a list of all registered ImageReaders that claim to be able 
//to decode the image (JPG, PNG...)
Iterator<ImageReader> imageReaders = javax.imageio.ImageIO.getImageReaders(in);

In this case, the javax.imageio.ImageReader has the framing logic to differentiate images in the byte stream and can be used to read all the images individually.

The complete code of this example is available @ GitHub:

https://github.com/pravega/blog-samples/tree/master/pravega-client-bytestream-api-101

Conclusion

The ability to ingest and read raw bytes from a Pravega Stream using the ByteStream APIs was explained in the sample application above. As a next step, you can modify the sample application to write and read this large image file into a Pravega stream. Of course, this is just the beginning.

Happy Hacking!

Acknowledgments

Special thanks to Flavio Junqueira, Derek Moore, Tom Kaitchuck, Claudio Fahey, and Ashish Batwara for their feedback, suggestions, and valuable insights to make this post better.

References:

[1] https://cncf.pravega.io/docs/latest/pravega-concepts/
[2] https://blog.pravega.io/2018/02/12/streams-in-and-out-of-pravega/
[3] https://blog.pravega.io/2020/09/22/pravega-client-api-101/
[4] https://github.com/pravega/pravega/blob/v0.8.0/client/src/main/java/io/pravega/client/stream/Serializer.java#L27