Deploying Pravega on Kubernetes 101¶
We show you how to deploy your "first Pravega cluster in Kubernetes". We provide a step-by-step guide to deploy Pravega in both Google Kubernetes Engine (GKE) and Amazon Elastic Kubernetes Service (EKS). Our goal is to keep things as simple as possible, and, at the same time, provide you with valuable insights on the services that form a Pravega cluster and the operators we developed to deploy them.
Creating and Setting Up the Kubernetes Cluster¶
First, we need to create the Kubernetes cluster to deploy Pravega. We assume as a pre-requisite that you have an account with at least one of the cloud providers mentioned above. If you already have an account for Google Cloud and/or AWS, then it is time to create a Kubernetes cluster for Pravega.
Creating a Kubernetes cluster in GKE is straightforward. The defaults in general are enough for running a demo Pravega cluster, but we suggest just a couple of setting changes to deploy Pravega:
- Go to
Kubernetes Enginedrop-down menu and select
Clusters > Create Clusteroption.
- Pick a name for your Kubernetes cluster (i.e.,
- As an important point, in
Master versionsection you should select a Kubernetes version 1.15. The reason is that we are going to exercise the latest Pravega and Bookkeeper Operators, which requires Kubernetes version 1.15+.
- Also, as the Pravega cluster consists of several services, we need to select a slightly larger node
flavor compared to the default one. Thus, go to
default-pool > Nodes > Machine typeand select
n1-standard-4nodes (4vCPUs, 15GB of RAM) and select 4 nodes instead of 3 (default). Note that this deployment is still accessible with the trial account.
- Press the
Createbutton, and that’s it.
Note that we use the Cloud Shell provided by GKE to deploy Pravega from the browser itself without installing locally any CLI (but feel free to use the Google Cloud CLI instead).
Pravega and Bookkeeper Operators also require elevated privileges in order to watch for custom resources. For this reason, in GKE you need as a pre-requisite to grant those permissions first by executing:
kubectl create clusterrolebinding cluster-admin-binding --clusterrole=cluster-admin --user=$(gcloud config get-value core/account)
In the case of AWS, we are going to use the EKS CLI, which automates and simplifies different aspects of the cluster creation and configuration (e.g., VPC, subnets, etc.). You will need to install and configure the EKS CLI before proceeding with the cluster creation.
Once the EKS CLI is installed, we just require one command to create an EKS cluster:
eksctl create cluster \ --name pravega-eks \ --region us-west-2 \ --nodegroup-name standard-workers \ --node-type t3.xlarge \ --nodes 3 \ --nodes-min 1 \ --nodes-max 4 \ --ssh-access \ --ssh-public-key ~/.ssh/pravega_aws.pub \ --managed
Similar to the GKE case, the previous command uses a larger node type compared to the default one
--node-type t3.xlarge). Note that the
--ssh-public-key parameter expects a public key that has
been generated when installing the AWS CLI to securely connect with your cluster (for more info,
please read this document).
Also, take into account that the region for the EKS cluster should match the configured region in your AWS CLI.
Now, we are ready to prepare our Kubernetes cluster for the installation of Pravega.
Once you install the Helm client, you just need to get the public charts we provide to deploy a Pravega cluster:
helm repo add pravega https://charts.pravega.io helm repo update
Webhook conversion and Cert-Manager¶
The most recent versions of Pravega Operator resort to the new Webhook Conversion feature, which is beta since 1.15. For this reason, Cert-Manager or some other certificate management solution must be deployed for managing webhook service certificates. To install Cert-Manager, just execute this command:
kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v0.14.2/cert-manager.yaml
Next, we show you step by step how to deploy Pravega, which involves the deployment of Apache Zookeeper, Bookkeeper (journal), and Pravega (as well as their respective Operators). Also, given that Pravega moves "cold" data to what we call long-term storage (a.k.a Tier 2), we need to instantiate a storage backend for such purpose.
Apache Zookeeper is a distributed system that provides reliable coordination services, such as consensus and group management. Pravega uses Zookeeper to store specific pieces of metadata as well as to offer a consistent view of data structures used by multiple service instances.
As part of the Pravega project, we have developed a Zookeeper Operator to manage the deployment of Zookeeper clusters in Kubernetes. Thus, deploying the Zookeeper Operator is the first step to deploy Zookeeper:
helm install zookeeper-operator pravega/zookeeper-operator --version=0.2.8
With the Zookeeper Operator up and running, the next step is to deploy Zookeeper. We can do so with the helm chart we published for Zookeeper:
helm install zookeeper pravega/zookeeper --version=0.2.8
This chart instantiates a Zookeeper cluster made of 3 instances and their respective Persistent Volume Claims (PVC) of 20GB of storage each, which is enough for a demo Pravega cluster.
Once the previous command has been executed, you can see both Zookeeper Operator and Zookeeper running in the cluster:
$ kubectl get pods NAME READY STATUS RESTARTS AGE zookeeper-0 1/1 Running 0 3m46s zookeeper-1 1/1 Running 0 3m6s zookeeper-2 1/1 Running 0 2m25s zookeeper-operator-6b9759bbcb-9j25s 1/1 Running 0 4m
Apache Bookkeeper is a distributed and reliable storage system that provides a distributed log abstraction. Bookkeeper excels on achieving low latency, append-only writes. This is the reason why Pravega uses Bookkeeper for journaling: Pravega writes data to Bookkeeper, which provides low latency, persistent, and replicated storage for stream appends. Pravega uses the data in BookKeeper to recover from failures, and that data is truncated once it is flushed to tiered long-term storage.
As in the case of Zookeeper, we have also developed a Bookkeeper Operator to manage the lifecycle of Bookkeeper clusters deployed in Kubernetes. Thus, the next step is to deploy the Bookkeeper Operator:
helm install bookkeeper-operator pravega/bookkeeper-operator --version=0.1.2
Once running, we can proceed to deploy Bookkeeper. In this case, we will use the Helm chart publicly available to quickly spin up a Bookkeeper cluster:
helm install bookkeeper pravega/bookkeeper --version=0.7.1
As a result, you can see below both Zookeeper and Bookkeeper up and running:
$ kubectl get pods NAME READY STATUS RESTARTS AGE bookkeeper-operator-85568f8949-d652z 1/1 Running 0 4m10s bookkeeper-pravega-bk-bookie-0 1/1 Running 0 2m10s bookkeeper-pravega-bk-bookie-1 1/1 Running 0 2m10s bookkeeper-pravega-bk-bookie-2 1/1 Running 0 2m10s zookeeper-0 1/1 Running 0 8m59s zookeeper-1 1/1 Running 0 8m19s zookeeper-2 1/1 Running 0 7m38s zookeeper-operator-6b9759bbcb-9j25s 1/1 Running 0 9m13s
We mentioned before that Pravega automatically moves data to Long-Term Storage (or Tier 2). This feature is very interesting, because it positions Pravega in a "sweet spot" in the latency vs throughput trade-off: Pravega achieves low latency writes by using Bookkeeper for appends. At the same time, it also provides high throughput reads when accessing historical data.
As our goal is to keep things as simple as possible, we deploy a simple storage option: the NFS Server provisioner. With such a provisioner, we have a pod that acts as an NFS Server for Pravega. To deploy it, you need to execute the next command:
helm repo add stable https://kubernetes-charts.storage.googleapis.com/ helm install stable/nfs-server-provisioner --generate-name
Once the NFS Server provisioner is up and running, Pravega will require a PVC for long-term storage pointing to the
NFS Server provisioner that we have just deployed. To create the PVC, you can just copy the following manifest
kind: PersistentVolumeClaim apiVersion: v1 metadata: name: pravega-tier2 spec: storageClassName: "nfs" accessModes: - ReadWriteMany resources: requests: storage: 5Gi
And create the PVC for long-term storage as follows:
kubectl apply -f tier2_pvc.yaml
As you may notice, the long-term storage option suggested in this post is just for demo purposes, just to keep things simple. But, if you really want to have a real Pravega cluster running in the cloud, then we suggest you to use actual storage services like FileStore in GKE and EFS in AWS. There are instructions on how to deploy production long-term storage options in the documentation of Pravega Operator.
We are almost there! The last step is to deploy Pravega Operator and Pravega, pretty much as what we have already done for Zookeeper and Bookkeeper. As usual, we first need to deploy the Pravega Operator (and its required certificate) as follows:
git clone https://github.com/pravega/pravega-operator kubectl create -f pravega-operator/deploy/certificate.yaml helm install pravega-operator pravega/pravega-operator --version=0.5.1
Once deployed, we can deploy Pravega with the default Helm chart publicly available as follows:
helm install pravega pravega/pravega --version=0.8.0
That's it! Once this command gets executed, you will have your first Pravega cluster up and running:
$ kubectl get pods NAME READY STATUS RESTARTS AGE bookkeeper-operator-85568f8949-d652z 1/1 Running 0 11m bookkeeper-pravega-bk-bookie-0 1/1 Running 0 9m6s bookkeeper-pravega-bk-bookie-1 1/1 Running 0 9m6s bookkeeper-pravega-bk-bookie-2 1/1 Running 0 9m6s nfs-server-provisioner-1592297085-0 1/1 Running 0 5m26s pravega-operator-6c6d9db459-mpjr4 1/1 Running 0 4m19s pravega-pravega-controller-5b447c85b-t8jsx 1/1 Running 0 2m56s pravega-pravega-segment-store-0 1/1 Running 0 2m56s zookeeper-0 1/1 Running 0 15m zookeeper-1 1/1 Running 0 15m zookeeper-2 1/1 Running 0 14m zookeeper-operator-6b9759bbcb-9j25s 1/1 Running 0 16m
Executing a Sample Application¶
Finally, we would like to help you to exercise the Pravega cluster you just deployed. Let’s deploy a pod in
our Kubernetes cluster to run samples and applications, like the one we propose in the manifest below
kind: Pod apiVersion: v1 metadata: name: test-pod spec: containers: - name: test-pod image: ubuntu:18.04 args: [bash, -c, 'for ((i = 0; ; i++)); do echo "$i: $(date)"; sleep 100; done']
You can directly use this manifest and create your Ubuntu 18.04 pod as follows:
kubectl create -f test-pod.yaml
Once the pod is up and running, we suggest you to login into the pod and build the Pravega samples to interact with the Pravega cluster by executing the following commands:
kubectl exec -it test-pod -- /bin/bash apt-get update apt-get -y install git-core openjdk-8-jdk git clone -b r0.8 https://github.com/pravega/pravega-samples cd pravega-samples ./gradlew installDist
With this, we can go to the location where the Pravega samples executable files have been generated and execute one of them, making sure that we point to the Pravega Controller service:
cd pravega-client-examples/build/install/pravega-client-examples/ bin/consoleWriter -u tcp://pravega-pravega-controller:9090
That’s it, you have executed your first sample against the Pravega cluster! With the
you will be able to write to Pravega regular events or transactions. We also encourage you to execute
on another terminal the
consoleReader, so you will see how events are both written and read at the same
time (for more info, see the Pravega samples documentation).
There are many other interesting samples for Pravega in the repository, so please be curious and try them out.
What is next?¶
This guide (also available in this blog post) provides a high-level overview on how to deploy Pravega on Kubernetes. But there is much more to learn! We suggest you to continue exploring Pravega with the following documents: