Predictive Maintenance for IoT

Harnessing wind power at commercial scale requires a large number of wind turbines distributed over a large area. Each wind turbine generates thousands of data points per second (e.g. temperature, rotation speed, wind direction, energy output). Collecting all this data for historical and real-time analysis is necessary for prediction of potential failures as well as controlling power distribution networks to manage the variable nature of renewable energy. Pravega can be a key component of a predictive maintenance solution for a wide variety of equipment, including wind turbines, machines on a factory floor, trains, computer hardware, and even roller coasters.

Benefits that Pravega Offers

Consistent high performance for ingestion of both small and large events
Transactions to guarantee that events are never duplicated, lost, or out of order
Watermark functionality to ensure accurate windowed aggregations
Durable and low latency storage
Same API for reading both real-time and historical data
Connectors to stream processing engines such as Flink and Spark
Automatic deletion of older events based on a retention policy
Scalable data ingestion from large number of sensors

Example Solution Architecture

IoT Device

Edge Cluster

A Spark streaming job reads from the Raw Sensors stream, performs inference, and writes inference results to a new Pravega stream Clean Sensors with Inference.Note that Pravega provides connectors for Spark and Flink which can be used interchangeably throughout this solution.
A Flink job aggregates events from the Clean Sensors with Inference stream and updates a database for serving dashboard requests.
The database can be PostgreSQL, Elasticsearch, Pravega Search, or anything else supported by Flink.
A web server provides a UI to view a dashboard. If using Elasticsearch or Pravega Search, this can be Kibana.
A Flink job is used to continuously copy stream events from an edge cluster to a data center or cloud instance of Pravega.
Pravega stream retention is configured to keep just a few days of data in the edge cluster to reduce the storage requirements at the edge. Older events will be automatically deleted.

Data Center / Cloud

A Flink job in the data center or cloud can aggregate events from multiple edge clusters and write to the Aggregated stream.
A Spark batch job can train or retrain an AI model based on the long-term historical events in the Aggregated stream. The new model can then be deployed as a new Spark AI Inference job in the edge cluster.
By default, all stream data will be stored on Long Term Storage (HDFS, NFS, Dell ECS S3) forever. Retention policies can be applied as needed to delete older events.

To Learn More about this Solution

Data Flow from Sensors to the Edge and the Cloud using Pravega