Predictive Maintenance for IoT
Harnessing wind power at commercial scale requires a large number of wind turbines distributed over a large area. Each wind turbine generates thousands of data points per second (e.g. temperature, rotation speed, wind direction, energy output). Collecting all this data for historical and real-time analysis is necessary for prediction of potential failures as well as controlling power distribution networks to manage the variable nature of renewable energy. Pravega can be a key component of a predictive maintenance solution for a wide variety of equipment, including wind turbines, machines on a factory floor, trains, computer hardware, and even roller coasters.
Benefits that Pravega Offers
- Consistent high performance for ingestion of both small and large events
- Transactions to guarantee that events are never duplicated, lost, or out of order
- Watermark functionality to ensure accurate windowed aggregations
- Durable and low latency storage
- Same API for reading both real-time and historical data
- Connectors to stream processing engines such as Flink and Spark
- Automatic deletion of older events based on a retention policy
- Scalable data ingestion from large number of sensors
Example Solution Architecture
- IoT Device
- Pravega Sensor Collector or a similar component collects sensor data from IoT devices and forwards it to the Pravega stream Raw Sensors.
- Edge Cluster
- A Spark streaming job reads from the Raw Sensors stream, performs inference, and writes inference results to a new Pravega stream Clean Sensors with Inference.Note that Pravega provides connectors for Spark and Flink which can be used interchangeably throughout this solution.
- A Flink job aggregates events from the Clean Sensors with Inference stream and updates a database for serving dashboard requests.
- The database can be PostgreSQL, Elasticsearch, Pravega Search, or anything else supported by Flink.
- A web server provides a UI to view a dashboard. If using Elasticsearch or Pravega Search, this can be Kibana.
- A Flink job is used to continuously copy stream events from an edge cluster to a data center or cloud instance of Pravega.
- Pravega stream retention is configured to keep just a few days of data in the edge cluster to reduce the storage requirements at the edge. Older events will be automatically deleted.
- Data Center / Cloud
- A Flink job in the data center or cloud can aggregate events from multiple edge clusters and write to the Aggregated stream.
- A Spark batch job can train or retrain an AI model based on the long-term historical events in the Aggregated stream. The new model can then be deployed as a new Spark AI Inference job in the edge cluster.
- By default, all stream data will be stored on Long Term Storage (HDFS, NFS, Dell ECS S3) forever. Retention policies can be applied as needed to delete older events.