Edge-based Data Profiling and Repair as a Service for IoT
Tverdal, Simeon; Goknil, Arda; Nguyen, Phu Hong; Husom, Erik Johannes; Sen, Sagar; Ruh, Jan; Flamigni, Francesca
Chapter
Published version
Date
2024Metadata
Show full item recordCollections
- Publikasjoner fra CRIStin - SINTEF AS [5850]
- SINTEF Digital [2523]
Original version
IoT '23: Proceedings of the 13th International Conference on the Internet of Things. 2024, 17-24. 10.1145/3627050.3627065Abstract
With the proliferation of IoT devices and the consequent exponential growth in data generation, ensuring data quality has become a critical challenge in IoT applications. Erroneous data can significantly impact the reliability and effectiveness of decision-making processes and downstream analytics. Leveraging the computational abilities of edge devices enables data profiling and repair tasks at the edge, allowing for immediate remediation of erroneous data within the data stream and improved scalability through distributed repair across multiple edge devices. Cloud-based data profiling and repair methods have been extensively researched, but limited computational resources constrain their applicability at edge/fog devices. To overcome this limitation and enhance generalizability, Machine Learning (ML) offers a promising solution, allowing sensor substitution, missing value prediction, and corrupt data replacement. ML-based data repair techniques can be flexibly deployed at the edge using containerized repair services for real-time data repair. In this paper, we propose and assess EDPRaaS (Edge-based Data Profiling and Repair as a Service), a novel approach designed for efficient data quality profiling and repair in IoT environments. EDPRaaS incorporates an ML-based data repair component, enabling real-time data repair at the edge. It leverages pandas profiling and Great Expectations tools for data profiling, providing comprehensive insights into the dataset and detecting data quality issues.