This service provides a statistical analysis of datasets and dataset splits used in machine learning pipelines, with the objective of verifying their distributional consistency. It supports the validation of training, validation, and test splits by detecting statistically significant differences that could bias model training or invalidate performance evaluation. The analysis conducted in a controlled and documented execution environment to ensure traceability and repeatability of results.
The service is particularly relevant for AI systems in the health domain, where dataset representativeness directly affects evaluation reliability, which is critical for regulatory compliance and clinical trustworthiness. The analysis is performed using robust statistical techniques and is delivered as a structured, interpretable report. The service execution follows ISO 9001 certified processes.
Service Inputs
Client datasets or predefined dataset splits (e.g. training / validation / test).
Service Outputs
Report on dataset analysis (statistical differences measured by our method on client dataset, specifying if differences between datasets/splits were found).
Dependencies & Restrictions
GDPR for the handling of personal data. A DPA will have to be signed between parties to allow us to process the data.
Certification Support
AI Act
MDR
Service Standards
AI Act Art. 10
MDR
Comments
This service supports data quality assessment and risk mitigation in machine learning workflows, contributing to the development and evaluation of trustworthy AI systems. The resulting analysis report can be used as supporting evidence within a Medical Device Regulation (MDR) technical documentation. This service addresses Article 10 (data quality obligations) under the EU AI Act by detecting distributional inconsistencies across dataset splits. The service also aligns with key principles of the EU AI Act for high-risk AI systems, notably those related to performance evaluation, transparency, and technical documentation.