Presentation + Paper
7 June 2024 Mind the (domain) gap: metrics for the differences in synthetic and real data distributions
Author Affiliations +
Abstract
Synthetic data are frequently used to supplement a small set of real images and create a dataset with diverse features, but this may not improve the equivariance of a computer vision model. Our work answers the following questions: First, what metrics are useful for measuring a domain gap between real and synthetic data distributions? Second, is there an effective method for bridging an observed domain gap? We explore these questions by presenting a pathological case where the inclusion of synthetic data did not improve model performance, then presenting measurements of the difference between the real and synthetic distributions in the image space, latent space, and model prediction space. We find that augmenting the dataset with pixel-level augmentation effectively reduced the observed domain gap, and improves the model F1 score to 0.95 compared to 0.43 for un-augmented data. We also observe that an increase in the average cross entropy of the latent space feature vectors is positively correlated with increased model equivariance and the closing of the domain gap. The results are explained using a framework of model regularization effects.
Conference Presentation
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Ashley S. Dale, William R. Reindl, Edwin Sanchez, Albert William, and Lauren Christopher "Mind the (domain) gap: metrics for the differences in synthetic and real data distributions", Proc. SPIE 13035, Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130350Z (7 June 2024); https://doi.org/10.1117/12.3015657
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Data modeling

Education and training

Performance modeling

Principal component analysis

Binary data

Visual process modeling

Interpolation

Back to Top