Data Clinic Improvement Case Study
Introducing data improvement cases provided by Data Clinic. Solve the data quality issues you are struggling with with three data improvement solutions consisting of bulking, dieting, and replica.
Pebblous DAL, <Pebble Design System, Grid #1>, Code Painting, Digital Media, Joo-Haeng Lee
Three Improvement Solutions for Data Clinic

Data Bulk-Up
Find where data is lacking and generate precision targeting synthetic data suitable for the dataset.

Data Diet
Eliminate duplicate/similar data in the dataset and make it lighter while ensuring the performance of the AI model

Data Replica
To protect the original and promote safe distribution of data, virtual data with similar statistical distribution characteristics are created.
Learn more about how to improve your data
Bulk up precision targeting data
Pebblous uses its proprietary precision targeting technique to find areas where data is lacking and add the optimal volume of synthetic data to improve AI performance.
Data distribution in embedding space
Capturing boundary points between different clusters
Targeting
Boundary point targeting technique
Density targeting technique
Bulk up
Bulk up at the center of the boundary point
Bulk up around low density points
Case study on improving classification performance through data bulking
This is an example of diagnosing an actual dataset and adding about 5% of synthetic data as a precision targeting method, which resulted in a performance improvement of about 2% without any changes to the classification model or learning process.
Original Birds-450 dataset

Precision targeting synthetic data

Improving AI model performance through precision targeting synthetic data

Solve various quality issues discovered through data diagnosis with improvement services!