A Step-by-Step Guide to Data Quality Diagnostics

Is your data quality good enough? Find out in just a few clicks. DataClinic: The easiest way to inspect your AI training data.
Pebbly's avatar
May 15, 2026
A Step-by-Step Guide to Data Quality Diagnostics

When AI performance falls short of expectations, the bottleneck is rarely the model—it's the data. If your dataset is imbalanced, redundant, or mislabeled, even the most advanced model will inevitably produce unstable results.

Data Clinic is a data quality diagnostic service that proactively detects these hidden vulnerabilities and provides a quantitative, objective evaluation.

What Does Data Quality Diagnostics Provide?

A DataClinic diagnostic is more than just a simple statistical report.

It meticulously evaluates whether your training data is truly AI-Ready across the following dimensions:

  • Is the data volume sufficient?

  • Is the class distribution balanced?

  • Are there redundant images or corrupted files?

  • Is the label structure optimized for training?

  • Is the train/test split configured correctly?

Our automated diagnostic engine analyzes all of these factors and delivers actionable, human-readable results.

Ready to check the health of your data? Here is a step-by-step guide on how to diagnose your data using DataClinic!

Currently, diagnostics are available exclusively for image data. We plan to expand support to multimodal datasets, including performance charts, video, and sensor data, in the future.

Requesting a Data Quality Diagnostic

1️⃣ Click [Request Diagnostic] When you land on the DataClinic diagnostic page, the most prominent button you will see is [Request Diagnostic].

Data Quality Diagnosis Page
Data Quality Diagnosis Page

“Is it safe to just click it?” Yes! At this stage, no credits are deducted, and you are simply confirming configurations. Feel free to click it without any pressure. (Button: Request Data Diagnostic)

2️⃣ Review the Process and Click [Continue]

Data quality diagnosis process
Data quality diagnosis process

The next screen provides an at-a-glance overview of the Data Clinic diagnostic workflow:

  • Level I · II · III Diagnostics

  • Comprehensive Evaluation & Improvement Suggestions

Take a quick look to understand the process, and then click the [Continue] button.

3️⃣ Check Your Available Credits and Click [Proceed]

Data Quality Diagnosis Credit
Data Quality Diagnosis Credit

Here is a feature many users appreciate!

A pop-up will clearly display:

✔ Your currently available diagnostic credits

✔ How many credits will be consumed for this specific diagnostic

  • "Enough credits?" 👉 Click [Proceed]

  • "Not enough?" 👉 Recharge your credits and then proceed

Click [Proceed] to move to the next step.

4️⃣ Name Your Dataset Now it's time for your data.

Write the diagnostic dataset name
Write the diagnostic dataset name

In this step, simply assign a name to the dataset you wish to diagnose.

Quick naming tips:

  • Alphanumeric combinations recommended: e.g., AnimalFaceDataset_1

  • Underscores (_) and hyphens (-) are allowed.

  • Other special characters are not permitted.

  • Names can be edited later in [My Page].

📌 Since your final diagnostic report will be saved under this name, we recommend choosing something easily identifiable.

5️⃣ Organize Your Dataset Folder Structure

Upload data for quality diagnosis
Upload data for quality diagnosis

For this step, you just need to match our baseline format. Please refer to the folder structure guidelines below:

Dataset form for diagnosis application
Dataset form for diagnosis application

✔ Supported image extensions: .jpg, .png, .jpeg

✔ Both 'train' and 'test' folders are required.

✔ Class (label) names must be identical across both folders.

To ensure DataClinic can accurately analyze your data, it is crucial to strictly adhere to this basic structure!

  • 1. 'train' folder (Mandatory diagnostic data) This folder contains your training images. Inside the 'train' folder, create subfolders for each class (label), and place the corresponding image files (.jpg, .png, .jpeg) inside them.

  • 2. 'test' folder (Reference for quality improvement) While not directly used for the primary diagnostic, this folder is analyzed to provide holistic data quality improvement strategies. Create class-specific subfolders and organize the images exactly as you did in the 'train' folder.

6️⃣ Compress and Upload Once your folders are organized, you’re at the final step!

Quality diagnosis data upload complete
Quality diagnosis data upload complete

  • File Compression: Compress both the 'train' and 'test' folders into a single .zip file.

  • Upload: Drag and drop the .zip file into the upload window, or click the [Upload File] button! We support massive uploads of up to 1TB, so there's no need to worry about file size constraints.

  • Continue: Once everything is ready, click the [Continue] button.

💡 Pro Tip! Confused about the data format? Click 'Download Sample Data' on the screen to review the template beforehand. It will make the process much clearer.

7️⃣ Confirm Estimated Credit Usage

Information on Available Diagnostic Credits
Information on Available Diagnostic Credits

This step outlines exactly how many images will be diagnosed and how many credits will be consumed. If everything looks good 👉 Click [Proceed].

8️⃣ Final Review and Click [Continue]

Final application for data quality diagnosis
Final application for data quality diagnosis

This is the final step. From here, DataClinic takes over and runs the diagnostics automatically!

If you've made it this far, your diagnostic request is complete! 🎉

Data quality diagnosis application completed
Data quality diagnosis application completed

To recap, requesting a Data Clinic diagnostic essentially boils down to 3 main phases. Much simpler than it sounds, right?

1️⃣ Name your dataset

2️⃣ Organize your folder structure

3️⃣ Compress and upload

If your AI performance is unstable, inspect your data first. Models don't lie. They only perform as well as the data you feed them.

Diagnose the health of your data right now with Data Clinic. It is the very first step you must take before any AI training begins.

Share article