Pebblouss is
'Green Data' and 'Data Greenhouse'Through
Contributing to sustainable artificial intelligence.
The quality of the data and
Market problems caused by it
Ambiguous data pricing system, poor model performance, waste of GPU resources, and inadequate response to increasing AI regulations
Problem 01

Continuous and diverse
Data quality issues
The syntax, meaning, and distribution of data
Complex quality issues
Chronic lack of data and incorrect
Data collection direction
Problem 02

AI performance degradation and
Waste of GPU resources
Due to low data quality
AI performance degradation
GPU/energy waste due to model-centric iterative learning
Problem 03

Strengthening AI Regulations and Growing need for governance
The growing need for AI industry regulation and data governance
Increasing demand for AI-Ready DataGartner Link
Pebblous solution

Solution 01
Satisfying AI suitability
Green Data Green Data* Provided
Providing high-efficiency, eco-friendly data that meets AI suitability criteria through data quality assessment and optimization

Solution 02
Data Greenhouse
Data Greenhouse* SaaSSolution
A data management solution for next-generation governance that continuously evaluates and improves the AI suitability of data.
Green data generation through semantic-based data operations
Effective data management system based on meaning
AI Industry Regulatory Response
*Green data Green DataWhat is it?
High-efficiency, eco-friendly data that meets AI suitability criteria through data quality assessment and optimization
*Data Greenhouse Data GreenHouse What is it?
Data governance solution that continuously manages green data
Key features of Data Clinic

Supports various data formats (Modalities)
Text, image, time series, orthopedic DB, formula, formula,Multimodal. Quality diagnosis and improvement are possible regardless of data format.

In various industries and work domains
It is applicable
Mobility, Defense, Sports, Meta Bus, Resource recycling,
Data Clinic is being applied in various industries such as manufacturing logistics, pharmaceuticals/medical, fashion, and finance.

Efficient through precision targeting
Generate synthetic data
Synthetic data, again, is about quality rather than quantity. Pebblous accurately identifies where data is lacking based on data clinic diagnosis and adds just the necessary amount of synthetic data.

Structured data in DB is also a data clinic
It is applicable
Important information such as financial and medical information is stored in a structured form in the DB. Data Clinic provides diagnosis as well as reproduction data and combination evaluation for structured data.

AI learning data
Manage efficiently
Effective data lightening is possible through the data diet of the data clinic. It optimizes the volume of AI learning data by removing unnecessary data and maximizes resource efficiency.

Data quality is explainable
This is the first step to AI.
Ethical and fair use of datasets and compliance with data governance are the beginning of explainable AI. Data Clinic supports a variety of data quality standards.
Data Clinic's
Differentiated technology
Through DataLens, which is composed of the latest deep learning neural networks
The multidimensional characteristics of artificial intelligence learning data
You can observe it precisely.

AI learning data / big data
DataLens
* 'DataLens' US trademark application

General type
Quickly deployable
Using ready-made deep learning neural networks

Customized
Optimized for customer datasets
Using deep learning neural networks
Data Geometry/Statistics Analysis
Complex relationships between data and
Clearly revealing the pattern
Based on the latest algorithm
It is a geometric/statistical analysis technique.
Data visualization
Complex data is intuitive and
In an easy-to-understand visual format
By converting, user understanding and
Strengthens decision making.
Improving data quality
Customized based on unique analytics
Improvements address the flaws in your data and provide data of optimal quality.
Securing intellectual property rights for core technologies
patent
• Domestic applications/registrations: 35/7 cases
• US applications/registrations: 5/2
• PCT applications: 5
thesis
• 360° Reconstruction From a Single Image Using Space Carved Outpainting
(with POSTECH, SIGGRAPH ASIA 2023, Oral)
• Expandable Facial Expression Dataset via Embedding Analysis and Synthesis
(with GIANTSTEP, SIGGRAPH ASIA 2023, Poster)
• FacialX: A Robust Facial Expression Tracking System based on Multifaceted
Expression Embedding (with GIANTSTEP, SIGGRAPH ASIA 2024, Tech. Comm.)
Why Data Clinic?
10Based on the full bloom dataset
1Time-fast quality assessment
Quality assessment and visualization of AI training datasets
5% With synthetic data
2% Performance Improvement
Generating synthetic data for precision targeting based on quality diagnosis
80% By making data lightweight
x5 GPU Increase efficiency
Optimize learning efficiency by reducing redundant data
About customer data
Quality Assessment and Diagnostic Report
example: Kaggle Bird-450 dataset
Comprehensive Evaluation
From a comprehensive perspective, we synthesize the results of Level I, II, and III diagnostics to evaluate data quality and suggest directions for improvement.
Summary of diagnostic results
Diagnostic Report Issue Date: September 8, 2024
The Bird-450 dataset is of good quality overall, but there are some areas where improvements are definitely needed. The data is well-corrected and the total number of images is sufficient, but the number of classes is large, and the images for each class are not sufficient. In addition, some classes have too much diversity.

Quality Improvement Suggestions
If the boundaries between classes are not clear, you can add synthetic data to increase the discrimination between classes. (This applies to both the training and test datasets.) In the case of this dataset, since the number of data per class is relatively small compared to the number of classes, I suggest adding about 10% of the total synthetic data.
Data Bulk-up
Bulk up Data

Data Repelica
Data replica

Data Diet
Data diet

Diagnostic status is always available
Once the diagnosis is complete, an email will be sent.
