Login

Diagnostic Report Guide

all close

Level I Diagnosis Results

Measuring Data Integrity

Missing Value Measurement

Class Balance Measurement

Statistical Measurement

Level II Diagnostic Results

Data Lens Selection and Imaging

Observing geometric properties

Observation of distribution properties

icon

Level I Diagnostic Results - Basic Diagnosis

Data integrity

Image size

Image Channel

Label Consistency

Check for missing values

Check for missing values

Statistical Measurement

Overall Statistics

Overall average image

The overall average image can be used to gauge the overall trends in color, shape, and pattern of the dataset.
image1

Pixel histogram of the overall average image

You can find out the distribution characteristics of each color channel of the entire average image.
Red Color
Green Color
Blue Color
icon

Level II Diagnostic Results - Basic Diagnosis

Apply DataLens

Select Data Lens

Base neural network

 

Observation Resources

Data Imaging

Data Imaging

Observing geometric properties

Macroscopic property observation

Overall data distribution

This is a 2D PCA result for visualizing the high-dimensional imaging results obtained by DataLens. In the chart, the origin is the origin vector value in the imaging space, and the mean image feature is the vector value of the entire average image imaged by DataLens. The higher the diversity of the image, the greater the distance between the mean feature and the mean image feature.

Manifold shape measurement (I) Macroscopic

The data imaging results are observed as a manifold in a multidimensional space. The horizontal axis represents representative classes. The vertical axis is the average of the magnitude (norm) of the feature vectors belonging to the class, which corresponds to the average distance from the origin. The minimum/maximum distance from the origin is displayed together to estimate the overall size of the manifold and the specificity of each class. The average image of highly diverse data is not similar to any image in the dataset, so it usually exists outside the minimum/maximum interval. However, since the data lens used for level II diagnosis is domain-neutral, the average image usually also exists in the minimum/maximum interval.
Maximum distance
Minimum distance
Average image distance
Average image of each class

Observation of local properties

Distance-based similarity measurement

This is the result of distance-based similarity search for representative images by class. For example, for a given data, it extracts the 10 closest and 10 furthest data and shows them. This allows you to identify local singularities within the dataset. This helps identify outliers and duplicate images within the dataset.

Density Measurement (I)

The density is calculated by calculating the distance between adjacent data for each data on the multidimensional manifold, which is the result of data imaging. The density is higher when there are more other data around a specific data, and the density is lower when there are fewer. Data with high density are more likely to be duplicates, and data with low density are more likely to be outliers. Density is visualized through two-dimensional PCA, not observation dimension. At this time, the darker the red, the higher the density of the data. In the case of density measurement by class, a total of 12 classes representing the distribution of density are selected and the results are displayed.

Density chart: total data (calculated from observation dimension and visualized in two dimensions)

Overall density chart

Distance-density measurement

The shape of the multidimensional manifold as a result of data imaging and the density of each data are shown together. The horizontal axis is the distance from the origin of each feature vector, and the vertical axis is the density of the corresponding data. The distance-density chart of a dataset with a good distribution of distance-density measurement results for various data has a single feather shape. Therefore, it is also called a feather chart. Usually, similar/redundant data are located in the dense area at the top of the feather.

Density distribution over space

Density Measurement (II)

Similar to density measurement (I), but adds contour lines so that the density distribution can be observed together with the macroscopic distribution of the data. When viewed together with density, clusters in the macroscopic distribution can be more easily detected.

Data density line: full data

Red Color

lowness

height

Observation of distribution properties

Observe statistical properties

Manifold shape measurement (II) Statistical

The data imaging results are statistically observed in the manifold of the multidimensional space. The horizontal axis represents the distance from the origin for each vector value, and the vertical axis represents the frequency of each vector value. The graph indicated by the dotted line represents the mean frequency of the corresponding class. This allows us to understand the distribution of a specific class in the manifold. Four charts are shown centered on the reference point: (1) distance from the origin of the manifold, (2) distance from the origin by class, (3) distance from the center of the data, and (4) distance from the center of the data by class.

Distributional shape of the manifold: (1) Distance from the origin

Origin

Distributional shape of the manifold: (3) Distance from the center of data

Class average
Origin

Density Measurement (III) Distributional Properties

The distribution of the density of each data is shown in two charts. First, in the histogram chart, the horizontal axis represents the density value, and the vertical axis represents the frequency of the density. The histogram can help you understand the overall density distribution of the dataset. It is especially helpful for understanding the distribution of outliers such as edge cases. The second chart shows the density distribution for the representative class in a box-whisker chart. The representative classes are arranged in order of density, and can be compared with the average density. If you improve the data quality (bulk up/diet) in the future, you can also see that the density distribution improves.

Density histogram: full data

Class average
Origin

Density Box Chart

Mean Density

Special sample examples

A singular sample from a density perspective

After the quality diagnosis, we show you some outliers that you should look at again with domain knowledge. First, we show outliers in terms of density. We show you the 20 most dense and 20 least dense samples in the entire distribution and by class. High-density samples are likely to be similar/duplicate data and will be the target of data diet in the future. Low-density samples are outliers. Depending on the target task, you may need to keep them as edge cases, remove them as outliers, or bulk up by adding data to the surrounding area to increase the density.

Diagnostic Report Chart Explorer

Select diagnostic report

​
​

Chart Type

​
​

class

​
​

Please select the chart you want to see with Pebbly

chart_search

Ask Pebbly!

Copyright © Pebblous.ai All rights reserved

E-mail|contact@pebblous.ai
Tel|044-589-3824
Office|South KoreaSejongSeoulDaejeon
Web|pebblous.ai