Overview

Dataset statistics

Number of variables4
Number of observations12
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory480.0 B
Average record size in memory40.0 B

Variable types

Numeric2
Categorical2

Warnings

Period has constant value "2001-2019" Constant
Lake Victoria is highly correlated with SimiyuHigh correlation
Simiyu is highly correlated with Lake VictoriaHigh correlation
Lake Victoria is highly correlated with SimiyuHigh correlation
Simiyu is highly correlated with Lake VictoriaHigh correlation
Lake Victoria is highly correlated with SimiyuHigh correlation
Simiyu is highly correlated with Lake VictoriaHigh correlation
Simiyu is highly correlated with MonthHigh correlation
Month is highly correlated with Simiyu and 1 other fieldsHigh correlation
Lake Victoria is highly correlated with MonthHigh correlation
Month is highly correlated with PeriodHigh correlation
Period is highly correlated with MonthHigh correlation
Month is uniformly distributed Uniform
Lake Victoria has unique values Unique
Simiyu has unique values Unique
Month has unique values Unique

Reproduction

Analysis started2022-05-05 15:11:44.021659
Analysis finished2022-05-05 15:11:51.568778
Duration7.55 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

Lake Victoria
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIQUE

Distinct12
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.524877193
Minimum1.764421053
Maximum9.362789474
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size192.0 B

Quantile statistics

Minimum1.764421053
5-th percentile2.340936842
Q13.366657895
median4.0735
Q35.168460526
95-th percentile8.065744737
Maximum9.362789474
Range7.598368421
Interquartile range (IQR)1.801802632

Descriptive statistics

Standard deviation2.037278089
Coefficient of variation (CV)0.4502394214
Kurtosis1.993378036
Mean4.524877193
Median Absolute Deviation (MAD)0.971236842
Skewness1.264509715
Sum54.29852632
Variance4.150502013
MonotonicityNot monotonic
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
9.3627894741
8.3%
5.3184210531
8.3%
5.1184736841
8.3%
4.1681052631
8.3%
1.7644210531
8.3%
4.6870526321
8.3%
2.8126315791
8.3%
3.4771
8.3%
3.1761
8.3%
3.9788947371
8.3%
Other values (2)2
16.7%
ValueCountFrequency (%)
1.7644210531
8.3%
2.8126315791
8.3%
3.1761
8.3%
3.4302105261
8.3%
3.4771
8.3%
3.9788947371
8.3%
4.1681052631
8.3%
4.6870526321
8.3%
5.1184736841
8.3%
5.3184210531
8.3%
ValueCountFrequency (%)
9.3627894741
8.3%
7.0045263161
8.3%
5.3184210531
8.3%
5.1184736841
8.3%
4.6870526321
8.3%
4.1681052631
8.3%
3.9788947371
8.3%
3.4771
8.3%
3.4302105261
8.3%
3.1761
8.3%

Simiyu
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIQUE

Distinct12
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.394868421
Minimum0.1952105263
Maximum4.753578947
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size192.0 B

Quantile statistics

Minimum0.1952105263
5-th percentile0.2713421052
Q11.166118421
median2.681605263
Q33.291078948
95-th percentile4.381721052
Maximum4.753578947
Range4.558368421
Interquartile range (IQR)2.124960527

Descriptive statistics

Standard deviation1.489299991
Coefficient of variation (CV)0.6218713219
Kurtosis-1.109718615
Mean2.394868421
Median Absolute Deviation (MAD)1.302157895
Skewness-0.06066814636
Sum28.73842105
Variance2.218014463
MonotonicityNot monotonic
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
1.0469473681
8.3%
3.0914210531
8.3%
0.19521052631
8.3%
1.81
8.3%
2.9084736841
8.3%
4.0774736841
8.3%
2.9810526321
8.3%
3.8900526321
8.3%
0.33363157891
8.3%
1.2058421051
8.3%
Other values (2)2
16.7%
ValueCountFrequency (%)
0.19521052631
8.3%
0.33363157891
8.3%
1.0469473681
8.3%
1.2058421051
8.3%
1.81
8.3%
2.4547368421
8.3%
2.9084736841
8.3%
2.9810526321
8.3%
3.0914210531
8.3%
3.8900526321
8.3%
ValueCountFrequency (%)
4.7535789471
8.3%
4.0774736841
8.3%
3.8900526321
8.3%
3.0914210531
8.3%
2.9810526321
8.3%
2.9084736841
8.3%
2.4547368421
8.3%
1.81
8.3%
1.2058421051
8.3%
1.0469473681
8.3%

Month
Categorical

HIGH CORRELATION
HIGH CORRELATION
UNIFORM
UNIQUE

Distinct12
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size192.0 B
Jan
Nov
Apr
Sep
Mar
Other values (7)

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters36
Distinct characters22
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12 ?
Unique (%)100.0%

Sample

1st rowJan
2nd rowFeb
3rd rowMar
4th rowApr
5th rowMay

Common Values

ValueCountFrequency (%)
Jan1
8.3%
Nov1
8.3%
Apr1
8.3%
Sep1
8.3%
Mar1
8.3%
Jun1
8.3%
Dec1
8.3%
Jul1
8.3%
May1
8.3%
Oct1
8.3%
Other values (2)2
16.7%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
jun1
8.3%
dec1
8.3%
may1
8.3%
jul1
8.3%
apr1
8.3%
sep1
8.3%
feb1
8.3%
jan1
8.3%
nov1
8.3%
oct1
8.3%
Other values (2)2
16.7%

Most occurring characters

ValueCountFrequency (%)
J3
 
8.3%
a3
 
8.3%
e3
 
8.3%
u3
 
8.3%
n2
 
5.6%
M2
 
5.6%
r2
 
5.6%
A2
 
5.6%
p2
 
5.6%
c2
 
5.6%
Other values (12)12
33.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter24
66.7%
Uppercase Letter12
33.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a3
12.5%
e3
12.5%
u3
12.5%
n2
8.3%
r2
8.3%
p2
8.3%
c2
8.3%
b1
 
4.2%
y1
 
4.2%
l1
 
4.2%
Other values (4)4
16.7%
Uppercase Letter
ValueCountFrequency (%)
J3
25.0%
M2
16.7%
A2
16.7%
F1
 
8.3%
S1
 
8.3%
O1
 
8.3%
N1
 
8.3%
D1
 
8.3%

Most occurring scripts

ValueCountFrequency (%)
Latin36
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
J3
 
8.3%
a3
 
8.3%
e3
 
8.3%
u3
 
8.3%
n2
 
5.6%
M2
 
5.6%
r2
 
5.6%
A2
 
5.6%
p2
 
5.6%
c2
 
5.6%
Other values (12)12
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII36
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
J3
 
8.3%
a3
 
8.3%
e3
 
8.3%
u3
 
8.3%
n2
 
5.6%
M2
 
5.6%
r2
 
5.6%
A2
 
5.6%
p2
 
5.6%
c2
 
5.6%
Other values (12)12
33.3%

Period
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)8.3%
Missing0
Missing (%)0.0%
Memory size192.0 B
2001-2019
12 

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters108
Distinct characters5
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2001-2019
2nd row2001-2019
3rd row2001-2019
4th row2001-2019
5th row2001-2019

Common Values

ValueCountFrequency (%)
2001-201912
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2001-201912
100.0%

Most occurring characters

ValueCountFrequency (%)
036
33.3%
224
22.2%
124
22.2%
-12
 
11.1%
912
 
11.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number96
88.9%
Dash Punctuation12
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
036
37.5%
224
25.0%
124
25.0%
912
 
12.5%
Dash Punctuation
ValueCountFrequency (%)
-12
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common108
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
036
33.3%
224
22.2%
124
22.2%
-12
 
11.1%
912
 
11.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII108
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
036
33.3%
224
22.2%
124
22.2%
-12
 
11.1%
912
 
11.1%

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

Lake VictoriaSimiyuMonthPeriod
03.1760002.908474Jan2001-2019
13.4770001.800000Feb2001-2019
24.6870532.981053Mar2001-2019
37.0045264.753579Apr2001-2019
49.3627894.077474May2001-2019
53.4302111.046947Jun2001-2019
61.7644210.195211Jul2001-2019
72.8126320.333632Aug2001-2019
83.9788951.205842Sep2001-2019
95.3184212.454737Oct2001-2019

Last rows

Lake VictoriaSimiyuMonthPeriod
24.6870532.981053Mar2001-2019
37.0045264.753579Apr2001-2019
49.3627894.077474May2001-2019
53.4302111.046947Jun2001-2019
61.7644210.195211Jul2001-2019
72.8126320.333632Aug2001-2019
83.9788951.205842Sep2001-2019
95.3184212.454737Oct2001-2019
105.1184743.091421Nov2001-2019
114.1681053.890053Dec2001-2019