Summaries
A summary object can be generated for a DataFrame using to_summary():
>>> dataframe
strings f g h
group 1 0 1
floats bool group
1.1 False 0 1 A 2
2.2 False 0 2 B None
3.3 True 2 3 C B
4.4 True 1 4 D None
>>> summary = dataframe.to_summary()
>>> type(summary)
<class 'metaframe.src.dataframe.summary.base.Summary'>
Pre-build summaries
Multiple pre-built summaries are available:
basic(): for DataFrame and MetaFrames dimensions
>>> summary.basic()
Rows Columns Cells
DataFrame 4 3 12
MFR 4 3 12
MFC 3 2 6
whole(): for the whole DataFrame matrix
>>> summary.whole()
DataFrame
dtype Mode Metric Type
all Describe Num. elements count 12
NAs count 2.0
% 16.7
Not Numeric Describe Num. elements count 5.0
% 41.7
unique count 4
% 80.0
top top B
freq freq 2
Value Count B count 2
% 40.0
A count 1
% 20.0
C count 1
% 20.0
D count 1
% 20.0
Numeric Describe Num. elements count 5.0
% 41.7
mean mean 2.4
std std 1.14
min min 1.0
25% percentile 2.0
50% percentile 2.0
75% percentile 3.0
max max 4.0
sum sum 12.0
Custom zeros count 0.0
% 0.0
filled count 5.0
% 100.0
row(): for per-row summary
>>> summary.row()
floats 1.1 2.2 3.3 4.4
bool False False True True
group 0 0 2 1
dtype Mode Metric Type
all Describe Num. elements count 3 3 3 3
NAs count 0.0 1.0 0.0 1.0
% 0.0 33.3 0.0 33.3
Not Numeric Describe Num. elements count 1.0 1.0 2.0 1.0
% 33.3 33.3 66.7 33.3
unique count 1 1 2 1
% 100.0 100.0 100.0 100.0
top top A B C D
freq freq 1 1 1 1
Value Count A count 1 NaN NaN NaN
% 100.0 NaN NaN NaN
B count NaN 1 1 NaN
% NaN 100.0 50.0 NaN
C count NaN NaN 1 NaN
% NaN NaN 50.0 NaN
D count NaN NaN NaN 1
% NaN NaN NaN 100.0
Numeric Describe Num. elements count 2.0 1.0 1.0 1.0
% 66.7 33.3 33.3 33.3
mean mean 1.5 2.0 3.0 4.0
std std 0.71 NaN NaN NaN
min min 1.0 2.0 3.0 4.0
25% percentile 1.25 2.0 3.0 4.0
50% percentile 1.5 2.0 3.0 4.0
75% percentile 1.75 2.0 3.0 4.0
max max 2.0 2.0 3.0 4.0
sum sum 3.0 2.0 3.0 4.0
Custom zeros count 0.0 0.0 0.0 0.0
% 0.0 0.0 0.0 0.0
filled count 2.0 1.0 1.0 1.0
% 100.0 100.0 100.0 100.0
col(): for per-col summary
>>> summary.col()
strings f g h
group 1 0 1
dtype Mode Metric Type
all Describe Num. elements count 4.00 4 4
NAs count 0.00 0.0 2.0
% 0.00 0.0 50.0
Numeric Describe Num. elements count 4.00 NaN 1.0
% 100.00 NaN 25.0
mean mean 2.50 NaN 2.0
std std 1.29 NaN NaN
min min 1.00 NaN 2.0
25% percentile 1.75 NaN 2.0
50% percentile 2.50 NaN 2.0
75% percentile 3.25 NaN 2.0
max max 4.00 NaN 2.0
sum sum 10.00 NaN 2.0
Custom zeros count 0.00 NaN 0.0
% 0.00 NaN 0.0
filled count 4.00 NaN 1.0
% 100.00 NaN 100.0
Not Numeric Describe Num. elements count NaN 4.0 1.0
% NaN 100.0 25.0
unique count NaN 4 1
% NaN 100.0 100.0
top top NaN A B
freq freq NaN 1 1
Value Count A count NaN 1 NaN
% NaN 25.0 NaN
B count NaN 1 1
% NaN 25.0 100.0
C count NaN 1 NaN
% NaN 25.0 NaN
D count NaN 1 NaN
% NaN 25.0 NaN
Custom summaries
Summaries are customable in two ways:
- Using pre-build summaries and passing a custom
d_funcdictionary:{'<metric_type>': {'<metric>': <func>}}With func being a function/lambda taking a Series and returning a single value.
>>> summary.whole(d_func={'my_metric_type': {'my_metric': lambda s: (s<0).sum()}})
DataFrame
dtype Mode Metric Type
...
Numeric Custom my_metric my_metric_type 0.0
- Using the
summary()method with a custom function, taking a DataFrame and a DataFrameSummaryOpts object as input and returning a DataFrame.