Skip to content

Summaries

A summary object can be generated for a DataFrame using to_summary():

>>> dataframe
strings             f  g     h
group               1  0     1
floats bool  group            
1.1    False 0      1  A     2
2.2    False 0      2  B  None
3.3    True  2      3  C     B
4.4    True  1      4  D  None
>>> summary = dataframe.to_summary()
>>> type(summary)
<class 'metaframe.src.dataframe.summary.base.Summary'>

Pre-build summaries

Multiple pre-built summaries are available:

  • basic(): for DataFrame and MetaFrames dimensions
>>> summary.basic()
           Rows  Columns  Cells
DataFrame     4        3     12
MFR           4        3     12
MFC           3        2      6
  • whole(): for the whole DataFrame matrix
>>> summary.whole()
                                                 DataFrame
dtype       Mode        Metric        Type                
all         Describe    Num. elements count             12
                        NAs           count            2.0
                                      %               16.7
Not Numeric Describe    Num. elements count            5.0
                                      %               41.7
                        unique        count              4
                                      %               80.0
                        top           top                B
                        freq          freq               2
            Value Count B             count              2
                                      %               40.0
                        A             count              1
                                      %               20.0
                        C             count              1
                                      %               20.0
                        D             count              1
                                      %               20.0
Numeric     Describe    Num. elements count            5.0
                                      %               41.7
                        mean          mean             2.4
                        std           std             1.14
                        min           min              1.0
                        25%           percentile       2.0
                        50%           percentile       2.0
                        75%           percentile       3.0
                        max           max              4.0
                        sum           sum             12.0
            Custom      zeros         count            0.0
                                      %                0.0
                        filled        count            5.0
                                      %              100.0
  • row() : for per-row summary
>>> summary.row()
floats                                              1.1    2.2    3.3    4.4
bool                                              False  False  True   True 
group                                                 0      0      2      1
dtype       Mode        Metric        Type                                  
all         Describe    Num. elements count           3      3      3      3
                        NAs           count         0.0    1.0    0.0    1.0
                                      %             0.0   33.3    0.0   33.3
Not Numeric Describe    Num. elements count         1.0    1.0    2.0    1.0
                                      %            33.3   33.3   66.7   33.3
                        unique        count           1      1      2      1
                                      %           100.0  100.0  100.0  100.0
                        top           top             A      B      C      D
                        freq          freq            1      1      1      1
            Value Count A             count           1    NaN    NaN    NaN
                                      %           100.0    NaN    NaN    NaN
                        B             count         NaN      1      1    NaN
                                      %             NaN  100.0   50.0    NaN
                        C             count         NaN    NaN      1    NaN
                                      %             NaN    NaN   50.0    NaN
                        D             count         NaN    NaN    NaN      1
                                      %             NaN    NaN    NaN  100.0
Numeric     Describe    Num. elements count         2.0    1.0    1.0    1.0
                                      %            66.7   33.3   33.3   33.3
                        mean          mean          1.5    2.0    3.0    4.0
                        std           std          0.71    NaN    NaN    NaN
                        min           min           1.0    2.0    3.0    4.0
                        25%           percentile   1.25    2.0    3.0    4.0
                        50%           percentile    1.5    2.0    3.0    4.0
                        75%           percentile   1.75    2.0    3.0    4.0
                        max           max           2.0    2.0    3.0    4.0
                        sum           sum           3.0    2.0    3.0    4.0
            Custom      zeros         count         0.0    0.0    0.0    0.0
                                      %             0.0    0.0    0.0    0.0
                        filled        count         2.0    1.0    1.0    1.0
                                      %           100.0  100.0  100.0  100.0
  • col() : for per-col summary
>>> summary.col()
strings                                                f      g      h
group                                                  1      0      1
dtype       Mode        Metric        Type                            
all         Describe    Num. elements count         4.00      4      4
                        NAs           count         0.00    0.0    2.0
                                      %             0.00    0.0   50.0
Numeric     Describe    Num. elements count         4.00    NaN    1.0
                                      %           100.00    NaN   25.0
                        mean          mean          2.50    NaN    2.0
                        std           std           1.29    NaN    NaN
                        min           min           1.00    NaN    2.0
                        25%           percentile    1.75    NaN    2.0
                        50%           percentile    2.50    NaN    2.0
                        75%           percentile    3.25    NaN    2.0
                        max           max           4.00    NaN    2.0
                        sum           sum          10.00    NaN    2.0
            Custom      zeros         count         0.00    NaN    0.0
                                      %             0.00    NaN    0.0
                        filled        count         4.00    NaN    1.0
                                      %           100.00    NaN  100.0
Not Numeric Describe    Num. elements count          NaN    4.0    1.0
                                      %              NaN  100.0   25.0
                        unique        count          NaN      4      1
                                      %              NaN  100.0  100.0
                        top           top            NaN      A      B
                        freq          freq           NaN      1      1
            Value Count A             count          NaN      1    NaN
                                      %              NaN   25.0    NaN
                        B             count          NaN      1      1
                                      %              NaN   25.0  100.0
                        C             count          NaN      1    NaN
                                      %              NaN   25.0    NaN
                        D             count          NaN      1    NaN
                                      %              NaN   25.0    NaN

Custom summaries

Summaries are customable in two ways:

  • Using pre-build summaries and passing a custom d_func dictionary: {'<metric_type>': {'<metric>': <func>}} With func being a function/lambda taking a Series and returning a single value.
>>> summary.whole(d_func={'my_metric_type': {'my_metric': lambda s: (s<0).sum()}})
                                                     DataFrame
dtype       Mode        Metric        Type                    
...
Numeric     Custom      my_metric     my_metric_type       0.0
  • Using the summary() method with a custom function, taking a DataFrame and a DataFrameSummaryOpts object as input and returning a DataFrame.