Series

Bases: _SeriesCoreMixin, _SeriesSummaryMixin, Series


              flowchart TD
              metaframe.src.series.base.Series[Series]
              metaframe.src.series.core._SeriesCoreMixin[_SeriesCoreMixin]
              metaframe.src.series.summary._SeriesSummaryMixin[_SeriesSummaryMixin]

                              metaframe.src.series.core._SeriesCoreMixin --> metaframe.src.series.base.Series
                
                metaframe.src.series.summary._SeriesSummaryMixin --> metaframe.src.series.base.Series
                


              click metaframe.src.series.base.Series href "" "metaframe.src.series.base.Series"
              click metaframe.src.series.core._SeriesCoreMixin href "" "metaframe.src.series.core._SeriesCoreMixin"
              click metaframe.src.series.summary._SeriesSummaryMixin href "" "metaframe.src.series.summary._SeriesSummaryMixin"

Extended pandas Series with dataframe-aware helpers and summaries.

This subclass behaves like pandas.Series but guarantees that operations returning new objects preserve the custom Series or project DataFrame types. It also provides additional helpers for:

construction from DataFrames or Index objects
regex matching
structured statistical summaries

Source code in metaframe/src/series/base.py

class Series(
    _SeriesCoreMixin,
    _SeriesSummaryMixin,
    pd.Series
):
    """
    Extended pandas ``Series`` with dataframe-aware helpers and summaries.

    This subclass behaves like `pandas.Series` but guarantees that
    operations returning new objects preserve the custom ``Series`` or
    project ``DataFrame`` types. It also provides additional helpers for:

    * construction from DataFrames or Index objects
    * regex matching
    * structured statistical summaries
    """

    # ------------------------------------------------------------------
    # Constructors
    # ------------------------------------------------------------------

    @property
    def _constructor(self) -> Self:
        """
        Series constructor used internally by pandas.

        Ensures pandas operations that produce a new Series return this
        subclass instead of ``pandas.Series``.

        Returns
        -------
        Series
        """
        return Series

    @property
    def _constructor_expanddim(self) -> Type:
        """
        DataFrame constructor used when dimensionality increases.

        Used internally when a Series becomes a DataFrame.

        Returns
        -------
        DataFrame
        """
        from metaframe.src.dataframe import DataFrame
        return DataFrame

`_constructor` `property`

Series constructor used internally by pandas.

Ensures pandas operations that produce a new Series return this subclass instead of pandas.Series.

Returns:

Type	Description
`Series`

`_constructor_expanddim` `property`

DataFrame constructor used when dimensionality increases.

Used internally when a Series becomes a DataFrame.

Returns:

Type	Description
`DataFrame`

`summary(**kwargs)`

Compute a structured summary of the Series.

Produces a MultiIndex Series describing counts, missing values, descriptive statistics, value frequencies, and optional custom metrics.

The resulting Series will have a MultiIndex with the following levels:

dtype -> 'all', 'Not Numeric' or 'Numeric' Describe on which dtype of data from the original Series the summary was produced
Mode -> 'Describe', 'Value Count' or 'Custom' Describe which source produced the summary
- Describe: pandas describe method
- Value Count: pandas value_counts method
- Custom: user-defined function from d_func parameter
Metric Name of the computed metric displayed
Type Type of metric Each 'count' metric will generate its associated % metric type below

Behavior depends on value_counts:

None -> automatically split numeric and non-numeric data
True -> frequency-based summary only (on numeric and non-numeric data)
False -> descriptive statistics only (on numeric data)

Parameters:

Name	Type	Description	Default
`kwargs`		SeriesSummaryOpts keywords arguments.	`{}`

Returns:

Type	Description
`Series`	MultiIndex Series containing the summary statistics.

Examples:

>>> s = Series([1, 2, 'a', 2, 'b', 'a', 3, 'a', None])
>>> s.summary()
dtype        Mode         Metric         Type      
all          Describe     Num. elements  count             9
                          NAs            count           1.0
                                         %              11.1
Not Numeric  Describe     Num. elements  count           4.0
                                         %              44.4
                          unique         count             2
                                         %              50.0
                          top            top               a
                          freq           freq              3
             Value Count  a              count             3
                                         %              75.0
                          b              count             1
                                         %              25.0
Numeric      Describe     Num. elements  count           4.0
                                         %              44.4
                          mean           mean            2.0
                          std            std            0.82
                          min            min             1.0
                          25%            percentile     1.75
                          50%            percentile      2.0
                          75%            percentile     2.25
                          max            max             3.0
                          sum            sum             8.0
             Custom       zeros          count           0.0
                                         %               0.0
                          filled         count           4.0
                                         %             100.0
dtype: object

Source code in metaframe/src/series/summary.py

def summary(self, **kwargs) -> Self:
    """
    Compute a structured summary of the Series.

    Produces a MultiIndex Series describing counts, missing values,
    descriptive statistics, value frequencies, and optional custom metrics.

    The resulting Series will have a MultiIndex with the following levels:

    - dtype
        -> 'all', 'Not Numeric' or 'Numeric'
        Describe on which dtype of data from the original Series the summary was produced
    - Mode
        -> 'Describe', 'Value Count' or 'Custom'
        Describe which source produced the summary
        - Describe: pandas describe method
        - Value Count: pandas value_counts method
        - Custom: user-defined function from d_func parameter
    - Metric
        Name of the computed metric displayed
    - Type
        Type of metric
        Each 'count' metric will generate its associated % metric type below

    Behavior depends on `value_counts`:

    - None  -> automatically split numeric and non-numeric data
    - True  -> frequency-based summary only (on numeric and non-numeric data)
    - False -> descriptive statistics only (on numeric data)

    Parameters
    ----------
    kwargs:
        SeriesSummaryOpts keywords arguments.

    Returns
    -------
    Series
        MultiIndex Series containing the summary statistics.

    Examples
    --------
    >>> s = Series([1, 2, 'a', 2, 'b', 'a', 3, 'a', None])
    >>> s.summary()
    dtype        Mode         Metric         Type      
    all          Describe     Num. elements  count             9
                              NAs            count           1.0
                                             %              11.1
    Not Numeric  Describe     Num. elements  count           4.0
                                             %              44.4
                              unique         count             2
                                             %              50.0
                              top            top               a
                              freq           freq              3
                 Value Count  a              count             3
                                             %              75.0
                              b              count             1
                                             %              25.0
    Numeric      Describe     Num. elements  count           4.0
                                             %              44.4
                              mean           mean            2.0
                              std            std            0.82
                              min            min             1.0
                              25%            percentile     1.75
                              50%            percentile      2.0
                              75%            percentile     2.25
                              max            max             3.0
                              sum            sum             8.0
                 Custom       zeros          count           0.0
                                             %               0.0
                              filled         count           4.0
                                             %             100.0
    dtype: object
    """
    if self.empty:
        return self._constructor()
    opts = SeriesSummaryOpts(**kwargs)
    l_summary_names = [opts.label_type, opts.label_mode, opts.label_metric, opts.label_metric_type]
    summary_mi = pd.MultiIndex.from_tuples([], names=l_summary_names)
    processed_summary = self._constructor(index=summary_mi)
    na_summary = self._constructor(index=summary_mi)
    count_summary = self._constructor(index=summary_mi)
    count_summary[(opts.label_type_all, opts.label_mode_desc, opts.label_metric_count, opts.label_metric_type_count)] = self.shape[0]
    if not opts.skip_na:
        na_summary[(opts.label_type_all, opts.label_mode_desc, opts.label_metric_nas, opts.label_metric_type_count)] = self.isna().sum()
        na_summary = summary_perc(na_summary, self.shape[0], opts)
    if opts.value_counts is None:
        num_mask = self.dropna().apply(lambda x: not isinstance(x, (bool, np_bool)) and isinstance(x, Number))
        processed_summary_not_num = self[num_mask[~num_mask].index].summary(**asdict(replace(opts, value_counts=True, skip_na=True, _tot_shape=self.shape[0])))
        if not processed_summary_not_num.empty:
            processed_summary_not_num = processed_summary_not_num.rename({opts.label_type_all: opts.label_type_not_num}, level=opts.label_type)
        processed_summary_num = self[num_mask[num_mask].index].infer_objects().summary(**asdict(replace(opts, value_counts=False, skip_na=True, _tot_shape=self.shape[0])))
        if not processed_summary_num.empty:
            processed_summary_num = processed_summary_num.rename({opts.label_type_all: opts.label_type_num}, level=opts.label_type)
        to_concat = [e for e in [processed_summary_not_num, processed_summary_num] if not e.empty]
        if to_concat:
            processed_summary = pd.concat(to_concat)
    elif opts.value_counts:
        processed_summary = self.astype(str)._summary_not_num(l_summary_names=l_summary_names, opts=opts)
    else:
        if pd.api.types.is_numeric_dtype(self.dtype):
            processed_summary = self._summary_num(l_summary_names=l_summary_names, summary_mi=summary_mi, opts=opts)
        else:
            count_summary.iloc[0] = 0
    if opts._tot_shape is not None:
        count_summary = summary_perc(count_summary, opts._tot_shape, opts)
    summary = pd.concat([e for e in [count_summary, na_summary, processed_summary] if not e.empty])
    summary.name = self.name
    return summary

`_summary_not_num(l_summary_names, opts)`

Compute summary statistics for non-numeric data.

Includes descriptive metrics and value frequencies, with optional percentage computation.

Parameters:

Name	Type	Description	Default
`l_summary_names`	`List[str]`		required
`opts`	`SeriesSummaryOpts`		required

Returns:

Type	Description
`Series`	MultiIndex summary for non-numeric values.

Source code in metaframe/src/series/summary.py

def _summary_not_num(self, l_summary_names: List[str], opts: SeriesSummaryOpts) -> Self:
    """
    Compute summary statistics for non-numeric data.

    Includes descriptive metrics and value frequencies, with optional
    percentage computation.

    Parameters
    ----------
    l_summary_names: List[str]
    opts : SeriesSummaryOpts

    Returns
    -------
    Series
        MultiIndex summary for non-numeric values.
    """
    s_summary_desc = self.describe(include='all', **opts.describe_kwargs).drop('count')
    s_summary_desc.index = pd.MultiIndex.from_tuples([(opts.label_type_all, opts.label_mode_desc, e, e if e!='unique' else opts.label_metric_type_count) for e in s_summary_desc.index], names=l_summary_names)
    s_summary_values = self.value_counts().fillna(0)
    s_summary_values.index = pd.MultiIndex.from_tuples([(opts.label_type_all, opts.label_mode_value_count, str(e), opts.label_metric_type_count) for e in s_summary_values.index], names=l_summary_names)
    return summary_perc(pd.concat([s_summary_desc, s_summary_values]), self.shape[0], opts)

`_summary_num(l_summary_names, summary_mi, opts)`

Compute summary statistics for numeric data.

Includes descriptive statistics (mean, std, min, percentiles, max, sum) and optional custom metrics provided through d_func.

Parameters:

Name	Type	Default
`l_summary_names`	`List[str]`	required
`summary_mi`	`MultiIndex`	required
`opts`	`SeriesSummaryOpts`	required

Returns:

Type	Description
`Series`	MultiIndex summary for numeric values.

Source code in metaframe/src/series/summary.py

def _summary_num(self, 
                 l_summary_names: List[str], 
                 summary_mi: pd.MultiIndex, 
                 opts: SeriesSummaryOpts) -> Self:
    """
    Compute summary statistics for numeric data.

    Includes descriptive statistics (mean, std, min, percentiles, max, sum)
    and optional custom metrics provided through `d_func`.

    Parameters
    ----------
    l_summary_names: List[str]
    summary_mi: pd.MultiIndex
    opts: SeriesSummaryOpts

    Returns
    -------
    Series
        MultiIndex summary for numeric values.
    """
    summary_desc = self.describe(**opts.describe_kwargs).drop('count').round(opts.round_desc)
    summary_desc.index = pd.MultiIndex.from_tuples([(opts.label_type_num, opts.label_mode_desc, e, e if not e.endswith('%') else opts.label_metric_type_percentile) for e in summary_desc.index], names=l_summary_names)
    processed_summary = self._constructor(name=self.name, index=summary_mi)
    processed_summary[(opts.label_type_num, opts.label_mode_desc, 'sum', 'sum')] = self.sum()
    for metric_type, d in opts.d_func.items():
        for metric_name, func in d.items():
            processed_summary[(opts.label_type_num, opts.label_mode_custom, metric_name, metric_type)] = func(self)
    return summary_perc(pd.concat([summary_desc, processed_summary.round(opts.round_desc)]), self.shape[0], opts)

`fullmatch(pattern, **kwargs)`

Test whether each value fully matches a regex pattern.

Each value is cast to string and matched using re.fullmatch. Missing values return False.

Parameters:

Name	Type	Description	Default
`pattern`	`str`	Regular expression pattern.	required
`**kwargs`		Additional arguments forwarded to `re.fullmatch`.	`{}`

Returns:

Type	Description
`Series of bool`

Examples:

>>> s = Series(["A1", "B2", "AA"])
>>> s.fullmatch(r"[A-Z]\d")
0     True
1     True
2    False
dtype: bool

Source code in metaframe/src/series/core.py

def fullmatch(self, pattern: str, **kwargs) -> Self:
    """
    Test whether each value fully matches a regex pattern.

    Each value is cast to string and matched using `re.fullmatch`.
    Missing values return ``False``.

    Parameters
    ----------
    pattern : str
        Regular expression pattern.
    **kwargs
        Additional arguments forwarded to ``re.fullmatch``.

    Returns
    -------
    Series of bool

    Examples
    --------
    >>> s = Series(["A1", "B2", "AA"])
    >>> s.fullmatch(r"[A-Z]\\d")
    0     True
    1     True
    2    False
    dtype: bool
    """
    if not isinstance(pattern, str):
        pattern = str(pattern)
    return self.astype(str).apply(lambda x: re_fullmatch(pattern, x, **kwargs) is not None if x==x else False)

`to_int(start_at=0)`

Encode unique values as consecutive integers.

Identical values receive identical integers. Missing values are preserved.

Parameters:

Name	Type	Description	Default
`start_at`	`int`	Starting integer label.	`0`

Returns:

Type	Description
`Series of int`

Examples:

>>> s = Series(["a", "b", "a"])
>>> s.to_int()
0    0
1    1
2    0
dtype: int64

Source code in metaframe/src/series/core.py

def to_int(self, start_at: int=0) -> Self:
    """
    Encode unique values as consecutive integers.

    Identical values receive identical integers. Missing values are preserved.

    Parameters
    ----------
    start_at : int, default 0
        Starting integer label.

    Returns
    -------
    Series of int

    Examples
    --------
    >>> s = Series(["a", "b", "a"])
    >>> s.to_int()
    0    0
    1    1
    2    0
    dtype: int64
    """
    # When dtype is 'O' and series is composed of pseudo-nuemric (numeric + numeric strings + NaN)
    # NaN are not replaced
    return self.fillna(PLACEHOLDER).replace({v: (i+start_at) for i, v in enumerate(self.fillna(PLACEHOLDER).unique())}).astype(int)

Series

_constructor property

_constructor_expanddim property

summary(**kwargs)

_summary_not_num(l_summary_names, opts)

_summary_num(l_summary_names, summary_mi, opts)

fullmatch(pattern, **kwargs)

to_int(start_at=0)

`_constructor` `property`

`_constructor_expanddim` `property`

`summary(**kwargs)`

`_summary_not_num(l_summary_names, opts)`

`_summary_num(l_summary_names, summary_mi, opts)`

`fullmatch(pattern, **kwargs)`

`to_int(start_at=0)`