Skip to content

DataFrame

Bases: _DataFrameCoreMixin, _DataFrameIOMixin, _DataFrameUtilsMixin, _DataFrameGetStringMixin, DataFrame


              flowchart TD
              metaframe.src.dataframe.DataFrame[DataFrame]
              metaframe.src.dataframe.core._DataFrameCoreMixin[_DataFrameCoreMixin]
              metaframe.src.dataframe.io._DataFrameIOMixin[_DataFrameIOMixin]
              metaframe.src.dataframe.utils._DataFrameUtilsMixin[_DataFrameUtilsMixin]
              metaframe.src.dataframe.getstring._DataFrameGetStringMixin[_DataFrameGetStringMixin]

                              metaframe.src.dataframe.core._DataFrameCoreMixin --> metaframe.src.dataframe.DataFrame
                
                metaframe.src.dataframe.io._DataFrameIOMixin --> metaframe.src.dataframe.DataFrame
                
                metaframe.src.dataframe.utils._DataFrameUtilsMixin --> metaframe.src.dataframe.DataFrame
                
                metaframe.src.dataframe.getstring._DataFrameGetStringMixin --> metaframe.src.dataframe.DataFrame
                


              click metaframe.src.dataframe.DataFrame href "" "metaframe.src.dataframe.DataFrame"
              click metaframe.src.dataframe.core._DataFrameCoreMixin href "" "metaframe.src.dataframe.core._DataFrameCoreMixin"
              click metaframe.src.dataframe.io._DataFrameIOMixin href "" "metaframe.src.dataframe.io._DataFrameIOMixin"
              click metaframe.src.dataframe.utils._DataFrameUtilsMixin href "" "metaframe.src.dataframe.utils._DataFrameUtilsMixin"
              click metaframe.src.dataframe.getstring._DataFrameGetStringMixin href "" "metaframe.src.dataframe.getstring._DataFrameGetStringMixin"
            

Metadata-aware pandas DataFrame.

DataFrame is a subclass of pandas.DataFrame that adds first-class support for metadata, semantic indexing, and structured table representations.

It behaves like a standard pandas DataFrame while providing additional abstractions for working with metadata-rich datasets. All pandas operations remain available and return metaframe.DataFrame objects whenever possible.

The core idea of DataFrame is that structure is data: rows and columns are treated as semantic entities that can carry metadata (MetaData DataFrame -> MetaFrame) and be selected using expressive, readable selectors.

Key features include:

  • MetaFrame-backed index/column management (mfr / mfc)
  • MetaFrame-aware indexers (q, gs, mfloc, mfiloc)
  • Safe manipulation of index and columns without breaking structure
  • Structured file import/export
  • Compatibility with all standard pandas APIs
Properties

_constructor: Pandas _constructor overridden to return metaframe.DataFrame.

_constructor_sliced: Pandas _constructor_sliced overridden to return metaframe.Series.

mfr: MetaFrameRow view of DataFrame index as _MetaFrame. Settable with MetaFrame or compatible DataFrame.

mfc: MetaFrameCol view of DataFrame columns as _MetaFrame. Settable with MetaFrame or compatible DataFrame.

q: Query indexer (obj.gs[row/col] or obj.gs[row, col]). Returns _GetStringIndexer with __getitem__/__setitem__ support. Tries rows first, then columns for non-table DataFrame, columns only for table DataFrame.

gs: Get-string indexer (obj.gs[row/col] or obj.gs[row, col]). Returns _GetStringIndexer with __getitem__/__setitem__ support. Tries rows first, then columns for non-table DataFrame, columns only for table DataFrame.

mfloc: MetaFrame .loc indexer (obj.mfloc[mfr_key/mfc_key]). Returns _MfIndexer mirroring mfr.loc/mfc.loc with DataFrame row/col selection. Supports __setitem__.

mfiloc: MetaFrame .iloc indexer (obj.mfiloc[mfr_pos/mfc_pos]). Returns _MfIndexer mirroring mfr.iloc/mfc.iloc with DataFrame row/col selection. Supports __setitem__.

is_table: bool indicating table format (simple index/columns, no MultiIndex).

is_metaframe: bool indicating MetaFrame format (table + numeric index).

Notes

The class behaves like a normal pandas DataFrame.

You can always access the underlying pandas behavior, and you can freely mix pandas and MetaFrame operations within the same workflow.

Source code in metaframe/src/dataframe/base.py
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
class DataFrame(
    _DataFrameCoreMixin,
    _DataFrameIOMixin,
    _DataFrameUtilsMixin,
    _DataFrameGetStringMixin,
    pd.DataFrame,
):
    """
    Metadata-aware pandas DataFrame.

    ``DataFrame`` is a subclass of `pandas.DataFrame` that adds
    first-class support for metadata, semantic indexing, and structured
    table representations.

    It behaves like a standard pandas DataFrame while providing additional
    abstractions for working with metadata-rich datasets. All pandas operations 
    remain available and return ``metaframe.DataFrame`` objects whenever possible.

    The core idea of ``DataFrame`` is that **structure is data**:
    rows and columns are treated as semantic entities that can carry
    metadata (MetaData DataFrame -> MetaFrame) and be selected using 
    expressive, readable selectors.

    Key features include:

    - MetaFrame-backed index/column management (``mfr`` / ``mfc``)
    - MetaFrame-aware indexers (``q``, ``gs``, ``mfloc``, ``mfiloc``)
    - Safe manipulation of index and columns without breaking structure
    - Structured file import/export
    - Compatibility with all standard pandas APIs

    Properties
    ----------
    _constructor: 
        Pandas `_constructor` overridden to return `metaframe.DataFrame`.

    _constructor_sliced: 
        Pandas `_constructor_sliced` overridden to return `metaframe.Series`.

    mfr: 
        MetaFrameRow view of DataFrame index as `_MetaFrame`. Settable with MetaFrame or
        compatible DataFrame.

    mfc: 
        MetaFrameCol view of DataFrame columns as `_MetaFrame`. Settable with MetaFrame or
        compatible DataFrame.

    q:
        Query indexer (`obj.gs[row/col]` or `obj.gs[row, col]`). Returns `_GetStringIndexer`
        with `__getitem__`/`__setitem__` support. Tries rows first, then columns for
        non-table DataFrame, columns only for table DataFrame.

    gs: 
        Get-string indexer (`obj.gs[row/col]` or `obj.gs[row, col]`). Returns `_GetStringIndexer`
        with `__getitem__`/`__setitem__` support. Tries rows first, then columns for
        non-table DataFrame, columns only for table DataFrame.

    mfloc: 
        MetaFrame `.loc` indexer (`obj.mfloc[mfr_key/mfc_key]`). Returns `_MfIndexer`
        mirroring `mfr.loc`/`mfc.loc` with DataFrame row/col selection. Supports `__setitem__`.

    mfiloc: 
        MetaFrame `.iloc` indexer (`obj.mfiloc[mfr_pos/mfc_pos]`). Returns `_MfIndexer`
        mirroring `mfr.iloc`/`mfc.iloc` with DataFrame row/col selection. Supports `__setitem__`.

    is_table: 
        `bool` indicating table format (simple index/columns, no MultiIndex).

    is_metaframe: 
        `bool` indicating MetaFrame format (table + numeric index).

    Notes
    -----
    The class behaves like a normal pandas DataFrame.

    You can always access the underlying pandas behavior, and you can
    freely mix pandas and MetaFrame operations within the same workflow.
    """

    # Override
    @property
    def _constructor(self) -> Type:
        """
        Return the MetaFrame DataFrame constructor.

        This ensures that pandas operations returning a DataFrame
        (such as slicing, arithmetic operations, or transformations)
        preserve the ``metaframe.DataFrame`` type.

        Returns
        -------
        Type
            The ``metaframe.DataFrame`` class.
        """
        return DataFrame

    # Override
    @property
    def _constructor_sliced(self) -> Type:
        """
        Return the MetaFrame Series constructor.

        This ensures that pandas operations returning a Series
        (such as column access or row selection) return a
        ``metaframe.Series`` when possible.

        Returns
        -------
        Type
            The ``metaframe.Series`` class.
        """
        from metaframe.src.series import Series
        return Series

gs property

General semantic selector for rows and columns.

gs provides expressive, MetaFrame-aware selection using semantic labels, operations or regular expressions.

The selector attempts row selection first, then column selection, unless the DataFrame is in table format, in which case column selection is preferred.

Returns:

Type Description
_GetStringIndexer

An indexer supporting __getitem__ and __setitem__.

Notes

Read-only property.

gs is designed for readability and intent, not positional access.

It complements rather than replaces loc/iloc.

For more informations on the Get-Strings format and usage, see the 'Get-Strings' wiki page!

Examples:

Getter

>>> from metaframe.testing import dataframe, metaframe_row
>>> metaframe_row
   floats   bool  group
0     1.1  False      0
1     2.2  False      0
2     3.3   True      2
3     4.4   True      1
>>> metaframe_row.gs["floats:(>3 or =1.1) and group:!2"]
   floats   bool  group
0     1.1  False      0
3     4.4   True      1
>>> dataframe
strings             f  g     h
group               1  0     1
floats bool  group            
1.1    False 0      1  A     2
2.2    False 0      2  B  None
3.3    True  2      3  C     B
4.4    True  1      4  D  None
>>> dataframe.gs["bool:'.*se$'", "group:0,2"]
strings             g
group               0
floats bool  group   
1.1    False 0      A
2.2    False 0      B

Setter

>>> metaframe_row.gs["floats:(>3 or =1.1) and group:!2"] = [0.0, True, -1]
>>> metaframe_row
   floats   bool  group
0     0.0   True     -1
1     2.2  False      0
2     3.3   True      2
3     0.0   True     -1
>>> dataframe.gs["bool:'.*se$'", "group:0,2"] = 'E'
strings             f  g     h
group               1  0     1
floats bool  group            
1.1    False 0      1  E     2
2.2    False 0      2  E  None
3.3    True  2      3  C     B
4.4    True  1      4  D  None

q property

General query selector for rows and columns.

q provides expressive, MetaFrame-aware selection using pandas query strings.

The selector attempts row selection first, then column selection, unless the DataFrame is in table format, in which case column selection is preferred.

Returns:

Type Description
_GetStringIndexer

An indexer supporting __getitem__ and __setitem__.

Notes

Read-only property.

q is designed for readability and intent, not positional access.

It complements rather than replaces loc/iloc.

For more informations on the query format, see the pandas.DataFrame.query documentation!

mfr property writable

DataFrame view of the DataFrame index (MetaFrameRow).

mfr exposes the DataFrame rows as a structured MetaFrame, allowing metadata-aware inspection, selection, and modification of the index.

Returns:

Type Description
_MetaFrame

DataFrame representation of the DataFrame index.

Examples:

>>> from metaframe.testing import dataframe
>>> dataframe
strings             f  g     h
group               1  0     1
floats bool  group            
1.1    False 0      1  A     2
2.2    False 0      2  B  None
3.3    True  2      3  C     B
4.4    True  1      4  D  None
>>> dataframe.mfr
     floats   bool  group
0       1.1  False      0
1       2.2  False      0
2       3.3   True      2
3       4.4   True      1

The extra space between the columns names and the matrix is expected, as _MetaFrame objects have a specific index name identifier.

mfc property writable

DataFrame view of the DataFrame columns (MetaFrameCol).

mfc exposes the DataFrame columns as a structured MetaFrame, allowing metadata-aware inspection, selection, and modification of the columns.

Returns:

Type Description
_MetaFrame

DataFrame representation of the DataFrame columns.

Examples:

>>> from metaframe.testing import dataframe
>>> dataframe
strings             f  g     h
group               1  0     1
floats bool  group            
1.1    False 0      1  A     2
2.2    False 0      2  B  None
3.3    True  2      3  C     B
4.4    True  1      4  D  None
>>> dataframe.mfc
    strings  group
0         f      1
1         g      0
2         h      1

The extra space between the columns names and the matrix is expected, as _MetaFrame objects have a specific index name identifier.

is_table property

Indicate whether the DataFrame is in table format.

A table-format DataFrame has a simple (non-MultiIndex) index and columns, and is typically suitable for export or display.

Returns:

Type Description
bool

True if the DataFrame is in table format.

Notes

Read-only property.

Examples:

>>> from metaframe import dataframe, metaframe_col
>>> metaframe_col
  strings  group
0       f      1
1       g      0
2       h      1
>>> metaframe_col.is_table
True
>>> metaframe_col.set_index('strings').is_table
True
>>> dataframe
strings             f  g     h
group               1  0     1
floats bool  group            
1.1    False 0      1  A     2
2.2    False 0      2  B  None
3.3    True  2      3  C     B
4.4    True  1      4  D  None
>>> dataframe.is_table
False

is_metaframe property

Indicate whether the DataFrame conforms to MetaFrame format.

A MetaFrame-format DataFrame is a table-format DataFrame with a numeric index suitable for representing structured metadata.

Returns:

Type Description
bool

True if the DataFrame conforms to MetaFrame format.

Notes

Read-only property.

Examples:

>>> from metaframe import dataframe, metaframe_col
>>> metaframe_col
  strings  group
0       f      1
1       g      0
2       h      1
>>> metaframe_col.is_metaframe
True
>>> metaframe_col.set_index('strings').is_metaframe
False
>>> dataframe
strings             f  g     h
group               1  0     1
floats bool  group            
1.1    False 0      1  A     2
2.2    False 0      2  B  None
3.3    True  2      3  C     B
4.4    True  1      4  D  None
>>> dataframe.is_metaframe
False

mfloc property

MetaFrame-aware label-based indexer.

Indexes MetaFrameRow/MetaFrameCol via .loc semantics (obj.mfloc[mfr_key/mfc_key] and obj.mfloc[mfr_key, mfc_key]).

mfloc mirrors pandas .loc semantics while operating on MetaFrame row and column representations. It allows selection using MetaFrame-compatible keys rather than raw labels.

Returns:

Type Description
_MfIndexer

A MetaFrame-aware label-based indexer.

Notes

Read-only property.

Examples:

Getter

>>> from metaframe import dataframe
>>> dataframe
strings             f  g     h
group               1  0     1
floats bool  group            
1.1    False 0      1  A     2
2.2    False 0      2  B  None
3.3    True  2      3  C     B
4.4    True  1      4  D  None

Non-tuple values are applied on MetaFrames columns

>>> dataframe.mfloc['floats']
strings  f  g     h
group    1  0     1
floats             
1.1      1  A     2
2.2      2  B  None
3.3      3  C     B
4.4      4  D  None
>>> dataframe.mfloc[:, 'strings']
strings             f  g     h
floats bool  group            
1.1    False 0      1  A     2
2.2    False 0      2  B  None
3.3    True  2      3  C     B
4.4    True  1      4  D  None
>>> dataframe.mfloc['floats', 'strings']
strings  f  g     h
floats             
1.1      1  A     2
2.2      2  B  None
3.3      3  C     B
4.4      4  D  None

Multiple selection, with list or slices, is also possible

>>> dataframe.mfloc[['floats', 'group']]
strings       f  g     h
group         1  0     1
floats group            
1.1    0      1  A     2
2.2    0      2  B  None
3.3    2      3  C     B
4.4    1      4  D  None
>>> dataframe.mfloc[slice('floats', 'group')]
strings             f  g     h
group               1  0     1
floats bool  group            
1.1    False 0      1  A     2
2.2    False 0      2  B  None
3.3    True  2      3  C     B
4.4    True  1      4  D  None

Tuples enable rows selection:

>>> dataframe.mfloc[(0, ['floats', 'group']), ([1, 2], 'strings')]
strings       g  h
floats group      
1.1    0      A  2

':' can not be used within tuple! Use '' or 'slice(None)' instead:

>>> dataframe.mfloc[(0,), (slice(None), 'strings')]
strings             f  g  h
floats bool  group         
1.1    False 0      1  A  2

Setter

The mfloc property also support setting, in the similar fahion to pandas DataFrame loc (support new columns creation). The setting will only affect the corrsponding MetaFrames matrices, never the DataFrame matrix!

>>> from metaframe import dataframe
>>> dataframe
strings             f  g     h
group               1  0     1
floats bool  group            
1.1    False 0      1  A     2
2.2    False 0      2  B  None
3.3    True  2      3  C     B
4.4    True  1      4  D  None
>>> dataframe.mfloc['group'] = 5
>>> dataframe
strings             f  g     h
group               1  0     1
floats bool  group            
1.1    False 5      1  A     2
2.2    False 5      2  B  None
3.3    True  5      3  C     B
4.4    True  5      4  D  None
>>> dataframe.mfloc[:, 'strings'] = ['i', 'j', 'k']
>>> dataframe
strings             i  j     k
group               1  0     1
floats bool  group            
1.1    False 5      1  A     2
2.2    False 5      2  B  None
3.3    True  5      3  C     B
4.4    True  5      4  D  None
>>> dataframe.mfloc[(0, 'New'),] = 'foo'
>>> dataframe
strings                 i  j     k
group                   1  0     1
floats bool  group New            
1.1    False 5     foo  1  A     2
2.2    False 5     nan  2  B  None
3.3    True  5     nan  3  C     B
4.4    True  5     nan  4  D  None

mfiloc property

MetaFrame-aware positional indexer.

Indexes MetaFrameRow/MetaFrameCol via .iloc semantics (obj.mfiloc[mfr_pos/mfc_pos] and obj.mfiloc[mfr_pos, mfc_pos]).

mfiloc mirrors pandas .iloc semantics while preserving MetaFrame structure and metadata during positional selection.

Returns:

Type Description
_MfIndexer

A MetaFrame-aware positional indexer.

Notes

Read-only property.

Examples:

Getter

>>> from metaframe import dataframe
>>> dataframe
strings             f  g     h
group               1  0     1
floats bool  group            
1.1    False 0      1  A     2
2.2    False 0      2  B  None
3.3    True  2      3  C     B
4.4    True  1      4  D  None

Non-tuple values are applied on MetaFrames columns

>>> dataframe.mfiloc[0]
strings  f  g     h
group    1  0     1
floats             
1.1      1  A     2
2.2      2  B  None
3.3      3  C     B
4.4      4  D  None
>>> dataframe.mfiloc[:, 0]
strings             f  g     h
floats bool  group            
1.1    False 0      1  A     2
2.2    False 0      2  B  None
3.3    True  2      3  C     B
4.4    True  1      4  D  None
>>> dataframe.mfiloc[0, 0]
strings  f  g     h
floats             
1.1      1  A     2
2.2      2  B  None
3.3      3  C     B
4.4      4  D  None

Multiple selection, with list or slices, is also possible

>>> dataframe.mfiloc[[0, 2]]
strings       f  g     h
group         1  0     1
floats group            
1.1    0      1  A     2
2.2    0      2  B  None
3.3    2      3  C     B
4.4    1      4  D  None
>>> dataframe.mfiloc[slice(0, 3)]
strings             f  g     h
group               1  0     1
floats bool  group            
1.1    False 0      1  A     2
2.2    False 0      2  B  None
3.3    True  2      3  C     B
4.4    True  1      4  D  None

Tuples enable rows selection:

>>> dataframe.mfiloc[(0, [0, 2]), ([1, 2], 0)]
strings       g  h
floats group      
1.1    0      A  2

':' can not be used within tuple! Use '' or 'slice(None)' instead:

>>> dataframe.mfiloc[(0,), (slice(None), 0)]
strings             f  g  h
floats bool  group         
1.1    False 0      1  A  2

Setter

The mfiloc property also support setting, in the similar fahion to pandas DataFrame iloc (does NOT support new columns creation). The setting will only affect the corrsponding MetaFrames matrices, never the DataFrame matrix!

>>> from metaframe import dataframe
>>> dataframe
strings             f  g     h
group               1  0     1
floats bool  group            
1.1    False 0      1  A     2
2.2    False 0      2  B  None
3.3    True  2      3  C     B
4.4    True  1      4  D  None
>>> dataframe.mfiloc[2] = 5
>>> dataframe
strings             f  g     h
group               1  0     1
floats bool  group            
1.1    False 5      1  A     2
2.2    False 5      2  B  None
3.3    True  5      3  C     B
4.4    True  5      4  D  None
>>> dataframe.mfiloc[:, 0] = ['i', 'j', 'k']
>>> dataframe
strings             i  j     k
group               1  0     1
floats bool  group            
1.1    False 5      1  A     2
2.2    False 5      2  B  None
3.3    True  5      3  C     B
4.4    True  5      4  D  None

_constructor property

Return the MetaFrame DataFrame constructor.

This ensures that pandas operations returning a DataFrame (such as slicing, arithmetic operations, or transformations) preserve the metaframe.DataFrame type.

Returns:

Type Description
Type

The metaframe.DataFrame class.

_constructor_sliced property

Return the MetaFrame Series constructor.

This ensures that pandas operations returning a Series (such as column access or row selection) return a metaframe.Series when possible.

Returns:

Type Description
Type

The metaframe.Series class.

fullmatch(pattern, names=None, clean=True, **kwargs)

Filter rows whose values fully match a regex pattern.

Applies re.fullmatch to selected columns and keeps rows where at least one column matches.

Parameters:

Name Type Description Default
pattern str

Regular expression.

required
names str or list of str

Columns to evaluate. Defaults to all.

None
clean bool

Drop rows that become entirely NaN.

True
**kwargs

Passed to the matching helper.

{}

Returns:

Type Description
Self

Filtered DataFrame.

Examples:

>>> from metaframe.testing import metaframe_row
>>> metaframe_row
   floats   bool  group
0     1.1  False      0
1     2.2  False      0
2     3.3   True      2
3     4.4   True      1
>>> metaframe_row.fullmatch(".*1$")
   floats bool  group
0     1.1  NaN    NaN
3     NaN  NaN    1.0
>>> metaframe_row.fullmatch(".*1$", clean=False)
   floats bool  group
0     1.1  NaN    NaN
1     NaN  NaN    NaN
2     NaN  NaN    NaN
3     NaN  NaN    1.0
Source code in metaframe/src/dataframe/getstring.py
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
def fullmatch(self, pattern: str, names: str | List[str] | None=None, clean: bool=True, **kwargs) -> Self:
    """
    Filter rows whose values fully match a regex pattern.

    Applies `re.fullmatch` to selected columns and keeps rows where
    at least one column matches.

    Parameters
    ----------
    pattern : str
        Regular expression.
    names : str or list of str, optional
        Columns to evaluate. Defaults to all.
    clean : bool, default True
        Drop rows that become entirely NaN.
    **kwargs
        Passed to the matching helper.

    Returns
    -------
    Self
        Filtered DataFrame.

    Examples
    --------
    >>> from metaframe.testing import metaframe_row
    >>> metaframe_row
       floats   bool  group
    0     1.1  False      0
    1     2.2  False      0
    2     3.3   True      2
    3     4.4   True      1
    >>> metaframe_row.fullmatch(".*1$")
       floats bool  group
    0     1.1  NaN    NaN
    3     NaN  NaN    1.0
    >>> metaframe_row.fullmatch(".*1$", clean=False)
       floats bool  group
    0     1.1  NaN    NaN
    1     NaN  NaN    NaN
    2     NaN  NaN    NaN
    3     NaN  NaN    1.0
    """
    if not names:
        names = self.columns
    res = self[names].map(lambda x: re_fullmatch(pattern, str(x), **kwargs)).notnull()
    return self[res].dropna(how='all', axis=0) if clean else self[res]

_eval_get_string(get_str, obj_name, axis=1)

Evaluates parsed get-string expression for selection.

Executes safely-eval'd get_str on appropriate DataFrame (self for simple Index, mf(axis) for MultiIndex). Returns selected rows/columns based on result index.

Parameters:

Name Type Description Default
get_str str

Parsed get-string from parse_get_string().

required
obj_name str

Variable name used in get-string expression.

required
axis (Literal[0, 1], optional)

0=rows (MetaFrameRow), 1=columns (MetaFrameCol).

1

Returns:

Type Description
Self

DataFrame with selected rows (axis=0/1 simple) or columns (axis=1 MultiIndex).

Raises:

Type Description
ValueError

Get-strings invalid on simple row indexes.

Source code in metaframe/src/dataframe/getstring.py
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
def _eval_get_string(self, get_str: str, obj_name: str, axis: Literal[0, 1]=1) -> Self:
    """
    Evaluates parsed get-string expression for selection.

    Executes safely-eval'd `get_str` on appropriate DataFrame (self for simple Index,
    `mf(axis)` for MultiIndex). Returns selected rows/columns based on result index.

    Parameters
    ----------
    get_str : str
        Parsed get-string from `parse_get_string()`.
    obj_name : str
        Variable name used in get-string expression.
    axis : Literal[0, 1], optional 
        0=rows (MetaFrameRow), 1=columns (MetaFrameCol).

    Returns
    -------
    Self
        DataFrame with selected rows (axis=0/1 simple) or columns (axis=1 MultiIndex).

    Raises
    ------
    ValueError
        Get-strings invalid on simple row indexes.
    """
    check_is(axis, is_axis)
    if axis == 0:
        index = self.index
        mf = self.mf(axis=0)
    else:
        index = self.columns
        mf = self.mf(axis=1)
    is_simple_idx = is_simple_index(index)
    if is_simple_idx and axis == 0:
        raise ValueError("Can not use get-strings on simple index rows!")
    # Apply the get-string on self if it is a simple index columns, on mf else
    idx = eval(get_str, {}, {obj_name: self if is_simple_idx else mf}).index
    if is_simple_idx:
        # For a simple index column, we return the selected rows
        return self.loc[idx, :]
    if axis == 0:
        # For a multi-index row, we return the selected rows
        return self.iloc[idx, :]
    # For a multi-index column, we return the selected columns
    return self.iloc[:, idx]

natsort_values(*args, key=None, **kwargs)

Sort using natural (human) ordering.

Wraps sort_values with a numeric-aware key based on the natsort library. If this first natsort fails (ie, float+dates comparison), the series will be converted to strings prior to natsorting. Disallows passing a custom key to avoid conflicts.

Parameters:

Name Type Description Default
*args

Forwarded to sort_values.

()
**kwargs

Forwarded to sort_values.

()
key None

Must not be provided.

None

Returns:

Type Description
Self

Naturally sorted DataFrame.

Raises:

Type Description
ValueError

If a custom key is supplied.

Examples:

>>> from metaframe.testing import mfr
>>> mfr
  strings  integers  floats   bool  ... bool_with_missing  dates_with_missing  mixed_numeric mixed_types
0       a         1     1.1  False  ...             False          2024-02-01              1           1
1       b         2     2.2  False  ...               NaN                 NaN            2.2           a
2       c         3     3.3   True  ...               NaN          2024-02-03              3        3.14
3       d         4     4.4   True  ...              True                 NaN            4.4         NaN
4       e         5     5.5  False  ...               NaN          2024-02-05            NaN  2024-03-01
[5 rows x 14 columns]
>>> mfr.natsort_values('mixed_types')
  strings  integers  floats   bool  ... bool_with_missing  dates_with_missing  mixed_numeric mixed_types
3       d         4     4.4   True  ...              True                 NaN            4.4         NaN
0       a         1     1.1  False  ...             False          2024-02-01              1           1
2       c         3     3.3   True  ...               NaN          2024-02-03              3        3.14
4       e         5     5.5  False  ...               NaN          2024-02-05            NaN  2024-03-01
1       b         2     2.2  False  ...               NaN                 NaN            2.2           a
[5 rows x 14 columns]
Source code in metaframe/src/dataframe/utils.py
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
def natsort_values(self, *args, key=None, **kwargs) -> Self:
    """
    Sort using natural (human) ordering.

    Wraps `sort_values` with a numeric-aware key based on the
    ``natsort`` library. If this first natsort fails (ie, float+dates
    comparison), the series will be converted to strings prior to
    natsorting.
    Disallows passing a custom `key` to avoid conflicts.

    Parameters
    ----------
    *args, **kwargs
        Forwarded to `sort_values`.
    key : None
        Must not be provided.

    Returns
    -------
    Self
        Naturally sorted DataFrame.

    Raises
    ------
    ValueError
        If a custom ``key`` is supplied.

    Examples
    --------
    >>> from metaframe.testing import mfr
    >>> mfr
      strings  integers  floats   bool  ... bool_with_missing  dates_with_missing  mixed_numeric mixed_types
    0       a         1     1.1  False  ...             False          2024-02-01              1           1
    1       b         2     2.2  False  ...               NaN                 NaN            2.2           a
    2       c         3     3.3   True  ...               NaN          2024-02-03              3        3.14
    3       d         4     4.4   True  ...              True                 NaN            4.4         NaN
    4       e         5     5.5  False  ...               NaN          2024-02-05            NaN  2024-03-01
    [5 rows x 14 columns]
    >>> mfr.natsort_values('mixed_types')
      strings  integers  floats   bool  ... bool_with_missing  dates_with_missing  mixed_numeric mixed_types
    3       d         4     4.4   True  ...              True                 NaN            4.4         NaN
    0       a         1     1.1  False  ...             False          2024-02-01              1           1
    2       c         3     3.3   True  ...               NaN          2024-02-03              3        3.14
    4       e         5     5.5  False  ...               NaN          2024-02-05            NaN  2024-03-01
    1       b         2     2.2  False  ...               NaN                 NaN            2.2           a
    [5 rows x 14 columns]
    """
    if key is not None:
        raise ValueError("Cannot apply natural sorting when a custom key is provided!")
    return self.sort_values(*args, key=key_natsort, **kwargs)

order_values(sorter=None, by=None, *args, axis=0, **kwargs)

Sort using explicit categorical order instead of lexicographic order.

Supports explicit order lists per column (dict), multi-column lists, or appearance-order.

Parameters:

Name Type Description Default
sorter list, list[list], dict, or None

Ordering definition. * None: appearance order * list: single column * list[list]: multi-column * dict: {column: order}, with order being one of the above, and ignore by.

None
by str or list of str

Columns to sort by. If None, use all columns. Ignored if sorted is set to a dictionary.

None
axis (0, 1)

Axis to sort.

0
*args

Passed to sort_values.

()
**kwargs

Passed to sort_values.

()

Raises:

Type Description
ValueError

if sorter is neither None, a string or a list of strings.

Returns:

Type Description
Self

Sorted DataFrame.

Examples:

>>> from metaframe.testing import mfr
>>> mfr[['bool', 'group']]
    bool  group
0  False      0
1  False      0
2   True      2
3   True      1
4  False      2
>>> mfr[['bool', 'group']].order_values({'group': [2, 1, 0], 'bool': None})
    bool  group
4  False      2
2   True      2
3   True      1
0  False      0
1  False      0
Source code in metaframe/src/dataframe/utils.py
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
def order_values(
    self,
    sorter: List[Any] | List[List[Any]] | Dict[str, List[Any]] | None = None,
    by: str | List[str] | None = None,
    *args,
    axis: Literal[0, 1] = 0,
    **kwargs
) -> Self:
    """
    Sort using explicit categorical order instead of lexicographic order.

    Supports explicit order lists per column (dict), multi-column lists, or appearance-order.

    Parameters
    ----------
    sorter : list, list[list], dict, or None, optional
        Ordering definition.
        * None: appearance order
        * list: single column
        * list[list]: multi-column
        * dict: {column: order}, with order being one of the above, and
        ignore `by`.
    by : str or list of str, optional
        Columns to sort by.
        If None, use all columns.
        Ignored if sorted is set to a dictionary.
    axis : {0, 1}, default 0
        Axis to sort.
    *args, **kwargs
        Passed to `sort_values`.

    Raises
    ------
    ValueError
        if sorter is neither None, a string or a list of strings.

    Returns
    -------
    Self
        Sorted DataFrame.

    Examples
    --------
    >>> from metaframe.testing import mfr
    >>> mfr[['bool', 'group']]
        bool  group
    0  False      0
    1  False      0
    2   True      2
    3   True      1
    4  False      2
    >>> mfr[['bool', 'group']].order_values({'group': [2, 1, 0], 'bool': None})
        bool  group
    4  False      2
    2   True      2
    3   True      1
    0  False      0
    1  False      0
    """
    check_is(axis, is_axis)
    if axis == 1:
        return self.T.order_values(sorter=sorter, by=by, *args, axis=0, **kwargs).T
    dtypes = self.dtypes.to_dict()
    if sorter is None:
        if by is None:
            by = self.columns
        elif isinstance(by, str):
                by = [by]
        sorter = {col: None for col in by}
    elif not isinstance(sorter, dict):
        if isinstance(by, list):
            sorter = {col: order for col, order in zip(by, sorter, strict=True)}
        elif isinstance(by, str):
            sorter = {by: sorter}
        else:
            raise ValueError("Invalid 'by' parameter!")
    to_order = self.copy().dropna(subset=sorter.keys())
    for col, order in sorter.items():
        to_order[col] = pd.Categorical(to_order[col], to_order[col].unique() if order is None else order if isinstance(order, list) else [order])
    return to_order.dropna(subset=sorter.keys()).sort_values(list(sorter.keys()), *args, axis=0, **kwargs).astype(dtypes)

auto_sort(*args, axis=None, **kwargs)

Automatically sort rows and/or columns.

Behavior depends on structure: * Table: columns sorted by increasing uniqueness, then rows naturally sorted * Non-Table: recursively sorts MetaFrames and reindexes accordingly

Parameters:

Name Type Description Default
axis (0, 1)

Specific axis to sort. None sorts both.

0
*args

Passed to natsort_values.

()
**kwargs

Passed to natsort_values.

()

Returns:

Type Description
Self

Sorted DataFrame.

Examples:

>>> from metaframe.testing import df
>>> print(df)
##############
# DataFrame  #
##############
strings                         f         g         h
id                              1         2         3
none_values                   NaN       NaN       NaN
strings integers floats                              
a       1        1.1     0.944497  0.464098  0.192795
b       2        2.2     0.620084  0.684224  0.103438
c       3        3.3     0.281979  0.753425  0.792706
(First 3 DataFrame and MetaFrames rows & columns showed)
################
# MetaFrameRow #
################
  strings  integers  floats   bool  ... bool_with_missing  dates_with_missing  mixed_numeric mixed_types
0       a         1     1.1  False  ...             False          2024-02-01              1           1
1       b         2     2.2  False  ...               NaN                 NaN            2.2           a
2       c         3     3.3   True  ...               NaN          2024-02-03              3        3.14
3       d         4     4.4   True  ...              True                 NaN            4.4         NaN
4       e         5     5.5  False  ...               NaN          2024-02-05            NaN  2024-03-01
[5 rows x 14 columns]
################
# MetaFrameCol #
################
  strings  id  none_values  group
0       f   1          NaN      1
1       g   2          NaN      0
2       h   3          NaN      1
3       i   4          NaN      3
[Row levels]: strings, integers, floats, bool, dates, none_values, group, strings_with_missing, ints_with_missing, floats_with_missing, bool_with_missing, dates_with_missing, mixed_numeric, mixed_types
[Col levels]: strings, id, none_values, group
DF : [5 rows x 4 columns]
MFR: [5 rows x 14 columns]
MFC: [4 rows x 4 columns]
Is Table:   False
Is MetaFrame: False
>>> print(df.auto_sort())
##############
# DataFrame  #
##############
none_values                   NaN                    
group                           0         1          
id                              2         1         3
none_values bool  group                              
NaN         False 0      0.684224  0.620084  0.103438
                  0      0.464098  0.944497  0.192795
                  2      0.573377  0.595951  0.704949
(First 3 DataFrame and MetaFrames rows & columns showed)
################
# MetaFrameRow #
################
   none_values   bool  group bool_with_missing  ... integers  floats  mixed_numeric mixed_types
0          NaN  False      0               NaN  ...        2     2.2            2.2           a
1          NaN  False      0             False  ...        1     1.1              1           1
2          NaN  False      2               NaN  ...        5     5.5            NaN  2024-03-01
3          NaN   True      1              True  ...        4     4.4            4.4         NaN
4          NaN   True      2               NaN  ...        3     3.3              3        3.14
[5 rows x 14 columns]
################
# MetaFrameCol #
################
   none_values  group  id strings
0          NaN      0   2       g
1          NaN      1   1       f
2          NaN      1   3       h
3          NaN      3   4       i
[Row levels]: none_values, bool, group, bool_with_missing, dates_with_missing, floats_with_missing, ints_with_missing, strings_with_missing, dates, strings, integers, floats, mixed_numeric, mixed_types
[Col levels]: none_values, group, id, strings
DF : [5 rows x 4 columns]
MFR: [5 rows x 14 columns]
MFC: [4 rows x 4 columns]
Is Table:   False
Is MetaFrame: False
Source code in metaframe/src/dataframe/utils.py
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
def auto_sort(self, *args, axis: Literal[0, 1] | None = None, **kwargs) -> Self:
    """
    Automatically sort rows and/or columns.

    Behavior depends on structure:
    * Table: columns sorted by increasing uniqueness, then rows naturally sorted
    * Non-Table: recursively sorts MetaFrames and reindexes accordingly

    Parameters
    ----------
    axis : {0, 1} or None, optional
        Specific axis to sort. ``None`` sorts both.
    *args, **kwargs
        Passed to `natsort_values`.

    Returns
    -------
    Self
        Sorted DataFrame.

    Examples
    --------
    >>> from metaframe.testing import df
    >>> print(df)
    ##############
    # DataFrame  #
    ##############
    strings                         f         g         h
    id                              1         2         3
    none_values                   NaN       NaN       NaN
    strings integers floats                              
    a       1        1.1     0.944497  0.464098  0.192795
    b       2        2.2     0.620084  0.684224  0.103438
    c       3        3.3     0.281979  0.753425  0.792706
    (First 3 DataFrame and MetaFrames rows & columns showed)
    ################
    # MetaFrameRow #
    ################
      strings  integers  floats   bool  ... bool_with_missing  dates_with_missing  mixed_numeric mixed_types
    0       a         1     1.1  False  ...             False          2024-02-01              1           1
    1       b         2     2.2  False  ...               NaN                 NaN            2.2           a
    2       c         3     3.3   True  ...               NaN          2024-02-03              3        3.14
    3       d         4     4.4   True  ...              True                 NaN            4.4         NaN
    4       e         5     5.5  False  ...               NaN          2024-02-05            NaN  2024-03-01
    [5 rows x 14 columns]
    ################
    # MetaFrameCol #
    ################
      strings  id  none_values  group
    0       f   1          NaN      1
    1       g   2          NaN      0
    2       h   3          NaN      1
    3       i   4          NaN      3
    [Row levels]: strings, integers, floats, bool, dates, none_values, group, strings_with_missing, ints_with_missing, floats_with_missing, bool_with_missing, dates_with_missing, mixed_numeric, mixed_types
    [Col levels]: strings, id, none_values, group
    DF : [5 rows x 4 columns]
    MFR: [5 rows x 14 columns]
    MFC: [4 rows x 4 columns]
    Is Table:   False
    Is MetaFrame: False
    >>> print(df.auto_sort())
    ##############
    # DataFrame  #
    ##############
    none_values                   NaN                    
    group                           0         1          
    id                              2         1         3
    none_values bool  group                              
    NaN         False 0      0.684224  0.620084  0.103438
                      0      0.464098  0.944497  0.192795
                      2      0.573377  0.595951  0.704949
    (First 3 DataFrame and MetaFrames rows & columns showed)
    ################
    # MetaFrameRow #
    ################
       none_values   bool  group bool_with_missing  ... integers  floats  mixed_numeric mixed_types
    0          NaN  False      0               NaN  ...        2     2.2            2.2           a
    1          NaN  False      0             False  ...        1     1.1              1           1
    2          NaN  False      2               NaN  ...        5     5.5            NaN  2024-03-01
    3          NaN   True      1              True  ...        4     4.4            4.4         NaN
    4          NaN   True      2               NaN  ...        3     3.3              3        3.14
    [5 rows x 14 columns]
    ################
    # MetaFrameCol #
    ################
       none_values  group  id strings
    0          NaN      0   2       g
    1          NaN      1   1       f
    2          NaN      1   3       h
    3          NaN      3   4       i
    [Row levels]: none_values, bool, group, bool_with_missing, dates_with_missing, floats_with_missing, ints_with_missing, strings_with_missing, dates, strings, integers, floats, mixed_numeric, mixed_types
    [Col levels]: none_values, group, id, strings
    DF : [5 rows x 4 columns]
    MFR: [5 rows x 14 columns]
    MFC: [4 rows x 4 columns]
    Is Table:   False
    Is MetaFrame: False
    """
    df = self.copy()
    if axis is None:
        if self.is_table:
            axis = 1
        else:
            if is_mi(df.index):
                sorted_idx = df.mf(axis=0).auto_sort(axis=1)
                df = df.iloc[sorted_idx.index, :]
                df.mfr = sorted_idx
            else:
                df = df.auto_sort(axis=0)
            if is_mi(df.columns):
                sorted_col = df.mf(axis=1).auto_sort(axis=1)
                df = df.iloc[:, sorted_col.index]
                df.mfc = sorted_col
            else:
                df = df.auto_sort(axis=1)
            return df
    check_is(axis, is_axis)
    other_axis = 0 if axis == 1 else 1
    sorted_axis = df.nunique(axis=other_axis, dropna=False).sort_values().index.tolist()
    df = df[sorted_axis] if axis == 1 else df.loc[sorted_axis]
    return df.natsort_values(sorted_axis, *args, axis=other_axis, **kwargs)

from_input(input=None, **kwargs) classmethod

Create a DataFrame from any DataFrame-convertible input.

The input is first converted to a pandas DataFrame using input_to_df and then wrapped in this subclass.

Parameters:

Name Type Description Default
input Any

File path, URL, pandas DataFrame, or other convertible object.

None
**kwargs

Passed to input_to_df.

{}

Returns:

Type Description
Self

New DataFrame instance.

Source code in metaframe/src/dataframe/io.py
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
@classmethod
def from_input(cls, input: Any=None, **kwargs) -> Self:
    """
    Create a DataFrame from any DataFrame-convertible input.

    The input is first converted to a pandas DataFrame using
    ``input_to_df`` and then wrapped in this subclass.

    Parameters
    ----------
    input : Any, optional
        File path, URL, pandas DataFrame, or other convertible object.
    **kwargs
        Passed to ``input_to_df``.

    Returns
    -------
    Self
        New DataFrame instance.
    """
    # Read the input and translate it to a pandas DataFrame
    df = input_to_df(input, **kwargs)
    return cls(df)

from_table(df, mf_names=None, mf_iloc=None, mf_from_to=None, name=None, axis=0, header=0, **kwargs) classmethod

Create a MultiIndex DataFrame from a table-format DataFrame.

Selected columns are promoted to a MultiIndex on rows or columns.

Parameters:

Name Type Description Default
df Any

Table-format DataFrame (simple index/columns).

required
mf_names str or list of str

Column names to elevate.

None
mf_iloc int or list of int

Column positions to elevate.

None
mf_from_to str or list of str

Column range to elevate.

None
name str

Name of the resulting MultiIndex.

None
axis (0, 1)

0: rows, 1: columns.

0
header int

Header row for input parsing.

0
**kwargs

Passed to input_to_df.

{}

Returns:

Type Description
Self

Raises:

Type Description
ValueError

If input is not table format or no columns are specified.

Examples:

>>> import metaframe as mf
>>> from metaframe.testing import metaframe_row
>>> metaframe_row
   floats   bool  group
0     1.1  False      0
1     2.2  False      0
2     3.3   True      2
3     4.4   True      1
>>> df = mf.DataFrame.from_table(metaframe_row, mf_names='group')
>>> df
       floats   bool
group               
0         1.1  False
0         2.2  False
2         3.3   True
1         4.4   True
>>> df.index
MultiIndex([(0,),
            (0,),
            (2,),
            (1,)],
        names=['group'])
Source code in metaframe/src/dataframe/io.py
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
@classmethod
def from_table(
    cls,
    df: Any,
    mf_names: str | list[str] | None = None,
    mf_iloc: int | list[int] | None = None,
    mf_from_to: str | list[str] | None = None,
    name: str | None = None,
    axis: Literal[0, 1] = 0,
    header: int = 0,
    **kwargs
) -> Self:
    """
    Create a MultiIndex DataFrame from a table-format DataFrame.

    Selected columns are promoted to a MultiIndex on rows or columns.

    Parameters
    ----------
    df : Any
        Table-format DataFrame (simple index/columns).
    mf_names : str or list of str, optional
        Column names to elevate.
    mf_iloc : int or list of int, optional
        Column positions to elevate.
    mf_from_to : str or list of str, optional
        Column range to elevate.
    name : str, optional
        Name of the resulting MultiIndex.
    axis : {0, 1}, default 0
        0: rows, 1: columns.
    header : int, default 0
        Header row for input parsing.
    **kwargs
        Passed to ``input_to_df``.

    Returns
    -------
    Self

    Raises
    ------
    ValueError
        If input is not table format or no columns are specified.

    Examples
    --------
    >>> import metaframe as mf
    >>> from metaframe.testing import metaframe_row
    >>> metaframe_row
       floats   bool  group
    0     1.1  False      0
    1     2.2  False      0
    2     3.3   True      2
    3     4.4   True      1
    >>> df = mf.DataFrame.from_table(metaframe_row, mf_names='group')
    >>> df
           floats   bool
    group               
    0         1.1  False
    0         2.2  False
    2         3.3   True
    1         4.4   True
    >>> df.index
    MultiIndex([(0,),
                (0,),
                (2,),
                (1,)],
            names=['group'])
    """
    if not is_table(df):
        raise ValueError('The input dataframe must be a table (pd.Index index & columns)')
    check_is(axis, is_axis)
    if mf_names is None and mf_iloc is None and not mf_from_to:
        raise ValueError('Either mf_names or mf_iloc (or both) or mf_from_to must be set!')
    df = input_to_df(df, header=header, **kwargs)
    if isinstance(mf_names, str):
        mf_names = [mf_names]
    if isinstance(mf_iloc, int):
        mf_iloc = [mf_iloc]
    if not mf_names:
        mf_names = []
    if mf_from_to:
        if isinstance(mf_from_to, str):
            mf_from_to = [df.columns[0], mf_from_to]
        elif len(mf_from_to) != 2:
            raise ValueError('Parameter mf_from_to must be str or a two element list!')
        if not mf_iloc:
            mf_iloc = []
        mf_iloc += [e for e in list(range(df.columns.get_loc(mf_from_to[0]), df.columns.get_loc(mf_from_to[1])+1)) if e not in mf_iloc]
    if mf_iloc:
        mf_names += [df.columns[i] for i in mf_iloc if df.columns[i] not in mf_names]
    this_mf = df[mf_names]
    df = df[[e for e in df.columns if e not in mf_names]]
    df.index = pd.MultiIndex.from_frame(this_mf)
    df.columns.name = name
    if axis == 1:
        df = df.T.infer_objects()
    return cls(df)

from_elements(mtx, mfr=None, mfc=None, header=0, **kwargs) classmethod

Assemble a DataFrame from a matrix data and optional MetaFrames.

Combines matrix data with MetaFrameRow (index) and MetaFrameCol (columns) DataFrames. Can ignore row/col names on matrix input. Validates MetaFrame compatibility.

Parameters:

Name Type Description Default
mtx Any

Matrix-like data.

required
mfr DataFrame

Row MetaFrame.

None
mfc DataFrame

Column MetaFrame.

None
header int
0
**kwargs

Passed to input_to_df.

{}

Returns:

Type Description
Self

Examples:

>>> import metaframe as mf
>>> from metaframe.testing import metaframe_row, metaframe_col, mtx
>>> mtx = mf.DataFrame.from_dict({0: {0: 'A', 1: 'B', 2: 'C', 3: 'D'}, 1: {0: 'E', 1: 'F', 2: 'G', 3: 'H'}, 2: {0: 'I', 1: 'J', 2: 'K', 3: 'L'}})
>>> mtx
   0  1  2
0  A  E  I
1  B  F  J
2  C  G  K
3  D  H  L
>>> metaframe_row
   floats   bool  group
0     1.1  False      0
1     2.2  False      0
2     3.3   True      2
3     4.4   True      1
>>> metaframe_col
  strings  group
0       f      1
1       g      0
2       h      1
>>> mf.DataFrame.from_elements(mtx, metaframe_row, metaframe_col)
strings             f  g  h
group               1  0  1
floats bool  group         
1.1    False 0      A  E  I
2.2    False 0      B  F  J
3.3    True  2      C  G  K
4.4    True  1      D  H  L
Source code in metaframe/src/dataframe/io.py
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
@classmethod
def from_elements(
    cls,
    mtx: Any,
    mfr: pd.DataFrame | None = None,
    mfc: pd.DataFrame | None = None,
    header: int = 0,
    **kwargs
) -> Self:
    """
    Assemble a DataFrame from a matrix data and optional MetaFrames.

    Combines matrix data with MetaFrameRow (index) and MetaFrameCol (columns) DataFrames.
    Can ignore row/col names on matrix input. Validates MetaFrame compatibility.

    Parameters
    ----------
    mtx : Any
        Matrix-like data.
    mfr : DataFrame, optional
        Row MetaFrame.
    mfc : DataFrame, optional
        Column MetaFrame.
    header : int, default 0
    **kwargs
        Passed to ``input_to_df``.

    Returns
    -------
    Self

    Examples
    --------
    >>> import metaframe as mf
    >>> from metaframe.testing import metaframe_row, metaframe_col, mtx
    >>> mtx = mf.DataFrame.from_dict({0: {0: 'A', 1: 'B', 2: 'C', 3: 'D'}, 1: {0: 'E', 1: 'F', 2: 'G', 3: 'H'}, 2: {0: 'I', 1: 'J', 2: 'K', 3: 'L'}})
    >>> mtx
       0  1  2
    0  A  E  I
    1  B  F  J
    2  C  G  K
    3  D  H  L
    >>> metaframe_row
       floats   bool  group
    0     1.1  False      0
    1     2.2  False      0
    2     3.3   True      2
    3     4.4   True      1
    >>> metaframe_col
      strings  group
    0       f      1
    1       g      0
    2       h      1
    >>> mf.DataFrame.from_elements(mtx, metaframe_row, metaframe_col)
    strings             f  g  h
    group               1  0  1
    floats bool  group         
    1.1    False 0      A  E  I
    2.2    False 0      B  F  J
    3.3    True  2      C  G  K
    4.4    True  1      D  H  L
    """
    obj = cls.from_input(mtx, header=header, **kwargs)
    if mfr is not None:
        mfr = input_to_df(mfr, header=header, **kwargs)
        obj.mfr = mfr
    if mfc is not None:
        mfc = input_to_df(mfc, header=header, **kwargs)
        obj.mfc = mfc
    return obj

to_index()

Convert this DataFrame to a pandas Index or MultiIndex.

Returns:

Type Description
Index or MultiIndex

Examples:

>>> from metaframe.testing import dataframe, metaframe_row
>>> metaframe_row.to_index()
MultiIndex([(1.1, False, 0),
            (2.2, False, 0),
            (3.3,  True, 2),
            (4.4,  True, 1)],
        names=['floats', 'bool', 'group'])
Source code in metaframe/src/dataframe/io.py
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
def to_index(self) -> pd.Index | pd.MultiIndex:
    """
    Convert this DataFrame to a pandas Index or MultiIndex.

    Returns
    -------
    Index or MultiIndex

    Examples
    --------
    >>> from metaframe.testing import dataframe, metaframe_row
    >>> metaframe_row.to_index()
    MultiIndex([(1.1, False, 0),
                (2.2, False, 0),
                (3.3,  True, 2),
                (4.4,  True, 1)],
            names=['floats', 'bool', 'group'])
    """
    return index_from_frame(self)

to_table(idx=None, col=None, reset_idx_name=False, reset_col_name=True)

Converts MultiIndex DataFrame to table format (simple index/columns).

Parameters:

Name Type Description Default
idx str

Row level to extract.

None
col str

Column level to extract.

None
reset_idx_name bool
False
reset_col_name bool
True

Returns:

Type Description
Self

Examples:

>>> from metaframe.testing import dataframe
>>> dataframe
strings             f  g     h
group               1  0     1
floats bool  group            
1.1    False 0      1  A     2
2.2    False 0      2  B  None
3.3    True  2      3  C     B
4.4    True  1      4  D  None
>>> dataframe.to_table()
   0  1     2
0  1  A     2
1  2  B  None
2  3  C     B
3  4  D  None
>>> dataframe.to_table('floats', 'strings')
        f  g     h
floats            
1.1     1  A     2
2.2     2  B  None
3.3     3  C     B
4.4     4  D  None
>>> dataframe.to_table('floats', 'strings', reset_idx_name = True)
     f  g     h
1.1  1  A     2
2.2  2  B  None
3.3  3  C     B
4.4  4  D  None
Source code in metaframe/src/dataframe/io.py
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
def to_table(
    self,
    idx: str | None = None,
    col: str | None = None,
    reset_idx_name: bool = False,
    reset_col_name: bool = True
) -> Self:
    """
    Converts MultiIndex DataFrame to table format (simple index/columns).

    Parameters
    ----------
    idx : str, optional
        Row level to extract.
    col : str, optional
        Column level to extract.
    reset_idx_name : bool, default False
    reset_col_name : bool, default True

    Returns
    -------
    Self

    Examples
    --------
    >>> from metaframe.testing import dataframe
    >>> dataframe
    strings             f  g     h
    group               1  0     1
    floats bool  group            
    1.1    False 0      1  A     2
    2.2    False 0      2  B  None
    3.3    True  2      3  C     B
    4.4    True  1      4  D  None
    >>> dataframe.to_table()
       0  1     2
    0  1  A     2
    1  2  B  None
    2  3  C     B
    3  4  D  None
    >>> dataframe.to_table('floats', 'strings')
            f  g     h
    floats            
    1.1     1  A     2
    2.2     2  B  None
    3.3     3  C     B
    4.4     4  D  None
    >>> dataframe.to_table('floats', 'strings', reset_idx_name = True)
         f  g     h
    1.1  1  A     2
    2.2  2  B  None
    3.3  3  C     B
    4.4  4  D  None
    """
    if self.is_table:
        return self
    df = self.__class__(self.values)
    if col:
        df.columns = pd.Index(self.mfc[col])
        if reset_col_name or df.columns.name == 0:
            df.columns.name = None
    if idx:
        df.index = pd.Index(self.mfr[idx])
        if reset_idx_name or df.index.name == 0:
            df.index.name = None
    return df

to_metaframe(*args, names=None, **kwargs)

Converts DataFrame to MetaFrame format (table + numeric range index).

For tables, uses reset_index(). For non-table structures, applies melt() then reset_index().

Parameters:

Name Type Description Default
*args

Positional arguments passed to melt().

()
names str

Names for reset index levels. Passed to reset_index().

None
**kwargs

Keyword arguments passed to melt().

{}

Returns:

Type Description
Self

Examples:

>>> from metaframe.testing import dataframe
>>> dataframe
strings             f  g     h
group               1  0     1
floats bool  group            
1.1    False 0      1  A     2
2.2    False 0      2  B  None
3.3    True  2      3  C     B
4.4    True  1      4  D  None
>>> dataframe.to_metaframe(names=['floats', 'bool', 'group_row'])
    floats   bool  group_row strings  group value
0      1.1  False          0       f      1     1
1      2.2  False          0       f      1     2
2      3.3   True          2       f      1     3
3      4.4   True          1       f      1     4
4      1.1  False          0       g      0     A
5      2.2  False          0       g      0     B
6      3.3   True          2       g      0     C
7      4.4   True          1       g      0     D
8      1.1  False          0       h      1     2
9      2.2  False          0       h      1  None
10     3.3   True          2       h      1     B
11     4.4   True          1       h      1  None
Source code in metaframe/src/dataframe/io.py
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
def to_metaframe(self, *args, names: Any=None, **kwargs) -> Self:
    """
    Converts DataFrame to MetaFrame format (table + numeric range index).

    For tables, uses `reset_index()`.
    For non-table structures, applies `melt()` then `reset_index()`.

    Parameters
    ----------
    *args: 
        Positional arguments passed to `melt()`.
    names : str, optional
        Names for reset index levels. Passed to `reset_index()`.
    **kwargs: 
        Keyword arguments passed to `melt()`.

    Returns
    -------
    Self

    Examples
    --------
    >>> from metaframe.testing import dataframe
    >>> dataframe
    strings             f  g     h
    group               1  0     1
    floats bool  group            
    1.1    False 0      1  A     2
    2.2    False 0      2  B  None
    3.3    True  2      3  C     B
    4.4    True  1      4  D  None
    >>> dataframe.to_metaframe(names=['floats', 'bool', 'group_row'])
        floats   bool  group_row strings  group value
    0      1.1  False          0       f      1     1
    1      2.2  False          0       f      1     2
    2      3.3   True          2       f      1     3
    3      4.4   True          1       f      1     4
    4      1.1  False          0       g      0     A
    5      2.2  False          0       g      0     B
    6      3.3   True          2       g      0     C
    7      4.4   True          1       g      0     D
    8      1.1  False          0       h      1     2
    9      2.2  False          0       h      1  None
    10     3.3   True          2       h      1     B
    11     4.4   True          1       h      1  None
    """
    if self.is_metaframe:
        return self
    if self.is_table:
        return self.reset_index(names=names)
    return self.melt(*args, ignore_index=False, **kwargs).reset_index(names=names)

to_dtale(idx=None, col=None, **kwargs)

Launches interactive D-Tale viewer for this DataFrame (must be table format).

Converts to table format first using to_table(), then calls dtale.show().

Parameters:

Name Type Description Default
idx str

MetaFrameRow level for table index. Defaults to None.

None
col str, optional)

MetaFrameCol level for table columns. Defaults to None.

None
**kwargs

Arguments passed to dtale.show().

{}

Returns:

Type Description
dtale.views.DtaleData: D-Tale instance with interactive DataFrame viewer.
Source code in metaframe/src/dataframe/io.py
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
def to_dtale(self, idx: str | None = None, col: str | None = None, **kwargs) -> 'dtale.views.DtaleData':  # type: ignore # noqa: F821
    """
    Launches interactive D-Tale viewer for this DataFrame (must be table format).

    Converts to table format first using `to_table()`, then calls `dtale.show()`.

    Parameters
    ----------
    idx : str, optional
        MetaFrameRow level for table index. Defaults to None.
    col : str, optional)
        MetaFrameCol level for table columns. Defaults to None.
    **kwargs: 
        Arguments passed to `dtale.show()`.

    Returns
    -------
        dtale.views.DtaleData: D-Tale instance with interactive DataFrame viewer.
    """
    import dtale
    return dtale.show(self.to_table(idx=idx, col=col), **kwargs)

to_numeric(invalid=None, errors='coerce')

Converts all DataFrame values to numeric types with custom NA handling.

Applies pd.to_numeric across columns, then replaces new NaNs with invalid value while preserving original NaN positions with DEFAULT_NA.

Parameters:

Name Type Description Default
invalid scalar

Replacement for coerced values.

None
errors ('raise', coerce, ignore)
'raise'

Returns:

Type Description
Self

Examples:

>>> from metaframe.testing import dataframe
>>> dataframe
strings             f  g     h
group               1  0     1
floats bool  group            
1.1    False 0      1  A     2
2.2    False 0      2  B  None
3.3    True  2      3  C     B
4.4    True  1      4  D  None
>>> dataframe.to_numeric(invalid='X')
strings             f  g    h
group               1  0    1
floats bool  group           
1.1    False 0      1  X  2.0
2.2    False 0      2  X  NaN
3.3    True  2      3  X    X
4.4    True  1      4  X  NaN
Source code in metaframe/src/dataframe/io.py
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
def to_numeric(self, invalid: int | float | None = None, errors: str = 'coerce') -> Self:
    """
    Converts all DataFrame values to numeric types with custom NA handling.

    Applies `pd.to_numeric` across columns, then replaces new NaNs with `invalid`
    value while preserving original NaN positions with `DEFAULT_NA`.

    Parameters
    ----------
    invalid : scalar, optional
        Replacement for coerced values.
    errors : {'raise', 'coerce', 'ignore'}, default 'coerce'

    Returns
    -------
    Self

    Examples
    --------

    >>> from metaframe.testing import dataframe
    >>> dataframe
    strings             f  g     h
    group               1  0     1
    floats bool  group            
    1.1    False 0      1  A     2
    2.2    False 0      2  B  None
    3.3    True  2      3  C     B
    4.4    True  1      4  D  None
    >>> dataframe.to_numeric(invalid='X')
    strings             f  g    h
    group               1  0    1
    floats bool  group           
    1.1    False 0      1  X  2.0
    2.2    False 0      2  X  NaN
    3.3    True  2      3  X    X
    4.4    True  1      4  X  NaN
    """
    na_mask = self.isna()
    df = self.apply(pd.to_numeric, errors=errors)
    if invalid is not None:
        df = df.fillna(invalid)
        df[na_mask] = DEFAULT_NA
    return df

to_int(start_at=0, axis=0, whole=False)

Converts DataFrame values to consecutive integers, deduplicating within scope.

Maps unique values to sequential integers starting at start_at. Use whole=True for global deduplication across entire DataFrame, or axis for per-column/row.

Parameters:

Name Type Description Default
start_at int
0
axis (0, 1)
0
whole bool

Apply mapping globally instead of per-axis.

False

Returns:

Type Description
Self

Examples:

>>> from metaframe.testing import metaframe_row
>>> metaframe_row
   floats   bool  group
0     1.1  False      0
1     2.2  False      0
2     3.3   True      2
3     4.4   True      1
>>> metaframe_row.to_int()
   floats  bool  group
0       0     0      0
1       1     0      0
2       2     1      1
3       3     1      2
>>> metaframe_row.to_int(axis=1)
   floats  bool  group
0       0     1      1
1       0     1      1
2       0     1      2
3       0     1      1
>>> metaframe_row.to_int(whole=True)
   floats  bool  group
0       1     0      0
1       2     0      0
2       3     4      5
3       6     4      1
Source code in metaframe/src/dataframe/io.py
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
def to_int(self, start_at: int = 0, axis: Literal[0, 1] = 0, whole: bool = False) -> Self:
    """
    Converts DataFrame values to consecutive integers, deduplicating within scope.

    Maps unique values to sequential integers starting at `start_at`. Use `whole=True`
    for global deduplication across entire DataFrame, or `axis` for per-column/row.

    Parameters
    ----------
    start_at : int, default 0
    axis : {0, 1}, default 0
    whole : bool, default False
        Apply mapping globally instead of per-axis.

    Returns
    -------
    Self

    Examples
    --------

    >>> from metaframe.testing import metaframe_row
    >>> metaframe_row
       floats   bool  group
    0     1.1  False      0
    1     2.2  False      0
    2     3.3   True      2
    3     4.4   True      1
    >>> metaframe_row.to_int()
       floats  bool  group
    0       0     0      0
    1       1     0      0
    2       2     1      1
    3       3     1      2
    >>> metaframe_row.to_int(axis=1)
       floats  bool  group
    0       0     1      1
    1       0     1      1
    2       0     1      2
    3       0     1      1
    >>> metaframe_row.to_int(whole=True)
       floats  bool  group
    0       1     0      0
    1       2     0      0
    2       3     4      5
    3       6     4      1
    """
    check_is(axis, is_axis)
    if whole:
        return self.replace({v: (i+start_at) for i, v in enumerate(set(self.values.flatten()))}).astype(int)
    return self.apply(lambda x: x.to_int(start_at=start_at), axis=axis)

to_file(output, **kwargs)

Writes DataFrame to file, with MetaFrame-aware handling.

Automatically detects file format (.csv, .tsv, .xlsx) and converts between table and MetaFrame representations. Excel extensive mode enables lossless round-tripping with dedicated MetaFrame sheets.

Parameters:

Name Type Description Default
output str or ExcelWriter

Destination file or writer.

required
**kwargs

Passed to format-specific writers.

{}

Raises:

Type Description
ValueError

For invalid formats or incompatible shapes.

NotImplementedError

For unsupported extensions.

Notes

For more informations on the parameters, see 'Excel Output' wiki page!

Examples:

>>> from metaframe.testing import dataframe
>>> dataframe.to_file('path/to/excel.file.xlsx')
Source code in metaframe/src/dataframe/io.py
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
def to_file(
    self,
    output: str | pd.ExcelWriter,
    **kwargs
) -> None:
    """
    Writes DataFrame to file, with MetaFrame-aware handling.

    Automatically detects file format (.csv, .tsv, .xlsx) and converts
    between table and MetaFrame representations. Excel ``extensive`` mode
    enables lossless round-tripping with dedicated MetaFrame sheets.

    Parameters
    ----------
    output : str or pandas.ExcelWriter
        Destination file or writer.
    **kwargs
        Passed to format-specific writers.

    Raises
    ------
    ValueError
        For invalid formats or incompatible shapes.
    NotImplementedError
        For unsupported extensions.

    Notes
    -----
    For more informations on the parameters, see 'Excel Output' wiki page!

    Examples
    --------
    >>> from metaframe.testing import dataframe
    >>> dataframe.to_file('path/to/excel.file.xlsx')
    """
    if isinstance(output, pd.ExcelWriter):
        file_ext = '.xlsx'
    else:
        directory, file_name = os_path.split(output)
        _, file_ext = os_path.splitext(file_name)
        directory = os_path.abspath(directory)
    if file_ext in [None, '']:
        file_ext = '.xlsx'
        output += file_ext
    if file_ext in ['.xlsx']:
        return self.to_file_excel(output=output, **kwargs)
    if file_ext in ['.csv', '.tsv', '.txt', '.lst', '.list']:
        if file_ext == '.tsv':
            return self.to_file_tsv(output, **kwargs)
        elif file_ext in ['.txt', '.lst', '.list'] and self.shape[1] != 1:
            raise ValueError(f"The {file_ext} extension require a 1-column DataFrame!")
        return self.to_file_csv(output, **kwargs)
    if file_ext == '.xlsm':
        raise ValueError("Pandas can not convert a dataframe to .xlsm file!")
    else:
        raise NotImplementedError(f"Extension {file_ext} is not supported (yet!).")

to_file_excel(*args, **kwargs)

Explicit Excel file export.

Parameters:

Name Type Description Default
args

ExcelExporter parameters.

()
kwargs

ExcelExporter parameters.

()
Notes

See to_file method and Excel export wiki page for more informations!

Source code in metaframe/src/dataframe/io.py
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
def to_file_excel(self, *args, **kwargs) -> None:
    """
    Explicit Excel file export.

    Parameters
    ----------
    args, kwargs:
        ExcelExporter parameters.

    Notes
    -----
    See `to_file` method and `Excel export` wiki page for more informations!
    """
    from metaframe.src.dataframe.exporter import ExcelExporter
    ExcelExporter(self, *args, **kwargs).export()

to_file_csv(*args, **kwargs)

Explicit CSV file export.

Parameters:

Name Type Description Default
args

TxtExporter parameters.

()
kwargs

TxtExporter parameters.

()
Notes

See to_file method and Txt export wiki page for more informations!

Source code in metaframe/src/dataframe/io.py
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
def to_file_csv(self, *args, **kwargs) -> None:
    """
    Explicit CSV file export.

    Parameters
    ----------
    args, kwargs:
        TxtExporter parameters.

    Notes
    -----
    See `to_file` method and `Txt export` wiki page for more informations!
    """
    from metaframe.src.dataframe.exporter import TxtExporter
    TxtExporter(self, *args, sep=',', **kwargs).export()

to_file_tsv(*args, **kwargs)

Explicit TSV file export.

Parameters:

Name Type Description Default
args

TxtExporter parameters.

()
kwargs

TxtExporter parameters.

()
Notes

See to_file method and Txt export wiki page for more informations!

Source code in metaframe/src/dataframe/io.py
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
def to_file_tsv(self, *args, **kwargs) -> None:
    """
    Explicit TSV file export.

    Parameters
    ----------
    args, kwargs:
        TxtExporter parameters.

    Notes
    -----
    See `to_file` method and `Txt export` wiki page for more informations!
    """
    from metaframe.src.dataframe.exporter import TxtExporter
    TxtExporter(self, *args, sep="\t", **kwargs).export()

to_summary(**kwargs)

Make a Summary representation of self.

Multiple summary types can be launched from a Summary: .row(...) .col(...) .whole(...) .basic(...) For more informations on the Summary parameters, see the Summary wiki page!

Parameters:

Name Type Description Default
kwargs

DataFrameSummaryOpts keyword arguments.

{}

Returns:

Type Description
Summary

a Summary representation of self

Examples:

>>> from metaframe import dataframe
>>> dataframe_summary = dataframe.to_summary()
>>> # dataframe_summary.row()
>>> # dataframe_summary.col()
>>> # dataframe_summary.whole()
>>> # dataframe_summary.basic()
Source code in metaframe/src/dataframe/io.py
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
def to_summary(self, **kwargs) -> 'Summary':  # type: ignore # noqa: F821
    """
    Make a Summary representation of self.

    Multiple summary types can be launched from a Summary:
    .row(...)
    .col(...)
    .whole(...)
    .basic(...)
    For more informations on the Summary parameters, see the ``Summary``
    wiki page!

    Parameters
    ----------
    kwargs
        DataFrameSummaryOpts keyword arguments.

    Returns
    -------
    Summary
        a Summary representation of self

    Examples
    --------
    >>> from metaframe import dataframe
    >>> dataframe_summary = dataframe.to_summary()
    >>> # dataframe_summary.row()
    >>> # dataframe_summary.col()
    >>> # dataframe_summary.whole()
    >>> # dataframe_summary.basic()
    """
    from metaframe.src.dataframe.summary.base import Summary
    return Summary(self, **kwargs)

to_frame()

Return a plain pandas DataFrame copy.

Returns:

Type Description
DataFrame

Examples:

>>> from metaframe.testing import dataframe
>>> type(dataframe)
<class 'metaframe.src.dataframe.base.DataFrame'>
>>> type(dataframe.to_frame())
<class 'pandas.DataFrame'>
Source code in metaframe/src/dataframe/core.py
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
def to_frame(self) -> pd.DataFrame:
    """
    Return a plain pandas DataFrame copy.

    Returns
    -------
    pandas.DataFrame

    Examples
    --------
    >>> from metaframe.testing import dataframe
    >>> type(dataframe)
    <class 'metaframe.src.dataframe.base.DataFrame'>
    >>> type(dataframe.to_frame())
    <class 'pandas.DataFrame'>
    """
    return pd.DataFrame(self)

mf(axis)

Returns DataFrame view of index (axis=0) or columns (axis=1).

Converts the index (rows) or columns into a table-format DataFrame using from_index. The return object is a DataFrame, not a _MetaFrame!

Parameters:

Name Type Description Default
axis (0, 1)

Axis to view. * 0: rows (index) * 1: columns

0

Returns:

Type Description
Self

DataFrame representation of the selected axis.

Examples:

>>> from metaframe.testing import dataframe
>>> dataframe
strings             f  g     h
group               1  0     1
floats bool  group            
1.1    False 0      1  A     2
2.2    False 0      2  B  None
3.3    True  2      3  C     B
4.4    True  1      4  D  None
>>> dataframe.mf(axis=0)
   floats   bool  group
0     1.1  False      0
1     2.2  False      0
2     3.3   True      2
3     4.4   True      1
>>> type(dataframe.mf(axis=0))
<class 'metaframe.src.dataframe.base.DataFrame'>
>>> dataframe.mf(axis=1)
  strings  group
0       f      1
1       g      0
2       h      1
>>> type(dataframe.mf(axis=1))
<class 'metaframe.src.dataframe.base.DataFrame'>
Source code in metaframe/src/dataframe/core.py
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
def mf(self, axis: Literal[0, 1]) -> Self:
    """
    Returns ``DataFrame`` view of index (axis=0) or columns (axis=1).

    Converts the index (rows) or columns into a table-format DataFrame using
    `from_index`.
    The return object is a ``DataFrame``, not a ``_MetaFrame``!

    Parameters
    ----------
    axis : {0, 1}
        Axis to view.
        * 0: rows (index)
        * 1: columns

    Returns
    -------
    Self
        DataFrame representation of the selected axis.

    Examples
    --------
    >>> from metaframe.testing import dataframe
    >>> dataframe
    strings             f  g     h
    group               1  0     1
    floats bool  group            
    1.1    False 0      1  A     2
    2.2    False 0      2  B  None
    3.3    True  2      3  C     B
    4.4    True  1      4  D  None
    >>> dataframe.mf(axis=0)
       floats   bool  group
    0     1.1  False      0
    1     2.2  False      0
    2     3.3   True      2
    3     4.4   True      1
    >>> type(dataframe.mf(axis=0))
    <class 'metaframe.src.dataframe.base.DataFrame'>
    >>> dataframe.mf(axis=1)
      strings  group
    0       f      1
    1       g      0
    2       h      1
    >>> type(dataframe.mf(axis=1))
    <class 'metaframe.src.dataframe.base.DataFrame'>
    """
    check_is(axis, is_axis)
    idx = self.index if axis == 0 else self.columns
    return self.from_index(idx)

merge(*args, indicator=False, only=False, conserve_index=False, **kwargs)

Add new parameters too pandas' merge method.

Parameters:

Name Type Description Default
only bool

whether to select only lines with indicator ending with '_only' Depreciated since the introduction of 'left_anti' and 'right_anti' 'how' parameters.

False
conserve_index bool

whether to conserve the index of left in the result

False
args

pandas' merge parameters

()
indicator

pandas' merge parameters

()
kwargs

pandas' merge parameters

()

Returns:

Type Description
Self

Examples:

>>> from metaframe.testing import metaframe_row, metaframe_col

The left DataFrame index can be lost with native pandas merge:

>>> metaframe_col.merge(metaframe_row[['floats', 'group']], how='outer')
strings  group  floats
0       g      0     1.1
1       g      0     2.2
2       f      1     4.4
3       h      1     4.4
4     NaN      2     3.3

Use conserve_index=True to preserve the left index in the result:

>>> metaframe_col.merge(metaframe_row[['floats', 'group']], how='outer', conserve_index=True)
    strings  group  floats
1.0       g      0     1.1
1.0       g      0     2.2
0.0       f      1     4.4
2.0       h      1     4.4
NaN     NaN      2     3.3
Source code in metaframe/src/dataframe/core.py
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
def merge(self, *args, indicator: bool=False, only: bool=False, conserve_index: bool=False, **kwargs) -> Self:
    """
    Add new parameters too pandas' merge method.

    Parameters
    ----------
    only: bool=False
        whether to select only lines with indicator ending with '_only'
        Depreciated since the introduction of 'left_anti' and 'right_anti'
        'how' parameters.
    conserve_index: bool=False
        whether to conserve the index of left in the result
    args, indicator, kwargs:
        pandas' merge parameters

    Returns
    -------
    Self

    Examples
    --------
    >>> from metaframe.testing import metaframe_row, metaframe_col

    The left DataFrame index can be lost with native pandas merge:

    >>> metaframe_col.merge(metaframe_row[['floats', 'group']], how='outer')
    strings  group  floats
    0       g      0     1.1
    1       g      0     2.2
    2       f      1     4.4
    3       h      1     4.4
    4     NaN      2     3.3

    Use ``conserve_index=True`` to preserve the left index in the result:

    >>> metaframe_col.merge(metaframe_row[['floats', 'group']], how='outer', conserve_index=True)
        strings  group  floats
    1.0       g      0     1.1
    1.0       g      0     2.2
    0.0       f      1     4.4
    2.0       h      1     4.4
    NaN     NaN      2     3.3
    """
    indicator_param = (indicator or only)
    if not conserve_index:
        result = super().merge(*args, indicator=indicator_param, **kwargs)
    else:
        self.insert(0, COLNAME_METAFRAME_UNIQUE_ID, self.index)
        try:
            result = pd.DataFrame.merge(self, *args, indicator=indicator_param, **kwargs).set_index(COLNAME_METAFRAME_UNIQUE_ID)
        finally:
            self.drop(COLNAME_METAFRAME_UNIQUE_ID, axis=1, inplace=True)
        result.index.name = self.index.name
    if only:
        result = result.loc[[e.endswith('_only') for e in result["_merge"]]]
        if not indicator:
            result = result.drop("_merge", axis=1)
    return result

from_index(idx) classmethod

Creates DataFrame from pandas Index or MultiIndex.

Converts index to DataFrame via to_frame(index=False). Handles empty index case explicitly.

Parameters:

Name Type Description Default
idx Index or MultiIndex
required

Returns:

Type Description
Self

Examples:

>>> import metaframe as mf
>>> from metaframe.testing import dataframe
>>> dataframe.index
MultiIndex([(1.1, False, 0),
    (2.2, False, 0),
    (3.3,  True, 2),
    (4.4,  True, 1)],
   names=['floats', 'bool', 'group'])
>>> mf.DataFrame.from_index(dataframe.index)
   floats   bool  group
0     1.1  False      0
1     2.2  False      0
2     3.3   True      2
3     4.4   True      1
Source code in metaframe/src/dataframe/core.py
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
@classmethod
def from_index(cls, idx: pd.Index | pd.MultiIndex) -> Self:
    """
    Creates DataFrame from pandas Index or MultiIndex.

    Converts index to DataFrame via `to_frame(index=False)`. Handles empty index
    case explicitly.

    Parameters
    ----------
    idx : Index or MultiIndex

    Returns
    -------
    Self

    Examples
    --------
    >>> import metaframe as mf
    >>> from metaframe.testing import dataframe
    >>> dataframe.index
    MultiIndex([(1.1, False, 0),
        (2.2, False, 0),
        (3.3,  True, 2),
        (4.4,  True, 1)],
       names=['floats', 'bool', 'group'])
    >>> mf.DataFrame.from_index(dataframe.index)
       floats   bool  group
    0     1.1  False      0
    1     2.2  False      0
    2     3.3   True      2
    3     4.4   True      1
    """
    if idx.names == [None] and idx.empty:
        df = pd.DataFrame()
    else:
        df = idx.to_frame(index=False)
    return cls(df)

_update_from_mf_obj(mf, axis, _ignore_id=False)

Updates the DataFrame in-place using a _MetaFrame object.

Reindexes the DataFrame along the specified axis based on the _MetaFrame. Updates the corresponding MetaFrameRow (axis=0) or MetaFrameCol (axis=1).

Parameters:

Name Type Description Default
mf _MetaFrame

MetaFrame object to use for updating the DataFrame.

required
axis Literal[0, 1]

Axis along which to update (0=row, 1=column).

required
_ignore_id bool

If True, skips unique ID consistency check. Defaults to False.

False

Raises:

Type Description
ValueError

If the MetaFrame is missing a unique index name.

ValueError

If the MetaFrame axis does not match the target axis.

ValueError

If the MetaFrame and DataFrame are misaligned (unless _ignore_id is True).

Notes

This will fail if there are any duplicated rows in the selected MetaFrame.

mfr or mfc attributes of the updated DataFrame are replaced with the MetaFrame's reset index converted to a DataFrame.

The update is applied in-place.

Source code in metaframe/src/dataframe/core.py
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
def _update_from_mf_obj(self, mf: '_MetaFrame', axis: Literal[0, 1], _ignore_id: bool=False) -> None: # type: ignore  # noqa: F821
    """
    Updates the DataFrame in-place using a ``_MetaFrame`` object.

    Reindexes the DataFrame along the specified axis based on the `_MetaFrame`.
    Updates the corresponding MetaFrameRow (axis=0) or MetaFrameCol (axis=1).

    Parameters
    ----------
    mf : _MetaFrame
        MetaFrame object to use for updating the DataFrame.
    axis : Literal[0, 1]
        Axis along which to update (0=row, 1=column).
    _ignore_id : bool, optional
        If True, skips unique ID consistency check.
        Defaults to False.

    Raises
    ------
    ValueError
        If the MetaFrame is missing a unique index name.
    ValueError
        If the MetaFrame axis does not match the target axis.
    ValueError
        If the MetaFrame and DataFrame are misaligned (unless `_ignore_id` is True).

    Notes
    -----
    This will fail if there are any duplicated rows in the selected MetaFrame.

    `mfr` or `mfc` attributes of the updated DataFrame are replaced with
    the MetaFrame's reset index converted to a DataFrame.

    The update is applied **in-place**.
    """
    if mf.is_broken:
        raise ValueError(BROKEN_METAFRAME.format(mf._break_source))
    if mf.axis != axis:
        raise ValueError(METAFRAME_AXIS_ERROR)
    idx_col = self.index if axis == 0 else self.columns
    if not _ignore_id and id(idx_col) != mf._id:
        raise ValueError(METAFRAME_DIVERGENCE_ERROR)
    idx_max = len(idx_col)
    res = self.reindex([idx_col[int(e)] if e==e and e < idx_max else e for e in mf.index], axis=axis)
    if axis == 0:
        res.mfr = mf.to_frame().reset_index(drop=True)
    else:
        res.mfc = mf.to_frame().reset_index(drop=True)
    super().__init__(res)