Core#

class checkedframe._core.InterrogationResult(df: nwt.IntoDataFrame, mask: nwt.IntoDataFrame, is_good: nwt.IntoSeries, summary: nwt.IntoDataFrame)#

All DataFrames and Series are of the same type as the original input DataFrame, e.g. pandas in, pandas out.

df#

The input DataFrame with sucessful transforms (casting) applied

Type:

nwt.IntoDataFrame

mask#

A boolean DataFrame in the same row order as the input DataFrame where each column is whether the specified check passed or not

Type:

nwt.IntoDataFrame

is_good#

A boolean Series in the same row order as the input DataFrame that indicates whether the row passed all checks or not

Type:

nwt.IntoSeries

summary#

A DataFrame of id, column, operation, n_failed, and pct_failed identified by id. Usually, column and operation are enough, but it is possible that the same operation is applied multiple times to the same column. column describes what column the check was attached to (and is set to “__dataframe__”) for frame-level checks. operation describes the check done to the column, e.g. “cast” or “check_length_lt_3”. n_failed and pct_failed are the number / percent of rows that fail the operation for that column.

Type:

nwt.IntoDataFrame

class checkedframe._core.Schema(expected_schema: Mapping[str, TypedColumn | CfUnion], checks: Iterable[Check] | None = None)#

A lightweight schema representing a DataFrame. Briefly, a schema consists of columns and their associated data types. In addition, the schema stores checks that can be run either on a specific column or the entire DataFrame. Since checkedframe leverages narwhals, any Narwhals-compatible DataFrame (Pandas, Polars, Modin, PyArrow, cuDF) is valid.

A Schema can be used in two ways. It can either be initialized directly from a dictionary or inherited from in a class. The class-based method should be preferred.

Parameters:
  • expected_schema (dict[str, TypedColumn]) – A dictionary of column names and data types

  • checks (Optional[Sequence[Check]], optional) – A list of checks to run, by default None

classmethod columns() list[str]#

Returns the column names of the schema.

Return type:

list[str]

classmethod filter(df: IntoDataFrameT) IntoDataFrameT#

Filter the given DataFrame to passing rows.

Parameters:

df (nwt.IntoDataFrameT) – Any Narwhals-compatible DataFrame, see https://narwhals-dev.github.io/narwhals/ for more information

Returns:

The input DataFrame filtered to passing rows

Return type:

nwt.IntoDataFrameT

classmethod interrogate(df: IntoDataFrameT) InterrogationResult#

Interrogate the DataFrame, returning the input DataFrame, a validation mask, a boolean Series indicating which rows pass, and a summary of passes / failures.

Parameters:

df (nwt.IntoDataFrameT) – Any Narwhals-compatible DataFrame, see https://narwhals-dev.github.io/narwhals/ for more information

Return type:

InterrogationResult

classmethod validate(df: IntoDataFrameT) IntoDataFrameT#

Validate the given DataFrame.

Parameters:

df (nwt.IntoDataFrameT) – Any Narwhals-compatible DataFrame, see https://narwhals-dev.github.io/narwhals/ for more information

Returns:

Your original DataFrame

Return type:

nwt.IntoDataFrameT

Raises:

SchemaError – If validation fails

Examples

Let’s say we have a Polars DataFrame we want to validate. We have one column, a string, that should be 3 characters.

import polars as pl

df = pl.DataFrame({"col1": ["abc", "ef"]})

Via inheritance:

import checkedframe as cf

class MySchema(cf.Schema):
    col1 = cf.String()

    @cf.Check(columns="col1")
    def check_length(s: pl.Series) -> pl.Series:
        return s.str.len_bytes() == 3

MySchema.validate(df)

Via explicit construction:

import checkedframe as cf

MySchema = cf.Schema({
    "col1": cf.String(
        checks=[cf.Check(lambda s: s.str.len_bytes() == 3)]
    )
})

MySchema.validate(df)