Core#

class checkedframe._core.InterrogationResult(df: 'nwt.IntoDataFrame', mask: 'nwt.IntoDataFrame', is_good: 'nwt.IntoSeries', summary: 'nwt.IntoDataFrame')#
class checkedframe._core.Schema(expected_schema: dict[str, _TypedColumn], checks: list[Check] | None = None)#

A lightweight schema representing a DataFrame. Briefly, a schema consists of columns and their associated data types. In addition, the schema stores checks that can be run either on a specific column or the entire DataFrame. Since checkedframe leverages narwhals, any Narwhals-compatible DataFrame (Pandas, Polars, Modin, PyArrow, cuDF) is valid.

A Schema can be used in two ways. It can either be initialized directly from a dictionary or inherited from in a class.

Parameters:
  • expected_schema (dict[str, Column]) – A dictionary of column names and data types

  • checks (Optional[Sequence[Check]], optional) – A list of checks to run, by default None

Examples

Let’s say we have a Polars DataFrame we want to validate. We have one column, a string, that should be 3 characters.

import polars as pl

df = pl.DataFrame({"col1": ["abc", "ef"]})

Via inheritance:

import checkedframe as cf

class MySchema(cf.Schema):
    col1 = cf.String()

    @cf.Check(columns="col1")
    def check_length(s: pl.Series) -> pl.Series:
        return s.str.len_bytes() == 3

MySchema.validate(df)

Via explicit construction:

import checkedframe as cf

MySchema = cf.Schema({
    "col1": cf.String(
        checks=[cf.Check(lambda s: s.str.len_bytes() == 3)]
    )
})

MySchema.validate(df)
classmethod validate(df: IntoDataFrameT, cast: bool = False) IntoDataFrameT#

Validate the given DataFrame

Parameters:
Returns:

Your original DataFrame

Return type:

nwt.IntoDataFrameT

Raises:

SchemaError – If validation fails