Core#
- class checkedframe._core.InterrogationResult(df: 'nwt.IntoDataFrame', mask: 'nwt.IntoDataFrame', is_good: 'nwt.IntoSeries', summary: 'nwt.IntoDataFrame')#
- class checkedframe._core.Schema(expected_schema: dict[str, _TypedColumn], checks: list[Check] | None = None)#
A lightweight schema representing a DataFrame. Briefly, a schema consists of columns and their associated data types. In addition, the schema stores checks that can be run either on a specific column or the entire DataFrame. Since checkedframe leverages narwhals, any Narwhals-compatible DataFrame (Pandas, Polars, Modin, PyArrow, cuDF) is valid.
A Schema can be used in two ways. It can either be initialized directly from a dictionary or inherited from in a class.
- Parameters:
expected_schema (dict[str, Column]) – A dictionary of column names and data types
checks (Optional[Sequence[Check]], optional) – A list of checks to run, by default None
Examples
Let’s say we have a Polars DataFrame we want to validate. We have one column, a string, that should be 3 characters.
import polars as pl df = pl.DataFrame({"col1": ["abc", "ef"]})
Via inheritance:
import checkedframe as cf class MySchema(cf.Schema): col1 = cf.String() @cf.Check(columns="col1") def check_length(s: pl.Series) -> pl.Series: return s.str.len_bytes() == 3 MySchema.validate(df)
Via explicit construction:
import checkedframe as cf MySchema = cf.Schema({ "col1": cf.String( checks=[cf.Check(lambda s: s.str.len_bytes() == 3)] ) }) MySchema.validate(df)
- classmethod validate(df: IntoDataFrameT, cast: bool = False) IntoDataFrameT #
Validate the given DataFrame
- Parameters:
df (nwt.IntoDataFrameT) – Any Narwhals-compatible DataFrame, see https://narwhals-dev.github.io/narwhals/ for more information
cast (bool, optional) – Whether to cast columns, by default False
- Returns:
Your original DataFrame
- Return type:
nwt.IntoDataFrameT
- Raises:
SchemaError – If validation fails