Core#
- class checkedframe._core.InterrogationResult(df: nwt.IntoDataFrame, mask: nwt.IntoDataFrame, is_good: nwt.IntoSeries, summary: nwt.IntoDataFrame)#
All DataFrames and Series are of the same type as the original input DataFrame, e.g. pandas in, pandas out.
- df#
The input DataFrame with sucessful transforms (casting) applied
- Type:
nwt.IntoDataFrame
- mask#
A boolean DataFrame in the same row order as the input DataFrame where each column is whether the specified check passed or not
- Type:
nwt.IntoDataFrame
- is_good#
A boolean Series in the same row order as the input DataFrame that indicates whether the row passed all checks or not
- Type:
nwt.IntoSeries
- summary#
A DataFrame of id, column, operation, n_failed, and pct_failed identified by id. Usually, column and operation are enough, but it is possible that the same operation is applied multiple times to the same column. column describes what column the check was attached to (and is set to “__dataframe__”) for frame-level checks. operation describes the check done to the column, e.g. “cast” or “check_length_lt_3”. n_failed and pct_failed are the number / percent of rows that fail the operation for that column.
- Type:
nwt.IntoDataFrame
- class checkedframe._core.Schema(expected_schema: Mapping[str, TypedColumn | CfUnion], checks: Iterable[Check] | None = None)#
A lightweight schema representing a DataFrame. Briefly, a schema consists of columns and their associated data types. In addition, the schema stores checks that can be run either on a specific column or the entire DataFrame. Since checkedframe leverages narwhals, any Narwhals-compatible DataFrame (Pandas, Polars, Modin, PyArrow, cuDF) is valid.
A Schema can be used in two ways. It can either be initialized directly from a dictionary or inherited from in a class. The class-based method should be preferred.
- Parameters:
expected_schema (dict[str, TypedColumn]) – A dictionary of column names and data types
checks (Optional[Sequence[Check]], optional) – A list of checks to run, by default None
- classmethod columns() list[str] #
Returns the column names of the schema.
- Return type:
list[str]
- classmethod filter(df: IntoDataFrameT) IntoDataFrameT #
Filter the given DataFrame to passing rows.
- Parameters:
df (nwt.IntoDataFrameT) – Any Narwhals-compatible DataFrame, see https://narwhals-dev.github.io/narwhals/ for more information
- Returns:
The input DataFrame filtered to passing rows
- Return type:
nwt.IntoDataFrameT
- classmethod interrogate(df: IntoDataFrameT) InterrogationResult #
Interrogate the DataFrame, returning the input DataFrame, a validation mask, a boolean Series indicating which rows pass, and a summary of passes / failures.
- Parameters:
df (nwt.IntoDataFrameT) – Any Narwhals-compatible DataFrame, see https://narwhals-dev.github.io/narwhals/ for more information
- Return type:
- classmethod validate(df: IntoDataFrameT) IntoDataFrameT #
Validate the given DataFrame.
- Parameters:
df (nwt.IntoDataFrameT) – Any Narwhals-compatible DataFrame, see https://narwhals-dev.github.io/narwhals/ for more information
- Returns:
Your original DataFrame
- Return type:
nwt.IntoDataFrameT
- Raises:
SchemaError – If validation fails
Examples
Let’s say we have a Polars DataFrame we want to validate. We have one column, a string, that should be 3 characters.
import polars as pl df = pl.DataFrame({"col1": ["abc", "ef"]})
Via inheritance:
import checkedframe as cf class MySchema(cf.Schema): col1 = cf.String() @cf.Check(columns="col1") def check_length(s: pl.Series) -> pl.Series: return s.str.len_bytes() == 3 MySchema.validate(df)
Via explicit construction:
import checkedframe as cf MySchema = cf.Schema({ "col1": cf.String( checks=[cf.Check(lambda s: s.str.len_bytes() == 3)] ) }) MySchema.validate(df)