Bootstrap

Classes:

Name	Description
`Bootstrap`	Computes a two-sided bootstrap confidence interval of a statistic. Note that
`BootstrappedConfusionMatrix`	Result object returned by `rapidstats.Bootstrap().confusion_matrix`.

`Bootstrap`

Computes a two-sided bootstrap confidence interval of a statistic. Note that \( \alpha \) is then defined as \( \frac{1 - \text{confidence}}{2} \). Regardless of method, the result will be a three-tuple of (lower, point, upper), where point is the point estimate. The process is as follows:

Resample 100% of the data with replacement for iterations
Compute the statistic on each resample

If the method is standard,

Compute the statistic on the original data \( \hat{\theta} \)
Compute the standard error of the bootstrap statistics. Note that the standard error of any statistic is defined as the standard deviation of its sampling distribution.
Compute the Z-score

\[ z_{\alpha} = \phi^{-1}(\alpha) \]

where \( \phi^{-1} \) is the quantile, inverse CDF, or percent-point function

Then the "Standard" or "First-Order Normal Approximation" interval is

\[ \hat{\theta} \pm z_{\alpha} \times \hat{\sigma} \]

If the method is percentile, we stop here and compute the interval of the bootstrap distribution that is symmetric about the median and contains confidence of the bootstrap statistics. Then the "Percentile" interval is

\[ [\text{percentile}(\hat{\theta}^{*}, \alpha), \text{percentile}(\hat{\theta}^{*}, 1 - \alpha)] \]

where \( \hat{\theta}^{*} \) is the vector of bootstrap statistics.

If the method is basic,

Compute the statistic on the original data
Compute the "Percentile" interval

Then the "Basic" or "Reverse Percentile" interval is

\[ [2\hat{\theta} - PCI_u, 2\hat{\theta} - PCI_l,] \]

where \( \hat{\theta} \) is the statistic on the original data, \( PCI_u \) is the upper bound of the "Percentile" interval, and \( PCI_l \) is the lower bound of the "Percentile" interval.

If the method is BCa,

Compute the statistic on the original data \( \hat{\theta} \)
Compute the statistic on the data with the \( i^{th} \) row deleted (jacknife)
Compute the bias correction factor as

\[ \hat{z_0} = \phi^{-1}( \frac{\sum_{i=1}^B \hat{\theta_i}^{*} \le \hat{\theta} + \sum_{i=1}^B \hat{\theta_i}^{*} \leq \hat{\theta}}{2 * B} ) \]

where \( \hat{\theta}^{*} \) is the vector of bootstrap statistics and \( B \) is the length of that vector.
Compute the acceleration factor as

\[ \hat{a} = \frac{1}{6} \frac{ \sum_{i=1}^{N} (\hat{\theta_{(.)}} - \hat{\theta_i})^3 }{ \sum_{i=1}^{N} [(\hat{\theta_{(.)}} - \hat{\theta_i})^2]^{1.5} } \]

where \( \hat{\theta_{(.)}} \) is the mean of the jacknife statistics and \( \hat{\theta_i} \) is the \( i^{th} \) element of the jacknife vector.
Compute the lower and upper percentiles as

\[ \alpha_l = \phi( \hat{z_0} + \frac{\hat{z_0} + z_{\alpha}}{1 - \hat{a}(\hat{z} + z_{\alpha})} ) \]

and

\[ \alpha_u = \phi( \hat{z_0} + \frac{ \hat{z_0} + z_{1 - \alpha} }{ 1 - \hat{a}(\hat{z} + z_{1-\alpha}) } ) \]

Then the "BCa" or "Bias-Corrected and Accelerated" interval is

\[ [\text{percentile}(\hat{\theta}^{*}, \alpha_l), \text{percentile}(\hat{\theta}^{*}, \alpha_u)] \]

where \( \hat{\theta}^{*} \) is the vector of bootstrap statistics.

Parameters:

Name	Type	Description	Default
`iterations`	`int`	How many times to resample the data, by default 1_000	`1000`
`confidence`	`float`	The confidence level, by default 0.95	`0.95`
`method`	`Literal['standard', 'percentile', 'basic', 'BCa']`	Whether to return the Percentile, Basic / Reverse Percentile, or Bias Corrected and Accelerated Interval, by default "percentile"	`'percentile'`
`sampling_method`	`Literal['poisson', 'multinomial']`	How to sample. If "multinomial", sample with replacement. If "poisson", simulate number of draws via a Poisson(1) distribution. Note that "poisson" is usually much more performant, especially since order is preserved, which allows certain functions to avoid sorting every iteration. However, "poisson" is still in a beta stage, by default "multinomial"	`'multinomial'`
`seed`	`Optional[int]`	Seed that controls resampling. Set this to any integer to make results reproducible, by default None	`None`
`n_jobs`	`Optional[int]`	How many threads to run with. None means let the executor decide, and 1 means run sequentially, by default None	`None`
`chunksize`	`Optional[int]`	The chunksize for each thread. None means let the executor decide, by default None	`None`

Raises:

Type	Description
`ValueError`	If the method is not one of `standard`, `percentile`, `basic`, or `BCa`
`ValueError`	If the sampling method is not one of `poisson` or `multinomial`

Examples:

import rapidstats as rs
ci = rs.Bootstrap(seed=208).mean([1, 1, 2, 3])

(1.0, 1.75, 2.5)

Methods:

Name	Description
`adverse_impact_ratio`	Bootstrap AIR. See rapidstats.metrics.adverse_impact_ratio for more details.
`adverse_impact_ratio_at_thresholds`	Bootstrap AIR at thresholds. See
`average_precision`	Bootstrap average precision. See rapidstats.metrics.average_precision for more
`brier_loss`	Bootstrap Brier loss. See rapidstats.metrics.brier_loss for more details.
`confusion_matrix`	Bootstrap confusion matrix. See rapidstats.metrics.confusion_matrix for
`confusion_matrix_at_thresholds`	Bootstrap confusion matrix at thresholds. See
`max_ks`	Bootstrap Max-KS. See rapidstats.metrics.max_ks for more details.
`mean`	Bootstrap mean.
`mean_squared_error`	Bootstrap MSE. See rapidstats.metrics.mean_squared_error for more details.
`r2`	Bootstrap R2. See rapidstats.metrics.r2 for more details.
`roc_auc`	Bootstrap ROC-AUC. See rapidstats.metrics.roc_auc for more details.
`root_mean_squared_error`	Bootstrap RMSE. See rapidstats.metrics.root_mean_squared_error for more details.
`run`	Run bootstrap for an arbitrary function that accepts a Polars DataFrame and

Source code in python/rapidstats/_bootstrap.py

class Bootstrap:
    r"""Computes a two-sided bootstrap confidence interval of a statistic. Note that
    \( \alpha \) is then defined as \( \frac{1 - \text{confidence}}{2} \). Regardless
    of method, the result will be a three-tuple of (lower, point, upper), where point is
    the point estimate. The process is as follows:

    - Resample 100% of the data with replacement for `iterations`
    - Compute the statistic on each resample

    If the method is `standard`,

    - Compute the statistic on the original data \( \hat{\theta} \)
    - Compute the standard error of the bootstrap statistics. Note that the standard
    error of any statistic is defined as the standard deviation of its sampling
    distribution.
    - Compute the Z-score

        \[ z_{\alpha} = \phi^{-1}(\alpha) \]

        where \( \phi^{-1} \) is the quantile, inverse CDF, or percent-point function

    Then the "Standard" or "First-Order Normal Approximation" interval is

    \[ \hat{\theta} \pm z_{\alpha} \times \hat{\sigma} \]

    If the method is `percentile`, we stop here and compute the interval of the
    bootstrap distribution that is symmetric about the median and contains
    `confidence` of the bootstrap statistics. Then the "Percentile" interval is

    \[
        [\text{percentile}(\hat{\theta}^{*}, \alpha),
        \text{percentile}(\hat{\theta}^{*}, 1 - \alpha)]
    \]

    where \( \hat{\theta}^{*} \) is the vector of bootstrap statistics.

    If the method is `basic`,

    - Compute the statistic on the original data
    - Compute the "Percentile" interval

    Then the "Basic" or "Reverse Percentile" interval is

    \[
        [2\hat{\theta} - PCI_u,
        2\hat{\theta} - PCI_l,]
    \]

    where \( \hat{\theta} \) is the statistic on the original data, \( PCI_u \) is the
    upper bound of the "Percentile" interval, and \( PCI_l \) is the lower bound of the
    "Percentile" interval.

    If the method is `BCa`,

    - Compute the statistic on the original data \( \hat{\theta} \)
    - Compute the statistic on the data with the \( i^{th} \) row deleted (jacknife)
    - Compute the bias correction factor as

        \[
            \hat{z_0} = \phi^{-1}(
                \frac{\sum_{i=1}^B \hat{\theta_i}^{*} \le \hat{\theta}
                + \sum_{i=1}^B \hat{\theta_i}^{*} \leq \hat{\theta}}{2 * B}
            )
        \]

        where \( \hat{\theta}^{*} \) is the vector of bootstrap statistics and \( B \)
        is the length of that vector.

    - Compute the acceleration factor as

        \[
            \hat{a} = \frac{1}{6} \frac{
                \sum_{i=1}^{N} (\hat{\theta_{(.)}} - \hat{\theta_i})^3
            }{
                \sum_{i=1}^{N} [(\hat{\theta_{(.)}} - \hat{\theta_i})^2]^{1.5}
            }
        \]

        where \( \hat{\theta_{(.)}} \) is the mean of the jacknife statistics and
        \( \hat{\theta_i} \) is the \( i^{th} \) element of the jacknife vector.

    - Compute the lower and upper percentiles as

        \[
            \alpha_l = \phi(
                \hat{z_0} + \frac{\hat{z_0} + z_{\alpha}}{1 - \hat{a}(\hat{z} + z_{\alpha})}
            )
        \]

        and

        \[
            \alpha_u = \phi(
                \hat{z_0} + \frac{
                    \hat{z_0} + z_{1 - \alpha}
                }{
                    1 - \hat{a}(\hat{z} + z_{1-\alpha})
                }
            )
        \]

    Then the "BCa" or "Bias-Corrected and Accelerated" interval is

    \[
        [\text{percentile}(\hat{\theta}^{*}, \alpha_l),
        \text{percentile}(\hat{\theta}^{*}, \alpha_u)]
    \]

    where \( \hat{\theta}^{*} \) is the vector of bootstrap statistics.

    Parameters
    ----------
    iterations : int, optional
        How many times to resample the data, by default 1_000
    confidence : float, optional
        The confidence level, by default 0.95
    method : Literal["standard", "percentile", "basic", "BCa"], optional
        Whether to return the Percentile, Basic / Reverse Percentile, or
        Bias Corrected and Accelerated Interval, by default "percentile"
    sampling_method: Literal["poisson", "multinomial"], optional
        How to sample. If "multinomial", sample with replacement. If "poisson", simulate
        number of draws via a Poisson(1) distribution. Note that "poisson" is usually
        much more performant, especially since order is preserved, which allows certain
        functions to avoid sorting every iteration. However, "poisson" is still in a
        beta stage, by default "multinomial"
    seed : Optional[int], optional
        Seed that controls resampling. Set this to any integer to make results
        reproducible, by default None
    n_jobs: Optional[int], optional
        How many threads to run with. None means let the executor decide, and 1 means
        run sequentially, by default None
    chunksize: Optional[int], optional
        The chunksize for each thread. None means let the executor decide, by default
        None

    Raises
    ------
    ValueError
        If the method is not one of `standard`, `percentile`, `basic`, or `BCa`
    ValueError
        If the sampling method is not one of `poisson` or `multinomial`

    Examples
    --------
    ``` py
    import rapidstats as rs
    ci = rs.Bootstrap(seed=208).mean([1, 1, 2, 3])
    ```
    (1.0, 1.75, 2.5)
    """

    def __init__(
        self,
        iterations: int = 1_000,
        confidence: float = 0.95,
        method: Literal["standard", "percentile", "basic", "BCa"] = "percentile",
        sampling_method: Literal["poisson", "multinomial"] = "multinomial",
        seed: Optional[int] = None,
        n_jobs: Optional[int] = None,
        chunksize: Optional[int] = None,
    ) -> None:
        if method not in ("standard", "percentile", "basic", "BCa"):
            raise ValueError(
                f"Invalid confidence interval method `{method}`, only `standard`, `percentile`, `basic`, and `BCa` are supported",
            )

        if sampling_method not in ("poisson", "multinomial"):
            raise ValueError(
                f"Invalid sampling method `{sampling_method}`, only `poisson` and `multinomial` are supported"
            )

        self.iterations = iterations
        self.confidence = confidence
        self.seed = seed
        self.alpha = (1 - confidence) / 2
        self.method = method
        self.sampling_method = sampling_method
        self.n_jobs = n_jobs
        self.chunksize = chunksize

        self._params = {
            "iterations": self.iterations,
            "alpha": self.alpha,
            "method": self.method,
            "seed": self.seed,
            "n_jobs": self.n_jobs,
            "chunksize": self.chunksize,
            "poisson": self.sampling_method == "poisson",
        }

    def run(
        self, df: pl.DataFrame, stat_func: StatFunc, **kwargs
    ) -> ConfidenceInterval:
        """Run bootstrap for an arbitrary function that accepts a Polars DataFrame and
        returns a scalar real number.

        Parameters
        ----------
        df : pl.DataFrame
            The data to pass to `stat_func`
        stat_func : StatFunc
            A callable that takes a Polars DataFrame as its first argument and returns
            a scalar real number.

        Returns
        -------
        ConfidenceInterval
            A tuple of (lower, point, upper)

        Added in version 0.1.0
        ----------------------
        """
        default = {"executor": "threads", "preserve_order": False}
        for k, v in default.items():
            if k not in kwargs:
                kwargs[k] = v

        if self._params["poisson"]:
            func = functools.partial(
                _poisson_bs_func, df=df, df_height=df.height, stat_func=stat_func
            )
        else:
            func = functools.partial(_bs_func, df=df, stat_func=stat_func)

        if self.seed is None:
            iterable = (None for _ in range(self.iterations))
        else:
            iterable = (self.seed + i for i in range(self.iterations))

        bootstrap_stats = [
            x for x in _run_concurrent(func, iterable, **kwargs) if not math.isnan(x)
        ]

        original_stat = stat_func(df)

        if len(bootstrap_stats) == 0:
            return (math.nan, math.nan, math.nan)

        if self.method == "standard":
            return _standard_interval(original_stat, bootstrap_stats, self.alpha)
        elif self.method == "percentile":
            return _percentile_interval(original_stat, bootstrap_stats, self.alpha)
        elif self.method == "basic":
            return _basic_interval(original_stat, bootstrap_stats, self.alpha)
        elif self.method == "BCa":
            jacknife_stats = [x for x in _jacknife(df, stat_func) if not math.isnan(x)]

            return _bca_interval(
                original_stat, bootstrap_stats, jacknife_stats, self.alpha
            )
        else:
            # We shouldn't hit this since we check method in __init__, but it makes the
            # type-checker happy
            raise ValueError("Invalid method")

    def confusion_matrix(
        self,
        y_true: ArrayLike,
        y_pred: ArrayLike,
        beta: float = 1.0,
        sample_weight: Optional[ArrayLike] = None,
    ) -> BootstrappedConfusionMatrix:
        r"""Bootstrap confusion matrix. See [rapidstats.metrics.confusion_matrix][] for
        more details.

        Parameters
        ----------
        y_true : ArrayLike
            Ground truth target
        y_pred : ArrayLike
            Predicted target
        beta : float, optional
            \( \beta \) to use in \( F_\beta \), by default 1
        sample_weight: Optional[ArrayLike], optional
            Sample weights, set to 1 if None

            !!! Version
                Added 0.2.0

        Returns
        -------
        BootstrappedConfusionMatrix
            A dataclass of confusion matrix metrics as (lower, point, upper). See
            [rapidstats._bootstrap.BootstrappedConfusionMatrix][] for more details.

        Added in version 0.1.0
        ----------------------
        """
        df = _y_true_y_pred_to_df(y_true, y_pred, sample_weight).with_columns(
            pl.col("y_true").cast(pl.UInt8)
        )

        return BootstrappedConfusionMatrix(
            *_bootstrap_confusion_matrix(df, beta, **self._params)
        )

    def confusion_matrix_at_thresholds(
        self,
        y_true: ArrayLike,
        y_score: ArrayLike,
        thresholds: Optional[list[float]] = None,
        metrics: Iterable[ConfusionMatrixMetric] = DefaultConfusionMatrixMetrics,
        strategy: LoopStrategy = "auto",
        beta: float = 1.0,
        sample_weight: Optional[ArrayLike] = None,
    ) -> pl.DataFrame:
        r"""Bootstrap confusion matrix at thresholds. See
        [rapidstats.metrics.confusion_matrix_at_thresholds][] for more details.

        Parameters
        ----------
        y_true : ArrayLike
            Ground truth target
        y_score : ArrayLike
            Predicted scores
        thresholds : Optional[list[float]], optional
            The thresholds to compute `y_pred` at, i.e. y_score >= t. If None,
            uses every score present in `y_score`, by default None
        metrics : Iterable[ConfusionMatrixMetric], optional
            The metrics to compute, by default DefaultConfusionMatrixMetrics
        strategy : LoopStrategy, optional
            Computation method, by default "auto"
        beta : float, optional
            \( \beta \) to use in \( F_\beta \), by default 1
        sample_weight: Optional[ArrayLike], optional
            Sample weights, set to 1 if None

            !!! Version
                Added 0.2.0

        Returns
        -------
        pl.DataFrame
            A DataFrame of `threshold`, `metric`, `lower`, `mean`, and `upper`

        Raises
        ------
        NotImplementedError
            When `strategy` is `cum_sum` and `method` is `BCa`

        Added in version 0.1.0
        ----------------------
        """
        df = (
            _y_true_y_score_to_df(y_true, y_score, sample_weight)
            .rename({"y_score": "threshold"})
            .sort("threshold", descending=True)
        )
        final_cols = ["threshold", "metric", "lower", "point", "upper"]

        strategy = _set_loop_strategy(thresholds, strategy)

        if strategy == "loop":
            cms: list[pl.DataFrame] = []
            for t in tqdm(set(thresholds or y_score)):
                cm = (
                    self.confusion_matrix(
                        df["y_true"],
                        df["threshold"].ge(t),
                        beta=beta,
                        sample_weight=df["sample_weight"],
                    )
                    .to_polars()
                    .with_columns(pl.lit(t).alias("threshold"))
                )
                cms.append(cm)

            return pl.concat(cms, how="vertical").with_columns(
                pl.col("lower", "point", "upper").fill_nan(None)
            )
        elif strategy == "cum_sum":
            if thresholds is None:
                thresholds = df["threshold"].unique()

            if self._params["poisson"]:
                _matrix_func = _base_confusion_matrix_at_thresholds_sorted
                _sample_func = functools.partial(_poisson_sample, df_height=df.height)
                df = df.lazy()
            else:
                _matrix_func = _base_confusion_matrix_at_thresholds
                _sample_func = _multinomial_sample

            def _cm_inner(pf: PolarsFrame) -> pl.LazyFrame:
                return (
                    pf.lazy()
                    .pipe(_matrix_func)
                    .pipe(_full_confusion_matrix_from_base, beta=beta)
                    .unique("threshold")
                    .pipe(_map_to_thresholds, thresholds)
                    .drop("_threshold_actual")
                )

            def _cm(i: int) -> pl.LazyFrame:
                sample_df = _sample_func(df, seed=i)

                return _cm_inner(sample_df)

            cms: list[pl.LazyFrame] = _run_concurrent(
                _cm,
                (
                    (self.seed + i for i in range(self.iterations))
                    if self.seed is not None
                    else (None for _ in range(self.iterations))
                ),
            )

            def _process_results(lf: pl.LazyFrame) -> pl.LazyFrame:
                return (
                    lf.select("threshold", *metrics)
                    .unpivot(index="threshold")
                    .rename({"variable": "metric"})
                )

            bootstrap_lf = pl.concat(cms, how="vertical").pipe(_process_results)

            lf = bootstrap_lf.group_by("threshold", "metric")

            original = (
                _cm_inner(df)
                .select("threshold", *metrics)
                .pipe(_map_to_thresholds, thresholds)
                .unpivot(index="threshold")
                .rename({"variable": "metric", "value": "point"})
            )

            if self.method == "standard":
                return (
                    _standard_interval_polars(lf, self.alpha)
                    .join(
                        original,
                        on=["threshold", "metric"],
                        how="left",
                        validate="1:1",
                    )
                    .select(final_cols)
                    .collect()
                )
            elif self.method == "percentile":
                return (
                    _percentile_interval_polars(lf, self.alpha)
                    .join(
                        original,
                        on=["threshold", "metric"],
                        how="left",
                        validate="1:1",
                    )
                    .select(final_cols)
                    .collect()
                )
            elif self.method == "basic":
                return (
                    _percentile_interval_polars(lf, self.alpha)
                    .join(
                        original,
                        on=["threshold", "metric"],
                        how="left",
                        validate="1:1",
                    )
                    .pipe(_basic_interval_polars)
                    .select(final_cols)
                    .collect()
                )
            elif self.method == "BCa":
                raise NotImplementedError(
                    "Method `BCa` not implemented for strategy `cum_sum` due to https://github.com/pola-rs/polars/issues/20951"
                )
                original_lf = (
                    _cm_inner(df)
                    .select("threshold", *metrics)
                    .pipe(_map_to_thresholds, thresholds)
                    .unpivot(index="threshold")
                    .rename({"variable": "metric", "value": "original_value"})
                )
                jacknife_lf = pl.concat(_jacknife(df, _cm_inner), how="vertical").pipe(
                    _process_results
                )

                return (
                    _bca_interval_polars(
                        original_lf,
                        bootstrap_lf=bootstrap_lf,
                        jacknife_lf=jacknife_lf,
                        alpha=self.alpha,
                        by=["threshold", "metric"],
                    )
                    .select(final_cols)
                    .collect()
                )
            else:
                raise ValueError()

    def roc_auc(
        self,
        y_true: ArrayLike,
        y_score: ArrayLike,
        sample_weight: Optional[ArrayLike] = None,
    ) -> ConfidenceInterval:
        """Bootstrap ROC-AUC. See [rapidstats.metrics.roc_auc][] for more details.

        Parameters
        ----------
        y_true : ArrayLike
            Ground truth target
        y_score : ArrayLike
            Predicted scores
        sample_weight: Optional[ArrayLike], optional
            Sample weights, set to 1 if None

            !!! Version
                Added 0.2.0

        Returns
        -------
        ConfidenceInterval
            A tuple of (lower, point, upper)

        Changelog
        ---------
        - Added in version 0.1.0
        - Returns point estimate instead of mean starting version 0.3.0
        """
        df = _y_true_y_score_to_df(y_true, y_score, sample_weight).with_columns(
            pl.col("y_true").cast(pl.Float64)
        )

        if self._params["poisson"]:
            df = df.sort("y_score")
            _f = _bootstrap_roc_auc_sorted
        else:
            _f = _bootstrap_roc_auc

        return _f(df, **self._params)

    def average_precision(
        self,
        y_true: ArrayLike,
        y_score: ArrayLike,
        sample_weight: Optional[ArrayLike] = None,
    ) -> ConfidenceInterval:
        """Bootstrap average precision. See [rapidstats.metrics.average_precision][] for more
        details.

        Parameters
        ----------
        y_true : ArrayLike
            Ground truth target
        y_score : ArrayLike
            Predicted scores
        sample_weight: Optional[ArrayLike], optional
            Sample weights, set to 1 if None

            !!! Version
                Added 0.2.0

        Returns
        -------
        ConfidenceInterval
            A tuple of (lower, point, upper)

        Changelog
        ----------------------
        - Added in version 0.1.0
        - Returns point estimate instead of mean starting version 0.3.0
        """
        df = (
            _y_true_y_score_to_df(y_true, y_score, sample_weight)
            .rename({"y_score": "threshold"})
            .drop_nulls()
        )

        def _cm_inner(pf: PolarsFrame) -> pl.LazyFrame:
            return (
                pf.lazy()
                .pipe(_base_confusion_matrix_at_thresholds)
                .pipe(_full_confusion_matrix_from_base)
                .select("threshold", "precision", "tpr")
            )

        def _cm(i: int) -> pl.LazyFrame:
            sample_df = df.sample(fraction=1, with_replacement=True, seed=i)

            return _cm_inner(sample_df)

        cms: list[pl.LazyFrame] = _run_concurrent(
            _cm,
            (
                (self.seed + i for i in range(self.iterations))
                if self.seed is not None
                else (None for _ in range(self.iterations))
            ),
        )

        cms = [
            cm.with_columns(pl.lit(i).alias("iteration")) for i, cm in enumerate(cms)
        ]

        bootstrap_stats = (
            pl.concat(cms, how="vertical")
            .sort("threshold")
            .group_by("iteration", maintain_order=True)
            .agg(
                _ap_from_pr_curve(pl.col("precision"), pl.col("tpr")).alias(
                    "average_precision"
                )
            )
            .collect()["average_precision"]
            .to_list()
        )

        original_stat = _ap(y_true, y_score)

        if self.method == "standard":
            return _standard_interval(original_stat, bootstrap_stats, self.alpha)
        elif self.method == "percentile":
            return _percentile_interval(original_stat, bootstrap_stats, self.alpha)
        elif self.method == "basic":
            return _basic_interval(original_stat, bootstrap_stats, self.alpha)
        elif self.method == "BCa":

            def _cm_jacknife(i):
                j_df = df.filter(pl.col("index").ne(i))

                return _cm_inner(j_df).with_columns(pl.lit(i).alias("iteration"))

            df = df.with_row_index("index")
            cms = _run_concurrent(_cm_jacknife, range(df.height))
            jacknife_stats = (
                pl.concat(cms, how="vertical")
                .sort("threshold")
                .group_by("iteration", maintain_order=True)
                .agg(
                    _ap_from_pr_curve(pl.col("precision"), pl.col("tpr")).alias(
                        "average_precision"
                    )
                )
                .collect()["average_precision"]
                .to_list()
            )

            return _bca_interval(
                original_stat, bootstrap_stats, jacknife_stats, self.alpha
            )

    def max_ks(self, y_true: ArrayLike, y_score: ArrayLike) -> ConfidenceInterval:
        """Bootstrap Max-KS. See [rapidstats.metrics.max_ks][] for more details.

        Parameters
        ----------
        y_true : ArrayLike
            Ground truth target
        y_score : ArrayLike
            Predicted scores

        Returns
        -------
        ConfidenceInterval
            A tuple of (lower, point, upper)

        Changelog
        ----------------------
        - Added in version 0.1.0
        - Returns point estimate instead of mean starting version 0.3.0
        """
        df = _y_true_y_score_to_df(y_true, y_score)

        return _bootstrap_max_ks(df, **self._params)

    def brier_loss(self, y_true: ArrayLike, y_score: ArrayLike) -> ConfidenceInterval:
        """Bootstrap Brier loss. See [rapidstats.metrics.brier_loss][] for more details.

        Parameters
        ----------
        y_true : ArrayLike
            Ground truth target
        y_score : ArrayLike
            Predicted scores

        Returns
        -------
        ConfidenceInterval
            A tuple of (lower, point, upper)
        """
        df = _y_true_y_score_to_df(y_true, y_score)

        return _bootstrap_brier_loss(df, **self._params)

    def mean(self, y: ArrayLike) -> ConfidenceInterval:
        """Bootstrap mean.

        Parameters
        ----------
        y : ArrayLike
            A 1D-array

        Returns
        -------
        ConfidenceInterval
            A tuple of (lower, point, upper)

        Added in version 0.1.0
        ----------------------
        """
        df = pl.DataFrame({"y": y})

        return _bootstrap_mean(df, **self._params)

    def adverse_impact_ratio(
        self,
        y_pred: ArrayLike,
        protected: ArrayLike,
        control: ArrayLike,
        sample_weight: Optional[ArrayLike] = None,
    ) -> ConfidenceInterval:
        """Bootstrap AIR. See [rapidstats.metrics.adverse_impact_ratio][] for more details.

        Parameters
        ----------
        y_pred : ArrayLike
            Predicted target
        protected : ArrayLike
            An array of booleans identifying the protected class
        control : ArrayLike
            An array of booleans identifying the control class
        sample_weight: Optional[ArrayLike], optional
            Sample weights, set to 1 if None

            !!! Version
                Added 0.2.0

        Returns
        -------
        ConfidenceInterval
            A tuple of (lower, point, upper)

        Changelog
        ----------------------
        - Added in version 0.1.0
        """
        df = (
            pl.DataFrame(
                {
                    "y_pred": y_pred,
                    "protected": protected,
                    "control": control,
                    "sample_weight": 1.0 if sample_weight is None else sample_weight,
                }
            )
            .with_columns(pl.col("y_pred", "protected", "control").cast(pl.Boolean))
            .with_columns(pl.col("y_pred").cast(pl.Float64))
        )

        return _bootstrap_adverse_impact_ratio(df, **self._params)

    def adverse_impact_ratio_at_thresholds(
        self,
        y_score: ArrayLike,
        protected: ArrayLike,
        control: ArrayLike,
        sample_weight: Optional[ArrayLike] = None,
        thresholds: Optional[list[float]] = None,
        strategy: LoopStrategy = "auto",
    ) -> pl.DataFrame:
        """Bootstrap AIR at thresholds. See
        [rapidstats.metrics.adverse_impact_ratio_at_thresholds][] for more details.

        Parameters
        ----------
        y_score : ArrayLike
            Predicted scores
        protected : ArrayLike
            An array of booleans identifying the protected class
        control : ArrayLike
            An array of booleans identifying the control class
        sample_weight: Optional[ArrayLike], optional
            Sample weights, set to 1 if None

            !!! Version
                Added 0.2.0
        thresholds : Optional[list[float]], optional
            The thresholds to compute `is_predicted_negative` at, i.e. y_score < t.
            If None, uses every score present in `y_score`, by default None
        strategy : LoopStrategy, optional
            Computation method, by default "auto"

        Returns
        -------
        pl.DataFrame
            A DataFrame of `threshold`, `lower`, `mean`, and `upper`

        Raises
        ------
        NotImplementedError
            When `strategy` is `cum_sum` and `method` is `BCa`
        """
        has_sample_weight = sample_weight is not None
        df = pl.DataFrame(
            {"y_score": y_score, "protected": protected, "control": control}
        ).with_columns(
            pl.col("protected", "control").cast(pl.Boolean),
            pl.col("y_score").cast(pl.Float64),
        )

        if has_sample_weight:
            df = df.with_columns(
                pl.Series("sample_weight", sample_weight).cast(pl.Float64)
            )

        strategy = _set_loop_strategy(thresholds, strategy)

        if strategy == "loop":
            airs: list[dict[str, float]] = []
            for t in tqdm(set(thresholds or y_score)):
                lower, point, upper = self.adverse_impact_ratio(
                    df["y_score"].lt(t),
                    df["protected"],
                    df["control"],
                    sample_weight=sample_weight,
                )
                airs.append(
                    {"threshold": t, "lower": lower, "point": point, "upper": upper}
                )

            return pl.DataFrame(airs).fill_nan(None).pipe(_fill_infinite, None)

        elif strategy == "cum_sum":
            if thresholds is None:
                thresholds = df["y_score"]

            if self._params["poisson"]:
                _air_func = _air_at_thresholds_core_sorted
                _sample_func = functools.partial(_poisson_sample, df_height=df.height)
                df = df.lazy()
            else:
                _air_func = _air_at_thresholds_core
                _sample_func = _multinomial_sample

            def _air(i: int) -> pl.LazyFrame:
                sample_df = _sample_func(df, seed=i)

                return _air_func(sample_df, thresholds, has_sample_weight)

            airs: list[pl.LazyFrame] = _run_concurrent(
                _air,
                (
                    (self.seed + i for i in range(self.iterations))
                    if self.seed is not None
                    else (None for _ in range(self.iterations))
                ),
            )
            bootstrap_lf = (
                pl.concat(airs, how="vertical")
                .rename({"air": "value"})
                .with_columns(
                    _expr_fill_infinite(pl.col("value").fill_nan(None)).alias("value")
                )
            )

            lf = bootstrap_lf.group_by("threshold")

            final_cols = ["threshold", "lower", "point", "upper"]

            original = (
                _air_at_thresholds_core(df, thresholds, has_sample_weight)
                .rename({"air": "point"})
                .unique("threshold")
            )

            if self.method == "standard":
                return (
                    _standard_interval_polars(lf, self.alpha)
                    .join(original, on="threshold", how="left", validate="1:1")
                    .select(final_cols)
                    .collect()
                )
            elif self.method == "percentile":
                return (
                    _percentile_interval_polars(lf, self.alpha)
                    .join(original, on="threshold", how="left", validate="1:1")
                    .select(final_cols)
                    .collect()
                )
            elif self.method == "basic":
                return (
                    _percentile_interval_polars(lf, self.alpha)
                    .join(original, on="threshold", how="left", validate="1:1")
                    .pipe(_basic_interval_polars)
                    .select(final_cols)
                    .collect()
                )
            elif self.method == "BCa":
                raise NotImplementedError(
                    "Method `BCa` not implemented for strategy `cum_sum` due to https://github.com/pola-rs/polars/issues/20951"
                )
                original_lf = (
                    _air_at_thresholds_core(df, thresholds, has_sample_weight)
                    .rename({"air": "original_value"})
                    .unique("threshold")
                )

                tmp = functools.partial(
                    _air_at_thresholds_core,
                    thresholds=thresholds,
                    has_sample_weight=has_sample_weight,
                )
                jacknife_lf = (
                    pl.concat(_jacknife(df, tmp), how="vertical")
                    .rename({"air": "value"})
                    .unique("threshold")
                )

                return (
                    _bca_interval_polars(
                        original_lf,
                        bootstrap_lf=bootstrap_lf.rename({"air": "value"}),
                        jacknife_lf=jacknife_lf,
                        alpha=self.alpha,
                        by=["threshold"],
                    )
                    .select(final_cols)
                    .collect()
                )

    def mean_squared_error(
        self, y_true: ArrayLike, y_score: ArrayLike
    ) -> ConfidenceInterval:
        r"""Bootstrap MSE. See [rapidstats.metrics.mean_squared_error][] for more details.

        Parameters
        ----------
        y_true : ArrayLike
            Ground truth target
        y_score : ArrayLike
            Predicted scores

        Returns
        -------
        ConfidenceInterval
            A tuple of (lower, point, upper)

        Added in version 0.1.0
        ----------------------
        """
        return _bootstrap_mean_squared_error(
            _regression_to_df(y_true, y_score), **self._params
        )

    def root_mean_squared_error(
        self, y_true: ArrayLike, y_score: ArrayLike
    ) -> ConfidenceInterval:
        r"""Bootstrap RMSE. See [rapidstats.metrics.root_mean_squared_error][] for more details.

        Parameters
        ----------
        y_true : ArrayLike
            Ground truth target
        y_score : ArrayLike
            Predicted scores

        Returns
        -------
        ConfidenceInterval
            A tuple of (lower, point, upper)

        Added in version 0.1.0
        ----------------------
        """
        return _bootstrap_root_mean_squared_error(
            _regression_to_df(y_true, y_score), **self._params
        )

    def r2(self, y_true: ArrayLike, y_score: ArrayLike) -> ConfidenceInterval:
        """Bootstrap R2. See [rapidstats.metrics.r2][] for more details.

        Parameters
        ----------
        y_true : ArrayLike
            Ground truth target
        y_score : ArrayLike
            Predicted scores

        Returns
        -------
        ConfidenceInterval
            A tuple of (lower, point, upper)

        Added in version 0.1.0
        ----------------------
        """
        return _bootstrap_r2(_regression_to_df(y_true, y_score), **self._params)

`adverse_impact_ratio(y_pred, protected, control, sample_weight=None)`

Bootstrap AIR. See rapidstats.metrics.adverse_impact_ratio for more details.

Parameters:

Name	Type	Description	Default
`y_pred`	`ArrayLike`	Predicted target	required
`protected`	`ArrayLike`	An array of booleans identifying the protected class	required
`control`	`ArrayLike`	An array of booleans identifying the control class	required
`sample_weight`	`Optional[ArrayLike]`	Sample weights, set to 1 if None Version Added 0.2.0	`None`

Returns:

Type	Description
`ConfidenceInterval`	A tuple of (lower, point, upper)

Changelog

Added in version 0.1.0

Source code in python/rapidstats/_bootstrap.py

def adverse_impact_ratio(
    self,
    y_pred: ArrayLike,
    protected: ArrayLike,
    control: ArrayLike,
    sample_weight: Optional[ArrayLike] = None,
) -> ConfidenceInterval:
    """Bootstrap AIR. See [rapidstats.metrics.adverse_impact_ratio][] for more details.

    Parameters
    ----------
    y_pred : ArrayLike
        Predicted target
    protected : ArrayLike
        An array of booleans identifying the protected class
    control : ArrayLike
        An array of booleans identifying the control class
    sample_weight: Optional[ArrayLike], optional
        Sample weights, set to 1 if None

        !!! Version
            Added 0.2.0

    Returns
    -------
    ConfidenceInterval
        A tuple of (lower, point, upper)

    Changelog
    ----------------------
    - Added in version 0.1.0
    """
    df = (
        pl.DataFrame(
            {
                "y_pred": y_pred,
                "protected": protected,
                "control": control,
                "sample_weight": 1.0 if sample_weight is None else sample_weight,
            }
        )
        .with_columns(pl.col("y_pred", "protected", "control").cast(pl.Boolean))
        .with_columns(pl.col("y_pred").cast(pl.Float64))
    )

    return _bootstrap_adverse_impact_ratio(df, **self._params)

`adverse_impact_ratio_at_thresholds(y_score, protected, control, sample_weight=None, thresholds=None, strategy='auto')`

Bootstrap AIR at thresholds. See rapidstats.metrics.adverse_impact_ratio_at_thresholds for more details.

Parameters:

Name	Type	Description	Default
`y_score`	`ArrayLike`	Predicted scores	required
`protected`	`ArrayLike`	An array of booleans identifying the protected class	required
`control`	`ArrayLike`	An array of booleans identifying the control class	required
`sample_weight`	`Optional[ArrayLike]`	Sample weights, set to 1 if None Version Added 0.2.0	`None`
`thresholds`	`Optional[list[float]]`	The thresholds to compute `is_predicted_negative` at, i.e. y_score < t. If None, uses every score present in `y_score`, by default None	`None`
`strategy`	`LoopStrategy`	Computation method, by default "auto"	`'auto'`

Returns:

Type	Description
`DataFrame`	A DataFrame of `threshold`, `lower`, `mean`, and `upper`

Raises:

Type	Description
`NotImplementedError`	When `strategy` is `cum_sum` and `method` is `BCa`

Source code in python/rapidstats/_bootstrap.py

def adverse_impact_ratio_at_thresholds(
    self,
    y_score: ArrayLike,
    protected: ArrayLike,
    control: ArrayLike,
    sample_weight: Optional[ArrayLike] = None,
    thresholds: Optional[list[float]] = None,
    strategy: LoopStrategy = "auto",
) -> pl.DataFrame:
    """Bootstrap AIR at thresholds. See
    [rapidstats.metrics.adverse_impact_ratio_at_thresholds][] for more details.

    Parameters
    ----------
    y_score : ArrayLike
        Predicted scores
    protected : ArrayLike
        An array of booleans identifying the protected class
    control : ArrayLike
        An array of booleans identifying the control class
    sample_weight: Optional[ArrayLike], optional
        Sample weights, set to 1 if None

        !!! Version
            Added 0.2.0
    thresholds : Optional[list[float]], optional
        The thresholds to compute `is_predicted_negative` at, i.e. y_score < t.
        If None, uses every score present in `y_score`, by default None
    strategy : LoopStrategy, optional
        Computation method, by default "auto"

    Returns
    -------
    pl.DataFrame
        A DataFrame of `threshold`, `lower`, `mean`, and `upper`

    Raises
    ------
    NotImplementedError
        When `strategy` is `cum_sum` and `method` is `BCa`
    """
    has_sample_weight = sample_weight is not None
    df = pl.DataFrame(
        {"y_score": y_score, "protected": protected, "control": control}
    ).with_columns(
        pl.col("protected", "control").cast(pl.Boolean),
        pl.col("y_score").cast(pl.Float64),
    )

    if has_sample_weight:
        df = df.with_columns(
            pl.Series("sample_weight", sample_weight).cast(pl.Float64)
        )

    strategy = _set_loop_strategy(thresholds, strategy)

    if strategy == "loop":
        airs: list[dict[str, float]] = []
        for t in tqdm(set(thresholds or y_score)):
            lower, point, upper = self.adverse_impact_ratio(
                df["y_score"].lt(t),
                df["protected"],
                df["control"],
                sample_weight=sample_weight,
            )
            airs.append(
                {"threshold": t, "lower": lower, "point": point, "upper": upper}
            )

        return pl.DataFrame(airs).fill_nan(None).pipe(_fill_infinite, None)

    elif strategy == "cum_sum":
        if thresholds is None:
            thresholds = df["y_score"]

        if self._params["poisson"]:
            _air_func = _air_at_thresholds_core_sorted
            _sample_func = functools.partial(_poisson_sample, df_height=df.height)
            df = df.lazy()
        else:
            _air_func = _air_at_thresholds_core
            _sample_func = _multinomial_sample

        def _air(i: int) -> pl.LazyFrame:
            sample_df = _sample_func(df, seed=i)

            return _air_func(sample_df, thresholds, has_sample_weight)

        airs: list[pl.LazyFrame] = _run_concurrent(
            _air,
            (
                (self.seed + i for i in range(self.iterations))
                if self.seed is not None
                else (None for _ in range(self.iterations))
            ),
        )
        bootstrap_lf = (
            pl.concat(airs, how="vertical")
            .rename({"air": "value"})
            .with_columns(
                _expr_fill_infinite(pl.col("value").fill_nan(None)).alias("value")
            )
        )

        lf = bootstrap_lf.group_by("threshold")

        final_cols = ["threshold", "lower", "point", "upper"]

        original = (
            _air_at_thresholds_core(df, thresholds, has_sample_weight)
            .rename({"air": "point"})
            .unique("threshold")
        )

        if self.method == "standard":
            return (
                _standard_interval_polars(lf, self.alpha)
                .join(original, on="threshold", how="left", validate="1:1")
                .select(final_cols)
                .collect()
            )
        elif self.method == "percentile":
            return (
                _percentile_interval_polars(lf, self.alpha)
                .join(original, on="threshold", how="left", validate="1:1")
                .select(final_cols)
                .collect()
            )
        elif self.method == "basic":
            return (
                _percentile_interval_polars(lf, self.alpha)
                .join(original, on="threshold", how="left", validate="1:1")
                .pipe(_basic_interval_polars)
                .select(final_cols)
                .collect()
            )
        elif self.method == "BCa":
            raise NotImplementedError(
                "Method `BCa` not implemented for strategy `cum_sum` due to https://github.com/pola-rs/polars/issues/20951"
            )
            original_lf = (
                _air_at_thresholds_core(df, thresholds, has_sample_weight)
                .rename({"air": "original_value"})
                .unique("threshold")
            )

            tmp = functools.partial(
                _air_at_thresholds_core,
                thresholds=thresholds,
                has_sample_weight=has_sample_weight,
            )
            jacknife_lf = (
                pl.concat(_jacknife(df, tmp), how="vertical")
                .rename({"air": "value"})
                .unique("threshold")
            )

            return (
                _bca_interval_polars(
                    original_lf,
                    bootstrap_lf=bootstrap_lf.rename({"air": "value"}),
                    jacknife_lf=jacknife_lf,
                    alpha=self.alpha,
                    by=["threshold"],
                )
                .select(final_cols)
                .collect()
            )

`average_precision(y_true, y_score, sample_weight=None)`

Bootstrap average precision. See rapidstats.metrics.average_precision for more details.

Parameters:

Name	Type	Description	Default
`y_true`	`ArrayLike`	Ground truth target	required
`y_score`	`ArrayLike`	Predicted scores	required
`sample_weight`	`Optional[ArrayLike]`	Sample weights, set to 1 if None Version Added 0.2.0	`None`

Returns:

Type	Description
`ConfidenceInterval`	A tuple of (lower, point, upper)

Changelog

Added in version 0.1.0
Returns point estimate instead of mean starting version 0.3.0

Source code in python/rapidstats/_bootstrap.py

def average_precision(
    self,
    y_true: ArrayLike,
    y_score: ArrayLike,
    sample_weight: Optional[ArrayLike] = None,
) -> ConfidenceInterval:
    """Bootstrap average precision. See [rapidstats.metrics.average_precision][] for more
    details.

    Parameters
    ----------
    y_true : ArrayLike
        Ground truth target
    y_score : ArrayLike
        Predicted scores
    sample_weight: Optional[ArrayLike], optional
        Sample weights, set to 1 if None

        !!! Version
            Added 0.2.0

    Returns
    -------
    ConfidenceInterval
        A tuple of (lower, point, upper)

    Changelog
    ----------------------
    - Added in version 0.1.0
    - Returns point estimate instead of mean starting version 0.3.0
    """
    df = (
        _y_true_y_score_to_df(y_true, y_score, sample_weight)
        .rename({"y_score": "threshold"})
        .drop_nulls()
    )

    def _cm_inner(pf: PolarsFrame) -> pl.LazyFrame:
        return (
            pf.lazy()
            .pipe(_base_confusion_matrix_at_thresholds)
            .pipe(_full_confusion_matrix_from_base)
            .select("threshold", "precision", "tpr")
        )

    def _cm(i: int) -> pl.LazyFrame:
        sample_df = df.sample(fraction=1, with_replacement=True, seed=i)

        return _cm_inner(sample_df)

    cms: list[pl.LazyFrame] = _run_concurrent(
        _cm,
        (
            (self.seed + i for i in range(self.iterations))
            if self.seed is not None
            else (None for _ in range(self.iterations))
        ),
    )

    cms = [
        cm.with_columns(pl.lit(i).alias("iteration")) for i, cm in enumerate(cms)
    ]

    bootstrap_stats = (
        pl.concat(cms, how="vertical")
        .sort("threshold")
        .group_by("iteration", maintain_order=True)
        .agg(
            _ap_from_pr_curve(pl.col("precision"), pl.col("tpr")).alias(
                "average_precision"
            )
        )
        .collect()["average_precision"]
        .to_list()
    )

    original_stat = _ap(y_true, y_score)

    if self.method == "standard":
        return _standard_interval(original_stat, bootstrap_stats, self.alpha)
    elif self.method == "percentile":
        return _percentile_interval(original_stat, bootstrap_stats, self.alpha)
    elif self.method == "basic":
        return _basic_interval(original_stat, bootstrap_stats, self.alpha)
    elif self.method == "BCa":

        def _cm_jacknife(i):
            j_df = df.filter(pl.col("index").ne(i))

            return _cm_inner(j_df).with_columns(pl.lit(i).alias("iteration"))

        df = df.with_row_index("index")
        cms = _run_concurrent(_cm_jacknife, range(df.height))
        jacknife_stats = (
            pl.concat(cms, how="vertical")
            .sort("threshold")
            .group_by("iteration", maintain_order=True)
            .agg(
                _ap_from_pr_curve(pl.col("precision"), pl.col("tpr")).alias(
                    "average_precision"
                )
            )
            .collect()["average_precision"]
            .to_list()
        )

        return _bca_interval(
            original_stat, bootstrap_stats, jacknife_stats, self.alpha
        )

`brier_loss(y_true, y_score)`

Bootstrap Brier loss. See rapidstats.metrics.brier_loss for more details.

Parameters:

Name	Type	Description	Default
`y_true`	`ArrayLike`	Ground truth target	required
`y_score`	`ArrayLike`	Predicted scores	required

Returns:

Type	Description
`ConfidenceInterval`	A tuple of (lower, point, upper)

Source code in python/rapidstats/_bootstrap.py

def brier_loss(self, y_true: ArrayLike, y_score: ArrayLike) -> ConfidenceInterval:
    """Bootstrap Brier loss. See [rapidstats.metrics.brier_loss][] for more details.

    Parameters
    ----------
    y_true : ArrayLike
        Ground truth target
    y_score : ArrayLike
        Predicted scores

    Returns
    -------
    ConfidenceInterval
        A tuple of (lower, point, upper)
    """
    df = _y_true_y_score_to_df(y_true, y_score)

    return _bootstrap_brier_loss(df, **self._params)

`confusion_matrix(y_true, y_pred, beta=1.0, sample_weight=None)`

Bootstrap confusion matrix. See rapidstats.metrics.confusion_matrix for more details.

Parameters:

Name	Type	Description	Default
`y_true`	`ArrayLike`	Ground truth target	required
`y_pred`	`ArrayLike`	Predicted target	required
`beta`	`float`	\( \beta \) to use in \( F_\beta \), by default 1	`1.0`
`sample_weight`	`Optional[ArrayLike]`	Sample weights, set to 1 if None Version Added 0.2.0	`None`

Returns:

Type	Description
`BootstrappedConfusionMatrix`	A dataclass of confusion matrix metrics as (lower, point, upper). See rapidstats._bootstrap.BootstrappedConfusionMatrix for more details.

Added in version 0.1.0

Source code in python/rapidstats/_bootstrap.py

def confusion_matrix(
    self,
    y_true: ArrayLike,
    y_pred: ArrayLike,
    beta: float = 1.0,
    sample_weight: Optional[ArrayLike] = None,
) -> BootstrappedConfusionMatrix:
    r"""Bootstrap confusion matrix. See [rapidstats.metrics.confusion_matrix][] for
    more details.

    Parameters
    ----------
    y_true : ArrayLike
        Ground truth target
    y_pred : ArrayLike
        Predicted target
    beta : float, optional
        \( \beta \) to use in \( F_\beta \), by default 1
    sample_weight: Optional[ArrayLike], optional
        Sample weights, set to 1 if None

        !!! Version
            Added 0.2.0

    Returns
    -------
    BootstrappedConfusionMatrix
        A dataclass of confusion matrix metrics as (lower, point, upper). See
        [rapidstats._bootstrap.BootstrappedConfusionMatrix][] for more details.

    Added in version 0.1.0
    ----------------------
    """
    df = _y_true_y_pred_to_df(y_true, y_pred, sample_weight).with_columns(
        pl.col("y_true").cast(pl.UInt8)
    )

    return BootstrappedConfusionMatrix(
        *_bootstrap_confusion_matrix(df, beta, **self._params)
    )

`confusion_matrix_at_thresholds(y_true, y_score, thresholds=None, metrics=DefaultConfusionMatrixMetrics, strategy='auto', beta=1.0, sample_weight=None)`

Bootstrap confusion matrix at thresholds. See rapidstats.metrics.confusion_matrix_at_thresholds for more details.

Parameters:

Name	Type	Description	Default
`y_true`	`ArrayLike`	Ground truth target	required
`y_score`	`ArrayLike`	Predicted scores	required
`thresholds`	`Optional[list[float]]`	The thresholds to compute `y_pred` at, i.e. y_score >= t. If None, uses every score present in `y_score`, by default None	`None`
`metrics`	`Iterable[ConfusionMatrixMetric]`	The metrics to compute, by default DefaultConfusionMatrixMetrics	`DefaultConfusionMatrixMetrics`
`strategy`	`LoopStrategy`	Computation method, by default "auto"	`'auto'`
`beta`	`float`	\( \beta \) to use in \( F_\beta \), by default 1	`1.0`
`sample_weight`	`Optional[ArrayLike]`	Sample weights, set to 1 if None Version Added 0.2.0	`None`

Returns:

Type	Description
`DataFrame`	A DataFrame of `threshold`, `metric`, `lower`, `mean`, and `upper`

Raises:

Type	Description
`NotImplementedError`	When `strategy` is `cum_sum` and `method` is `BCa`

Added in version 0.1.0

Source code in python/rapidstats/_bootstrap.py

def confusion_matrix_at_thresholds(
    self,
    y_true: ArrayLike,
    y_score: ArrayLike,
    thresholds: Optional[list[float]] = None,
    metrics: Iterable[ConfusionMatrixMetric] = DefaultConfusionMatrixMetrics,
    strategy: LoopStrategy = "auto",
    beta: float = 1.0,
    sample_weight: Optional[ArrayLike] = None,
) -> pl.DataFrame:
    r"""Bootstrap confusion matrix at thresholds. See
    [rapidstats.metrics.confusion_matrix_at_thresholds][] for more details.

    Parameters
    ----------
    y_true : ArrayLike
        Ground truth target
    y_score : ArrayLike
        Predicted scores
    thresholds : Optional[list[float]], optional
        The thresholds to compute `y_pred` at, i.e. y_score >= t. If None,
        uses every score present in `y_score`, by default None
    metrics : Iterable[ConfusionMatrixMetric], optional
        The metrics to compute, by default DefaultConfusionMatrixMetrics
    strategy : LoopStrategy, optional
        Computation method, by default "auto"
    beta : float, optional
        \( \beta \) to use in \( F_\beta \), by default 1
    sample_weight: Optional[ArrayLike], optional
        Sample weights, set to 1 if None

        !!! Version
            Added 0.2.0

    Returns
    -------
    pl.DataFrame
        A DataFrame of `threshold`, `metric`, `lower`, `mean`, and `upper`

    Raises
    ------
    NotImplementedError
        When `strategy` is `cum_sum` and `method` is `BCa`

    Added in version 0.1.0
    ----------------------
    """
    df = (
        _y_true_y_score_to_df(y_true, y_score, sample_weight)
        .rename({"y_score": "threshold"})
        .sort("threshold", descending=True)
    )
    final_cols = ["threshold", "metric", "lower", "point", "upper"]

    strategy = _set_loop_strategy(thresholds, strategy)

    if strategy == "loop":
        cms: list[pl.DataFrame] = []
        for t in tqdm(set(thresholds or y_score)):
            cm = (
                self.confusion_matrix(
                    df["y_true"],
                    df["threshold"].ge(t),
                    beta=beta,
                    sample_weight=df["sample_weight"],
                )
                .to_polars()
                .with_columns(pl.lit(t).alias("threshold"))
            )
            cms.append(cm)

        return pl.concat(cms, how="vertical").with_columns(
            pl.col("lower", "point", "upper").fill_nan(None)
        )
    elif strategy == "cum_sum":
        if thresholds is None:
            thresholds = df["threshold"].unique()

        if self._params["poisson"]:
            _matrix_func = _base_confusion_matrix_at_thresholds_sorted
            _sample_func = functools.partial(_poisson_sample, df_height=df.height)
            df = df.lazy()
        else:
            _matrix_func = _base_confusion_matrix_at_thresholds
            _sample_func = _multinomial_sample

        def _cm_inner(pf: PolarsFrame) -> pl.LazyFrame:
            return (
                pf.lazy()
                .pipe(_matrix_func)
                .pipe(_full_confusion_matrix_from_base, beta=beta)
                .unique("threshold")
                .pipe(_map_to_thresholds, thresholds)
                .drop("_threshold_actual")
            )

        def _cm(i: int) -> pl.LazyFrame:
            sample_df = _sample_func(df, seed=i)

            return _cm_inner(sample_df)

        cms: list[pl.LazyFrame] = _run_concurrent(
            _cm,
            (
                (self.seed + i for i in range(self.iterations))
                if self.seed is not None
                else (None for _ in range(self.iterations))
            ),
        )

        def _process_results(lf: pl.LazyFrame) -> pl.LazyFrame:
            return (
                lf.select("threshold", *metrics)
                .unpivot(index="threshold")
                .rename({"variable": "metric"})
            )

        bootstrap_lf = pl.concat(cms, how="vertical").pipe(_process_results)

        lf = bootstrap_lf.group_by("threshold", "metric")

        original = (
            _cm_inner(df)
            .select("threshold", *metrics)
            .pipe(_map_to_thresholds, thresholds)
            .unpivot(index="threshold")
            .rename({"variable": "metric", "value": "point"})
        )

        if self.method == "standard":
            return (
                _standard_interval_polars(lf, self.alpha)
                .join(
                    original,
                    on=["threshold", "metric"],
                    how="left",
                    validate="1:1",
                )
                .select(final_cols)
                .collect()
            )
        elif self.method == "percentile":
            return (
                _percentile_interval_polars(lf, self.alpha)
                .join(
                    original,
                    on=["threshold", "metric"],
                    how="left",
                    validate="1:1",
                )
                .select(final_cols)
                .collect()
            )
        elif self.method == "basic":
            return (
                _percentile_interval_polars(lf, self.alpha)
                .join(
                    original,
                    on=["threshold", "metric"],
                    how="left",
                    validate="1:1",
                )
                .pipe(_basic_interval_polars)
                .select(final_cols)
                .collect()
            )
        elif self.method == "BCa":
            raise NotImplementedError(
                "Method `BCa` not implemented for strategy `cum_sum` due to https://github.com/pola-rs/polars/issues/20951"
            )
            original_lf = (
                _cm_inner(df)
                .select("threshold", *metrics)
                .pipe(_map_to_thresholds, thresholds)
                .unpivot(index="threshold")
                .rename({"variable": "metric", "value": "original_value"})
            )
            jacknife_lf = pl.concat(_jacknife(df, _cm_inner), how="vertical").pipe(
                _process_results
            )

            return (
                _bca_interval_polars(
                    original_lf,
                    bootstrap_lf=bootstrap_lf,
                    jacknife_lf=jacknife_lf,
                    alpha=self.alpha,
                    by=["threshold", "metric"],
                )
                .select(final_cols)
                .collect()
            )
        else:
            raise ValueError()

`max_ks(y_true, y_score)`

Bootstrap Max-KS. See rapidstats.metrics.max_ks for more details.

Parameters:

Name	Type	Description	Default
`y_true`	`ArrayLike`	Ground truth target	required
`y_score`	`ArrayLike`	Predicted scores	required

Returns:

Type	Description
`ConfidenceInterval`	A tuple of (lower, point, upper)

Changelog

Added in version 0.1.0
Returns point estimate instead of mean starting version 0.3.0

Source code in python/rapidstats/_bootstrap.py

def max_ks(self, y_true: ArrayLike, y_score: ArrayLike) -> ConfidenceInterval:
    """Bootstrap Max-KS. See [rapidstats.metrics.max_ks][] for more details.

    Parameters
    ----------
    y_true : ArrayLike
        Ground truth target
    y_score : ArrayLike
        Predicted scores

    Returns
    -------
    ConfidenceInterval
        A tuple of (lower, point, upper)

    Changelog
    ----------------------
    - Added in version 0.1.0
    - Returns point estimate instead of mean starting version 0.3.0
    """
    df = _y_true_y_score_to_df(y_true, y_score)

    return _bootstrap_max_ks(df, **self._params)

`mean(y)`

Bootstrap mean.

Parameters:

Name	Type	Description	Default
`y`	`ArrayLike`	A 1D-array	required

Returns:

Type	Description
`ConfidenceInterval`	A tuple of (lower, point, upper)

Added in version 0.1.0

Source code in python/rapidstats/_bootstrap.py

def mean(self, y: ArrayLike) -> ConfidenceInterval:
    """Bootstrap mean.

    Parameters
    ----------
    y : ArrayLike
        A 1D-array

    Returns
    -------
    ConfidenceInterval
        A tuple of (lower, point, upper)

    Added in version 0.1.0
    ----------------------
    """
    df = pl.DataFrame({"y": y})

    return _bootstrap_mean(df, **self._params)

`mean_squared_error(y_true, y_score)`

Bootstrap MSE. See rapidstats.metrics.mean_squared_error for more details.

Parameters:

Name	Type	Description	Default
`y_true`	`ArrayLike`	Ground truth target	required
`y_score`	`ArrayLike`	Predicted scores	required

Returns:

Type	Description
`ConfidenceInterval`	A tuple of (lower, point, upper)

Added in version 0.1.0

Source code in python/rapidstats/_bootstrap.py

def mean_squared_error(
    self, y_true: ArrayLike, y_score: ArrayLike
) -> ConfidenceInterval:
    r"""Bootstrap MSE. See [rapidstats.metrics.mean_squared_error][] for more details.

    Parameters
    ----------
    y_true : ArrayLike
        Ground truth target
    y_score : ArrayLike
        Predicted scores

    Returns
    -------
    ConfidenceInterval
        A tuple of (lower, point, upper)

    Added in version 0.1.0
    ----------------------
    """
    return _bootstrap_mean_squared_error(
        _regression_to_df(y_true, y_score), **self._params
    )

`r2(y_true, y_score)`

Bootstrap R2. See rapidstats.metrics.r2 for more details.

Parameters:

Name	Type	Description	Default
`y_true`	`ArrayLike`	Ground truth target	required
`y_score`	`ArrayLike`	Predicted scores	required

Returns:

Type	Description
`ConfidenceInterval`	A tuple of (lower, point, upper)

Added in version 0.1.0

Source code in python/rapidstats/_bootstrap.py

def r2(self, y_true: ArrayLike, y_score: ArrayLike) -> ConfidenceInterval:
    """Bootstrap R2. See [rapidstats.metrics.r2][] for more details.

    Parameters
    ----------
    y_true : ArrayLike
        Ground truth target
    y_score : ArrayLike
        Predicted scores

    Returns
    -------
    ConfidenceInterval
        A tuple of (lower, point, upper)

    Added in version 0.1.0
    ----------------------
    """
    return _bootstrap_r2(_regression_to_df(y_true, y_score), **self._params)

`roc_auc(y_true, y_score, sample_weight=None)`

Bootstrap ROC-AUC. See rapidstats.metrics.roc_auc for more details.

Parameters:

Name	Type	Description	Default
`y_true`	`ArrayLike`	Ground truth target	required
`y_score`	`ArrayLike`	Predicted scores	required
`sample_weight`	`Optional[ArrayLike]`	Sample weights, set to 1 if None Version Added 0.2.0	`None`

Returns:

Type	Description
`ConfidenceInterval`	A tuple of (lower, point, upper)

Changelog

Added in version 0.1.0
Returns point estimate instead of mean starting version 0.3.0

Source code in python/rapidstats/_bootstrap.py

def roc_auc(
    self,
    y_true: ArrayLike,
    y_score: ArrayLike,
    sample_weight: Optional[ArrayLike] = None,
) -> ConfidenceInterval:
    """Bootstrap ROC-AUC. See [rapidstats.metrics.roc_auc][] for more details.

    Parameters
    ----------
    y_true : ArrayLike
        Ground truth target
    y_score : ArrayLike
        Predicted scores
    sample_weight: Optional[ArrayLike], optional
        Sample weights, set to 1 if None

        !!! Version
            Added 0.2.0

    Returns
    -------
    ConfidenceInterval
        A tuple of (lower, point, upper)

    Changelog
    ---------
    - Added in version 0.1.0
    - Returns point estimate instead of mean starting version 0.3.0
    """
    df = _y_true_y_score_to_df(y_true, y_score, sample_weight).with_columns(
        pl.col("y_true").cast(pl.Float64)
    )

    if self._params["poisson"]:
        df = df.sort("y_score")
        _f = _bootstrap_roc_auc_sorted
    else:
        _f = _bootstrap_roc_auc

    return _f(df, **self._params)

`root_mean_squared_error(y_true, y_score)`

Bootstrap RMSE. See rapidstats.metrics.root_mean_squared_error for more details.

Parameters:

Name	Type	Description	Default
`y_true`	`ArrayLike`	Ground truth target	required
`y_score`	`ArrayLike`	Predicted scores	required

Returns:

Type	Description
`ConfidenceInterval`	A tuple of (lower, point, upper)

Added in version 0.1.0

Source code in python/rapidstats/_bootstrap.py

def root_mean_squared_error(
    self, y_true: ArrayLike, y_score: ArrayLike
) -> ConfidenceInterval:
    r"""Bootstrap RMSE. See [rapidstats.metrics.root_mean_squared_error][] for more details.

    Parameters
    ----------
    y_true : ArrayLike
        Ground truth target
    y_score : ArrayLike
        Predicted scores

    Returns
    -------
    ConfidenceInterval
        A tuple of (lower, point, upper)

    Added in version 0.1.0
    ----------------------
    """
    return _bootstrap_root_mean_squared_error(
        _regression_to_df(y_true, y_score), **self._params
    )

`run(df, stat_func, **kwargs)`

Run bootstrap for an arbitrary function that accepts a Polars DataFrame and returns a scalar real number.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	The data to pass to `stat_func`	required
`stat_func`	`StatFunc`	A callable that takes a Polars DataFrame as its first argument and returns a scalar real number.	required

Returns:

Type	Description
`ConfidenceInterval`	A tuple of (lower, point, upper)

Added in version 0.1.0

Source code in python/rapidstats/_bootstrap.py

def run(
    self, df: pl.DataFrame, stat_func: StatFunc, **kwargs
) -> ConfidenceInterval:
    """Run bootstrap for an arbitrary function that accepts a Polars DataFrame and
    returns a scalar real number.

    Parameters
    ----------
    df : pl.DataFrame
        The data to pass to `stat_func`
    stat_func : StatFunc
        A callable that takes a Polars DataFrame as its first argument and returns
        a scalar real number.

    Returns
    -------
    ConfidenceInterval
        A tuple of (lower, point, upper)

    Added in version 0.1.0
    ----------------------
    """
    default = {"executor": "threads", "preserve_order": False}
    for k, v in default.items():
        if k not in kwargs:
            kwargs[k] = v

    if self._params["poisson"]:
        func = functools.partial(
            _poisson_bs_func, df=df, df_height=df.height, stat_func=stat_func
        )
    else:
        func = functools.partial(_bs_func, df=df, stat_func=stat_func)

    if self.seed is None:
        iterable = (None for _ in range(self.iterations))
    else:
        iterable = (self.seed + i for i in range(self.iterations))

    bootstrap_stats = [
        x for x in _run_concurrent(func, iterable, **kwargs) if not math.isnan(x)
    ]

    original_stat = stat_func(df)

    if len(bootstrap_stats) == 0:
        return (math.nan, math.nan, math.nan)

    if self.method == "standard":
        return _standard_interval(original_stat, bootstrap_stats, self.alpha)
    elif self.method == "percentile":
        return _percentile_interval(original_stat, bootstrap_stats, self.alpha)
    elif self.method == "basic":
        return _basic_interval(original_stat, bootstrap_stats, self.alpha)
    elif self.method == "BCa":
        jacknife_stats = [x for x in _jacknife(df, stat_func) if not math.isnan(x)]

        return _bca_interval(
            original_stat, bootstrap_stats, jacknife_stats, self.alpha
        )
    else:
        # We shouldn't hit this since we check method in __init__, but it makes the
        # type-checker happy
        raise ValueError("Invalid method")

`BootstrappedConfusionMatrix` `dataclass`

Result object returned by rapidstats.Bootstrap().confusion_matrix.

See rapidstats.metrics.ConfusionMatrix for a detailed breakdown of the attributes stored in this class. However, instead of storing the statistic, it stores the bootstrapped confidence interval as (lower, point, upper).

Methods:

Name	Description
`to_polars`	Transform the dataclass to a long Polars DataFrame with columns

Source code in python/rapidstats/_bootstrap.py

@dataclasses.dataclass
class BootstrappedConfusionMatrix:
    """Result object returned by `rapidstats.Bootstrap().confusion_matrix`.

    See [rapidstats.metrics.ConfusionMatrix][] for a detailed breakdown of the attributes stored in
    this class. However, instead of storing the statistic, it stores the bootstrapped
    confidence interval as (lower, point, upper).
    """

    tn: ConfidenceInterval
    fp: ConfidenceInterval
    fn: ConfidenceInterval
    tp: ConfidenceInterval
    tpr: ConfidenceInterval
    fpr: ConfidenceInterval
    fnr: ConfidenceInterval
    tnr: ConfidenceInterval
    prevalence: ConfidenceInterval
    prevalence_threshold: ConfidenceInterval
    informedness: ConfidenceInterval
    precision: ConfidenceInterval
    false_omission_rate: ConfidenceInterval
    plr: ConfidenceInterval
    nlr: ConfidenceInterval
    acc: ConfidenceInterval
    balanced_accuracy: ConfidenceInterval
    fbeta: ConfidenceInterval
    folkes_mallows_index: ConfidenceInterval
    mcc: ConfidenceInterval
    threat_score: ConfidenceInterval
    markedness: ConfidenceInterval
    fdr: ConfidenceInterval
    npv: ConfidenceInterval
    dor: ConfidenceInterval
    ppr: ConfidenceInterval
    pnr: ConfidenceInterval

    def to_polars(self) -> pl.DataFrame:
        """Transform the dataclass to a long Polars DataFrame with columns
        `metric`, `lower`, `point`, and `upper`.

        Returns
        -------
        pl.DataFrame
            A DataFrame with columns `metric`, `lower`, `point`, and `upper`

        Changelog
        ---------
        Return point estimate instead of mean starting version 0.3.0
        """
        dct = self.__dict__
        lower = []
        point = []
        upper = []
        for l, p, u in dct.values():  # noqa: E741
            lower.append(l)
            point.append(p)
            upper.append(u)

        return pl.DataFrame(
            {
                "metric": dct.keys(),
                "lower": lower,
                "point": point,
                "upper": upper,
            }
        )

`to_polars()`

Transform the dataclass to a long Polars DataFrame with columns metric, lower, point, and upper.

Returns:

Type	Description
`DataFrame`	A DataFrame with columns `metric`, `lower`, `point`, and `upper`

Changelog

Return point estimate instead of mean starting version 0.3.0

Source code in python/rapidstats/_bootstrap.py

def to_polars(self) -> pl.DataFrame:
    """Transform the dataclass to a long Polars DataFrame with columns
    `metric`, `lower`, `point`, and `upper`.

    Returns
    -------
    pl.DataFrame
        A DataFrame with columns `metric`, `lower`, `point`, and `upper`

    Changelog
    ---------
    Return point estimate instead of mean starting version 0.3.0
    """
    dct = self.__dict__
    lower = []
    point = []
    upper = []
    for l, p, u in dct.values():  # noqa: E741
        lower.append(l)
        point.append(p)
        upper.append(u)

    return pl.DataFrame(
        {
            "metric": dct.keys(),
            "lower": lower,
            "point": point,
            "upper": upper,
        }
    )

Bootstrap

Bootstrap

adverse_impact_ratio(y_pred, protected, control, sample_weight=None)

adverse_impact_ratio_at_thresholds(y_score, protected, control, sample_weight=None, thresholds=None, strategy='auto')

average_precision(y_true, y_score, sample_weight=None)

brier_loss(y_true, y_score)

confusion_matrix(y_true, y_pred, beta=1.0, sample_weight=None)

confusion_matrix_at_thresholds(y_true, y_score, thresholds=None, metrics=DefaultConfusionMatrixMetrics, strategy='auto', beta=1.0, sample_weight=None)

max_ks(y_true, y_score)

mean(y)

mean_squared_error(y_true, y_score)

r2(y_true, y_score)

roc_auc(y_true, y_score, sample_weight=None)

root_mean_squared_error(y_true, y_score)

run(df, stat_func, **kwargs)

BootstrappedConfusionMatrix dataclass

to_polars()

`Bootstrap`

`adverse_impact_ratio(y_pred, protected, control, sample_weight=None)`

`adverse_impact_ratio_at_thresholds(y_score, protected, control, sample_weight=None, thresholds=None, strategy='auto')`

`average_precision(y_true, y_score, sample_weight=None)`

`brier_loss(y_true, y_score)`

`confusion_matrix(y_true, y_pred, beta=1.0, sample_weight=None)`

`confusion_matrix_at_thresholds(y_true, y_score, thresholds=None, metrics=DefaultConfusionMatrixMetrics, strategy='auto', beta=1.0, sample_weight=None)`

`max_ks(y_true, y_score)`

`mean(y)`

`mean_squared_error(y_true, y_score)`

`r2(y_true, y_score)`

`roc_auc(y_true, y_score, sample_weight=None)`

`root_mean_squared_error(y_true, y_score)`

`run(df, stat_func, **kwargs)`

`BootstrappedConfusionMatrix` `dataclass`

`to_polars()`