Polars

Functions:

Name	Description
`auc`	Computes the area under the curve (AUC) via numerical integration.
`is_close`	Compares the relative equality of the inputs.

`auc(x, y, method='trapezoidal')`

Computes the area under the curve (AUC) via numerical integration.

Parameters:

Name	Type	Description	Default
`x`	`Expr \| str`	The x-axis	required
`y`	`Expr \| str`	The y-axis	required
`method`	`Literal['rectangular', 'trapezoidal']`	If "rectangular", use rectangular integration, if "trapezoidal", use trapezoidal integration, by default "trapezoidal"	`'trapezoidal'`

Returns:

Type	Description
`Expr`

Raises:

Type	Description
`ValueError`	If `method` is not one of `rectangular` or `trapezoidal`

Examples:

import polars as pl
import rapidstats.polars as rps

df = pl.DataFrame({"x": [1, 2, 3], "y": [5, 6, 7]})
df.select(rps.auc("x", "y"))

output

shape: (1, 1)
┌──────┐
│ x    │
│ ---  │
│ f64  │
╞══════╡
│ 12.0 │
└──────┘

Added in version 0.2.0

Source code in python/rapidstats/_polars/_numeric.py

def auc(
    x: IntoExprColumn,
    y: IntoExprColumn,
    method: Literal["rectangular", "trapezoidal"] = "trapezoidal",
) -> pl.Expr:
    """Computes the area under the curve (AUC) via numerical integration.

    Parameters
    ----------
    x : pl.Expr | str
        The x-axis
    y : pl.Expr | str
        The y-axis
    method : Literal["rectangular", "trapezoidal"], optional
        If "rectangular", use rectangular integration, if "trapezoidal", use
        trapezoidal integration, by default "trapezoidal"

    Returns
    -------
    pl.Expr

    Raises
    ------
    ValueError
        If `method` is not one of `rectangular` or `trapezoidal`

    Examples
    --------
    ``` py
    import polars as pl
    import rapidstats.polars as rps

    df = pl.DataFrame({"x": [1, 2, 3], "y": [5, 6, 7]})
    df.select(rps.auc("x", "y"))
    ```
    ``` title="output"
    shape: (1, 1)
    ┌──────┐
    │ x    │
    │ ---  │
    │ f64  │
    ╞══════╡
    │ 12.0 │
    └──────┘
    ```

    Added in version 0.2.0
    ----------------------
    """
    if method == "trapezoidal":
        is_trapezoidal = True
    elif method == "rectangular":
        is_trapezoidal = False
    else:
        raise ValueError("`method` must be one of `rectangular` or `trapezoidal`")

    return pl.plugins.register_plugin_function(
        plugin_path=_PLUGIN_PATH,
        function_name="pl_auc",
        args=[
            _str_to_expr(x).cast(pl.Float64),
            _str_to_expr(y).cast(pl.Float64),
            pl.lit(is_trapezoidal),
        ],
        returns_scalar=True,
    )

`is_close(x, y, rtol=1e-05, atol=1e-08, null_equal=False)`

Compares the relative equality of the inputs.

Parameters:

Name	Type	Description	Default
`x`	`Expr \| str \| float`		required
`y`	`Expr \| str \| float`		required
`rtol`	`float`	Relative tolerance, by default 1e-05	`1e-05`
`atol`	`float`	Absolute tolerance, by default 1e-08	`1e-08`
`null_equal`	`bool`	If True, considers nulls to be equal, by default False	`False`

Returns:

Type	Description
`Expr`

Examples:

import polars as pl
import rapidstats.polars as rps

df = pl.DataFrame({"x": [1.0, 1.1], "y": [.999999999, 5]})
df.select(rps.is_close("x", "y"))

output

shape: (2, 1)
┌───────┐
│ x     │
│ ---   │
│ bool  │
╞═══════╡
│ true  │
│ false │
└───────┘

Added in version 0.2.0

Source code in python/rapidstats/_polars/_numeric.py

def is_close(
    x: IntoExprColumn | NumericLiteral,
    y: IntoExprColumn | NumericLiteral,
    rtol: float = 1e-05,
    atol: float = 1e-08,
    null_equal: bool = False,
) -> pl.Expr:
    """Compares the relative equality of the inputs.

    Parameters
    ----------
    x : pl.Expr | str | float
    y : pl.Expr | str | float
    rtol : float, optional
        Relative tolerance, by default 1e-05
    atol : float, optional
        Absolute tolerance, by default 1e-08
    null_equal : bool, optional
        If True, considers nulls to be equal, by default False

    Returns
    -------
    pl.Expr

    Examples
    --------
    ``` py
    import polars as pl
    import rapidstats.polars as rps

    df = pl.DataFrame({"x": [1.0, 1.1], "y": [.999999999, 5]})
    df.select(rps.is_close("x", "y"))
    ```
    ``` title="output"
    shape: (2, 1)
    ┌───────┐
    │ x     │
    │ ---   │
    │ bool  │
    ╞═══════╡
    │ true  │
    │ false │
    └───────┘
    ```

    Added in version 0.2.0
    ----------------------
    """
    x = _numeric_to_expr(x)
    y = _numeric_to_expr(y)

    res = x.sub(y).abs().le(pl.lit(atol).add(rtol).mul(y.abs()))

    if null_equal:
        res = res.or_(x.is_null().and_(y.is_null()))

    return res

Functions:

Name	Description
`format`	Format expressions as a string using Python f-string syntax.

`format(f_string, *args)`

Format expressions as a string using Python f-string syntax.

Parameters:

Name	Type	Description	Default
`f_string`	`str`	A string with placeholders, mimicing Python f-string syntax. For example, "{:.3f}". Currently, the only supported types are "f" and "%". Width, alignment, and fill are also not supported.	required
`args`	`Union[Expr, str, float]`	Expression(s) that fill the placeholders. Note that strings are NOT parsed as columns.	`()`

Returns:

Type	Description
`Expr`

Raises:

Type	Description
`ValueError`	If the number of placeholders does not match the number of expressions

Examples:

import polars as pl
import rapidstats.polars as rps

df = pl.DataFrame({"x": 1123.09873, "y": "foo"})
df.select(
    rps.format(
        "{:,.3f} is {} is {}", pl.col("x"), pl.col("y"), "bar"
    )
)

output

shape: (1, 1)
┌─────────────────────────┐
│ x                       │
│ ---                     │
│ str                     │
╞═════════════════════════╡
│ 1,123.099 is foo is bar │
└─────────────────────────┘

Added in version 0.2.0

Source code in python/rapidstats/_polars/_format.py

def format(f_string: str, *args: Union[pl.Expr, str, float]) -> pl.Expr:
    """Format expressions as a string using Python f-string syntax.

    Parameters
    ----------
    f_string : str
        A string with placeholders, mimicing Python f-string syntax. For example,
        "{:.3f}". Currently, the only supported types are "f" and "%". Width, alignment,
        and fill are also not supported.
    args
        Expression(s) that fill the placeholders. Note that strings are NOT parsed as
        columns.

    Returns
    -------
    pl.Expr

    Raises
    ------
    ValueError
        If the number of placeholders does not match the number of expressions

    Examples
    --------
    ``` py
    import polars as pl
    import rapidstats.polars as rps

    df = pl.DataFrame({"x": 1123.09873, "y": "foo"})
    df.select(
        rps.format(
            "{:,.3f} is {} is {}", pl.col("x"), pl.col("y"), "bar"
        )
    )
    ```
    ``` title="output"
    shape: (1, 1)
    ┌─────────────────────────┐
    │ x                       │
    │ ---                     │
    │ str                     │
    ╞═════════════════════════╡
    │ 1,123.099 is foo is bar │
    └─────────────────────────┘
    ```

    Added in version 0.2.0
    ----------------------
    """
    parts = _parse_format_string(f_string)
    formatters = [
        _parse_formatter(p) for p, is_formatter in zip(*parts) if is_formatter
    ]

    len_formatters = len(formatters)
    len_args = len(args)
    if len_formatters != len(args):
        raise ValueError(
            f"Number of placeholders `{len_formatters}` does not match number of arguments `{len_args}`"
        )

    outputs = [
        _apply_formatter(s if isinstance(s, pl.Expr) else pl.lit(s), format_spec)
        for s, format_spec in zip(args, formatters)
    ]

    i = 0
    to_concat = []
    for p, is_formatter in zip(*parts):
        if is_formatter:
            to_concat.append(outputs[i])
            i += 1
        else:
            to_concat.append(pl.lit(p))

    return pl.concat_str(*to_concat)