Skip to content

Polars

Functions:

Name Description
auc

Computes the area under the curve (AUC) via numerical integration.

is_close

Compares the relative equality of the inputs.

auc(x, y, method='trapezoidal')

Computes the area under the curve (AUC) via numerical integration.

Parameters:

Name Type Description Default
x Expr | str

The x-axis

required
y Expr | str

The y-axis

required
method Literal['rectangular', 'trapezoidal']

If "rectangular", use rectangular integration, if "trapezoidal", use trapezoidal integration, by default "trapezoidal"

'trapezoidal'

Returns:

Type Description
Expr

Raises:

Type Description
ValueError

If method is not one of rectangular or trapezoidal

Examples:

import polars as pl
import rapidstats.polars as rps

df = pl.DataFrame({"x": [1, 2, 3], "y": [5, 6, 7]})
df.select(rps.auc("x", "y"))
output
shape: (1, 1)
┌──────┐
│ x    │
│ ---  │
│ f64  │
╞══════╡
│ 12.0 │
└──────┘

Added in version 0.2.0
Source code in python/rapidstats/_polars/_numeric.py
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
def auc(
    x: IntoExprColumn,
    y: IntoExprColumn,
    method: Literal["rectangular", "trapezoidal"] = "trapezoidal",
) -> pl.Expr:
    """Computes the area under the curve (AUC) via numerical integration.

    Parameters
    ----------
    x : pl.Expr | str
        The x-axis
    y : pl.Expr | str
        The y-axis
    method : Literal["rectangular", "trapezoidal"], optional
        If "rectangular", use rectangular integration, if "trapezoidal", use
        trapezoidal integration, by default "trapezoidal"

    Returns
    -------
    pl.Expr

    Raises
    ------
    ValueError
        If `method` is not one of `rectangular` or `trapezoidal`

    Examples
    --------
    ``` py
    import polars as pl
    import rapidstats.polars as rps

    df = pl.DataFrame({"x": [1, 2, 3], "y": [5, 6, 7]})
    df.select(rps.auc("x", "y"))
    ```
    ``` title="output"
    shape: (1, 1)
    ┌──────┐
    │ x    │
    │ ---  │
    │ f64  │
    ╞══════╡
    │ 12.0 │
    └──────┘
    ```

    Added in version 0.2.0
    ----------------------
    """
    if method == "trapezoidal":
        is_trapezoidal = True
    elif method == "rectangular":
        is_trapezoidal = False
    else:
        raise ValueError("`method` must be one of `rectangular` or `trapezoidal`")

    return pl.plugins.register_plugin_function(
        plugin_path=_PLUGIN_PATH,
        function_name="pl_auc",
        args=[
            _str_to_expr(x).cast(pl.Float64),
            _str_to_expr(y).cast(pl.Float64),
            pl.lit(is_trapezoidal),
        ],
        returns_scalar=True,
    )

is_close(x, y, rtol=1e-05, atol=1e-08, null_equal=False)

Compares the relative equality of the inputs.

Parameters:

Name Type Description Default
x Expr | str | float
required
y Expr | str | float
required
rtol float

Relative tolerance, by default 1e-05

1e-05
atol float

Absolute tolerance, by default 1e-08

1e-08
null_equal bool

If True, considers nulls to be equal, by default False

False

Returns:

Type Description
Expr

Examples:

import polars as pl
import rapidstats.polars as rps

df = pl.DataFrame({"x": [1.0, 1.1], "y": [.999999999, 5]})
df.select(rps.is_close("x", "y"))
output
shape: (2, 1)
┌───────┐
│ x     │
│ ---   │
│ bool  │
╞═══════╡
│ true  │
│ false │
└───────┘

Added in version 0.2.0
Source code in python/rapidstats/_polars/_numeric.py
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
def is_close(
    x: IntoExprColumn | NumericLiteral,
    y: IntoExprColumn | NumericLiteral,
    rtol: float = 1e-05,
    atol: float = 1e-08,
    null_equal: bool = False,
) -> pl.Expr:
    """Compares the relative equality of the inputs.

    Parameters
    ----------
    x : pl.Expr | str | float
    y : pl.Expr | str | float
    rtol : float, optional
        Relative tolerance, by default 1e-05
    atol : float, optional
        Absolute tolerance, by default 1e-08
    null_equal : bool, optional
        If True, considers nulls to be equal, by default False

    Returns
    -------
    pl.Expr

    Examples
    --------
    ``` py
    import polars as pl
    import rapidstats.polars as rps

    df = pl.DataFrame({"x": [1.0, 1.1], "y": [.999999999, 5]})
    df.select(rps.is_close("x", "y"))
    ```
    ``` title="output"
    shape: (2, 1)
    ┌───────┐
    │ x     │
    │ ---   │
    │ bool  │
    ╞═══════╡
    │ true  │
    │ false │
    └───────┘
    ```

    Added in version 0.2.0
    ----------------------
    """
    x = _numeric_to_expr(x)
    y = _numeric_to_expr(y)

    res = x.sub(y).abs().le(pl.lit(atol).add(rtol).mul(y.abs()))

    if null_equal:
        res = res.or_(x.is_null().and_(y.is_null()))

    return res

Functions:

Name Description
format

Format expressions as a string using Python f-string syntax.

format(f_string, *args)

Format expressions as a string using Python f-string syntax.

Parameters:

Name Type Description Default
f_string str

A string with placeholders, mimicing Python f-string syntax. For example, "{:.3f}". Currently, the only supported types are "f" and "%". Width, alignment, and fill are also not supported.

required
args Union[Expr, str, float]

Expression(s) that fill the placeholders. Note that strings are NOT parsed as columns.

()

Returns:

Type Description
Expr

Raises:

Type Description
ValueError

If the number of placeholders does not match the number of expressions

Examples:

import polars as pl
import rapidstats.polars as rps

df = pl.DataFrame({"x": 1123.09873, "y": "foo"})
df.select(
    rps.format(
        "{:,.3f} is {} is {}", pl.col("x"), pl.col("y"), "bar"
    )
)
output
shape: (1, 1)
┌─────────────────────────┐
│ x                       │
│ ---                     │
│ str                     │
╞═════════════════════════╡
│ 1,123.099 is foo is bar │
└─────────────────────────┘

Added in version 0.2.0
Source code in python/rapidstats/_polars/_format.py
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
def format(f_string: str, *args: Union[pl.Expr, str, float]) -> pl.Expr:
    """Format expressions as a string using Python f-string syntax.

    Parameters
    ----------
    f_string : str
        A string with placeholders, mimicing Python f-string syntax. For example,
        "{:.3f}". Currently, the only supported types are "f" and "%". Width, alignment,
        and fill are also not supported.
    args
        Expression(s) that fill the placeholders. Note that strings are NOT parsed as
        columns.

    Returns
    -------
    pl.Expr

    Raises
    ------
    ValueError
        If the number of placeholders does not match the number of expressions

    Examples
    --------
    ``` py
    import polars as pl
    import rapidstats.polars as rps

    df = pl.DataFrame({"x": 1123.09873, "y": "foo"})
    df.select(
        rps.format(
            "{:,.3f} is {} is {}", pl.col("x"), pl.col("y"), "bar"
        )
    )
    ```
    ``` title="output"
    shape: (1, 1)
    ┌─────────────────────────┐
    │ x                       │
    │ ---                     │
    │ str                     │
    ╞═════════════════════════╡
    │ 1,123.099 is foo is bar │
    └─────────────────────────┘
    ```

    Added in version 0.2.0
    ----------------------
    """
    parts = _parse_format_string(f_string)
    formatters = [
        _parse_formatter(p) for p, is_formatter in zip(*parts) if is_formatter
    ]

    len_formatters = len(formatters)
    len_args = len(args)
    if len_formatters != len(args):
        raise ValueError(
            f"Number of placeholders `{len_formatters}` does not match number of arguments `{len_args}`"
        )

    outputs = [
        _apply_formatter(s if isinstance(s, pl.Expr) else pl.lit(s), format_spec)
        for s, format_spec in zip(args, formatters)
    ]

    i = 0
    to_concat = []
    for p, is_formatter in zip(*parts):
        if is_formatter:
            to_concat.append(outputs[i])
            i += 1
        else:
            to_concat.append(pl.lit(p))

    return pl.concat_str(*to_concat)