Skip to content

Visualization

Classes:

Name Description
ScreenTransform

Transforms the data from raw units into "screen" space, e.g. pixels.

Functions:

Name Description
thin_points

Given a set of points, select points such that each point is visually distinct

ScreenTransform

Transforms the data from raw units into "screen" space, e.g. pixels.

Parameters:

Name Type Description Default
width float

The width of the screen

required
height float

The height of the screen

required
xmin float | None

The min x value. If None, is the min observed x value, by default None

None
xmax float | None

The max x value. If None, is the max observed x value, by default None

None
ymin float | None

The min y value. If None, is the min observed y value, by default None

None
ymax float | None

The max y value. If None, is the max observed y value, by default None

None
Source code in python/rapidstats/viz.py
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
class ScreenTransform:
    """Transforms the data from raw units into "screen" space, e.g. pixels.

    Parameters
    ----------
    width : float
        The width of the screen
    height : float
        The height of the screen
    xmin : float | None, optional
        The min x value. If None, is the min observed x value, by default None
    xmax : float | None, optional
        The max x value. If None, is the max observed x value, by default None
    ymin : float | None, optional
        The min y value. If None, is the min observed y value, by default None
    ymax : float | None, optional
        The max y value. If None, is the max observed y value, by default None
    """

    def __init__(
        self,
        width: float,
        height: float,
        xmin: float | None = None,
        xmax: float | None = None,
        ymin: float | None = None,
        ymax: float | None = None,
    ):
        self.width = width
        self.height = height
        self.xmin = xmin
        self.xmax = xmax
        self.ymin = ymin
        self.ymax = ymax

    def __call__(self, df: nwt.IntoDataFrameT, x: str, y: str) -> nwt.IntoDataFrameT:
        nw_df = nw.from_native(df)

        xmin = self.xmin
        xmax = self.xmax
        ymin = self.ymin
        ymax = self.ymax

        if xmin is None:
            xmin = nw_df[x].min()

        if xmax is None:
            xmax = nw_df[x].max()

        if ymin is None:
            ymin = nw_df[y].min()

        if ymax is None:
            ymax = nw_df[y].max()

        if xmax < xmin:
            raise ValueError("xmax must be >= xmin")

        if ymax < ymin:
            raise ValueError("ymax must be >= ymin")

        # We want an affine map u(x) = ax + b. The map we choose is given by solving the
        # systems of equations with two constraints:
        # 1. The smallest data value maps to the left edge
        #   u(xmin) = 0, u(ymin) = 0
        # 2. The largest data value maps to the right edge
        #   u(xmax) = width, u(ymax) = height
        # Solving these gives:
        #   a = width / (xmax - xmin)
        #   b = -a * xmin
        # If xmax = xmin, we set a = 0 and b to any constant. Let's set it to either
        # width / 2 or height / 2 (middle of the screen).

        xrange = xmax - xmin
        yrange = ymax - ymin

        if xrange == 0:
            ax = 0.0
            bx = self.width / 2
        else:
            ax = self.width / xrange
            bx = -ax * xmin

        if yrange == 0:
            ay = 0.0
            by = self.height / 2
        else:
            ay = self.height / yrange
            by = -ay * ymin

        return nw_df.with_columns(
            nw.col(x).__mul__(ax).__add__(bx),
            nw.col(y).__mul__(ay).__add__(by),
        ).to_native()

thin_points(df, x, y, min_distance, always_keep=None, order=None, transform=None)

Given a set of points, select points such that each point is visually distinct from the other.

Parameters:

Name Type Description Default
df IntoDataFrameT

The DataFrame containing the points

required
x str

The column denoting the x-axis

required
y str

The column denoting the y-axis

required
min_distance float

The minimum distance between each point

required
always_keep str | None

A boolean column denoting whether the point should always be kept regardless of distance. If None, no points are always kept (equivalent to a boolean column of all false), by default None

None
order str | None

A u64 column (lower is better) that controls which points in a cluster are kept. If None, points are kept in insertion order, by default None

None
transform Callable[[IntoDataFrameT, str, str], IntoDataFrameT] | None

A callable that accepts df, x, and y used to transform the data before applying the thinning algorithm. For instance, ScreenTransform can map the data to pixel space, so that min_distance can refer to pixels instead of raw units. If None, no transformations are applied, by default None

None

Returns:

Type Description
IntoDataFrameT

The original DataFrame filtered to the thinned points

Source code in python/rapidstats/viz.py
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
def thin_points(
    df: nwt.IntoDataFrameT,
    x: str,
    y: str,
    min_distance: float,
    always_keep: str | None = None,
    order: str | None = None,
    transform: (
        Callable[[nwt.IntoDataFrameT, str, str], nwt.IntoDataFrameT] | None
    ) = None,
) -> nwt.IntoDataFrameT:
    """Given a set of points, select points such that each point is visually distinct
    from the other.

    Parameters
    ----------
    df : nwt.IntoDataFrameT
        The DataFrame containing the points
    x : str
        The column denoting the x-axis
    y : str
        The column denoting the y-axis
    min_distance : float
        The minimum distance between each point
    always_keep : str | None, optional
        A boolean column denoting whether the point should always be kept regardless of
        distance. If None, no points are always kept (equivalent to a boolean column of
        all false), by default None
    order : str | None, optional
        A u64 column (lower is better) that controls which points in a cluster are kept.
        If None, points are kept in insertion order, by default None
    transform : Callable[[nwt.IntoDataFrameT, str, str], nwt.IntoDataFrameT] | None, optional
        A callable that accepts `df`, `x`, and `y` used to transform the data before
        applying the thinning algorithm. For instance, `ScreenTransform` can map the
        data to pixel space, so that `min_distance` can refer to pixels instead of raw
        units. If None, no transformations are applied, by default None

    Returns
    -------
    nwt.IntoDataFrameT
        The original DataFrame filtered to the thinned points
    """
    if min_distance < 0:
        raise ValueError("`min_distance` must be >= 0")

    to_select = [
        c
        for c in [
            x,
            y,
            always_keep,
            order,
        ]
        if c is not None
    ]

    selected = nw.from_native(native_object=df).select(to_select).to_native()

    if transform is not None:
        selected = transform(selected, x, y)

    sanitized = (
        nw.from_native(selected)
        .to_polars()
        .with_columns(
            pl.col(x, y).cast(pl.Float64),
        )
    )

    if always_keep is None:
        always_keep = "__rapidstats_always_keep__"
        sanitized = sanitized.with_columns(pl.lit(False).alias(always_keep))
    else:
        sanitized = sanitized.with_columns(pl.col(always_keep).cast(pl.Boolean))

    if order is not None:
        sanitized = sanitized.with_columns(pl.col(order).cast(pl.UInt64))

    to_keep = _thin_points_greedy(
        df=sanitized,
        x=x,
        y=y,
        min_distance=min_distance,
        always_keep=always_keep,
        order=order,
    )

    return nw.from_native(df).filter(to_keep).to_native()