Skip to content

Binning

Functions:

Name Description
doane

Doane's rule defines the bin count as

freedman_diaconis

The Freedman-Diaconis rule defines the bin width as

rice

Rice's rule defines the bin count as

scott

Scott's rule defines the bin width as

sqrt

The square root rule defines the bin count as

sturges

Sturges' rule defines the bin count as

doane(x)

Doane's rule defines the bin count as

\[ k = 1 + \log_{2}(n) + \log_{2}\left(1 + \frac{|g_{1}|}{\sigma_{g_{1}}}\right) \]

where

\[ \sigma_{g_{1}} = \sqrt{\frac{6(n-2)}{(n+1)(n+3)}} \]

Parameters:

Name Type Description Default
x ArrayLike
required

Returns:

Type Description
int

Bin count

Raises:

Type Description
ValueError

If \(n < 2\)

Source code in python/rapidstats/bin.py
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
def doane(x: ArrayLike) -> int:
    r"""Doane's rule defines the bin count as

    \[
        k = 1 + \log_{2}(n) + \log_{2}\left(1 + \frac{|g_{1}|}{\sigma_{g_{1}}}\right)
    \]

    where

    \[
        \sigma_{g_{1}} = \sqrt{\frac{6(n-2)}{(n+1)(n+3)}}
    \]

    Parameters
    ----------
    x : ArrayLike

    Returns
    -------
    int
        Bin count

    Raises
    ------
    ValueError
        If $n < 2$
    """
    x = pl.Series(x)
    x_len = x.len()

    if x_len <= 2:
        raise ValueError("Doane's rule requires at least 3 observations")

    g1 = abs(x.skew())
    sg1 = math.sqrt(6.0 * (x_len - 2) / ((x_len + 1.0) * (x_len + 3)))

    return int(1 + math.log2(x_len) + math.log2(1 + (g1 / sg1)))

freedman_diaconis(x)

The Freedman-Diaconis rule defines the bin width as

\[ h = 2\frac{IQR(x)}{\sqrt[3]{n}} \]

where \(x\) is the input array and \(n\) is the length of \(x\).

The bin width is converted to a bin count via

\[ k = \lceil \frac{\max{x} - \min{x}}{h} \rceil \]

If \(h\) is 0, compute the generalized IQR using successively larger intervals (e.g. .01 and .99 instead of .25 and .75) to determine \(h\). As a last ditch effort, use 3.5 times the standard deviation as \(h\).

Parameters:

Name Type Description Default
x ArrayLike
required

Returns:

Type Description
int

Bin count

Source code in python/rapidstats/bin.py
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
def freedman_diaconis(x: ArrayLike) -> int:
    r"""The Freedman-Diaconis rule defines the bin width as

    \[
        h = 2\frac{IQR(x)}{\sqrt[3]{n}}
    \]

    where $x$ is the input array and $n$ is the length of $x$.

    The bin width is converted to a bin count via

    \[
        k = \lceil \frac{\max{x} - \min{x}}{h} \rceil
    \]

    If $h$ is 0, compute the generalized IQR using successively larger intervals (e.g.
    .01 and .99 instead of .25 and .75) to determine $h$. As a last ditch effort, use
    3.5 times the standard deviation as $h$.

    Parameters
    ----------
    x : ArrayLike

    Returns
    -------
    int
        Bin count
    """
    x = pl.Series(x)

    iqr = x.quantile(0.75, interpolation="linear") - x.quantile(
        0.25, interpolation="linear"
    )
    h = 2 * iqr

    # It's possible that the IQR is 0. In that case, we try and compute the generalized
    # IQR using successively wider quantiles. Taken from R's hist.default function and
    # this answer: https://stats.stackexchange.com/questions/455237/what-to-do-when-iqr-returns-0-in-freedman-diaconis-rule
    alpha = 1 / 4
    alpha_min = 1 / 512

    while h == 0 and alpha >= alpha_min:
        alpha /= 2

        h = (
            x.quantile(alpha, interpolation="linear")
            - x.quantile(1 - alpha, interpolation="linear")
        ) / (1 - 2 * alpha)

    # As a last ditch, use 3.5 times the standard deviation
    if h == 0:
        h = 3.5 * x.std()

    bin_width = h * x.len() ** (-1.0 / 3.0)

    return _bin_width_to_count(x, bin_width)

rice(x)

Rice's rule defines the bin count as

\[ k = \lceil 2n^{\frac{1}{3}} \rceil \]

Parameters:

Name Type Description Default
x ArrayLike
required

Returns:

Type Description
int

Bin count

Source code in python/rapidstats/bin.py
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
def rice(x: ArrayLike) -> int:
    r"""Rice's rule defines the bin count as

    \[
        k = \lceil 2n^{\frac{1}{3}} \rceil
    \]

    Parameters
    ----------
    x : ArrayLike

    Returns
    -------
    int
        Bin count
    """
    return math.ceil(2 * (len(x) ** (1 / 3)))

scott(x)

Scott's rule defines the bin width as

\[ h = 3.49\sigma n^{-\frac{1}{3}} \]

The bin count is given by

\[ k = \lceil \frac{\max{x} - \min{x}}{h} \rceil \]

Parameters:

Name Type Description Default
x ArrayLike
required

Returns:

Type Description
int

Bin count

Source code in python/rapidstats/bin.py
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
def scott(x: ArrayLike) -> int:
    r"""Scott's rule defines the bin width as

    \[
        h = 3.49\sigma n^{-\frac{1}{3}}
    \]

    The bin count is given by

    \[
        k = \lceil \frac{\max{x} - \min{x}}{h} \rceil
    \]

    Parameters
    ----------
    x : ArrayLike

    Returns
    -------
    int
        Bin count
    """
    x = pl.Series(x)

    bin_width = (3.49 * x.std()) / (len(x) ** (1 / 3))

    return _bin_width_to_count(x, bin_width)

sqrt(x)

The square root rule defines the bin count as

\[ k = \lceil\sqrt{n}\rceil \]

Parameters:

Name Type Description Default
x ArrayLike
required

Returns:

Type Description
int

Bin count

Source code in python/rapidstats/bin.py
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
def sqrt(x: ArrayLike) -> int:
    r"""The square root rule defines the bin count as

    \[
        k = \lceil\sqrt{n}\rceil
    \]

    Parameters
    ----------
    x : ArrayLike

    Returns
    -------
    int
        Bin count
    """
    return math.ceil(math.sqrt(len(x)))

sturges(x)

Sturges' rule defines the bin count as

\[ k = \lceil 1 + \log_{2}(n) \rceil \]

Parameters:

Name Type Description Default
x ArrayLike
required

Returns:

Type Description
int

Bin count

Source code in python/rapidstats/bin.py
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
def sturges(x: ArrayLike) -> int:
    r"""Sturges' rule defines the bin count as

    \[
        k = \lceil 1 + \log_{2}(n) \rceil
    \]

    Parameters
    ----------
    x : ArrayLike

    Returns
    -------
    int
        Bin count
    """
    return math.ceil(math.log2(len(x))) + 1