Skip to content

Binning

Functions:

Name Description
doane

Doane's rule defines the bin count as

freedman_diaconis

The Freedman-Diaconis rule defines the bin width as

rice

Rice's rule defines the bin count as

scott

Scott's rule defines the bin width as

sqrt

The square root rule defines the bin count as

sturges

Sturges' rule defines the bin count as

doane(x)

Doane's rule defines the bin count as

\[ k = 1 + \log_{2}(n) + \log_{2}\left(1 + \frac{|g_{1}|}{\sigma_{g_{1}}}\right) \]

where

\[ \sigma_{g_{1}} = \sqrt{\frac{6(n-2)}{(n+1)(n+3)}} \]

Parameters:

Name Type Description Default
x ArrayLike
required

Returns:

Type Description
int

Bin count

Raises:

Type Description
ValueError

If \(n < 2\)

Source code in python/rapidstats/bin.py
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
def doane(x: ArrayLike) -> int:
    r"""Doane's rule defines the bin count as

    \[
        k = 1 + \log_{2}(n) + \log_{2}\left(1 + \frac{|g_{1}|}{\sigma_{g_{1}}}\right)
    \]

    where

    \[
        \sigma_{g_{1}} = \sqrt{\frac{6(n-2)}{(n+1)(n+3)}}
    \]

    Parameters
    ----------
    x : ArrayLike

    Returns
    -------
    int
        Bin count

    Raises
    ------
    ValueError
        If $n < 2$
    """
    x = pl.Series(x)
    x_len = x.len()

    if x_len <= 2:
        raise ValueError("Doane's rule requires at least 3 observations")

    g1 = abs(x.skew())
    sg1 = math.sqrt(6.0 * (x_len - 2) / ((x_len + 1.0) * (x_len + 3)))

    return int(1 + math.log2(x_len) + math.log2(1 + (g1 / sg1)))

freedman_diaconis(x)

The Freedman-Diaconis rule defines the bin width as

\[ h = 2\frac{IQR(x)}{\sqrt[3]{n}} \]

where \(x\) is the input array and \(n\) is the length of \(x\).

The bin width is converted to a bin count via

\[ k = \lceil \frac{\max{x} - \min{x}}{h} \rceil \]

Parameters:

Name Type Description Default
x ArrayLike
required

Returns:

Type Description
int

Bin count

Source code in python/rapidstats/bin.py
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
def freedman_diaconis(x: ArrayLike) -> int:
    r"""The Freedman-Diaconis rule defines the bin width as

    \[
        h = 2\frac{IQR(x)}{\sqrt[3]{n}}
    \]

    where $x$ is the input array and $n$ is the length of $x$.

    The bin width is converted to a bin count via

    \[
        k = \lceil \frac{\max{x} - \min{x}}{h} \rceil
    \]

    Parameters
    ----------
    x : ArrayLike

    Returns
    -------
    int
        Bin count
    """
    x = pl.Series(x)

    iqr = x.quantile(0.75, interpolation="linear") - x.quantile(
        0.25, interpolation="linear"
    )

    bin_width = 2.0 * iqr * x.len() ** (-1.0 / 3.0)

    return _bin_width_to_count(x, bin_width)

rice(x)

Rice's rule defines the bin count as

\[ k = \lceil 2n^{\frac{1}{3}} \rceil \]

Parameters:

Name Type Description Default
x ArrayLike
required

Returns:

Type Description
int

Bin count

Source code in python/rapidstats/bin.py
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
def rice(x: ArrayLike) -> int:
    r"""Rice's rule defines the bin count as

    \[
        k = \lceil 2n^{\frac{1}{3}} \rceil
    \]

    Parameters
    ----------
    x : ArrayLike

    Returns
    -------
    int
        Bin count
    """
    return math.ceil(2 * (len(x) ** (1 / 3)))

scott(x)

Scott's rule defines the bin width as

\[ h = 3.49\sigma n^{-\frac{1}{3}} \]

The bin count is given by

\[ k = \lceil \frac{\max{x} - \min{x}}{h} \rceil \]

Parameters:

Name Type Description Default
x ArrayLike
required

Returns:

Type Description
int

Bin count

Source code in python/rapidstats/bin.py
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
def scott(x: ArrayLike) -> int:
    r"""Scott's rule defines the bin width as

    \[
        h = 3.49\sigma n^{-\frac{1}{3}}
    \]

    The bin count is given by

    \[
        k = \lceil \frac{\max{x} - \min{x}}{h} \rceil
    \]

    Parameters
    ----------
    x : ArrayLike

    Returns
    -------
    int
        Bin count
    """
    x = pl.Series(x)

    bin_width = (3.49 * x.std()) / (len(x) ** (1 / 3))

    return _bin_width_to_count(x, bin_width)

sqrt(x)

The square root rule defines the bin count as

\[ k = \lceil\sqrt{n}\rceil \]

Parameters:

Name Type Description Default
x ArrayLike
required

Returns:

Type Description
int

Bin count

Source code in python/rapidstats/bin.py
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
def sqrt(x: ArrayLike) -> int:
    r"""The square root rule defines the bin count as

    \[
        k = \lceil\sqrt{n}\rceil
    \]

    Parameters
    ----------
    x : ArrayLike

    Returns
    -------
    int
        Bin count
    """
    return math.ceil(math.sqrt(len(x)))

sturges(x)

Sturges' rule defines the bin count as

\[ k = \lceil 1 + \log_{2}(n) \rceil \]

Parameters:

Name Type Description Default
x ArrayLike
required

Returns:

Type Description
int

Bin count

Source code in python/rapidstats/bin.py
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
def sturges(x: ArrayLike) -> int:
    r"""Sturges' rule defines the bin count as

    \[
        k = \lceil 1 + \log_{2}(n) \rceil
    \]

    Parameters
    ----------
    x : ArrayLike

    Returns
    -------
    int
        Bin count
    """
    return math.ceil(math.log2(len(x))) + 1