Correlation
Classes:
| Name | Description |
|---|---|
CorrelationBatchOptions |
Options to control batching in rapidstats.correlation_matrix. |
Functions:
| Name | Description |
|---|---|
correlation_matrix |
Warning |
CorrelationBatchOptions
dataclass
Options to control batching in rapidstats.correlation_matrix.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
batch_size
|
int | float
|
The number of combinations (where a combination is a pair of features) to compute each batch. If a float between 0 and 1, it is interpreted as a percent, by default = 0.1 |
0.1
|
cache_dir
|
str | Path | None
|
The directory to save out the results of each batch. If None, creates a folder called "rapidstats_correlation_cache" in the current working directory, by default None |
None
|
start_iteration
|
int | None
|
The iteration to start at. If None, will start at the latest iteration available
in |
None
|
delete_ok
|
bool
|
Whether to delete |
False
|
quiet
|
bool
|
Whether to print progress information, by default False |
False
|
Source code in python/rapidstats/_corr.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | |
correlation_matrix(data, l1=None, l2=None, method='pearson', format='wide', index='', batch_options=None)
Warning
If you know that your data has no nulls, you should use np.corrcoef instead.
While this function will return the correct result and is reasonably fast,
computing the null-aware correlation matrix will always be slower than assuming
that there are no nulls.
Compute the null-aware correlation matrix between two lists of columns. If both
lists are None, then the correlation matrix is over all columns in the input
DataFrame. If l1 is not None, and is a list of 2-tuples, l1 is interpreted
as the combinations of columns to compute the correlation for.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
IntoFrameT
|
The input data |
required |
l1
|
Union[list[str], list[tuple[str, str]]]
|
A list of columns to appear as the columns of the correlation matrix, by default None |
None
|
l2
|
list[str]
|
A list of columns to appear as the rows of the correlation matrix, by default None |
None
|
method
|
Literal['pearson', 'spearman']
|
How to calculate the correlation, by default "pearson" |
'pearson'
|
format
|
Literal['wide', 'long']
|
The format the correlation matrix is returned in. If "wide", it is the classic
correlation matrix. If "long", it is a DataFrame with the columns !!! Added in version 0.4.0 |
'wide'
|
index
|
str
|
The name of the !!! Added in version 0.2.0 !!! Renamed from "index_name" to "index" in version 0.4.0 |
''
|
batch_options
|
CorrelationBatchOptions | None
|
Parameters that control how to compute the correlation matrix in a batched manner. If None, does not use batching, by default None |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A correlation matrix with |
Added in version 0.0.24
Source code in python/rapidstats/_corr.py
235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 | |