Closed
Description
Hey,
I was playing about with histogram bin widths recently and was trying to test my method for the Freedman Diaconis Estimator by checking it against the "np.histogram_bin_edges". I spotted that I was getting different values when I calculated the bin width using this method and am suspecting there is some rounding going on - and I just want to check whether it is intentional or not.
This is because when I choose the length of my data to be a cube number my results match. However, in other cases my answers differ (see example below).
Apologies if this is expected behaviour or there is a bug in my method.
Reproducing code example:
import numpy as np
def freedmanDiaconus(data):
n = len(data)
# calculate quartiles
x_q1, x_q3 = np.percentile(data, [25, 75])
# calculate n data
x_n = len(data)
# calculate IQR
x_iqr = x_q3 - x_q1
# calculate Freedman Diaconus
freedman_diaconus = 2*x_iqr*n**(-1/3)
return freedman_diaconus
x = [1,2,3,4,5,6,7,8,9]
np_bins = np.histogram_bin_edges(x, bins='fd')
np_bin_width = np_bins[1] - np_bins[0]
fd_width = freedmanDiaconus(x)
print(fd_width, np_bin_width)
Output:
3.8459988541530894 2.6666666666666665
NumPy/Python version information:
1.19.5 3.6.9 (default, Oct 8 2020, 12:12:24)
[GCC 8.4.0]
Metadata
Metadata
Assignees
Labels
No labels