-
-
Save ruoyu0088/70effade57483355bbd18b31dc370f2a to your computer and use it in GitHub Desktop.
Thanks for posting this!
Here are a few suggested edits to pick the number of segments automatically by optimizing AIC and/ or BIC. The implementation is similar to the heuristic strategy presented in this paper: https://discovery.ucl.ac.uk/id/eprint/10070516/1/AIC_BIC_Paper.pdf
def segments_fit(X, Y, maxcount):
xmin = X.min()
xmax = X.max()
n = len(X)
AIC_ = float('inf')
BIC_ = float('inf')
r_ = None
for count in range(1, maxcount+1):
seg = np.full(count - 1, (xmax - xmin) / count)
px_init = np.r_[np.r_[xmin, seg].cumsum(), xmax]
py_init = np.array([Y[np.abs(X - x) < (xmax - xmin) * 0.1].mean() for x in px_init])
def func(p):
seg = p[:count - 1]
py = p[count - 1:]
px = np.r_[np.r_[xmin, seg].cumsum(), xmax]
return px, py
def err(p): # This is RSS / n
px, py = func(p)
Y2 = np.interp(X, px, py)
return np.mean((Y - Y2)**2)
r = optimize.minimize(err, x0=np.r_[seg, py_init], method='Nelder-Mead')
# Compute AIC/ BIC.
AIC = n * np.log10(err(r.x)) + 4 * count
BIC = n * np.log10(err(r.x)) + 2 * count * np.log(n)
if((BIC < BIC_) & (AIC < AIC_)): # Continue adding complexity.
r_ = r
AIC_ = AIC
BIC_ = BIC
else: # Stop.
count = count - 1
break
return func(r_.x) ## Return the last (n-1)
Great stuff, thanks! I used this to model draining out a tank to capture the formation of vorticies.
That's really helpful, thanks. I want to use a part from your implementation in a project I am working on. Is that ok? Could you add a license, or specify whether you want some reference etc?
No objections from me - feel free to use anything I've contributed under MIT no attribution license: https://opensource.org/licenses/MIT-0.
Be sure to credit ruoyu0088 for their contributions as they wish!
Hi! Sometimes, depending on the number of segments and the signals I pass, I get None in the return value. I believe it stems from the py_init initializing some indices to NaN. Why could it be? It leads me to a question: why do you multiply by 0.01?
Thank you!
@KateZi the multiplication factor of 0.01 ensures that only a small set of y-axis data (here 1% around the initial x-values) is selected to initialize the y-values corresponding to the initial x-values.
I had a data set which had a relatively larger difference between x-intervals than 1%, wherein I had to increase this parameter value to 0.02 (2%). That ensured that there are some y-values to initialize the optimization parameters.
Hello, thanks for the material, really useful! I have a question: is there any such code available to accomodate for 3 dimensions?
Thanks for posting this!
Here are a few suggested edits to pick the number of segments automatically by optimizing AIC and/ or BIC. The implementation is similar to the heuristic strategy presented in this paper: https://discovery.ucl.ac.uk/id/eprint/10070516/1/AIC_BIC_Paper.pdf
def segments_fit(X, Y, maxcount): xmin = X.min() xmax = X.max() n = len(X) AIC_ = float('inf') BIC_ = float('inf') r_ = None for count in range(1, maxcount+1): seg = np.full(count - 1, (xmax - xmin) / count) px_init = np.r_[np.r_[xmin, seg].cumsum(), xmax] py_init = np.array([Y[np.abs(X - x) < (xmax - xmin) * 0.1].mean() for x in px_init]) def func(p): seg = p[:count - 1] py = p[count - 1:] px = np.r_[np.r_[xmin, seg].cumsum(), xmax] return px, py def err(p): # This is RSS / n px, py = func(p) Y2 = np.interp(X, px, py) return np.mean((Y - Y2)**2) r = optimize.minimize(err, x0=np.r_[seg, py_init], method='Nelder-Mead') # Compute AIC/ BIC. AIC = n * np.log10(err(r.x)) + 4 * count BIC = n * np.log10(err(r.x)) + 2 * count * np.log(n) if((BIC < BIC_) & (AIC < AIC_)): # Continue adding complexity. r_ = r AIC_ = AIC BIC_ = BIC else: # Stop. count = count - 1 break return func(r_.x) ## Return the last (n-1)
Hi dankoc! Thanks for sharing the code which really helps me. One little clarification, shouldn't it be natural log in the AIC/BIC formula?
ruoyu0088 Thanks for the original version - it was exactly what I needed for a piecewise linear approximation of a CDF I wanted.
This can be done automatically by Linear Trees... A sklearn compatible implementation is available here