-
Notifications
You must be signed in to change notification settings - Fork 951
Fix NLinear normalization #2757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Some test still fail, and I'm not sure why. They can be fixed by changing the model to For example
|
…es not learn weight for linear time sereies by design
…rmalize=False because normalize=True is the new default
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2757 +/- ##
==========================================
- Coverage 43.95% 43.92% -0.04%
==========================================
Files 223 223
Lines 33252 33274 +22
==========================================
- Hits 14617 14614 -3
- Misses 18635 18660 +25 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for PR and the fixes @turbotimon 🚀 I agree with setting normalize=True
as default.
I left some comments regarding the past covariates normalization, where I believe we should try to find on a more long-term solution.
Also, it looks like there is still something fishing going on with normalize=True
and auto-regression (forecast horizon n
> output_chunk_length
). This is already in the current master
head, so not introduced by this PR. I left a comment in the tests. Would you mind having a look at that?
seq_last = x[:, -1:, : self.output_dim].detach().clone() | ||
# normalize the target features only (ignore the covariates) | ||
x[:, :, : self.output_dim] = x[:, :, : self.output_dim] - seq_last | ||
# get last values for all x but not future covariates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for giving this a go. However, I believe we should find a more general solution that we discussed in the issue.
Every change that we introduce here will be a "breaking" change for users that want to load the weights of a model stored with an earlier Darts version (in the sense that the weights were trained on an input with different value space).
So if we also want to add this for future covariates (the non-datetime attrs) then we would have to "break" it again.
In my opinion, either we find a robust solution that covers everything, or we leave the normalization only on the target for now.
By robust I'm thinking about the RINorm changes that we discussed in #2758. That would also allow to use the old behavior (target-only) and optionally also have fine-grained control over covariates normalization.
How does that sound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree do some extent, but think we shouldn't overthink this model. It is, as the author says an "embarrassingly simple model" and not sure how many extra features it can handle. Enable the normalization for covariates is just closest to the paper. Except maybe for future (at least for the real future part, not historic future) covariates which are handled by a separate layer anyway in this implementation, so it's ok to handle it different. So I'm not sure if it's a good idea making the implementation more and more complicated if the model most likely can not handle it..
"normalize": True, | ||
"pl_trainer_kwargs": tfm_kwargs["pl_trainer_kwargs"], | ||
}, | ||
80.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was curious why this would have larger error than the non-normalized one, so I tried it out (I ran a simplification of the test that was failing with 50
, e.g. test_single_ts
, ...):
import matplotlib.pyplot as plt
import numpy as np
from darts.models import NLinearModel
from darts.datasets import AirPassengersDataset
series = AirPassengersDataset().load().astype(np.float32)
model = NLinearModel(
input_chunk_length=24,
output_chunk_length=12,
random_state=0,
n_epochs=100,
normalize=True,
)
model.fit(series[:-36])
preds = model.predict(n=36, series=series[:-36])
series.plot()
preds.plot()
plt.show()
Seems like auto-regression is not performing as expected (this also happens in the current master
):

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I see it right, the problem happens somewhere here, where the input_past
(in your example len=24) get filled with 12 zeros on the left followed with the first output (len=12).
In my opinion, this is wrong and the zeros should be the past series instead. So the step size should be only output_chunk_length
(12) and not input_chunk_length
(24). I tried use roll_size=12
to overcome this, but got an exception...
You see this drift effect clearly with constant series, then, the input_past
will be e.g. [0.0]*12 + [value]*12
and therefore the forecast drops by value/2
in every autoregressive step.
import matplotlib.pyplot as plt
import numpy as np
from darts.models import NLinearModel
from darts.datasets import AirPassengersDataset
import darts.utils.timeseries_generation as tg
series = tg.constant_timeseries(length=240, value=10, freq="M") # value=10.0
model = NLinearModel(
input_chunk_length=24, # drift does not happen when same size as output_chunk_length
output_chunk_length=12,
random_state=1,
n_epochs=100,
normalize=True,
)
model.fit(series[:-36])
preds = model.predict(n=36, series=series[:-36]
# roll_size=12 # exception (but maybe i don't fully understand the parameter)
)
series.plot()
preds.plot()
It is however not present, when input_chunk_length
is the same size as output_chunk_length
.
Is this a general bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me setting the role_size=12 works fine (still the issue though with auto-regression). Also, the role_size is to output_chunk_length
if the user doesn't provide it. See here:
roll_size = self.output_chunk_length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Yes, now I see why it is necessary to clone x as well. Two things:
- Maybe a comment for
x.clone()
is reasonable because we have seen this is not obvious? - I think
detach()
(even if it doesn't change the outcome) is a helpful optimization because it gets excluded from the differentiation graph. See https://pytorch.org/docs/stable/generated/torch.clone.html
Co-authored-by: Dennis Bader <dennis.bader@gmx.ch>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @turbotimon, I added the related to cloning x
.
Everything is ready now. Thanks again for the great PR and apologies that this took a while :) 🚀
Checklist before merging this PR:
Fixes #2699.
Summary
shared_weighs=True
path (not covered by fix nlinear norm #2724)normalize=True
as the default now with an updates docstring.Full discussion and details, see #2699.
Other Information