forecasting | Time Series Forecasting Best Practices & Examples | Machine Learning library
kandi X-RAY | forecasting Summary
Support
Quality
Security
License
Reuse
- Create features from pred_round
- Returns the week of the given date_time
- Generate a dataframe from a list of dictionaries
- Compute moving averages for a moving window
- Return a pandas dataframe with the given lag
- Combine features
- Define the training data schema
- Specify data schema
- Check if time format is correct
- Check that the feature column is a static feature column
- Check the frequency of a time series
- Check that the column names in df_col_names are valid
- Download OJdata files
- Download a file
- Returns the git repository path
- Get or create a Azure workspace
- Return an auth type
- Return the week of the given date_time
- Creates a dataframe from a cartesian product
forecasting Key Features
forecasting Examples and Code Snippets
gdown https://drive.google.com/uc?id=1PXzIOHGScWqhAWXnO4q70fch8wF1DJUx
python evaluate_outputs.py -gt ./outputs/ground_truth/ -pred ./outputs/sted/
import pandas as pd df = pd.DataFrame({ "id": [1, 1, 1, 1, 2, 2], "time": [1, 2, 3, 4, 8, 9], "x": [1, 2, 3, 4, 10, 11], "y": [5, 6, 7, 8, 12, 13], })
from tsfresh.utilities.dataframe_functions import roll_time_series df_rolled = roll_time_series(df, column_id="id", column_sort="time")
from tsfresh import extract_features df_features = extract_features(df_rolled, column_id="id", column_sort="time")
Trending Discussions on forecasting
Trending Discussions on forecasting
QUESTION
So I'm relatively new in R and I was wondering what's wrong with my loop forecasting multi-step time series.
I first have this loop to mimic the information set at time τ and estimate the models based on a rolling window of 1000 observation and make a one-step-ahead out-of-sample forecast with 726 out-of-sample observations.
require(highfrequency)
require(quantmod)
require(xts)
getSymbols("^VIX")
VIX_fcst <- VIX[, "VIX.Close"]
VIX_fcst$pred <- NA
for (i in 2000:2726) {
HAREstimated <- HARmodel(data = VIX_fcst[i: (i+ 999), "VIX.Close"], periods = c(1, 5 , 22), type = "HAR", inputType = "RM")
VIX_fcst$pred[i + 1000] <- predict(HAREstimated)
}
Basically, I make a regression on 1000 observations and store the prediction t+1 in a separate column and it works greatly.
Now I would like to take it further with a 5 ahead multi-step forecast by integrating an inner loop to find the t+5 prediction using prediction value instead of price for t+1, t+2...t+5. Here's a example for the first T+5 prediction of the 726 obs :
for (i in 2000:2004) {
HAREstimated2 <- HARmodel(data = VIX_fcst[i: (i+ 999), "VIX.Close"], periods = c(1, 5 , 22), type = "HAR", inputType = "RM")
VIX_fcst$VIX.Close[i + 1000] <- predict(HAREstimated2)
}
I thought storing the value in the VIX.Close column did the trick but the loop doesn't seem to consider the predictions which brings me back to one step ahead loop results.
Any ideas?
ANSWER
Answered 2022-Apr-05 at 01:12I finally found the solution by myself, here it is for those who will encounter this problem :
for (i in 2000:2722) {
for (j in i:(i+4)) {
if (j < (i+4)) {
HAREstimated <- HARmodel(data = VIX_fcst[j: (j + 999), "VIX.Close"], periods = c(1, 5 , 22), type = "HAR", inputType = "RM")
VIX_fcst$VIX.Close[j + 1000] <- predict(HAREstimated)
}
else {
HAREstimated <- HARmodel(data = VIX_fcst[j: (j + 999), "VIX.Close"], periods = c(1, 5 , 22), type = "HAR", inputType = "RM")
VIX_fcst$pred[j + 1000] <- predict(HAREstimated3)
VIX_fcst <- VIX[, "VIX.Close"]
}
}
}
Note that you need to remove 4 observations in the sample you want to forecast.
QUESTION
Given a time-series, I have a multi-step forecasting task, where I want to forecast the same number of times as time steps in a given sequence of the time-series. If I have the following model:
input1 = Input(shape=(n_timesteps, n_channels))
lstm = LSTM(units=100, activation='relu')(input1)
outputs = Dense(n_timesteps, activation="softmax")(lstm)
model = Model(inputs=input1, outputs=outputs)
model.compile(loss="mse", optimizer="adam",
metrics=["accuracy"])
The n_timesteps on the dense layer means that I will have n_timesteps predictions. But if I wrap the dense layer in a TimeDistributed (or equivalently set return_sequences=True in the LSTM layer), does the number of units still have to be n_timesteps or is it 1, since with the TimeDistributed I would be applying the dense layer to all the times steps in the sequence.
ANSWER
Answered 2022-Mar-30 at 16:16Based on the example you posted, the TimeDistributed will essentially apply a Dense
layer with a softmax
activation function to each timestep:
import tensorflow as tf
n_timesteps = 10
n_channels = 30
input1 = tf.keras.layers.Input(shape=(n_timesteps, n_channels))
lstm = tf.keras.layers.LSTM(units=100, activation='relu', return_sequences=True)(input1)
outputs = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(n_channels, activation="softmax"))(lstm)
model = tf.keras.Model(inputs=input1, outputs=outputs)
Note that each fully connected layer is equal to the size of the n_channels
in order to give each channel a fair chance of being predicted at timestep n
.
If you are working on a multi-label problem, you can try something like this:
import tensorflow as tf
n_timesteps = 10
features = 3
input1 = tf.keras.layers.Input(shape=(n_timesteps, features))
lstm = tf.keras.layers.LSTM(units=100, activation='relu', return_sequences=False)(input1)
outputs = tf.keras.layers.Dense(n_timesteps, activation="sigmoid")(lstm)
model = tf.keras.Model(inputs=input1, outputs=outputs)
x = tf.random.normal((1, n_timesteps, features))
y = tf.random.uniform((1, n_timesteps), dtype=tf.int32, maxval=2)
print(x)
print(y)
model.compile(optimizer='adam', loss='binary_crossentropy')
model.fit(x, y, epochs=2)
QUESTION
I have a .csv database file which looks like this:
Day Hour N1 N2 N3 N4 N5 ... N14 N15 N16 N17 N18 N19 N20
0 1996-03-18 15:00 4 9 10 16 21 ... 48 62 66 68 73 76 78
1 1996-03-19 15:00 6 12 15 19 28 ... 63 64 67 69 71 75 77
2 1996-03-21 15:00 2 4 6 7 15 ... 52 54 69 72 73 75 77
3 1996-03-22 15:00 3 8 15 17 19 ... 49 60 61 64 67 68 75
4 1996-03-25 15:00 2 10 11 14 18 ... 55 60 61 66 67 75 79
... ... ... .. .. .. .. .. ... ... ... ... ... ... ... ...
13596 2022-01-04 22:50 17 18 22 26 27 ... 64 65 71 72 73 76 80
13597 2022-01-05 15:00 1 5 8 14 15 ... 47 54 59 67 70 72 76
13598 2022-01-05 22:50 6 7 14 15 16 ... 54 55 59 61 70 71 80
13599 2022-01-06 15:00 9 10 11 17 28 ... 51 55 65 67 72 76 78
13600 2022-01-06 22:50 1 2 6 9 11 ... 51 52 54 67 68 73 75
I have found this article: https://machinelearningmastery.com/how-to-develop-convolutional-neural-network-models-for-time-series-forecasting/
But I am trying to develop a modified version of that 1D CNN model by using softmax function on the last layer and SparseCategoricalCrossentropy() as loss function and also by adding new functions to that code making it different.
This is my code so far and the model I am trying to build and use:
# multivariate output 1d cnn example
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' # or any {'0', '1', '2'}
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
# multivariate output 1d cnn example
from numpy import array
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import *
from tensorflow.keras.losses import *
from tensorflow.keras.layers import *
import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.callbacks import ReduceLROnPlateau
from tensorflow.keras.callbacks import ModelCheckpoint
# Define the Required Callback Function
class printlearningrate(tf.keras.callbacks.Callback):
def on_epoch_end (self, epoch, logs={}):
optimizer = self.model.optimizer
lr = K.eval(optimizer.lr)
Epoch_count = epoch + 1
print('\n', "Epoch:", Epoch_count, ', Learning Rate: {:.7f}'.format(lr))
printlr = printlearningrate()
# split a multivariate sequence into samples
def split_sequences (sequences, n_steps):
X, y = list(), list()
for i in range(len(sequences)):
# find the end of this pattern
end_ix = i + n_steps
# check if we are beyond the dataset
if end_ix > len(sequences) - 1:
break
# gather input and output parts of the pattern
seq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix, :]
X.append(seq_x)
y.append(seq_y)
return array(X), array(y)
df = pd.read_csv('DrawsDB.csv')
print(df)
# df['Time'] = df[['Day', 'Hour']].agg(' '.join, axis=1)
df.insert(0, 'Time', df[['Day', 'Hour']].agg(' '.join, axis=1))
df.drop(columns=['Day', 'Hour'], inplace=True)
df.set_index('Time', inplace=True)
print(df)
numpy_array = df.to_numpy()
print(type(numpy_array))
print(numpy_array)
# choose a number of time steps
n_steps = 10
# convert into input/output
X, y = split_sequences(numpy_array, n_steps)
print(X.shape, y.shape)
# the dataset knows the number of features, e.g. 2
n_features = X.shape[2]
# Reduce learning rate when nothing happens to lower more the loss:
reduce_lr = ReduceLROnPlateau(monitor='loss', factor=0.9888888888888889,
patience=10, min_lr=0.0000001, verbose=1)
epochs = 10
# saving best model every epoch with ModelCheckpoint:
checkpoint_filepath = 'C:\\Path\\To\\Saved\\CheckPoint\\model\\'
model_checkpoint_callback = ModelCheckpoint(
filepath=checkpoint_filepath,
monitor='loss',
save_best_only=True,
save_weights_only=True,
verbose=1)
# define model
model = Sequential()
model.add(Conv1D(filters=64, kernel_size=2, activation=LeakyReLU(), input_shape=(n_steps, n_features)))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(50, activation=LeakyReLU()))
model.add(Dense(n_features))
model.compile(optimizer=Nadam(lr=0.09), loss=SparseCategoricalCrossentropy(),
metrics=['accuracy', mean_squared_error, mean_absolute_error, mean_absolute_percentage_error])
# fit model
model.fit(X, y, epochs=10, verbose=2, callbacks=[printlr, reduce_lr, model_checkpoint_callback])
split_sequences function like its name says it is splitting the database by taking just N rows from it as input and trying to predict the all N+1 row from the database as the output.
However, I think there is a problem because I am getting this error every time I am trying to run the python script:
tensorflow.python.framework.errors_impl.InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [32,20] and labels shape [640]
[[node sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits
(defined at C:\Users\UserName\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\backend.py:5114)
]] [Op:__inference_train_function_1228]
Any idea on how to fix this problem, please?
Thank you in advance!
ANSWER
Answered 2022-Feb-06 at 19:25Assuming the labels are integers, they have the wrong shape for SparseCategoricalCrossentropy
. Check the docs. Try converting your y
to one-hot encoded labels:
y = tf.keras.utils.to_categorical(y, num_classes=20)
and change your loss function to CategoricalCrossentropy
:
model.compile(optimizer=Nadam(lr=0.09), loss=tf.keras.losses.CategoricalCrossentropy(),
metrics=['accuracy', mean_squared_error, mean_absolute_error, mean_absolute_percentage_error])
and it should work.
QUESTION
I'm currently using the ARIMA model to predict a stock price, SARIMAX(0,1,0). I wanted to forecast stock prices on the test dataset with 95% confidence interval. I'm following this tutorial posted July 2021, but some things have changed and I can't figure out what.
The original code are as follows:
# Forecast
fc, se, conf = fitted.forecast(321, alpha=0.05) # 95% conf
# Make as pandas series
fc_series = pd.Series(fc, index=test_data.index)
lower_series = pd.Series(conf[:, 0], index=test_data.index)
upper_series = pd.Series(conf[:, 1], index=test_data.index)
# Plot
plt.figure(figsize=(10,5), dpi=100)
plt.plot(train_data, label='training data')
plt.plot(test_data, color = 'blue', label='Actual Stock Price')
plt.plot(fc_series, color = 'orange',label='Predicted Stock Price')
plt.fill_between(lower_series.index, lower_series, upper_series,
color='k', alpha=.10)
plt.title('ARCH CAPITAL GROUP Stock Price Prediction')
plt.xlabel('Time')
plt.ylabel('ARCH CAPITAL GROUP Stock Price')
plt.legend(loc='upper left', fontsize=8)
plt.show()
My code are as the following:
model = sm.tsa.statespace.SARIMAX(df_log, trend='c', order=(0,1,0))
fitted = model.fit(disp=False)
print(fitted.summary())
result = fitted.forecast(57, alpha =0.05)
# Make as pandas series
fc_series = pd.Series(result[564:620],test.index)
lower_series = pd.Series(result[564], test.index)
upper_series = pd.Series(result[620], test.index)
# Plot
plt.figure(figsize=(10,5), dpi=100)
plt.plot(df_log, label='training data')
plt.plot(test, color = 'blue', label='Actual Stock Price')
plt.plot(fc_series, color = 'orange',label='Predicted Stock Price')
plt.fill_between(lower_series.index, lower_series, upper_series,
color='gray', alpha=.10)
plt.title('TSLA Stock Price Prediction')
plt.xlabel('Date')
plt.ylabel('TSLA Stock Price')
plt.legend(loc='best', fontsize=8)
plt.show()
I wanted the graph to look similarly as follows:
However, the predicted stock price doesn't show on mine. When I try to plot the predicted stock price on its own graph, it doesn't show at all. It seems that it failed to adopt the test.index consisting of dates.
Please help T.T
ANSWER
Answered 2022-Mar-28 at 11:23Note that to set the forecast indices and confidence intervals, we subtract 57 from the total number of elements. Data is also requested for the upper and lower confidence interval, for their subsequent drawing(conf_ins = fitted.get_forecast(57).summary_frame()).
import statsmodels.api as sm
import pandas_datareader.data as web
import matplotlib.pyplot as plt
df = web.DataReader('^GSPC', 'yahoo', start='2020-05-15', end='2021-10-01')
total = len(df)
aaa = 57
hist = total - aaa
model = sm.tsa.statespace.SARIMAX(df['Close'].values[:hist], trend='c', order=(0,1,0))
fitted = model.fit(disp=False)
result = fitted.forecast(aaa, alpha =0.05)
conf_ins = fitted.get_forecast(aaa).summary_frame()
ind = np.arange(total)
fig, ax = plt.subplots()
ax.plot(ind, df['Close'].values, label='Actual Stock Price')
ax.plot(ind[hist:], result,label='Predicted Stock Price')
ax.plot(ind[hist:], conf_ins['mean_ci_lower'])
ax.plot(ind[hist:], conf_ins['mean_ci_upper'])
ax.legend()
fig.autofmt_xdate()
plt.show()
Set time for axis. But forecasts began to be drawn stepwise. I can't figure out what this is related to yet. I leave both options. If it fits, then please vote).
import statsmodels.api as sm
import pandas_datareader.data as web
import matplotlib.pyplot as plt
df = web.DataReader('^GSPC', 'yahoo', start='2020-05-15', end='2021-10-01')
total = len(df)
aaa = 57
hist = total - aaa
model = sm.tsa.statespace.SARIMAX(df['Close'].values[:hist], trend='c', order=(0,1,0))
fitted = model.fit(disp=False)
result = fitted.forecast(aaa, alpha = 0.05)
conf_ins = fitted.get_forecast(aaa).summary_frame()
ind = np.arange(total)
fig, ax = plt.subplots()
ax.plot(df.index, df['Close'].values, label='Actual Stock Price')
ax.plot(df.index[hist:], result, label='Predicted Stock Price')
ax.plot(df.index[hist:], conf_ins['mean_ci_lower'])
ax.plot(df.index[hist:], conf_ins['mean_ci_upper'])
ax.legend()
fig.autofmt_xdate()
plt.show()
QUESTION
I want to use Dask for operations of the form
df.groupby(some_columns).apply(some_function)
where some_function()
may compute some summary statistics, perform timeseries forecasting, or even just save the group to a single file in AWS S3.
Dask documentation states (and several other StackOverflow answers cite) that groupby-apply is not appropriate for aggregations:
Pandas’ groupby-apply can be used to to apply arbitrary functions, including aggregations that result in one row per group. Dask’s groupby-apply will apply func once to each partition-group pair, so when func is a reduction you’ll end up with one row per partition-group pair. To apply a custom aggregation with Dask, use dask.dataframe.groupby.Aggregation.
It is not clear whether Aggregation
supports operations on multiple columns. However, this DataFrames tutorial seems to do exactly what I'm suggesting, with roughly some_function = lambda x: LinearRegression().fit(...)
. The example seems to work as intended, and I've similarly had no problems so far with e.g. some_function = lambda x: x.to_csv(...)
.
Under what conditions can I expect that some_function
will be passed all rows of the group? If this is never guaranteed, is there a way to break the LinearRegression
example? And most importantly, what is the best way to handle these use cases?
ANSWER
Answered 2021-Dec-10 at 06:13This is speculative, but maybe one way to work around the scenario where partition-group
leads to rows from a single group split across partitions is to explicitly re-partition the data in a way that ensures each group is associated with a unique partition.
One way to achieve that is by creating an index that is the same as the group identifier column. This in general is not a cheap operation, but it can be helped by pre-processing the data in a way that the group identifier is already sorted.
QUESTION
I'm going through the documentation of the sktime
package. One thing I just cannot find is the feature importance (that we'd get with sklearn
models) or model summary (like the one we can obtain from statsmodels
). Is it something that is just not implemented yet?
It seems that this functionality is implemented for models like AutoETS
or AutoARIMA
.
from matplotlib import pyplot as plt
from sktime.datasets import load_airline
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.forecasting.base import ForecastingHorizon
y = load_airline()
y_train,y_test = temporal_train_test_split(y)
fh = ForecastingHorizon(y_test.index, is_relative=False)
from sktime.forecasting.ets import AutoETS
model = AutoETS(trend='add',seasonal='mul',sp=12)
model.fit(y_train,fh=y_test.index)
model.summary()
I wonder if these summaries are accessible from instances like ForecastingPipeline
.
ANSWER
Answered 2022-Mar-21 at 09:35Ok, I was able to solve it myself. I'm really glad the functionality is there!
The source code for ForecastingPipeline
indicates that an instance of this class has an attribute steps_ - it holds the fitted instance of the model in a pipeline.
from sktime.forecasting.compose import ForecastingPipeline
model = ForecastingPipeline(steps=[
("forecaster", AutoETS(sp=1))])
model.fit(y_train)
model.steps_[-1][1].summary() # model.steps[-1][1].summary() would throw an error
The output of model.steps_
is [('forecaster', AutoETS())]
(as mentioned before AutoETS() is in this case already fitted).
QUESTION
I'm trying to classify time series data using SQL. I have data for a reference data point that occurs over 3 years. So the reference occurs 36 times, one for each month. Sometimes the quantity is 0, other times it may be 25 or even higher for each row. What I want to know is how to calculate these equations using SQL (MSSQL in particular).
Then, similarly, I want to classify the data into Erratic
, Smooth
, Lumpy
, and/or Intermittent
as seen here.
Smooth demand (ADI < 1.32 and CV² < 0.49). The demand is very regular in time and in quantity. It is therefore easy to forecast and you won’t have trouble reaching a low forecasting error level.
Intermittent demand (ADI >= 1.32 and CV² < 0.49). The demand history shows very little variation in demand quantity but a high variation in the interval between two demands. Though specific forecasting methods tackle intermittent demands, the forecast error margin is considerably higher.
Erratic demand (ADI < 1.32 and CV² >= 0.49). The demand has regular occurrences in time with high quantity variations. Your forecast accuracy remains shaky.
Lumpy demand (ADI >= 1.32 and CV² >= 0.49). The demand is characterized by a large variation in quantity and in time. It is actually impossible to produce a reliable forecast, no matter which forecasting tools you use. This particular type of demand pattern is unforecastable.
Here is the query that produces the table that I am working with.
SELECT
distinct
CHAN_ID
,PROD_CD
,CP_REF
,PARENT
,ORDERED_DATE
,QUANTITY
FROM DF_ALL_DEMAND_BY_ROW_V
where parent is not null
ANSWER
Answered 2022-Feb-09 at 22:00with data as (
select
CP_REF,
count(*) * 1.0 /
nullif(count(case when QUANTITY > 0 then 1 end), 0) as ADI,
stdevp(QUANTITY) / nullif(avg(QUANTITY), 0) as COV
from DF_ALL_DEMAND_BY_ROW_V
where parent is not null
group by CP_REF
)
select
CP_REF, ADI, COV,
case
when ADI < 1.32 and COV < 0.49 then 'Smooth'
when ADI >= 1.32 and COV < 0.49 then 'Intermittent'
when ADI < 1.32 and COV >= 0.49 then 'Erratic'
when ADI >= 1.32 and COV >= 0.49 then 'Lumpy'
else 'Smooth'
end as DEMAND
from data;
Double check that you want to use stdevp()
and not stdev
. I wish I were more knowledgeable about statistics.
QUESTION
I have converted a normal DF into a tsibble object and used that for my time-series forecasting. While fitting the model I experience the date format error- "Error in decimal_date.default(x) : date(s) not in POSIXt or Date format". As you could see from the below code- the converted tsibble object clearly identifies column "Week.1" as week date type. Could you please help me clarify why I'm still getting the date format when I fit forecast models to the tsibble object?
library(dplyr)
library(tsibble)
library(fpp3)
library(forecast)
library(fable)
Original.df<- structure(list(YearWeek = c("201901", "201902", "201903", "201904",
"201905", "201906", "201907", "201908", "201909", "201910", "201911",
"201912", "201913", "201914", "201915", "201916", "201917", "201918",
"201919", "201920", "201921", "201922", "201923", "201924", "201925",
"201926", "201927", "201928", "201929", "201930", "201931", "201932",
"201933", "201934", "201935", "201936", "201937", "201938", "201939",
"201940", "201941", "201942", "201943", "201944", "201945", "201946",
"201947", "201948", "201949", "201950", "201951", "201952", "202001",
"202002", "202003", "202004", "202005", "202006", "202007", "202008",
"202009", "202010", "202011", "202012", "202013", "202014", "202015",
"202016", "202017", "202018", "202019", "202020", "202021", "202022",
"202023", "202024", "202025", "202026", "202027", "202028", "202029",
"202030", "202031", "202032", "202033", "202034", "202035", "202036",
"202037", "202038", "202039", "202040", "202041", "202042", "202043",
"202044", "202045", "202046", "202047", "202048", "202049", "202050",
"202051", "202052", "202053", "202101", "202102", "202103", "202104",
"202105", "202106", "202107", "202108", "202109", "202110", "202111",
"202112", "202113", "202114", "202115", "202116", "202117", "202118",
"202119", "202120", "202121", "202122", "202123", "202124", "202125",
"202126", "202127", "202128", "202129", "202130", "202131", "202132",
"202133", "202134", "202135", "202136", "202137", "202138", "202139",
"202140", "202141", "202142", "202143"), Shipment = c(418, 1442,
1115, 1203, 1192, 1353, 1191, 1411, 933, 1384, 1362, 1353, 1739,
1751, 1595, 1380, 1711, 2058, 1843, 1602, 2195, 2159, 2009, 1812,
2195, 1763, 821, 1892, 1781, 2071, 1789, 1789, 1732, 1384, 1435,
1247, 1839, 2034, 1963, 1599, 1596, 1548, 1084, 1350, 1856, 1882,
1979, 1021, 1311, 2031, 1547, 591, 724, 1535, 1268, 1021, 1269,
1763, 1275, 1411, 1847, 1379, 1606, 1473, 1180, 926, 800, 840,
1375, 1755, 1902, 1921, 1743, 1275, 1425, 1088, 1416, 1168, 842,
1185, 1570, 1435, 1209, 1470, 1368, 1926, 1233, 1189, 1245, 1465,
1226, 887, 1489, 1369, 1358, 1179, 1200, 1226, 1066, 823, 1913,
2308, 1842, 910, 794, 1098, 1557, 1417, 1851, 1876, 1010, 160,
1803, 1607, 1185, 1347, 1700, 981, 1191, 1058, 1464, 1513, 1333,
1169, 1294, 978, 962, 1254, 987, 1290, 758, 436, 579, 636, 614,
906, 982, 649, 564, 502, 274, 473, 506, 902, 639, 810, 398, 488
), Production = c(0, 198, 1436, 1055, 1396, 1330, 1460, 1628,
1513, 1673, 1737, 1274, 1726, 1591, 2094, 1411, 2009, 1909, 1759,
1693, 1748, 1455, 2078, 1717, 1737, 1886, 862, 1382, 1779, 1423,
1460, 1454, 1347, 1409, 1203, 1235, 1397, 1563, 1411, 1455, 1706,
688, 1446, 1336, 1618, 1404, 1759, 746, 1560, 1665, 1317, 0,
441, 1390, 1392, 1180, 1477, 1265, 1485, 1495, 1543, 1584, 1575,
1609, 1233, 1420, 908, 1008, 1586, 1392, 1385, 1259, 1010, 973,
1053, 905, 1101, 1196, 891, 1033, 925, 889, 1136, 1058, 1179,
1047, 967, 900, 904, 986, 1014, 945, 1030, 1066, 1191, 1143,
1292, 574, 1174, 515, 1296, 1315, 1241, 0, 0, 1182, 1052, 1107,
1207, 1254, 1055, 258, 1471, 1344, 1353, 1265, 1444, 791, 1397,
1186, 1264, 1032, 949, 1059, 954, 798, 956, 1074, 1136, 1209,
975, 833, 994, 1127, 1153, 1202, 1234, 1336, 1484, 1515, 1151,
1175, 976, 1135, 1272, 869, 1900, 1173), Net.Production.Qty = c(22,
188, 1428, 1031, 1382, 1368, 1456, 1578, 1463, 1583, 1699, 1318,
1582, 1537, 2118, 1567, 1961, 1897, 1767, 1603, 1666, 1419, 2186,
1621, 1677, 1840, 698, 1290, 1411, 927, 1754, 1222, 1411, 1549,
1491, 1359, 1179, 1945, 1463, 1465, 1764, 764, 810, 1308, 1830,
1542, 1695, 544, 1482, 1673, 1659, 0, 445, 1358, 1364, 1224,
1417, 1239, 1387, 1595, 1469, 1624, 1643, 1763, 1217, 1456, 568,
1290, 1666, 1428, 1327, 773, 1118, 1231, 1143, 921, 1083, 1124,
935, 903, 937, 849, 1132, 1032, 1143, 1081, 891, 886, 880, 1002,
1072, 969, 1000, 996, 1243, 1183, 1306, 650, 1226, 553, 1306,
1379, 1359, 0, 0, 1182, 988, 1099, 1173, 1244, 1039, 254, 1425,
1318, 1385, 1221, 1364, 739, 1397, 1112, 1160, 924, 971, 1015,
978, 828, 868, 994, 1090, 1165, 783, 887, 934, 1023, 1045, 1114,
1052, 1186, 1456, 1401, 1249, 779, 430, 1625, 1498, 883, 1860,
1101), isoweek = c("2019-W01-1", "2019-W02-1", "2019-W03-1",
"2019-W04-1", "2019-W05-1", "2019-W06-1", "2019-W07-1", "2019-W08-1",
"2019-W09-1", "2019-W10-1", "2019-W11-1", "2019-W12-1", "2019-W13-1",
"2019-W14-1", "2019-W15-1", "2019-W16-1", "2019-W17-1", "2019-W18-1",
"2019-W19-1", "2019-W20-1", "2019-W21-1", "2019-W22-1", "2019-W23-1",
"2019-W24-1", "2019-W25-1", "2019-W26-1", "2019-W27-1", "2019-W28-1",
"2019-W29-1", "2019-W30-1", "2019-W31-1", "2019-W32-1", "2019-W33-1",
"2019-W34-1", "2019-W35-1", "2019-W36-1", "2019-W37-1", "2019-W38-1",
"2019-W39-1", "2019-W40-1", "2019-W41-1", "2019-W42-1", "2019-W43-1",
"2019-W44-1", "2019-W45-1", "2019-W46-1", "2019-W47-1", "2019-W48-1",
"2019-W49-1", "2019-W50-1", "2019-W51-1", "2019-W52-1", "2020-W01-1",
"2020-W02-1", "2020-W03-1", "2020-W04-1", "2020-W05-1", "2020-W06-1",
"2020-W07-1", "2020-W08-1", "2020-W09-1", "2020-W10-1", "2020-W11-1",
"2020-W12-1", "2020-W13-1", "2020-W14-1", "2020-W15-1", "2020-W16-1",
"2020-W17-1", "2020-W18-1", "2020-W19-1", "2020-W20-1", "2020-W21-1",
"2020-W22-1", "2020-W23-1", "2020-W24-1", "2020-W25-1", "2020-W26-1",
"2020-W27-1", "2020-W28-1", "2020-W29-1", "2020-W30-1", "2020-W31-1",
"2020-W32-1", "2020-W33-1", "2020-W34-1", "2020-W35-1", "2020-W36-1",
"2020-W37-1", "2020-W38-1", "2020-W39-1", "2020-W40-1", "2020-W41-1",
"2020-W42-1", "2020-W43-1", "2020-W44-1", "2020-W45-1", "2020-W46-1",
"2020-W47-1", "2020-W48-1", "2020-W49-1", "2020-W50-1", "2020-W51-1",
"2020-W52-1", "2020-W53-1", "2021-W01-1", "2021-W02-1", "2021-W03-1",
"2021-W04-1", "2021-W05-1", "2021-W06-1", "2021-W07-1", "2021-W08-1",
"2021-W09-1", "2021-W10-1", "2021-W11-1", "2021-W12-1", "2021-W13-1",
"2021-W14-1", "2021-W15-1", "2021-W16-1", "2021-W17-1", "2021-W18-1",
"2021-W19-1", "2021-W20-1", "2021-W21-1", "2021-W22-1", "2021-W23-1",
"2021-W24-1", "2021-W25-1", "2021-W26-1", "2021-W27-1", "2021-W28-1",
"2021-W29-1", "2021-W30-1", "2021-W31-1", "2021-W32-1", "2021-W33-1",
"2021-W34-1", "2021-W35-1", "2021-W36-1", "2021-W37-1", "2021-W38-1",
"2021-W39-1", "2021-W40-1", "2021-W41-1", "2021-W42-1", "2021-W43-1"
), date = structure(c(17896, 17903, 17910, 17917, 17924, 17931,
17938, 17945, 17952, 17959, 17966, 17973, 17980, 17987, 17994,
18001, 18008, 18015, 18022, 18029, 18036, 18043, 18050, 18057,
18064, 18071, 18078, 18085, 18092, 18099, 18106, 18113, 18120,
18127, 18134, 18141, 18148, 18155, 18162, 18169, 18176, 18183,
18190, 18197, 18204, 18211, 18218, 18225, 18232, 18239, 18246,
18253, 18260, 18267, 18274, 18281, 18288, 18295, 18302, 18309,
18316, 18323, 18330, 18337, 18344, 18351, 18358, 18365, 18372,
18379, 18386, 18393, 18400, 18407, 18414, 18421, 18428, 18435,
18442, 18449, 18456, 18463, 18470, 18477, 18484, 18491, 18498,
18505, 18512, 18519, 18526, 18533, 18540, 18547, 18554, 18561,
18568, 18575, 18582, 18589, 18596, 18603, 18610, 18617, 18624,
18631, 18638, 18645, 18652, 18659, 18666, 18673, 18680, 18687,
18694, 18701, 18708, 18715, 18722, 18729, 18736, 18743, 18750,
18757, 18764, 18771, 18778, 18785, 18792, 18799, 18806, 18813,
18820, 18827, 18834, 18841, 18848, 18855, 18862, 18869, 18876,
18883, 18890, 18897, 18904, 18911, 18918, 18925), class = "Date")), row.names = c(NA,
148L), class = "data.frame")
# Converting the df to accomodate leap year for weekly observations
Original.df <- Original.df %>%
mutate(
isoweek =stringr::str_replace(YearWeek, "^(\\d{4})(\\d{2})$", "\\1-W\\2-1"),
date = ISOweek::ISOweek2date(isoweek)
)
View(Original.df)
# creating test and train data
Original.train.df <- Original.df %>%
filter(date >= "2018-12-31", date <= "2021-03-29")
Original.test.df <- Original.df %>%
filter(date >= "2021-04-05", date <= "2021-10-25")
# splitting the original train data with multiple variables to have only one variable(univariate time series)
Net.Production.train.df <- Original.train.df %>%
mutate(Week.1 = yearweek(ISOweek::ISOweek(date))) %>%
select(-YearWeek, -Shipment, -Production, -date,-isoweek) %>%
as_tsibble(index = Week.1)
Net.Production.train.df
class(Net.Production.train.df$Week.1)
#Fitting forecast model(Arima errors) to Net.Production.qty
bestfit.Net.Prod <- list(aicc=Inf)
for(K in seq(25))
{
fit.Net.Prod <- auto.arima(Net.Production.train.df, xreg=fourier(Net.Production.train.df, K=K), seasonal=FALSE,approximation = F)
if(fit.Net.Prod$aicc < bestfit.Net.Prod$aicc)
{
bestfit.Net.Prod <- fit.Net.Prod
bestK.Net.Prod <- K
}
}
forecast.net.prod<- forecast(bestfit.Net.Prod,xreg = fourier(Net.Production.train.df,K=bestK.Net.Prod,h=30))
forecast.net.prod
Please advise Thank you
ANSWER
Answered 2021-Dec-07 at 13:28You are mixing 2 different ways of doing forecasts. you either use fable or you use forecast. auto.arima
is from the forecast package. Though it does work with fable, it is better to keep everything to the same package eco system. Fable is the successor of forecast. Your library loading problably conflicted somewhere.
For arima forecasts check out chapter 9.7 from Forecasting: Principles and Practice 3rd edition.
I adjusted your code to work with fable. I have included 2 ways of doing this. My preference is the second one, because then you can see the difference in AICc values and see that they are very close to each other.
library(fpp3)
#... your code until just before the loop
# placeholder for the AICc
bestfit.Net.AICc <- Inf
for(K in seq(25)){
fit <- Net.Production.train.df %>%
model(ARIMA(Net.Production.Qty ~ fourier(K = K), approximation = FALSE))
if(purrr::pluck(glance(fit), "AICc") < bestfit.Net.AICc)
{
bestfit.Net.AICc <- purrr::pluck(glance(fit), "AICc")
bestfit.Net.Prod <- fit
bestK.Net.Prod <- K
}
}
bestK.Net.Prod # in my case 13
glance(bestfit.Net.Prod)
# A tibble: 1 x 8
.model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
1 ARIMA(Net.Production.Qty ~ fourier(K = K), approximation = FALSE) 96156. -822. 1702. 1722. 1782. %
forecast(h = 30) %>%
autoplot(Net.Production.train.df)
Second option:
#... your code until just before the loop
fit_all_models <- list()
for(K in seq(25)){
fit <- Net.Production.train.df %>%
model(ARIMA(Net.Production.Qty ~ fourier(K = K), approximation = FALSE))
names(fit) <- paste0("arima_", K)
fit_all_models <- bind_cols(fit_all_models, fit)
}
glance(fit_all_models) %>% arrange(AICc) %>% select(.model:BIC)
# A tibble: 25 x 6
.model sigma2 log_lik AIC AICc BIC
1 arima_13 96156. -822. 1702. 1722. 1782.
2 arima_11 104327. -829. 1709. 1723. 1778.
3 arima_12 102962. -827. 1709. 1726. 1783.
4 arima_14 95447. -820. 1702. 1726. 1788.
5 arima_10 108961. -833. 1713. 1726. 1780.
6 arima_5 127801. -848. 1725. 1730. 1767.
7 arima_8 117956. -839. 1721. 1730. 1779.
8 arima_15 95685. -819. 1704. 1731. 1795.
9 arima_6 127660. -846. 1727. 1733. 1774.
10 arima_9 123129. -842. 1724. 1733. 1779.
# ... with 15 more rows
best_model <- glance(fit_all_models) %>%
filter(AICc == min(AICc)) %>%
select(.model) %>%
as.character
# run a forecast and plot it
fit_all_models %>%
select(best_model) %>%
forecast(h = 30) %>%
autoplot(Net.Production.train.df)
QUESTION
I am doing some time series forecasting analysis with the fable
and fabletools
package and I am interested in comparing the accuracy of individual models and also a mixed model (consisting of the individual models I am using).
Here is some example code with a mock dataframe:-
library(fable)
library(fabletools)
library(distributional)
library(tidyverse)
library(imputeTS)
#creating mock dataframe
set.seed(1)
Date<-seq(as.Date("2018-01-01"), as.Date("2021-03-19"), by = "1 day")
Count<-rnorm(length(Date),mean = 2086, sd= 728)
Count<-round(Count)
df<-data.frame(Date,Count)
df
#===================redoing with new model================
df$Count<-abs(df$Count)#in case there is any negative values, force them to be absolute
count_data<-as_tsibble(df)
count_data<-imputeTS::na.mean(count_data)
testfrac<-count_data%>%arrange(Date)%>%sample_frac(0.8)
lastdate<-last(testfrac$Date)
#train data
train <- count_data %>%
#sample_frac(0.8)
filter(Date<=as.Date(lastdate))
set.seed(1)
fit <- train %>%
model(
ets = ETS(Count),
arima = ARIMA(Count),
snaive = SNAIVE(Count),
croston= CROSTON(Count),
ave=MEAN(Count),
naive=NAIVE(Count),
neural=NNETAR(Count),
lm=TSLM(Count ~ trend()+season())
) %>%
mutate(mixed = (ets + arima + snaive + croston + ave + naive + neural + lm) /8)# creates a combined model using the averages of all individual models
fc <- fit %>% forecast(h = 7)
accuracy(fc,count_data)
fc_accuracy <- accuracy(fc, count_data,
measures = list(
point_accuracy_measures,
interval_accuracy_measures,
distribution_accuracy_measures
)
)
fc_accuracy
# A tibble: 9 x 13
# .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1 winkler percentile CRPS
#
#1 arima Test -191. 983. 744. -38.1 51.8 0.939 0.967 -0.308 5769. 567. 561.
#2 ave Test -191. 983. 744. -38.1 51.8 0.939 0.967 -0.308 5765. 566. 561.
#3 croston Test -191. 983. 745. -38.2 51.9 0.940 0.968 -0.308 29788. 745. 745.
#4 ets Test -189. 983. 743. -38.0 51.7 0.938 0.967 -0.308 5759. 566. 560.
#5 lm Test -154. 1017. 742. -36.5 51.1 0.937 1.00 -0.307 6417. 583. 577.
#6 mixed Test -173. 997. 747. -36.8 51.1 0.944 0.981 -0.328 29897. 747. 747.
#7 naive Test 99.9 970. 612. -19.0 38.7 0.772 0.954 -0.308 7856. 692. 685.
#8 neural Test -322. 1139. 934. -49.6 66.3 1.18 1.12 -0.404 26361. 852. 848.
#9 snaive Test -244 1192. 896. -37.1 55.5 1.13 1.17 -0.244 4663. 690. 683.
I demonstrate how to create a mixed model. However, there can be some individual models which hamper the performance of a mixed model when added to it; in other words, the mixed model could be potentially improved if it did not include the individual models which skews the accuracy in a detrimental way.
Desired outcome
What I would like to achieve is to be able to test all of the possible combinations of individual models and returns the mixed model with the most optimum performance on one of the accuracy metrics, for instance, Mean Absolute Error (MAE). But I am not sure how to do this in an automated way as there are many potential combinations.
Can someone suggest or share some code as to how I could do this?
ANSWER
Answered 2021-Dec-07 at 11:04A couple of things to consider:
- While it's definitely desirable to quickly evaluate the performance of many combination models, it's pretty impractical. The best option would be to evaluate your models individually, and then create a more simple combination using, e.g. the 2 or 3 best ones
- As an example, consider that you can actually have weighted combinations - e.g.
0.75 * ets + 0.25 * arima
. The possibilities are now literally endless, so you start to see the limitations of the brute-force method (N.B. I don't thinkfable
actually supports these kind of combinations yet though).
That, said, here's one approach you could use to generate all the possible combinations. Note that this might take a prohibitively long time to run - but should give you what you're after.
# Get a table of models to get combinations from
fit <- train %>%
model(
ets = ETS(Count),
arima = ARIMA(Count),
snaive = SNAIVE(Count),
croston= CROSTON(Count),
ave=MEAN(Count),
naive=NAIVE(Count),
neural=NNETAR(Count),
lm=TSLM(Count ~ trend()+season())
)
# Start with a vector containing all the models we want to combine
models <- c("ets", "arima", "snaive", "croston", "ave", "naive", "neural", "lm")
# Generate a table of combinations - if a value is 1, that indicates that
# the model should be included in the combinations
combinations <- models %>%
purrr::set_names() %>%
map(~0:1) %>%
tidyr::crossing(!!!.)
combinations
#> # A tibble: 256 x 8
#> ets arima snaive croston ave naive neural lm
#>
#> 1 0 0 0 0 0 0 0 0
#> 2 0 0 0 0 0 0 0 1
#> 3 0 0 0 0 0 0 1 0
#> 4 0 0 0 0 0 0 1 1
#> 5 0 0 0 0 0 1 0 0
#> 6 0 0 0 0 0 1 0 1
#> 7 0 0 0 0 0 1 1 0
#> 8 0 0 0 0 0 1 1 1
#> 9 0 0 0 0 1 0 0 0
#> 10 0 0 0 0 1 0 0 1
#> # ... with 246 more rows
# This just filters for combinations with at least 2 models
relevant_combinations <- combinations %>%
filter(rowSums(across()) > 1)
# We can use this table to generate the code we would put in a call to `mutate()`
# to generate the combination. {fable} does something funny with code
# evaluation here, meaning that more elegant approaches are more trouble
# than they're worth
specs <- relevant_combinations %>%
mutate(id = row_number()) %>%
pivot_longer(-id, names_to = "model", values_to = "flag_present") %>%
filter(flag_present == 1) %>%
group_by(id) %>%
summarise(
desc = glue::glue_collapse(model, "_"),
model = glue::glue(
"({model_sums}) / {n_models}",
model_sums = glue::glue_collapse(model, " + "),
n_models = n()
)
) %>%
select(-id) %>%
pivot_wider(names_from = desc, values_from = model)
# This is what the `specs` table looks like:
specs
#> # A tibble: 1 x 247
#> neural_lm naive_lm naive_neural naive_neural_lm ave_lm ave_neural
#>
#> 1 (neural + lm) / 2 (naive +~ (naive + neu~ (naive + neural ~ (ave +~ (ave + ne~
#> # ... with 241 more variables: ave_neural_lm , ave_naive ,
#> # ave_naive_lm , ave_naive_neural , ave_naive_neural_lm ,
#> # croston_lm , croston_neural , croston_neural_lm ,
#> # croston_naive , croston_naive_lm , croston_naive_neural ,
#> # croston_naive_neural_lm , croston_ave , croston_ave_lm ,
#> # croston_ave_neural , croston_ave_neural_lm ,
#> # croston_ave_naive , croston_ave_naive_lm , ...
# We can combine our two tables and evaluate the generated code to produce
# combination models as follows:
combinations <- fit %>%
bind_cols(rename_with(specs, ~paste0("spec_", .))) %>%
mutate(across(starts_with("spec"), ~eval(parse(text = .))))
# Compute the accuracy for 2 random combinations to demonstrate:
combinations %>%
select(sample(seq_len(ncol(.)), 2)) %>%
forecast(h = 7) %>%
accuracy(count_data, measures = list(
point_accuracy_measures,
interval_accuracy_measures,
distribution_accuracy_measures
))
#> # A tibble: 2 x 13
#> .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1 winkler
#>
#> 1 spec_ets_arima~ Test -209. 1014. 771. -40.1 54.0 0.973 0.998 -0.327 30825.
#> 2 spec_ets_snaiv~ Test -145. 983. 726. -34.5 48.9 0.917 0.967 -0.316 29052.
#> # ... with 2 more variables: percentile , CRPS
QUESTION
I want to predict daily volatility by EGARCH(1,1) model using arch
package.
Interval of Prediction: 01-04-2015
to 12-06-2018
(mm-dd-yyyy format)
hence i should grab data (for example) from 2013
till 2015
to fit EGARCH(1,1) model on it, and then predict daily volatility for 01-04-2015
to 12-06-2018
so i tried to write it like this:
# Packages That we need
from pandas_datareader import data as web
from arch import arch_model
import pandas as pd
#---------------------------------------
# grab Microsoft daily adjusted close price data from '01-03-2013' to '12-06-2018' and store it in DataFrame
df = pd.DataFrame(web.get_data_yahoo('MSFT' , start='01-03-2013' , end='12-06-2018')['Adj Close'])
#---------------------------------------
# calculate daily rate of return that is necessary for predicting daily Volatility by EGARCH
daily_rate_of_return_EGARCH = np.log(df.loc[ : '01-04-2015']/df.loc[ : '01-04-2015'].shift())
# drop NaN values
daily_rate_of_return_EGARCH = daily_rate_of_return_EGARCH.dropna()
#---------------------------------------
# Volatility Forecasting By EGARCH(1,1)
model_EGARCH = arch_model(daily_rate_of_return_EGARCH, vol='EGARCH' , p = 1 , o = 0 , q = 1)
fitted_EGARCH = model_EGARCH.fit(disp='off')
#---------------------------------------
# and finally, Forecasting step
# Note that as mentioned in `purpose` section, predict interval should be from '01-04-2015' to end of the data frame
horizon = len(df.loc['01-04-2015' : ])
volatility_FORECASTED = fitted_EGARCH.forecast(horizon = horizon , method='simulation')
and then i got this error:
MemoryError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_12900/1021856026.py in
1 horizon = len(df.loc['01-04-2015':])
----> 2 volatility_FORECASTED = fitted_EGARCH.forecast(horizon = horizon , method='simulation')
MemoryError: Unable to allocate 3.71 GiB for an array with shape (503, 1000, 989) and data type float64
seems arch is going to save huge amount of data.
Expected Resultwhat i expect, is a simple pandas.Series
that contains daily volatility predictions from '01-04-2015'
until '12-06-2018'
. precisely i mean smth like this:
(Note: date format --> mm-dd-yyyy)
(DATE) (VOLATILITY)
'01-04-2015' .....
'01-05-2015' .....
'01-06-2015' .....
. .
. .
. .
'12-06-2018' .....
How can i achieve this?
ANSWER
Answered 2021-Nov-25 at 09:50You only need to pass the reindex=False
keyword and the memory requirement drops dramatically. You need a recent version of the arch package to use this feature which changes the output shape of the forecast to include only the forecast values, and so the alignment is different from the historical behavior.
# Packages That we need
from pandas_datareader import data as web
from arch import arch_model
import pandas as pd
#---------------------------------------
# grab Microsoft daily adjusted close price data from '01-03-2013' to '12-06-2018' and store it in DataFrame
df = pd.DataFrame(web.get_data_yahoo('MSFT' , start='01-03-2013' , end='12-06-2018')['Adj Close'])
#---------------------------------------
# calculate daily rate of return that is necessary for predicting daily Volatility by EGARCH
daily_rate_of_return_EGARCH = np.log(df.loc[ : '01-04-2015']/df.loc[ : '01-04-2015'].shift())
# drop NaN values
daily_rate_of_return_EGARCH = daily_rate_of_return_EGARCH.dropna()
#---------------------------------------
# Volatility Forecasting By EGARCH(1,1)
model_EGARCH = arch_model(daily_rate_of_return_EGARCH, vol='EGARCH' , p = 1 , o = 0 , q = 1)
fitted_EGARCH = model_EGARCH.fit(disp='off')
#---------------------------------------
# and finally, Forecasting step
# Note that as mentioned in `purpose` section, predict interval should be from '01-04-2015' to end of the data frame
horizon = len(df.loc['01-04-2015' : ])
volatility_FORECASTED = fitted_EGARCH.forecast(horizon = horizon , method='simulation', reindex=False)
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install forecasting
Install Anaconda with Python >= 3.6. Miniconda is a quick way to get started.
Clone the repository git clone https://github.com/microsoft/forecasting cd forecasting/
Run setup scripts to create conda environment. Please execute one of the following commands from the root of Forecasting repo based on your operating system. Linux ./tools/environment_setup.sh Windows tools\environment_setup.bat Note that for Windows you need to run the batch script from Anaconda Prompt. The script creates a conda environment forecasting_env and installs the forecasting utility library fclib.
Start the Jupyter notebook server jupyter notebook
Run the LightGBM single-round notebook under the 00_quick_start folder. Make sure that the selected Jupyter kernel is forecasting_env.
We assume you already have R installed on your machine. If not, simply follow the instructions on CRAN to download and install R. The recommended editor is RStudio, which supports interactive editing and previewing of R notebooks. However, you can use any editor or IDE that supports RMarkdown. In particular, Visual Studio Code with the R extension can be used to edit and render the notebook files. The rendered .nb.html files can be viewed in any modern web browser. The examples use the Tidyverts family of packages, which is a modern framework for time series analysis that builds on the widely-used Tidyverse family. The Tidyverts framework is still under active development, so it's recommended that you update your packages regularly to get the latest bug fixes and features.
Support
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesExplore Kits - Develop, implement, customize Projects, Custom Functions and Applications with kandi kits
Save this library and start creating your kit
Share this Page