End-to-End Project

31 / 32

End to End Project - Forecast Sales Quantities

Forecast sales quantities of each store and each product.

Input data available:

  1. Historical sales values (Location: /cxldata/datasets/project/sales_historical_sales_value.csv)

    • Store ID

    • Product ID

    • Datetime

    • Sales value (dependent)

  2. Historical Disposable Personal Income values (Location: /cxldata/datasets/project/sales_ disposable_personal_income.csv)

    • Datetime

    • Disposable Personal Income value

Additional features that can be computed are:

  1. Disposable Personal Income: As a leading indicator, this index changes before sales change. Observe the best lag that is of interest.

  2. Modeling parameters, including test.length, seasonality, observation.freq, and timeformat, needs to be input as well.

  3. Datetime

    • Date features: year, month, week of month, etc.

    • Time features

    • Season features

    • Weekday-and-weekend features

    • Holiday features: New Year, U.S. Labor Day, U.S. Thanksgiving, Cyber Monday, Christmas, etc.


  1. Consider only sales values greater than 20

  2. Divide the dataset into 2 years of training set and last 1 year of test set.

  3. Take a log transformation of the sales value (dependent variable)

Please use the forum below to discuss the problem and post queries.

Data source Acknowledgement: This dataset is taken from the UCI machine learning repository Azure-Blog-Storage-Template Data, and disposable income is taken from https://fred.stlouisfed.org/