Using time series analysis (Part 1)

I got the dataset from Kaggle for practicing time series analysis. 

 https://www.kaggle.com/felixzhao/productdemandforecasting  

This dataset includes historical product demand by products and warehouses between 2011 and 2017. I loaded data into Incorta, and use the Incorta API to read the data. 


I first did data profiling to summarize the data. I found the data can be categorized by product code, warehouse, and product category. I plan to find the time series based on different product categories and warehouses. 


It was quite challenging to use pandas time series related functions. I need to define the index on a DateTime field, but I can not directly use the date or timestamp field from Spark. Finally, It worked after I use a string field and cover it to date time with Pandas. 

If we don't consider product categories, it does not look like we have time series pattern or trend. 



Comments

Popular posts from this blog

How to create histogram in Incorta use bin function.

Using Github to version and manage notebooks(Jupyter notebooks)

Using Incorta and PySpark Linear Regression ML package to predict eCommerce Customer