Posts

Showing posts from March, 2021

Using the pickle library to save the model and use the model in Incorta

In the previous blog, I was Using Time Series Analysis By Prophet.  https://suziepyspark.blogspot.com/2021/03/using-time-series-analysis-by-prophet.html In this blog, I will use Pickle to Save and use the model in Incorta.  In machine learning, we often need to store the trained model so that we can directly read the model when making a decision without retraining the model, which greatly saves time. The pickle module provided by Python solves this problem well. It can serialize objects and save them to disk and read them out when needed. Any object can be serialized. Below is how to use the model: Below is how to save the model: Reference:  "Pickle Serialization Study Notes - Programmer Sought". Programmersought.Com, 2021, https://www.programmersought.com/article/15805994125/. "Save Model For Python · Issue #725 · Facebook/Prophet". Github, 2021, https://github.com/facebook/prophet/issues/725.

Using time series analysis by prophet

Image
 I got the dataset from Kaggle for practicing time series analysis.   https://www.kaggle.com/felixzhao/productdemandforecasting    This dataset includes historical product demand by products and warehouses between 2011 and 2017.  I used time series analysis in Incorta notebooks using Facebook Prophet. The Prophet library is designed to make predictions on univariate time series data sets. It is easy to use and is designed to automatically find a good set of hyperparameters for the model, and make proficient predictions on data with the trend and seasonal structure.  First, I put the dataset into Spark, and then converted it into a Pandas data frame, and used the Prophet model to train the data.  The picture below is displayed on Incorta's dashboard, we can see the successful prediction of the 2018 trend. In this Prophet model, the challenge I encountered was that I had no way to save prophet.fit(monthly_npdf), so I can only predict the order demand for one product for now. Using th

Using time series analysis (Part 1)

Image
I got the dataset from Kaggle for practicing time series analysis.   https://www.kaggle.com/felixzhao/productdemandforecasting    This dataset includes historical product demand by products and warehouses between 2011 and 2017. I loaded data into Incorta, and use the Incorta API to read the data.  I first did data profiling to summarize the data. I found the data can be categorized by product code, warehouse, and product category. I plan to find the time series based on different product categories and warehouses.  It was quite challenging to use pandas time series related functions. I need to define the index on a DateTime field, but I can not directly use the date or timestamp field from Spark. Finally, It worked after I use a string field and cover it to date time with Pandas.  If we don't consider product categories, it does not look like we have time series pattern or trend.