Posts

Using Time Series Analysis Electric Production by FB Prophet

Image
I got the dataset from Kaggle for practicing time series analysis.  https://www.kaggle.com/kandij/electric-production    The data set used here has 2 columns, one column is the date and the other column relates to the consumption percentage. It includes the data from Dec 31,1984 to Dec 31,2017. For predicting the consumption of electricity in the coming future.  I used time series analysis in Incorta notebooks using Facebook Prophet.  Here is the result in Incorta. The blue line showing original data, and the green line showing predict electric production. 

Using Time Series Analysis Electric Production by ARIMA Model

Image
I got the dataset from Kaggle for practicing time series analysis.  https://www.kaggle.com/kandij/electric-production    The data set used here has 2 columns, one column is the date and the other column relates to the consumption percentage. It includes the data from Dec 31,1984 to Dec 31,2017. For predicting the consumption of electricity in the coming future.  I loaded data into Incorta, and use the Incorta API to read the data from Jupyter notebook. I save the model in the external notebook and use data in Incorta.  First, I import data to Incorta. Then I use the Incorta API to read the data in Jupyter notebooks. Then, I use the model in Incorta.  Here is the result in Incorta. The blue line showing original data, and the green line showing predict electric production. 

Using the pickle library to save the model and use the model in Incorta

In the previous blog, I was Using Time Series Analysis By Prophet.  https://suziepyspark.blogspot.com/2021/03/using-time-series-analysis-by-prophet.html In this blog, I will use Pickle to Save and use the model in Incorta.  In machine learning, we often need to store the trained model so that we can directly read the model when making a decision without retraining the model, which greatly saves time. The pickle module provided by Python solves this problem well. It can serialize objects and save them to disk and read them out when needed. Any object can be serialized. Below is how to use the model: Below is how to save the model: Reference:  "Pickle Serialization Study Notes - Programmer Sought". Programmersought.Com, 2021, https://www.programmersought.com/article/15805994125/. "Save Model For Python · Issue #725 · Facebook/Prophet". Github, 2021, https://github.com/facebook/prophet/issues/725.

Using time series analysis by prophet

Image
 I got the dataset from Kaggle for practicing time series analysis.   https://www.kaggle.com/felixzhao/productdemandforecasting    This dataset includes historical product demand by products and warehouses between 2011 and 2017.  I used time series analysis in Incorta notebooks using Facebook Prophet. The Prophet library is designed to make predictions on univariate time series data sets. It is easy to use and is designed to automatically find a good set of hyperparameters for the model, and make proficient predictions on data with the trend and seasonal structure.  First, I put the dataset into Spark, and then converted it into a Pandas data frame, and used the Prophet model to train the data.  The picture below is displayed on Incorta's dashboard, we can see the successful prediction of the 2018 trend. In this Prophet model, the challenge I encountered was that I had no way to save prophet.fit(monthly_npdf), so I can only predict the order demand for one product for now. Using th

Using time series analysis (Part 1)

Image
I got the dataset from Kaggle for practicing time series analysis.   https://www.kaggle.com/felixzhao/productdemandforecasting    This dataset includes historical product demand by products and warehouses between 2011 and 2017. I loaded data into Incorta, and use the Incorta API to read the data.  I first did data profiling to summarize the data. I found the data can be categorized by product code, warehouse, and product category. I plan to find the time series based on different product categories and warehouses.  It was quite challenging to use pandas time series related functions. I need to define the index on a DateTime field, but I can not directly use the date or timestamp field from Spark. Finally, It worked after I use a string field and cover it to date time with Pandas.  If we don't consider product categories, it does not look like we have time series pattern or trend. 

Using Github to version and manage notebooks(Jupyter notebooks)

Image
Version Control and Manage is a vital part of data science workflows. Between multiple experiments, it is essential to know what changed and which updates were made by which team member. We can use Github to version and manage notebooks(Jupyter notebooks) following the below steps.  Step 1: Go to the Incorta environment at the terminal.  $ ssh -i <key file> incorta@<IP address> Step 2: Find Jupyter file path follow below command: $ cd / $ find . -name '*.ipynb' -print We can see the Jupyter notebooks file path is /home/incorta/Notebooks Step 3: Go to Github, create a new repository.  Step 4: Then clone this link under the Notebooks directory. $ git clone https://github.com/SuzieJi/Jupyter-Notebooks We can see the folder in Jupyter notebooks. Step 5: Go to the git repository directory that we cloned from Github.  $ cd < directory >/ Git config (When the first time) $ git config --global user.name "xxx" $ git config --global user.email "xxxxx"

Read parquet file via data lake connector in Incorta

Image
Incorta allow to read parquet file, this is how to read parquet file via data lake connector in Incorta.  Step 1: Save the parquet file in external notebooks(Jupyter notebooks) Step 2: In Incorta external data source, Add a new data source. Step 3: Select 'data lake - local files', give directory path. Step 4: Go to Incorta Schema add a new data lake table.