Posts

Showing posts from August, 2020

Using Incorta and PySpark Linear Regression ML package to predict eCommerce Customer

Image
Project Overview I got a dataset from  kaggle.com .  Assumption: eCommerce company based in New York City that sells clothing online but they also have in-store style and clothing advice sessi ons. Customers come into the store, have sessions or meetings with a personal stylist, then they can go home and order either on a mobile app or website for the clothes they want. We need to predict 'Yearly Amount Spent' Here are the features or attributes collected in the dataset: 'Avg__Session_Length' 'Time_on_App' 'Time_on_Website' 'Length_of_Membership' Step 1: Upload csv file in incorta. Upload the CSV file to Incorta, and add a file table in the schema named Ecommerce_Customer. Step 2: Read the Ecommerce Customer file Use PySpark to read the table named SparkTesting.Ecommerce_Customer. The CSV file loaded into Incota can be read into PySpark using  df=read("SparkTesting.Ecommerce_Customer") Step 3: VectorAssemblerTest Use...

How to convert date to 'yyyymmdd' format in MV

Image
 I got this question from  community.incorta.com .  Question:  I want to convert CURRENT_DATE to format 'yyyyMMdd' in the materialized view,  I try CONVERT, and FORMAT function but it's not supported in SQL of MW, how I can do that?  Answer:

Merge onlinebookstore and classicmodels schemas as unifiedmodel in Incorta

Image
 I have two schemas, one is onlinebookstore and the other is classicmodels which comes from MySQL Sample Database . I want to merge them into a schema named unifiedmodel.  onlinebookstore ERD classicmodels ERD Here are some details about mapping between two schemas. Source_Table and Source_Column are from those two schemas. The Target_Table and Target_Column are from the unifiedmodel schema. Name and address are conformed to the unifiedmodel.  Customers Level: Orders Level: Products Level: I used the Materialized View SQL language in incorta to merge the data into a new schema: unifiedmodel. Four tables are created here, they are Customers, Orders, Orderdetails, and Products.  Here is the Customer table: Here is the Orders table: Here is the Orderdetails table: Here is the products table: After merging the schemas I can show Top and Bottom 10 Sales from both businesses.

Verify Primary Key in Incorta

Image
I’m not sure what is the primary key of a table, for example, this Orderitems table. Here is how I verified. My assumption is that an order can have multiple items and an item will be listed once an order. By grouping Orderitems with orderNumber and productCode, we should get one row per group. If we got more than one row, we can conclude that the combination of orderNumber and productCode is not unique. The result shows that no data returned. This verifies my assumption is true.

Using PySpark to calculate Orders and Promotion in Incorta

Image
First of all, I have an OnlineBookStore database. I have Orderitems and Promotion tables. Orderitems Table: Promotion Table: I want to calculate the total price of each order from the Orderitems table. Then compare with promotion and find the corresponding gift. So I used PySpark calculate Order Amount in Incorta. OrderAmount Table: