Using Incorta and PySpark Linear Regression ML package to predict eCommerce Customer
Project Overview I got a dataset from kaggle.com . Assumption: eCommerce company based in New York City that sells clothing online but they also have in-store style and clothing advice sessi ons. Customers come into the store, have sessions or meetings with a personal stylist, then they can go home and order either on a mobile app or website for the clothes they want. We need to predict 'Yearly Amount Spent' Here are the features or attributes collected in the dataset: 'Avg__Session_Length' 'Time_on_App' 'Time_on_Website' 'Length_of_Membership' Step 1: Upload csv file in incorta. Upload the CSV file to Incorta, and add a file table in the schema named Ecommerce_Customer. Step 2: Read the Ecommerce Customer file Use PySpark to read the table named SparkTesting.Ecommerce_Customer. The CSV file loaded into Incota can be read into PySpark using df=read("SparkTesting.Ecommerce_Customer") Step 3: VectorAssemblerTest Use...