This is the second part of the fiscal calendar MV, the Spark SQL part, through this part we can get day number, day name, fiscal week number, fiscal year start date, fiscal week end date, fiscal week of year, fiscal week start date, fiscal day seq, fiscal week date, fiscal day seq, fiscal week seq, fiscal year seq, fiscal day ago date, fiscal week ago date. Divide the day of the year by 7, and if the remainder is 0, the week number is 7, which represents Sunday. Other week numbers are remainders. Pushing the current date backward the days of the year is the fiscal year's start date. To get the fiscal week of year. floor((doy-1)/7) . divided day of the year by 7 and remove the decimal. To get the fiscal week end date, first I get the fiscal week of the year floor((doy-1)/7) , and get the current number of days from the last day of the week through (((floor((doy-1)/7))+1)*7)-doy , then returns the week end date that is the num days after the date date_add(date, num_...
Project Overview I got a dataset from kaggle.com . Assumption: eCommerce company based in New York City that sells clothing online but they also have in-store style and clothing advice sessi ons. Customers come into the store, have sessions or meetings with a personal stylist, then they can go home and order either on a mobile app or website for the clothes they want. We need to predict 'Yearly Amount Spent' Here are the features or attributes collected in the dataset: 'Avg__Session_Length' 'Time_on_App' 'Time_on_Website' 'Length_of_Membership' Step 1: Upload csv file in incorta. Upload the CSV file to Incorta, and add a file table in the schema named Ecommerce_Customer. Step 2: Read the Ecommerce Customer file Use PySpark to read the table named SparkTesting.Ecommerce_Customer. The CSV file loaded into Incota can be read into PySpark using df=read("SparkTesting.Ecommerce_Customer") Step 3: VectorAssemblerTest Use...
The Ecommerce_Customer schema has four variables, Time On App, Time On Website, Length Of Membership, and Yearly Amount Spent. I want to see the distribution of these data. Incorta let me preview the data and show max and min data of each using its Preview function. Here are the steps I used to create a histogram in Incorta. First, I used the bin function in Incorta divided into different levels. Here is the documentation for the bin function. https://docs.incorta.com/4.5/r-bin Here is the result of the bin function. I divided the average of session length into 6 levels. If the length is less than 30, it will be labeled as 'SLV1', and if the length is greater than 30 but less than 32, it will be labeled as 'SLV2', by basically, according to the min and max value. I see the minimum value is close to 30, and the maximum value close to 38. I decided to use 2 minutes as the interval and created the formula using the bin function. I'm grouping...
Comments
Post a Comment