How to do data profiling in Incorta

Sometimes we need to better upstanding about data, we can do data profiling using Spark Python in Incorta.

Firstly, Add a new Materialized View in Incorta. Select Spark Python.

Then, I have two methods do data profiling.

Method 1:

Using df.describe()

This function can provide min, max, count, mean, stddev. But only for data types of string and number.

Method 2:

Calculate each metric ourselves.

Below is the syntax:

Suzie's PySpark and Incorta Notes