Using PySpark to get holidays in Incorta

The hardest part when I was working on a Date table was finding the holidays.

Next, I will tell you how to use Pyspark to find holidays in Incorta.

I find this package that is very useful for me. Python holidays library 

from datetime import date
from pyspark.sql.types import ArrayType, StructField, StructType, StringType, IntegerType
import holidays
us_holidays = holidays.UnitedStates(years=range(1980,2061))
for d in us_holidays.items():
print d
schema = StructType([
StructField('us_holidays', StringType(), True)
])
df = sc.parallelize([(k,)+(v,) for k,v in us_holidays.items()]).toDF(['date','holliday_name'])
df.printSchema
save(df)
view raw Date_Holiday hosted with ❤ by GitHub


Comments

Popular posts from this blog

How to create histogram in Incorta use bin function.

Using Time Series Analysis Electric Production by ARIMA Model