Created
October 9, 2018 04:22
-
-
Save dreyco676/0476deffea85713047f61eb03deef6a6 to your computer and use it in GitHub Desktop.
PySpark List Column to Boolean Columns for each value
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from pyspark.sql.functions import split, explode, lit, coalesce, first | |
# split 'ROOF' column by comma | |
df = df.withColumn('roof_list', split(df['ROOF'], ', ')) | |
# explode each value to a new record | |
ex_df = df.withColumn('ex_roof_list', explode(df['roof_list'])) | |
# create a new record to agg by later | |
ex_df = ex_df.withColumn('constant_val', lit(1)) | |
# pivot on the exploded column, coalesce null values and take the first value | |
piv_df = ex_df.groupBy('NO').pivot('ex_roof_list').agg(coalesce(first('constant_val'))) | |
# fill nulls with 0 | |
piv_df = piv_df.fillna(0) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment