https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame
concat_ws: 用指定的连接符连接字符串
concat_ws("_", field1, field2)
#out: field1_field2
concat_ws("_", [a, b, c])
# out: a_b_c
collect_list: returns a list of objects with duplicates
df = spark.createDataFrame([(2, ), (5, ), (7, ), ('age',)])