How to read csv without header and name them with names while reading in pyspark?

You can import the csv file into a dataframe with a predefined schema. The way you define a schema is by using the StructType and StructField objects. Assuming your data is all IntegerType data:

from pyspark.sql.types import StructType, StructField, IntegerType

schema = StructType([
    StructField("member_srl", IntegerType(), True),
    StructField("click_day", IntegerType(), True),
    StructField("productid", IntegerType(), True)])

df ="user_click_seq.csv",header=False,schema=schema)

should work.

For those who would like to do this in scala and may not want to add types:

val df ="csv")

You can read the data with header=False and then pass the column names with toDF as bellow:

data ='data.csv', header=False)
data = data.toDF('name1', 'name2', 'name3')