Aws Athena - Create external table skipping first row
Just tried the "skip.header.line.count"="1"
and seems to be working fine now.
On the AWS Console you can specify it as Serde parameters key-value keypair
While if you apply your infrastructure as code with terraform you can use ser_de_info parameter - "skip.header.line.count" = 1. Example bellow
resource "aws_glue_catalog_table" "banana_datalake_table" {
name = "mapping"
database_name = "banana_datalake"
table_type = "EXTERNAL_TABLE"
owner = "owner"
storage_descriptor {
location = "s3://banana_bucket/"
input_format = "org.apache.hadoop.mapred.TextInputFormat"
output_format = "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
compressed = "false"
number_of_buckets = -1
ser_de_info {
name = "SerDeCsv"
serialization_library = "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
parameters {
"field.delim" = ","
"skip.header.line.count" = 1 # Skip file headers
}
}
columns {
name = "column_1"
type = "string"
}
columns {
name = "column_2"
type = "string"
}
columns {
name = "column_3"
type = "string"
}
}
}