How to skip headers when we are reading data from a csv file in s3 and creating a table in aws athena.
This is what works in Redshift:
You want to use table properties ('skip.header.line.count'='1')
Along with other properties if you want, e.g. 'numRows'='100'
.
Here's a sample:
create external table exreddb1.test_table
(ID BIGINT
,NAME VARCHAR
)
row format delimited
fields terminated by ','
stored as textfile
location 's3://mybucket/myfolder/'
table properties ('numRows'='100', 'skip.header.line.count'='1');
This is a known deficiency.
The best method I've seen was tweeted by Eric Hammond:
...WHERE date NOT LIKE '#%'
This appears to skip header lines during a Query. I'm not sure how it works, but it might be a method for skipping NULLs.