AWS Glue Crawler Classifies json file as UNKNOWN
I have two json files which are 42mb and 16mb, partitioned on S3 as path:
s3://bucket/stg/year/month/_0.json
s3://bucket/stg/year/month/_1.json
I had the same problem as you, crawler classification as UNKNOWN.
I were able to solved it:
- You must create custom classifier with jsonPath as "$[*]" then create new crawler with the classifier.
- Run your new crawler with the data on S3 and proper schema will be created.
- DO NOT update your current crawler with the classifier as it won't apply the change, I don't know why, maybe because of classifier versioning AWS mentioned in their documents. Create new crawler make them work