Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the referenced columns only include the internal corrupt record column
The problem is with the JSON file. The file : "D:/playground/input.json"
looks like as you descibed as
{
"a": {
"b": 1
}
}
This is not right. Spark while processing json data considers each new line as a complete json. Thus it is failing.
You should keep your complete json in a single line in a compact form by removing all white spaces and newlines.
Like
{"a":{"b":1}}
If you want multiple jsons in a single file keep them like this
{"a":{"b":1}}
{"a":{"b":2}}
{"a":{"b":3}} ...
For more infos see
You may try either of these two ways.
Option-1: JSON in single line as answered above by @Avishek Bhattacharya.
Option-2: Add option to read multi line JSON in the code as follows. You could read the nested attribute also as shown below.
val df = spark.read.option("multiline","true").json("C:\\data\\nested-data.json")
df.select("a.b").show()
Here is the output for Option-2.
20/07/29 23:14:35 INFO DAGScheduler: Job 1 finished: show at NestedJsonReader.scala:23, took 0.181579 s
+---+
| b|
+---+
| 1|
+---+