How to fix Expected start-union. Got VALUE_NUMBER_INT when converting JSON to Avro on the command line?
According to the explanation by Doug Cutting,
Avro's JSON encoding requires that non-null union values be tagged with their intended type. This is because unions like ["bytes","string"] and ["int","long"] are ambiguous in JSON, the first are both encoded as JSON strings, while the second are both encoded as JSON numbers.
http://avro.apache.org/docs/current/spec.html#json_encoding
Thus your record must be encoded as:
{"name": "Alyssa", "favorite_number": {"int": 7}, "favorite_color": null}
There is a new JSON encoder in the works that should address this common issue:
https://issues.apache.org/jira/browse/AVRO-1582
https://github.com/zolyfarkas/avro
As @Emre-Sevinc has pointed out, the issue is with the encoding of your Avro record.
To be more specific here;
Don't do this:
jsonRecord = avroGenericRecord.toString
Instead, do this:
val writer = new GenericDatumWriter[GenericRecord](avroSchema)
val baos = new ByteArrayOutputStream
val jsonEncoder = EncoderFactory.get.jsonEncoder(avroSchema, baos)
writer.write(avroGenericRecord, jsonEncoder)
jsonEncoder.flush
val jsonRecord = baos.toString("UTF-8")
You'll also need following imports:
import org.apache.avro.Schema
import org.apache.avro.generic.{GenericData, GenericDatumReader, GenericDatumWriter, GenericRecord}
import org.apache.avro.io.{DecoderFactory, EncoderFactory}
After you do this, you'll get jsonRecord
with non-null union values tagged with their intended type.
Hope this helps !