Apache Spark-SQL vs Sqoop benchmarking while transferring data from RDBMS to hdfs
You are using the wrong tools for the job.
Sqoop will launch a slew of processes (on the datanodes) that will each make a connections to your database (see num-mapper) and they will each extract a part of the dataset. I don't think you can achieve kind of read parallelism with Spark.
Get the dataset with Sqoop and then process it with Spark.
you can try the following:-
Read data from netezza without any partitions and with increased fetch_size to a million.
sqlContext.read.format("jdbc").option("url","jdbc:netezza://hostname:port/dbname").option("dbtable","POC_TEST").option("user","user").option("password","password").option("driver","org.netezza.Driver").option("fetchSize","1000000").load().registerTempTable("POC")
repartition the data before writing it to final file.
val df3 = df2.repartition(10) //to reduce the shuffle
ORC formats are more optimized than TEXT. Write the final output to parquet/ORC.
df3.write.format("ORC").save("hdfs://Hostname/test")