Strongly typed access to csv in scala?

If your content has double-quotes to enclose other double quotes, commas and newlines, I would definitely use a library like opencsv that deals properly with special characters. Typically you end up with Iterator[Array[String]]. Then you use Iterator.map or collect to transform each Array[String] into your tuples dealing with type conversions errors there. If you need to do process the input without loading all in memory, you then keep working with the iterator, otherwise you can convert to a Vector or List and close the input stream.

So it may look like this:

val reader = new CSVReader(new FileReader(filename))
val iter = reader.iterator()
val typed = iter collect {
  case Array(double, int, string) => (double.toDouble, int.toInt, string)
}
// do more work with typed
// close reader in a finally block

Depending on how you need to deal with errors, you can return Left for errors and Right for success tuples to separate the errors from the correct rows. Also, I sometimes wrap of all this using scala-arm for closing resources. So my data maybe wrapped into the resource.ManagedResource monad so that I can use input coming from multiple files.

Finally, although you want to work with tuples, I have found that it is usually clearer to have a case class that is appropriate for the problem and then write a method that creates that case class object from an Array[String].

I've created a strongly-typed CSV helper for Scala, called object-csv. It is not a fully fledged framework, but it can be adjusted easily. With it you can do this:

val peopleFromCSV = readCSV[Person](fileName)

Where Person is case class, defined like this:

case class Person (name: String, age: Int, salary: Double, isNice:Boolean = false)

Read more about it in GitHub, or in my blog post about it.

product-collections appears to be a good fit for your requirements:

scala> val data = CsvParser[String,Int,Double].parseFile("sample.csv")
data: com.github.marklister.collections.immutable.CollSeq3[String,Int,Double] = 
CollSeq((Jan,10,22.33),
        (Feb,20,44.2),
        (Mar,25,55.1))

~~product-collections uses opencsv under the hood.~~

A CollSeq3 is an IndexedSeq[Product3[T1,T2,T3]] and also a Product3[Seq[T1],Seq[T2],Seq[T3]] with a little sugar. I am the author of product-collections.

Here's a link to the io page of the scaladoc

Product3 is essentially a tuple of arity 3.

You can use kantan.csv, which is designed with precisely that purpose in mind.

Imagine you have the following input:

1,Foo,2.0
2,Bar,false

Using kantan.csv, you could write the following code to parse it:

import kantan.csv.ops._

new File("path/to/csv").asUnsafeCsvRows[(Int, String, Either[Float, Boolean])](',', false)

And you'd get an iterator where each entry is of type (Int, String, Either[Float, Boolean]). Note the bit where the last column in your CSV can be of more than one type, but this is conveniently handled with Either.

This is all done in an entirely type safe way, no reflection involved, validated at compile time.

Depending on how far down the rabbit hole you're willing to go, there's also a shapeless module for automated case class and sum type derivation, as well as support for scalaz and cats types and type classes.

Full disclosure: I'm the author of kantan.csv.

Strongly typed access to csv in scala?

Tags:

Csv

Tuples

Scala

Strong Typing

Related

Recent Posts