Reading CSV files in Scala – the Traversable way

I needed to import some comma-separated data in Scala, did a quick search for ready-made CSV code and opted for opencsv, which is a Java library available in the Maven central repository.

It’s easy enough to use, but I wanted to see if I could apply some Scala tricks to make it even simpler and more expressive. What you usually want to do with a CSV file is just read it line by line and do something with its values, so I was thinking of a foreach method along these lines

val csv = new CSVFile("data.csv")
csv.foreach(values => {
  // process values
})

Then I discovered the Traversable trait, and I tried making my CSVFile class extend it.

class CSVFile(fileName: String, charset: String = "UTF-8", separator: Char = ',', quote: Char = '"', escape: Char = '\') extends Traversable[Array[String]] {

  override def foreach[U](f: Array[String] => U): Unit = {
    val csvReader = new CSVReader(new InputStreamReader(new FileInputStream(fileName), charset), separator, quote, escape)  
    try {
      var next = true
      while (next) {
        val values = csvReader.readNext()
        if (values != null) f(values)
        else next = false
      }
    } finally {
      csvReader.close()
    }
  }
}

To my surprise (I’m still fairly new to Scala), by extending Traversable I can now use a CSVFile in a for loop:

val csv = new CSVFile("data.csv")
for (values <- csv) {
  // process values
}

Not only that, but CSVFile also inherits all sorts of potentially useful methods from like map, filter. etc. For example you could extract only the numbers in the third column into a list with

val numbers = csv.map(values => values(2).toInt)

The only caveat is, Traversable is meant for collections. A CSV file can be seen as a sort of collection, but you should be aware that with this implementation the underlying file needs to be reopened and parsed again for every for loop or method invocation. I think this is fine in most cases, where you only want to read the values once.

The code along with some additional examples is now on Github: github.com/mirkonasato/traversable-csv.

2 thoughts on “Reading CSV files in Scala – the Traversable way

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s