Parsing JSON string into record in Haskell

I'd recommend that you use the new aeson package instead of the json package, as the former performs much better. Here's how you'd convert a JSON object to a Haskell record, using aeson:

{-# LANGUAGE OverloadedStrings #-}
module Example where

import Control.Applicative
import Control.Monad
import Data.Aeson

data Tweet = Tweet {
    from_user :: String,
    to_user_id :: String,
    profile_image_url :: String,
    created_at :: String,
    id_str :: String,
    source :: String,
    to_user_id_str :: String,
    from_user_id_str :: String,
    from_user_id :: String,
    text :: String,
    metadata :: String
    }

instance FromJSON Tweet where
    parseJSON (Object v) =
        Tweet <$> v .: "from_user"
              <*> v .: "to_user_id"
              <*> v .: "profile_image_url"
              <*> v .: "created_at"
              <*> v .: "id_str"
              <*> v .: "source"
              <*> v .: "to_user_id_str"
              <*> v .: "from_user_id_str"
              <*> v .: "from_user_id"
              <*> v .: "text"
              <*> v .: "metadata"
    -- A non-Object value is of the wrong type, so use mzero to fail.
    parseJSON _          = mzero

Then use Data.Aeson.json to get a attoparsec parser that converts a ByteString into a Value. The call fromJSON on the Value to attempt to parse it into your record. Note that there are two different parsers involved in these two steps, a Data.Attoparsec.Parser parser for converting the ByteString into a generic JSON Value and then a Data.Aeson.Types.Parser parser for converting the JSON value into a record. Note that both steps can fail:

  • The first parser can fail if the ByteString isn't a valid JSON value.
  • The second parser can fail if the (valid) JSON value doesn't contain one of the fields you mentioned in your fromJSON implementation.

The aeson package prefers the new Unicode type Text (defined in the text package) to the more old school String type. The Text type has a much more memory efficient representation than String and generally performs better. I'd recommend that you change the Tweet type to use Text instead of String.

If you ever need to convert between String and Text, use the pack and unpack functions defined in Data.Text. Note that such conversions require O(n) time, so avoid them as much as possible (i.e. always use Text).


You need to write a showJSON and readJSON method, for your type, that builds your Haskell values out of the JSON format. The JSON package will take care of parsing the raw string into a JSValue for you.

Your tweet will be a JSObject containing a map of strings, most likely.

  • Use show to look at the JSObject, to see how the fields are laid out.
  • You can lookup each field using get_field on the JSObject.
  • You can use fromJSString to get a regular Haskell strings from a JSString.

Broadly, you'll need something like,

{-# LANGUAGE RecordWildCards #-}

import Text.JSON
import Text.JSON.Types

instance JSON Tweet where

    readJSON (JSObject o) = return $ Tweet { .. }
            where from_user         = grab o "from_user"
                  to_user_id        = grab o "to_user_id"
                  profile_image_url = grab o "proile_image_url"
                  created_at        = grab o "created_at"
                  id_str            = grab o "id_str"
                  source            = grab o "source"
                  to_user_id_str    = grab o "to_user_id_str"
                  from_user_id_str  = grab o "from_user_id_str"
                  from_user_id      = grab o "from_user_id"
                  text              = grab o "text"
                  metadata          = grab o "metadata"


grab o s = case get_field o s of
                Nothing            -> error "Invalid field " ++ show s
                Just (JSString s') -> fromJSString s'

Note, I'm using the rather cool wild cards language extension.

Without an example of the JSON encoding, there's not much more I can advise.


Related

You can find example instances for the JSON encoding via instances

  • in the source, for simple types. Or in other packages that depend on json.
  • An instance for AUR messages is here, as a (low level) example.