node logoHello everyone! This is the thirteenth post in the node.js modules you should know about article series.

The first post was about dnode - the freestyle rpc library for node, the second was about optimist - the lightweight options parser for node, the third was about lazy - lazy lists for node, the fourth was about request - the swiss army knife of HTTP streaming, the fifth was about hashish - hash combinators library, the sixth was about read - easy reading from stdin, the seventh was about ntwitter - twitter api for node, the eighth was about socket.io that makes websockets and realtime possible in all browsers, the ninth was about redis - the best redis client API library for node, the tenth was on express - an insanely small and fast web framework for node, the eleventh was semver - a node module that takes care of versioning, the twelfth was cradle - a high-level, caching, CouchDB client for node.

This time I'll introduce you to a very awesome module called JSONStream. JSONStream is written by Dominic Tarr and it parses streaming JSON.

Here is an example. Suppose you have couchdb view like this:

{"total_rows":129,"offset":0,"rows":[
  { "id":"change1_0.6995461115147918"
  , "key":"change1_0.6995461115147918"
  , "value":{"rev":"1-e240bae28c7bb3667f02760f6398d508"}
  , "doc":{
      "_id":  "change1_0.6995461115147918"
    , "_rev": "1-e240bae28c7bb3667f02760f6398d508","hello":1}
  },
  { "id":"change2_0.6995461115147918"
  , "key":"change2_0.6995461115147918"
  , "value":{"rev":"1-13677d36b98c0c075145bb8975105153"}
  , "doc":{
      "_id":"change2_0.6995461115147918"
    , "_rev":"1-13677d36b98c0c075145bb8975105153"
    , "hello":2
    }
  },
  ...
]}

And you want to only filter out doc values from the rows. You can do it easily with JSONStream this way:

var parser = JSONStream.parse(['rows', /./, 'doc']);

This creates a stream that parses out rows.*.doc.

Since it's a stream you have to feed it data and then have it output the data somewhere. You can do it very nicely and idiomatically in node this way:

req.pipe(parser).pipe(process.stdout);

Here is the output:

{
  _id: 'change1_0.6995461115147918',
  _rev: '1-e240bae28c7bb3667f02760f6398d508',
  hello: 1
}
{
  _id: 'change2_0.6995461115147918',
  _rev: '1-13677d36b98c0c075145bb8975105153',
  hello: 2
}

Where req is request to couchdb view and parser is the JSONStream parser, and it all gets piped to process.stdout. The output, as you can see, is only the rows.*.doc. That was a really easy way to parse a JSON stream without reading the whole JSON into memory.

You can install JSONStream through npm as always:

npm install JSONStream

JSONStream on GitHub: https://github.com/dominictarr/JSONStream.