Parsing JSON Files from a Remote URL with Node, JSONStream and Hyperquest
Working with large data files can be tough and cause a bottleneck within your application. We can't just simply load the file all at once and expect everything to work. We need to iterate over the data and parse it in chunks.
This article assumes you know how to use the basics of node.
To get started, open up your working directory in your code editor and create a new file named
parser.js in the root.
Fetching the JSON
To be able to have something to work with, we will need to fetch the data we want from a remote server. If you want to test this out with a JSON file I recommend using the Scyfall JSON Endpoint for all of the Magic! The Gathering cards which you can find at https://archive.scryfall.com/json/scryfall-default-cards.json.
Before we can start installing things you will need to set up a
package.json to install our NPM packages. You can do this with Yarn or NPM.
Next we will need to install hyperquest.
Hyperquest is a subset of
request written to handle large payloads of data without breaking our server. It works by fixing a lot of issues with HTTP so that any pesky bugs don't get in the way.
Let's set things up, at the top of the
parser.js file import hyperquest.
Next, create a new function which will house our logic. While we are here, set a variable for the URL to the location fo the JSON file.
Next, let us initialise
hyperquest to fetch the data so that we can pass it through to our follow up functions. We are using
await here to ensure that everything gets processed before moving on.
Hyperquest allows you to create a pipeline so you can pass the data received to other functions by appending
.pipe(func), we are going to utilise this in the next step.
Handling the Returned Data
We are going to lean on a few more packages here to handle the data returned and make sure it is processed correctly. These are:
- JSONStream - Which allows us to stream the parsing of the results returned.
- event-stream - Which allows us to process the parsed data
Install them into the project.
Import them at the top of the
The first pipeline function we will add is for JSONStream. This will ensure everything is returned properly in a readable format. Update our code to the following.
* passed through to the
parse function is telling the JSONStream package that I wish to return every row in my JSON file. If all of your records were contained inside of a
data object. You may adjust the code to something closer to
Next, add a pipeline for processing the data with
event-stream, update the code to add the following
To explain what we have so far, for every row we return with JSONStream it will be passed to the event-stream function and
console.log the data (purely for testing this works). Finally we we call the
callback() function which will drop current data and returns a data entry without the current record so we can loop back around.
Our full code should look like the following:
We won't go into the processing of the data as this can be done in a multitude of ways, but if you run
node parser.js you should start to see the rows being logged in the console.
I've added a stripped down example of a project on GitHub.
I hope this helps you out in the future.
30 May 2020