Throttled File Loading with Node.js & Oracle Database

By March 5, 2015 Uncategorized

A little background

There have been some interesting discussions lately related to the Oracle Database Node.js Driver and support for streaming and LOBs:

As of the time of this writing Chris Jones, the Product Manager of the driver, is still looking for feedback on these topics so please add yours if you have any! As for me, I hadn’t tried to do much with LOBs and Node.js so I didn’t really know what types of features I would like to see in the driver. I needed a project that would force me to get my hands dirty…

While mulling over some options I remembered a post from Kris Rice on chunked file uploads with ORDS and I thought, I’ll create a File Load demo! I’d uploaded and downloaded files to and from the Oracle Database many times in the past. Why not do it through Node.js to see where the pain points were?

Why the database?

Before I get to the application, some of you may be wondering: Why would I upload files into the database to begin with? After all, you could just upload them to the filesystem and be done with it. That’s a great question! As usual, it really depends on what you intend to do with the files. There are, of course, cases where utilizing the file system may make perfect sense. But here’s a small lists of reasons you might want to load files into Oracle Database:

  • Better relate unstructured data with structured data – maybe you want to associate images of receipts with detail lines in an expense reporting application.
  • Ensure files are consistent and backed up with the database – perhaps you’d rather leverage your existing database backup/recovery strategies rather than invent new ones for the file system.
  • Index files and make them searchable via Oracle Textmany documents types can be made searchable with “Google like” search results.
  • Make available for multimedia processing with Oracle Multimedia – allows those with SQL, PL/SQL, and Java skills to store, manage, and retrieve multimedia files.

I could probably make this list a bit longer, but hopefully you’ll agree that there are some good use cases for loading files into the database. On to the application…

Some requirements

I started with the most basic of requirements:

  1. Users should be able to upload 1 or more files at a time.
  2. Users should be able to see, download, and delete files that have been uploaded.

But things got more fun with the following requirements:

  1. No more than 3 files at a time should be uploaded from the client to the database.
  2. File sizes should be limited (server-side) to a configurable number of bytes.
  3. Upload buffer sizes should be limited as to prevent consuming excessive memory.

Here’s what the application looked like in the end:
File Load Demo

Three at a time, please

I used the Async.js (async) library’s eachLimit method to solve the requirement that only 3 files should be uploaded at a time from the client to the database. This is one of the many things that async can make quick work of. If you haven’t already started using the async library, I suggest you start experimenting sooner rather than later; along with promises and proper code organization, async is one one of the best ways to keep your asynchronous code nice and tidy.

Check out async’s documentation on eachLimit to get an idea of how it works in general. In my case, I’m iterating over the selectedFiles array and calling uploadFile for each iteration. When async invokes uploadFile it will pass the function an element from the array along with a callback function that should be invoked when the upload is done. The maxSyncUploads variable tells async that up to three functions should be running at any time. The reset function will be invoked when all iterations have completed.

Note: Because the limit on simultaneous uploads is client-side it can’t really be trusted. Server-side logic would need to be added to ensure that this is not abused.

Streaming the file

Another part of the solution on the client-side was to use the file APIs provided in modern browsers (you pretty much need IE 10 or better). HTML5 Rocks provides great content on the file APIs. With these APIs we can get references to files as binary large objects (BLOBs).

When you combine that with XHR2’s ability to send BLOBs natively, we have a pretty simple means to stream the files to Node.js. You can find details on XHR2 at HTML5 Rocks as well. Here’s a small code sample that should give you an idea of how it works.

Receiving the file

Using Node.js I setup a standard Express web server. The server listens for POST requests on “/api/files” and invokes the postFiles function abbreviated below. The first thing we do is start the process of getting a connection to the database – that’s handled asynchronously. Next we add a listener to the data event on the request object. Request objects in Node.js implement the Readable Stream interface. By the time we are handed the request object, Node.js has parsed the request through the HTTP headers but no further.

When you add an event listener to the data event, the stream is switched to flowing mode where your handler is invoked and passed data chunks (as a Buffer objects) as soon as they are available. As data arrives, it’s queued for processing in an array. Once the database driver returns a connection object, the async module’s whilst method is used to start a loop that dequeues and processes elements in the buffer array.

Managing buffer sizes

When I first started testing the code outlined above I found that data was flowing into the buffer array much faster than it was being processed. What if someone was uploading a 1GB file (assuming that was allowed)??? We probably wouldn’t want to consume that much memory for 1 request! Solving this problem really blew my mind and made me fall in love with Node.js all over again! 😉

Turns out with just a little additional code I was able to monitor the number of bytes in the queue. If the number of bytes exceeded a configurable limit, a call to req.pause() would stop the flow of data steaming in. Later, as data was moved to the database and the queue fell below the limit, a call to req.resume() would allow the steam of data to start flowing in again.

There’s a saying in the Node.js community about “being close to the metal” that really clicked with me while doing this. Very cool!

While the sample application manages the buffer as files are streaming into the database it doesn’t do the same when users are downloading files – I’ll leave that as an exercise for the reader! Also, in production ready code, additional logic should be added to track the maximum number of connections that are able to post to the “/api/files” endpoint at any given time.

Managing file sizes

Implementing the logic to limit the max file size allowed was similar to handling the buffer size. I added another variable to track the total number of bytes that had been received. If the number exceeded a predetermined maximum then the portion that had been written to the database is rolled back, an HTTP 413 error is sent back to the client, and the connection is destroyed to prevent more data from flowing in.

In production ready code it would probably be best to take the “trust but verify” approach. Rather than only depending on counting the bytes as they arrive, one could first check the Content-Length HTTP header. This would allow the immediate rejection of files that are too large if the header was provided accurately. Along those same lines, some client-side code could be added to prevent any request from going out in the first place. Of course this server-side code should still be used to prevent abuse.

Installing the demo app

Follow these steps if you’d like to get this sample application up and running in your environment.

Prerequisites

  1. Oracle Database 11g or 12c
  2. Node.js
  3. Oracle Database Node.js Driver installed globally
  4. Bower installed globally

Checkout Up and Running with Node.js and Oracle if you are interested in getting a VM setup with the first 3 of these 4 requirements. Once complete, installing Bower in that VM is as easy as running the following lines in a terminal:

Application Installation

  • In the server with Node.js installed, open a terminal, navigate to an appropriate directory to start a new project, and enter the following:
  • Use your favorite text editor to modify the file named dbconfig.js so that it has the correct information to connect to your database.
  • Open your favorite tool to work with Oracle Database (I recommend SQL Developer) and run the script named create-objects.sql in file-load-demo-0.1.0/sql-scripts.
  • Return to the terminal in the file-load-demo-0.1.0 directory and use Node.js to start the server:

Final Notes

Remember, I started this project so that I could better understand what features for streaming and LOBs might benefit users of the Oracle Database Node.js driver. Now I have some ideas that I plan on sharing with the team that works on the driver. But this is just one specific use case based on LOBs. You may have your own ideas for other situations. Please, share your thoughts and help us build a build a better driver!

4 Comments

Leave a Reply