Thomas v James.com

Latest posts.

November 12th 2012

Loading tenant databases on RavenDB startup

Note: this applies to build 2139 and above of RavenDB

One of the ways RavenDB conserves resources is to only load tenant databases that are in use. When the RavenDB process is start/restarted this means that the in-use databases are only loaded when the first request is received. This can result in timeouts in the client if your database is large, like mine, and takes more than 30 seconds to load.

Thanks to Oren via the mailing list, from build 2139 there is now a way to implement global startup hooks. This allows us to start the tenant databases as soon as RavenDB is up-and-running.

Be warned, the global startup hooks are implemented synchronously which means that if what you’re loading/running takes considerable time this will delay the startup of RavenDB. I recommend using async calls to allow normal RavenDB startup to continue. This is especially important if you have multiple startup hooks.

Anyway, on to the code, to create a startup hook all that is required is implementing the IServerStartupTask interface in Raven.Abstractions.Extensions and placing the resulting class library in the RavenDB/Plugins folder. Restart RavenDB and you’re off.

I had to do a dig of digging to figure out how to implement the load database on startup hook so this code shouldn’t be considered best practice, YMMV, but it worked for me.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Raven.Abstractions.Data;
using Raven.Abstractions.Extensions;
using Raven.Database;
using Raven.Database.Plugins;
using Raven.Database.Server;

namespace Plugins
{
    public class LoadDatabaseOnServerStartupTask : IServerStartupTask
    {
        private const string DATABASE_NAME = "MyDatabase";

        public void Execute(HttpServer server)
        {
            // wait for the system database
            server.GetDatabaseInternal("System")
                  .ContinueWith(_ => server.GetDatabaseInternal(DATABASE_NAME));
        }
    }
}

This is just a simple example and without much more effort can be adapted to read a list of databases to load from a configuration document in the System database. Don’t forget logging as well… improvements welcome :)

Combine this with a much longer idle timeout and you’re tenant database will available just after RavenDB starts, ready to serve requests.

October 11th 2012

RavenDB, Bulk document import and MapReduce indexes

Build 960 was used as the basis for the post and is my first opportunity to use RavenDB in a non-trivial way

A recent project afforded me the opportunity to use a NoSQL backing store and given it was a .net solution RavenDB was the obvious first choice.

This project was to provide a new view over a number of years of already collected data and the data yet to be collected by that system. So for this to work it needed to be able to ingest the existing dataset. This was approximately 3 million documents, however it was later culled to about half that.

Given the nature of this project, providing aggregate views over the data was critical. These aggregate views provided a number of levels of granularity, from yearly, monthly through to hourly.

Based on everything I’ve read Raven should be able to gobble this task up and spit it out, and for the most part that was my experience but i’ve encountered a few gotchas along the way.

About the dataset I was using:

  • 3 Million < 1KB documents
  • Total size < 700MB uncompressed text
  • About 10GB in Raven including indexes
  • 5 MapReduce indexes that map over the entire dataset using a composite key

Importing the data, what I assumed would be a simple task turned out to be less than straight forward.

TL;DR: When importing large numbers of documents in bulk, initialise the indexes, use a large batch size and wait for indexes to be fresh between processing.

The first attempt

This was the naive attempt, create the indexes and just bulk import the data as fast as the datastore could accept it. This initially appeared to work well with the data being imported quite quickly. Soon after the import completed the indexing process kicked into high gear and consumed the available resources of the machine and after about 45 minutes crashed with an out of memory exception. Close to 100% CPU usage was observed for some time after the crash.

The second attempt

Believing that the indexing process was the culprit I attempted to split the process between loading the data and then applying the indexes. Once again the import process completed quickly and within about 45 minutes to an hour of the indexes being created the same out of memory issue was encountered. This occurred on both 32bit and 64bit systems (most initial testing was performed on a 32bit win2k8 r1 VM on hyper-v, later using a 64bit windows 7 laptop (4core, 6gb ram) and a win2k8 r2 64bit VM on hyper-v with 8gb ram.

As a stop-gap measure the import process was made interactive, pausing between batches of about 150k documents imported. The indexing process, CPU & memory utilisation was monitored until they returned to normal then the next batch was started. This yielded the most success allowing the import process to complete, indexes and all.

Getting it stable

Based on the success of processing batches of documents then pausing between to allow the indexing process to complete. The final import process automated this, which resulted in a stable and repeatable import process that completed in an acceptable amount of time.

And some code:

var documentStore = …;
while (iStillHaveDocumentsToImport) {
    ImportBatchOfDocumentsInto(documentStore);

    var staleIndexes = documentStore.DatabaseCommands.GetStatistics().StaleIndexes;
    while (staleIndexes.Length > 0) {
        Thread.Sleep(1000);

        staleIndexes = documentStore.DatabaseCommands.GetStatistics().StaleIndexes;
    }
}
July 30th 2012

Getting NewRelic monitoring working on Mono .Net Web Apps

About a week ago I set out attempting to get NewRelic‘s .Net Application instrumentation working on for web apps running on the Mono .Net runtime. I’ve been using NewRelic’s Lite plan for monitoring a number of workpress blogs I host for friends and the various VPSes I have around the world. I’ve always wanted to extend this monitoring to the asp.net mcv apps I have running, but as I host them on Linux with Mono this was completely unsupported and a few google searched on the topic yielded little help. With access to a standard plan it was time see if it was possible.

Please Note that running the NewRelic monitoring on Mono is completely unsupported by NewRelic and YMMV. If this breaks your system or a NewRelic upgrade prevents this from working then you'll be on your own.

Also, I am not affiliated with NewRelic in any way.

If you haven’t tried out the NewRelic application monitoring, give it a go, its free for a Lite plan which gives you pretty good metrics but only keeps the last 30 minutes of data. It allows you to monitor:

  • All the basics
  • Time taken in internal calls to external sites (this is pretty handy)
  • Process memory use
  • Real load time in the browser through JS injection
  • Database calls including SQL statements

TL;DR

  • Grab the installer for either x86/x64
  • Using msiexec on a windows box extract the contents using an administrator install (msiexec /a )
  • Copy the GAC dlls into your application’s bin
  • Copy the newrelic.xml to newrelic.config in your application
  • Add a httpModule reference to "NewRelic.Agent.Core.Tracer.Web.NewRelicHttpModule, NewRelic.Agent.Core" in your web.config file.
  • Edit the newrelic.config file and supply your license key
  • Deploy & watch those stats roll in
  • Missing:
    • Externals tracking
    • Database tracking
    • Most other tracking

The guts of it

The NewRelic documentation was the first port of call on working out if this little endeavour was even possible and I must say they provide a decent amount of information about how the .net monitoring agent works. It’s pretty clever if you ask me.

The documentation explains the .net agent uses the built-in CLR profiling hooks to kick off the instrumentation, and a quick look at the environment variables after installation show that the NewRelic.Agent.IL.dll contains the profiler bootstrap.

Reading up on how to build a CLR profiler didn’t provide any great insight into the problem of how the agent was started and hooked into the IIS/Asp.net runtime.

I started with analysing the installer, I grabbed the 64bit one but the 32bit one should work as well. I found that along with the profiler DLL there were 4 others that get installed into the GAC.

  • NewRelic.Agent.Core.dll
  • NewRelic.ICSharpCode.SharpZipLib.dll
  • NewRelic.Json.dll
  • NewRelic.Log.dll

Using reflection over the NewRelic.Agent.Core DLL (as the others were obviously supporting modules) revealed a number of interesting classes and interfaces. These ones peeked my interest:

  • IAgent
  • Agent

The Agent class contained a single public constructor as far as I could tell. This was worth a shot, I copied the newrelic.xml file into newrelic.config along side my app’s web.config and supplied my newrelic license key and using mono develop started the app on my local machine.

I waited five minutes and checked my newrelic dashboard. To my surprise my application showed up along with the hostname of my laptop listed under instances. Surely it couldn’t be this easy?

I then spent the next few minutes generating as many request as I could, unfortunately none of them were showing up on my dashboard.

At this point I thought to myself, that i must be missing something obvious so i reached out to the newrelic team on twitter just to see if they could point me in the right direction. Their support was excellent and although they advised me that mono was not supported a number of time, asked me to raise a ticket, which i did.

While providing some information for the support ticket, I was thinking about how I would implement the request tracking. I did miss the obvious, IHttpModule.

A few minutes later and quick reflection check and low and behold there was a single class implementing IHttpModule in the Core DLL. I was hoping that this was it, I wired it up into the web.config and restarted the local debug session of the app. Hitting it with quite a few initial requests and waiting a minute to be sure that the data was on its way to the dashboard.

I checked the dashboard, and IT WORKED.

Unfortunately, a quick check revealed that the other metrics indulging external calls and database monitoring weren’t working, and I suspect will take considerable more effort to figure out.

Thanks NewRelic for the awesome monitoring platform and for not making it too hard to unofficially get it running on mono.

For those that are interested, the LINQPad code for reflecting over the DLL to find the IHttpModule:

var type = typeof(IHttpModule);
var types = typeof(NewRelic.Agent.Core.Agent).Assembly.GetTypes()
    .Where(p => type.IsAssignableFrom(p));

types.Dump();