Mindscape Mindblog

Saving large numbers of entities in LightSpeed

tag icon Tagged as LightSpeed

We had a report in the forums recently of surprisingly slow performance when inserting large numbers of records using LightSpeed. The issue was resolved by adjusting some of LightSpeed’s performance tuning parameters, but the performance characteristics of those parameters weren’t initially obvious, so I thought it might be useful to write up a few notes on how to get the best performance out of LightSpeed if you’re doing mass inserts.

The first recommendation is to consider your IdentityMethod. This determines how a new entity gets its Id value. If you’re creating a lot of entities, you should use a method that allows IDs to be assigned efficiently. For numeric identities, avoid IdentityColumn, and prefer KeyTable or Sequence. IdentityColumn means that the ID is assigned by the database when LightSpeed inserts the record: that in turn means that LightSpeed has to insert records one at a time, and to query the database after each insert to find the ID that was assigned. That means two database round-trips for every entity. With KeyTable and Sequence, by contrast, LightSpeed grabs a block of IDs from the database, assigns them to entities client-side, and can then insert the entities in one go. That means two database round-trips total. (Well, sort of. See below.) For GUID fans, the Guid and GuidComb identity methods are also efficient because they assign IDs client-side and don’t incur any database queries for identity allocation. See Identity Generation in the LightSpeed user guide for more information.

The second recommendation is to consider your IdentityBlockSize. This determines how many IDs LightSpeed grabs in one go from a KeyTable. The default is 10. That means that for every 10 entities you create, you incur a database query for LightSpeed to grab another block of IDs. If your typical usage pattern is to save hundreds or thousands of entities at a time, you can reduce the number of queries by grabbing more IDs each time. The trade-off is that if you have to use the IDs or lose them. For example, if you specify an IdentityBlockSize of 1000 but you’re usually only inserting 5 or 10 records at a time, you’ll chew through the available range of ID numbers faster than necessary. The ideal is to use an IdentityBlockSize roughly equal to a typical insert size, so you’re keeping the number of identity block allocation queries low but making efficient use of the ID range.

The third recommendation is to consider your UpdateBatchSize. This determines how many entities LightSpeed saves (inserts or updates) in one go. If more entities than the UpdateBatchSize need saving, LightSpeed breaks up the save operation into multiple queries, each saving no more entities than the UpdateBatchSize. (The queries are all executed within a transaction, so the save operation remains atomic.) The default UpdateBatchSize is 10.

Now you might think that reducing the number of queries is good, and that bigger UpdateBatchSizes are therefore better. But that’s not always the case. The reason for this is that a bigger UpdateBatchSize means LightSpeed has to construct and send a bigger SQL statement. Up to a point, that’s no problem. But if the UpdateBatchSize is too high, the SQL statement gets really big, the number of parameters becomes huge, and the savings from reducing the number of queries are dwarfed by the cost of constructing such a vast query. Worse still, some providers can’t cope with really big batch sizes at all. For example, SQL Server has a limit of 2100 parameters per query. If you’ve got an entity with a dozen fields, and an UpdateBatchSize of 200, you’ll hit that limit.

So what should your UpdateBatchSize be? It depends on the characteristics of your entities, and more critically on your database and network characteristics. If hitting the database is fast — the database is on the local network — then the cost of multiple queries is low compared to the cost of constructing a single huge query. So an UpdateBatchSize of 10 or 20 is probably appropriate. If hitting the database is slow — for example, the database is on a WAN or VPN with low bandwidth and high latency — then the cost of multiple queries is higher, and you’ll want to get more work done in each query even if that means the queries take longer to build. So an UpdateBatchSize of 60 or 100 (or even higher) might be better.

As with any distributed system, you may need to do some experimenting to find the optimum for your entities, application usage patterns, database and network. Just remember that although reducing the number of round trips is important (hence KeyTable rather than IdentityColumn), other performance factors can come to dominate as the system scales. On the positive side, LightSpeed should help you to avoid having to optimise prematurely — the performance parameters can all be adjusted through configuration as you develop and test.

3 Responses to “Saving large numbers of entities in LightSpeed”

  1. “…That means two database round-trips for every entity.”

    Usually, this is batched with the offending INSERT so only one round-trip.

  2. Hi Ivan,
    I’m just starting with LightSpeed, but I’m really intered in the discussion about “Identity Size” using the Key Table… My question is: is it possible to change dinamically the IdentityBlockSize from code. In this way the base IdentityBlockSize could be setted to low size (maybe 1 for the base) and, then, when needed (when i have to insert a huge number of records) can be setted to a greater value.
    Thanks in advance for your suggestions. Francesco

  3. Yes, you can change the LightSpeedContext.IdentityBlockSize in code over the course of an application. When LightSpeed needs some more IDs, it will increment the KeyTable by the current IdentityBlockSize. LightSpeed will continue using IDs from the last allocated block until they run out, so if you overallocate and then step back down to 1, you won’t incur a new query until the overallocated block is exhausted. Thus:

    uow.Context.IdentityBlockSize = 1;
    uow.Add(entity1); // increments KeyTable by 1
    uow.Add(entity2); // block of 1 has run out, so increment again by 1
    uow.Context.IdentityBlockSize = 10;
    uow.Add(entity3); // block of 1 has run out, increment by 10
    uow.Add(entity4); // still enough left in block of 10, so KeyTable not updated
    uow.Context.IdentityBlockSize = 1;
    uow.Add(entity5); // still enough left in block of 10, so KeyTable not updated

    Finally, don’t get hung up on micro-optimising this stuff. There’s no real penalty for having gaps in your IDs, unless it’s going to result in you having to use an Int64 ID instead of an Int32, and even then the penalty is insignificant. Make sure you balance optimisation against keeping the code simple!

Leave a Reply