That's because actually we don't test true bulk data modification. We repeat the same operation to measure its time more precisely, as well as give ORM to utilize CUD batching (FAQ explains why this is important). That's it. So we do not try to simulate bulk data load on this test. Our sole purpose is to measure CUD operation time in case when there is some reasonable amount of such operations per single transaction. E.g. 20, 30 or 100.
And it does not matter much if we measure the numbers on 100, 1000, 5K and other counts of such operations: we assure that per second operation ratio is nearly stable. If it significantly decreases, it shows there is something wrong with our test code (in fact, such issues exposed some problems in our initial tests for NHibernate).
Btw, why it must be nearly stable? Because if table size is relatively small (that's true for most of our tests: remember, we minimize database load to expose pure ORM performance), most of time spent on this test is spent by ORM itself (e.g. on dirty checking and command composition) and communication stack.
On the other hand, if you'd increase table size to e.g. 10 million of rows (you can do this with ease - we publish all the source code), you'll find there is still a big difference between various ORM tools. Why? Because table part it modifies is still located in memory (we insert increasing values into a table with clustered index), thus index performance is still ~ O(log(N)) (this means if you increase table size by 10 times, you'll be spending additional n microseconds per each index operation).
Of course, if you do absolutely random modifications on table containing 10...100M+ rows, there won't be O(log(N) anymore - each index modification will require ~ one or two random disk reads, that will lead to almost constant time for each of such operations: it will be approximately equal to HDD seek time (~ 1/200 of sec. for fast hard drives, ~ 1/10K of sec. for SSD). But if you frequently face this case in a real world application, you have a serious performance issue. Normally it is resolved by adding more RAM to database server to allow it to cache the whole working set. Of course, this isn't the only one solution. But since this article isn't about this, I won't discuss others here.
So returning back to executable DML queries: as you see, they can't be used here, because we don't measure bulk operations over the tables. Although initially it may look like we do this. Yes, we could change the tests in the way to make it impossible to do the same with executable DML (e.g. we could add current CPU tick count to each value instead of 1 one update test), but since this is obvious such a modification can be done, we don't think it's a good idea to prove this.
Executable DML here is an attempt to use a specialized API instead of common. Moreover, if we'd use it, we would compare RDBMS performance on DML queries instead of ORM performance on CUD operations, and since the database is the same, we'd got absolutely the same numbers.
Hopefully this explains everything.
Kind regards,
Alex Yakunin





