Christmas coding project: let's optimize Grimrock 2 load times! I'll update this thread when progress is made. 👇
Starting situation: loading the main dungeon takes 10.5 seconds. That's awfully lot, considering the installed game is only 1 GB. This is with warm file caches (repeated the test three times), so I suspect most of the time is spent by CPU, not I/O.
Digging the code from naphthalene, I'm expecting some bit rot (last update was made in 2015!). The C++ engine is originally made with Visual Studio 2010 and I don't have that installed anymore. Luckily it compiles without errors on VS2017. Nice!
Hmm... there's something wrong with RMB mouse look. It snaps weirdly. I don't remember seeing this before. Only happens on Windows 10?
Gah, it's annoying. Have to fix that. The scope of this project is already getting out of hand!
Animation binding code, the code that binds anim clips to models looks suspect. There's a global string array of 1300+ node names and we're doing lots of searching in that array. Lots of string compares => slow. Hmm, I remember I had a much better routine for this in Druidstone.
Essentially it's scanning through that string array which contains pointers to strings, so it's going to be really bad for CPU caches. Every string comparison is basically a cache miss. And there's a lot of animations to bind, so I wouldn't be surprised if this took 1s or so.
I inserted some timing code and yep, anim binding takes 0.65 seconds. Not huge but definitely worth investigating into.
Hmm... before I can get at optimizing the anim binding routine, seems that I have to indeed intern all node names. That should save some memory and probably speed up some things by itself.
Changed anim binding to use string interning. Dungeon loading is now 0.6 seconds faster. Good start!
Next suspect: render entity iterator. That's some shitty code, allocating temp table 🤮 *Seems* to take 0.5 secs (hard to get exact measurement because it's so scattered). I bet I could write a faster version in C++. If I could make it stateless that could be a big win.
Who wrote this code!? I have only faint recollections of any of this code I'm shifting through. Some of it is already 10 years old! 😅
Twitter does not seem to handle non-linear threads well (a couple of updates got separated). I'll continue posting progress reports here. Let's see if that works.
Anyway, we're now at 6.2 seconds, down from 10.5 secs. Pretty good! Continuing to look for stuff to optimize...
Next suspect: getModelByFilename 0.44 seconds. That function is only checking if a given 3D model is already loaded in memory, so it *should* be very fast. Probably another place with raw string comparisons & therefore lots of cache misses.
Yup, as I feared.
Funny thing is that loading *all* 3D models takes just 0.93 seconds, out of which 0.44 seconds is spent checking if a model is already loaded (the check is repeated for all models before loading). Ouch!
Optimized getModelByFilename. -0.4 seconds. We're at 5.8s now (down from 10.5s). Time to call it a day. Let's hope I didn't add too many new bugs :)
Continuing where we left off... A simple memoization optimization. Storing the value of a frequently computed expression so that later calls are faster. Now at 5.6 seconds. Let's see if we can break the 5 second barrier today.
Implemented string interning for scene graph node names. Saves ~2MB memory, but does not have measurable difference to load times. I guess I'll take it. :-)

There are some look ups by name in game code though so this should have a positive effect on runtime performance.
Hmm... this seems to be getting harder. Finding less hotspots to optimize.
Another small one. Replaced heap alloc with stack alloc for some temp data. Saves only about 40ms, but hey why not. Should also help with memory fragmentation. 3D model instantiation for *all* game objects in the game world takes only 0.18 seconds. Computers are fast these days!
Oh man, I realized the archive file system could be made much faster if Grimrock was a 64-bit app. In 64-bit mode we'd have a bigger address space and we could keep the 1 GB game data archive in memory all the time. Now we're opening/closing/seeking the archive like crazy.
But 64-bit port would be more work and could potentially make some people unhappy who are still running a 32-bit OS. Are there any left is a good question...
5 second barrier breached! After a few misc asset processing optimizations we're down at 4.75 seconds. This is still with warm file caches though. Have to look at cold startup times at some point too.
3D models are now loaded in 0.45 seconds.😎
You can follow @petrih3.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.