We were having some performance issues with framedrops and slowdown so I decided to do some testing and i isolated the cause of the lag. it was the way we were calling ingame functions. (continued in might be a long thread)
Basically our custom scripting DLL used to be something we used to modify the game's memory and inject custom code from other processes, basically how Cheat Engine works etc. Because of this, we'd been calling kernel32.dll functions from our code
these kernal32 functions are actually very fast and there's nothing wrong with them. the slowdown comes from the way we'd been calling ingame functions. basically we'd inject the assembly bytes into the game's memory, then since this library used to be a separate thing--
--accessing the game's memory, we couldn't just call the game code directly we'd basically make the game start a new thread and execute the code for us. it turns out that telling a process to create a thread is actually very VERY fucking slow, especially if you have to do it--
--as often as we do. however, i never knew just how slow it was and so as we migrated our library to a custom DLL that runs in the game's process space (same way dsfix, modengine, etc all work), I left our code the same, using kernel32 to interact with the game. it was simpler--
--and quicker to do that since we were busy trying to get a lot of things to work all at once. so anyways, back to the thing about slowdown. I didn't realize how slow creating a thread for every function call was until I added the blended omnidirectional dodges. these dodges--
--actually have to run an update loop every frame of the dodge, correcting the distance you travel to use the hypotenuse distance for diagonals instead of the shorter sideways distances averaged. i ended up needing to call ONE ingame function every frame during dodges. after--
--pinpointing that the function call was what caused the lag, i decided to run a benchmark. Here I decided to call that same function 3000 times every frame, which no modern CPU should have trouble with, since they are multiple gigahertz, you know--
--and the instant i pressed the dodge button i was SHOCKED at how absolutely ridiculous the lag was. you can check it out here. this recording is not edited in any way, this is exactly the speed the game ran (note: Undead King Jar Eel is not in this mod he is test chr)
so yeah over 5 seconds for 3000 calls each frame to dark souls functions was not acceptable. i decided to try taking advantage of the fact that our library now runs in the same process memory as the game.
i decided to try the Marshal.GetDelegateForFunctionPointer function built into C# and just create a function delegate to our injected code, only requiring us to change like 10 lines of code, so an easy change to make. Here's the same 3000 calls per frame, much better!
so after that i decided to increase it from 3000 to 30000 function calls per frame. it STILL runs faster than the old method this is just incredible
so yeah, I just wanted to share the good news of this and show the absolutely ridiculous before and after comparison because wow that was just wild
...oh and the craziest thing about this is that there's even further optimization we can do like storing the assembly bytes of each function call and patching in the function call arguments to pass to it instead of recreating the entire assembly bytes every time etc