After abandoning desmume development for a few months, I've been working on it quite a lot lately. Of course, not as much as when I started working on the project, as motivation to develop desmume is rare this days (just check the official desmume CVS, and you'll see what I'm talking about). But anyway, I've done some improvements on a very specific side of the project that I've avoided working on for the last 6 months: speed.
I've to be clear: desmume current way of handling most of the interrupts sucks. For example, timers, dmas and other, are checked PER opcode emulated. In an ideal world, you'd run a batch of instructions (the exact ammount would be determined by some heuristics or whatever), and then you'd do the processing needed by the pending interrupt / timer handling, then again back to running a batch of instructions. Of course, that'd be a less exact than current implementation, but it'd be WAY faster. I did something similar on my Gameboy emulator, and it gave good results, we'll see how it works on desmume, whenever I find time to implement it.
Anyway, I'll start with two screenshots, one before the speed ups I implemented the past 3 days (left), and another one with them (right):
As you can see, it's a practical 33% speed up on this particular case. I must say that it affects every game that I've tested on a considerable ammount, being the minimum a 14% (I'll explain later why). Just as a real world example, tested "Castlevania: Dawn of Sorrow" and it's 30% faster now... So let's talk about the speed ups.
First, desmume used/s GDI to draw the DS screens onto the main window. GDI setup is as easy as it can get, but seems that it's not as fast as I would wish (more keeping in mind that it's only task is to take an image and draw it on the screen). I considered a few options, namely DDraw, openGL or GDI+. I excluded openGL at the beggining, because it would be quite a lot of work to make the current 3D core work correctly with it. So after some discussing it with some friends (more exactly, hearing some DDraw bashing and why I should stop using DirectX7 features), I started working on a GDI+ blitter. In the end it was easier than I expected. Reading the documentation, implementing the blitter and a bit of adjusting was done in an hour. In fact, searching through the documentation was what took longer. The good part, is that it gave a free 14% perfomance gain.
The rest was a matter of fixing some tidbits of how memory is accesed, avoiding a lot of conversions from floating point to integer (which is way more expensive that you'd guess), changing the way the texture cache works, and more stuff I can't remember. I won't be any more concise about this stuff, as it would take quite a few pages, which I'm not willing to write now. Maybe I'll go back to them in a future post.
Just as an ending, it seems that my obsession about fixing all regression bugs that I see is giving good results, as most games that I test have perfect 3D. Here's some proof, Hotel Dusk running perfectly, even with the motion blur effect (I rotated the image with an image editor, because I don't have screen rotation implemented):
Have fun