I've spent the past few days working on 2D related stuff and minor fixes. Quite a few hours have been invested "only" logging and debugging some demos still not working properly, because, for example, some basic stuff seems to be still failing. Also, I've worked a bit onto getting more speed from 3D, which got a bit faster, but still needs quite a lot of work.
Today I'll talk a bit about the last one I've been working on, even if the actual implementation is more a raft from what it should be. I'm talking about the pixel blending, which can be used, for example, to cross fade from one image to another (as used in Zelda Gallery), or blending multiple images (for example, I think that's what it's used for "MP: First Hunt" menus). You can get a proper reference here. I've still to make some proper improvements, because I still don't support some effects that can be achieved with multiple source selections, for example, but I'll fix them in a few days.
Today I prefered to leave the technical details aside, as I don't really still feel confident about how the special color effects work. Anyway, here's the blend support as seen in Zelda Gallery (source, half blended, and destiny):
Wednesday, November 29, 2006
Wednesday, November 22, 2006
Capture unit
Today I've been working on the capture unit, which, from the examples (and games) I've been able to get my hands on is used mainly for using 3D output on both screens of the DS, motion blur or screenshots. Of course there're a lot of other uses (much more that I can think at the moment), but for example, it's used on the menus of "Metroid Prime: First Hunt" for all the moving stuff on both screens. My implementation right now is very vague (just did today in a few hours) and lacks support for all the modes, but at least got the menus showing better in Metroid.
Also did some misc bugfixing here and there, that's not even interesting to talk about. Only one of the minor tweaks in the texture handling routines, that I thought would give crappy results, fixed hell a load of stuff.
So, as the proverb says, a picture is worth a thousand words:
Also did some misc bugfixing here and there, that's not even interesting to talk about. Only one of the minor tweaks in the texture handling routines, that I thought would give crappy results, fixed hell a load of stuff.
So, as the proverb says, a picture is worth a thousand words:
Friday, November 17, 2006
Texture coordinate generation and flickering
I fixed quite a large amount of bugs, but the 2 most important ones, would be the ones from topic: texture coordinate generation and flickering.
The first one is quite simple, as most PC hardware, the DS supports texture coordinate generation. This is used, for example, if you want to have a pool with water, the water being a texture mapped plane, and want it to seem to wave: instead of having a large amount of geometry, just generate the texture coordinates varying to get the visual effect of waving. I had made an implemention long ago, but I always found samples which rendered wrong, and then when I fixed one of them, another was broken. The bug was rather stupid: as openGL and DS texcoord generation doesn't map 1:1, I just pre-transformed them, and then sent them to openGL. The problem, was that I was not resetting the texture matrix, and as a such, at last they got pre-transformed by me, and then transformed by openGL. Stupid bug of the year, for sure.
Flickering was a damn annoying bug, which I never got enough motivation to fix, it "just" involved retrieving the openGL framebuffer, then copying it to BG0, and send it to the layering pipeline that desmume uses. Sounds simple, in fact it's simple to implement, but the code is ugly, probably slower than it should be, and, by the fact it has to be transformed from 24bit to 16bit, it looks a bit uglier (as in the DS, :P)
And more, or less, that's all for today, let's just put the usual screenshot:
For further references on texcoords, here you can find a nice explanation.
Have fun :)
The first one is quite simple, as most PC hardware, the DS supports texture coordinate generation. This is used, for example, if you want to have a pool with water, the water being a texture mapped plane, and want it to seem to wave: instead of having a large amount of geometry, just generate the texture coordinates varying to get the visual effect of waving. I had made an implemention long ago, but I always found samples which rendered wrong, and then when I fixed one of them, another was broken. The bug was rather stupid: as openGL and DS texcoord generation doesn't map 1:1, I just pre-transformed them, and then sent them to openGL. The problem, was that I was not resetting the texture matrix, and as a such, at last they got pre-transformed by me, and then transformed by openGL. Stupid bug of the year, for sure.
Flickering was a damn annoying bug, which I never got enough motivation to fix, it "just" involved retrieving the openGL framebuffer, then copying it to BG0, and send it to the layering pipeline that desmume uses. Sounds simple, in fact it's simple to implement, but the code is ugly, probably slower than it should be, and, by the fact it has to be transformed from 24bit to 16bit, it looks a bit uglier (as in the DS, :P)
And more, or less, that's all for today, let's just put the usual screenshot:
For further references on texcoords, here you can find a nice explanation.
Have fun :)
Wednesday, November 08, 2006
No time
I'm really busy with stuff not related to emulation right now, so not much progress has been done. I've only done some work towards making 3D processing faster, but least than a 5% improvement was achieved :P
I've really been thinking about when I should release, and I narrowed it down to one option: I'll release when either Super Mario DS or Metroid Prime look right, that means fixing compressed textures or fixing the generated texture coordinates. I'll probably make other speed fixes, but compatibility comes first.
No interesting stuff today neither shots, but it's late and I'm too tired :P
I've really been thinking about when I should release, and I narrowed it down to one option: I'll release when either Super Mario DS or Metroid Prime look right, that means fixing compressed textures or fixing the generated texture coordinates. I'll probably make other speed fixes, but compatibility comes first.
No interesting stuff today neither shots, but it's late and I'm too tired :P
Friday, October 20, 2006
Been busy....
Been busy with real life and stuff... So only 2 old screen shots today, and no chit-chat:
Both of them have a few bugs: the first one texturing is missing, due to texture coordinate generation failing, and the second one has problems with some of the block types of compressed textures.
Oh, I disabled the comments moderation, if it turns out wrong again, I'll leave it enabled indefinitely.
Have fun :)
Both of them have a few bugs: the first one texturing is missing, due to texture coordinate generation failing, and the second one has problems with some of the block types of compressed textures.
Oh, I disabled the comments moderation, if it turns out wrong again, I'll leave it enabled indefinitely.
Have fun :)
Monday, September 25, 2006
Texture compression and register hacking
Well, today I'll talk a bit about how the ds texture compression works and what I've accomplished with it (not that much, really). It's actually quite simply, if I understood it correctly (not sure I did). Well, a compressed texture is theorically, quite simple. We use one texture slot (even if not fully) for actual texture data, being it texture slot 0 or slot 2 (there're 4 possible slots in the DS).
For every 4x4 pixels block, we'll use one 32bit number from slot0 or slot2, that'll define the actual pixel data, 8bit a row per 4 columns, makes the total 32bit. So we define every pixel with a 2bit number, that'll use after for palette indexing. Now we need some more data, a 16bit value: the palette offset and a mode how this will be used, that'll will be located in texture slot1.
So, the general idea is quite simple in the end. Read a 32bit value from either slot0 or slot2, read a 16bit value from slot1. Then, use the 16bit value to determine the mode (there 4 possible modes, each filling each row in different manners) and palette index. With this palette index, go through each of the 2bit values, contained in the 32bit, and add each of them (separately, of course, for each colour) to the palette index, to get the final palette index to be used for each pixel. Then, with this value, you can just index in the normal palette data to get the colour. Depending on the mode, colours will be (or not) treated in different manners. Seems it's not that simple :P
My implementation is far from complete, I've severe bugs that just show lots of garbage. I'll try to fix it tomorrow.
I also have been doing some severe register hacking, that is just, that some memory locations are mapped to specific hardware of the DS. So, for example, writing to 4000490h will push a new 10b vertex to the render queue. So, I've just hacked a few of the registers so make games/demos work better, even if it's not correct emulation.
Just a simple screenshot today:
For every 4x4 pixels block, we'll use one 32bit number from slot0 or slot2, that'll define the actual pixel data, 8bit a row per 4 columns, makes the total 32bit. So we define every pixel with a 2bit number, that'll use after for palette indexing. Now we need some more data, a 16bit value: the palette offset and a mode how this will be used, that'll will be located in texture slot1.
So, the general idea is quite simple in the end. Read a 32bit value from either slot0 or slot2, read a 16bit value from slot1. Then, use the 16bit value to determine the mode (there 4 possible modes, each filling each row in different manners) and palette index. With this palette index, go through each of the 2bit values, contained in the 32bit, and add each of them (separately, of course, for each colour) to the palette index, to get the final palette index to be used for each pixel. Then, with this value, you can just index in the normal palette data to get the colour. Depending on the mode, colours will be (or not) treated in different manners. Seems it's not that simple :P
My implementation is far from complete, I've severe bugs that just show lots of garbage. I'll try to fix it tomorrow.
I also have been doing some severe register hacking, that is just, that some memory locations are mapped to specific hardware of the DS. So, for example, writing to 4000490h will push a new 10b vertex to the render queue. So, I've just hacked a few of the registers so make games/demos work better, even if it's not correct emulation.
Just a simple screenshot today:
Monday, September 18, 2006
GPU transforms
Today I'll talk about some stuff that I hate about how the DS 3D hardware works. One of the problems with DS vertex submission, its that they are sent in fixed point, either 1.3.12 (16b per vertex) or 1.3.6 (10b per vertex). What that means, is that the you have signed numbers, with a integer range of [-8,8], and a fractional part of either 4096 parts a whole number (16b vertexes), or either 64 whole parts (10b vertexes). The joy of ancient hardware, today.
So, if you want to define bigger objects than 16 units height/width/depth, or simply, and object that is not in this 16x16x16 cube, what has to be done? Transformations come to the rescue. So, before you send a vertex/es, you just supply one or various transforms to the ds "transform unit" (it's just a matrix stack) that, for example will double your vertex positions. Even if I find this a shitty way to work (and more important, I thought this type of severely limited hardware dissapeared over a decade ago), it's not the worst part.
The worst part, is that this transform changes are permitted INSIDE the vertex submission list, so it's quite cpu/gpu intensive to emulate this behavoiur. Probably some caching will help a lot here, atleast for games that do not modify display lists while in game.
So, my first implementation of the DS 3D gpu didn't support changing some parameters inside a "vertex submission block". As an example, I'll just show what supporting that fixed:
So, if you want to define bigger objects than 16 units height/width/depth, or simply, and object that is not in this 16x16x16 cube, what has to be done? Transformations come to the rescue. So, before you send a vertex/es, you just supply one or various transforms to the ds "transform unit" (it's just a matrix stack) that, for example will double your vertex positions. Even if I find this a shitty way to work (and more important, I thought this type of severely limited hardware dissapeared over a decade ago), it's not the worst part.
The worst part, is that this transform changes are permitted INSIDE the vertex submission list, so it's quite cpu/gpu intensive to emulate this behavoiur. Probably some caching will help a lot here, atleast for games that do not modify display lists while in game.
So, my first implementation of the DS 3D gpu didn't support changing some parameters inside a "vertex submission block". As an example, I'll just show what supporting that fixed:
Tuesday, September 12, 2006
Slowdown of the day
Next version of desmume will have more accurate emulation, even if some new stuff will still be kind of hacked (for example, the 3D core uses openGL, even if I think a software renderer would be more accurate). The main problem with accurate emulation (or, to make it short, more stuff being emulated) is that it requires more processing power.
Today's post is about one of the features I added, that'll make the emulator slower in certain situations. To make it simple, I just added fading. The DS hardware, can fade in/out the backgrounds / sprites, and after these backgrounds and sprites are correctly layered, there's a master brightness (more fading in out) that affects the final mix. The problem with that, is that involves a multiplication and an addition per pixel. And, to make things worse, you've 3 components in a pixel (red, green, blue), and every pixel is a single 16bit value, so some bit shifting and masking is also needed. I'll probably get it faster in the future (simd instructions come to mind to make it way faster), but for now, it'll only make rendering slower.
Today's post is about one of the features I added, that'll make the emulator slower in certain situations. To make it simple, I just added fading. The DS hardware, can fade in/out the backgrounds / sprites, and after these backgrounds and sprites are correctly layered, there's a master brightness (more fading in out) that affects the final mix. The problem with that, is that involves a multiplication and an addition per pixel. And, to make things worse, you've 3 components in a pixel (red, green, blue), and every pixel is a single 16bit value, so some bit shifting and masking is also needed. I'll probably get it faster in the future (simd instructions come to mind to make it way faster), but for now, it'll only make rendering slower.
Some screenshots, as usual, one at the beggining of the fade-in, and one at the end of the fade:
It's way better on runtime, fades don't make good screenshots. More stuff coming in the next days, if I get enough motivation to explain more stuff.
Saturday, September 09, 2006
No news are good news
I know I've been quiet for a few, but as some of you might have guessed, I (more or less) hate talking about comercial games working on an emulator. The reason of the lack of updates, is that the more impressive updates only show results on comercial games.
Well, there's a lot of technical details about how I got this working, but today I'll forget about the details. I'll talk about how I got it working another day, if anyone's interested. You'll have to just remind one thing, while watching this screenshots:
WE DON'T KNOW WHEN WE'LL RELEASE, NEITHER COMPATIBILITY: ASKING ABOUT IT WILL JUST GET THE RELEASE DELAYED
Seriously, I hate being rude, but: I love emulation, but as a coder, it's a though job, and I don't want to be worried about release dates. And, more important, it's annoying working on a though problem and just seeing forums post about: "Were I can download it?". So keep in mind, this are only Work In Progress screens.
Anyway, an image (or two) it's worth a thousand words:
Well, there's a lot of technical details about how I got this working, but today I'll forget about the details. I'll talk about how I got it working another day, if anyone's interested. You'll have to just remind one thing, while watching this screenshots:
WE DON'T KNOW WHEN WE'LL RELEASE, NEITHER COMPATIBILITY: ASKING ABOUT IT WILL JUST GET THE RELEASE DELAYED
Seriously, I hate being rude, but: I love emulation, but as a coder, it's a though job, and I don't want to be worried about release dates. And, more important, it's annoying working on a though problem and just seeing forums post about: "Were I can download it?". So keep in mind, this are only Work In Progress screens.
Anyway, an image (or two) it's worth a thousand words:
Tuesday, August 29, 2006
Improvements to the 2D core
After spending the last 9 days fixing, improving and rewriting most of the 3D core, I looked for some 3D stuff to test. While searching I found a few 2D demos that still didn't work properly. They run without crashing, but were rendering incorrect stuff. The reason is quite simple: desmume doesn't support rotated / scaled sprites yet (among other bugs).
So, yesterday I decided that I should give a try to the 2D core, as the rotation / scale stuff looked like an effect called "rotozoomer", that I already coded in the past. The stuff looked simple, but actually took 2 entire envenings to work correctly. The implemention is far from being finished, but all the sprites that have scaling/rotation and I don't support, get rendered like in the last released desmume, so at least, I don't broke anything.
Currently, I only support rotation when sprite is 256 colours / direct color (16bits) and the sprite is not flipped. I think that I'll get 16 colour sprites support done in no time, but I've not yet looked at it. Supporting flipped sprites will be harder, as I don't have any demos that already do that. I guess I'll have to change some of the available examples to enable flipping in them.
Enough chit chat, this is an example of how it's working now, a rotated and scaled sprite, with double size (this last one simply doubles the clipping rectangle, so if you've an square rotated 45 degrees, corners don't get cut):
So, yesterday I decided that I should give a try to the 2D core, as the rotation / scale stuff looked like an effect called "rotozoomer", that I already coded in the past. The stuff looked simple, but actually took 2 entire envenings to work correctly. The implemention is far from being finished, but all the sprites that have scaling/rotation and I don't support, get rendered like in the last released desmume, so at least, I don't broke anything.
Currently, I only support rotation when sprite is 256 colours / direct color (16bits) and the sprite is not flipped. I think that I'll get 16 colour sprites support done in no time, but I've not yet looked at it. Supporting flipped sprites will be harder, as I don't have any demos that already do that. I guess I'll have to change some of the available examples to enable flipping in them.
Enough chit chat, this is an example of how it's working now, a rotated and scaled sprite, with double size (this last one simply doubles the clipping rectangle, so if you've an square rotated 45 degrees, corners don't get cut):
Saturday, August 26, 2006
Perfection looks good
It's not a secret that I've spent the last days trying to get the DS Graphic FIFO commands working (also known as display lists). For those that don't know what I'm talking about, it's actually quite simple. If you want to render 3D objects in the DS, you've at least two ways to do it: inmediate and the FIFO. With inmediate, every vertex/normal/texture/etc coordinate is sent separately. It's theorically the slowest way. With the FIFO, you just have a queue of commands (vertex/normal/texture/etc), and then the DS interprets it. Well, enough chit chat.
This morning, I've finally fixed one of the bugs in my current core. The bug was quite stupid: the DS has 6 ways to send a vertex position, with different accuracy, and different number of components. One of them, is a relative position to the last coordinate sent, so, for example, if the last vertex was {1,0,0} and you want to set a new vertex at {1,2,3}, you cand just sent a relative vertex with coordinates {0,2,3}. That's a good idea, as the relative coordinates are actually half the size of a full accuracy vertex.
The differences are quite obvious:
This morning, I've finally fixed one of the bugs in my current core. The bug was quite stupid: the DS has 6 ways to send a vertex position, with different accuracy, and different number of components. One of them, is a relative position to the last coordinate sent, so, for example, if the last vertex was {1,0,0} and you want to set a new vertex at {1,2,3}, you cand just sent a relative vertex with coordinates {0,2,3}. That's a good idea, as the relative coordinates are actually half the size of a full accuracy vertex.
The differences are quite obvious:
I'll let you guess which one is the fixed one :P
Have fun, I'm back to coding.
Tuesday, August 22, 2006
DS 3D core fixes and enhacements
Since yesterday, I've been focused on fixing all the 3D demos that the devkitpro provides. As of now, I can say I fixed almost 95% of the issues with them. The 3D core was very primitive, just to name a few bugs, all the blending (transparency) was broken, all the texture formats were badly handed (the alpha was always a tiny value, so it broke all the transparency), and MANY others.
One of the things that I really wanted to fix, were the nehe samples, as only one of them ran. After adding all the missing 3D commands I could find missing, all of them run perfectly. Some screenshots about fixed things and examples of new features:
Just one of the demos I've doesn't run at all, and two others show very minor glitches (one of them the one with the alien sitting above). When I've finally fixed the missing bugs in the display list render code, I'll focus in the cpu core again.
One of the things that I really wanted to fix, were the nehe samples, as only one of them ran. After adding all the missing 3D commands I could find missing, all of them run perfectly. Some screenshots about fixed things and examples of new features:
Just one of the demos I've doesn't run at all, and two others show very minor glitches (one of them the one with the alien sitting above). When I've finally fixed the missing bugs in the display list render code, I'll focus in the cpu core again.
Monday, August 21, 2006
DS hardware crash course
Well, last friday I decided, after a quite stressing work day, to work a bit on the Desmume. As a source base, I used the last source found in Normmatt's blog. After a bit of testing of the binary provided, I found a few tiny details, the one that catched most my attention, is that it failed to pass a lot of the tests of the ARMWrestler by Mic (of Dualis fame). So, as I like writing cpu cores, and I didn't know anything about ARM cpus, I decided to try to fix some of the tests to learn some ARM assembly.
After about 9h of hardcore debugging, I fixed more than half of the errors that it reported. The next day, after talking with Normmatt and sending him the improved core, he sent me an updated source base, that included the 3D and sound cores written by yopyop (the original author of desmume). One of the main bugs with the 3D, is that the screen where the 3D is shown flickers a lot due to the way Desmume handles window updating. There were lots of other bugs, like the 3D api init failing on certain configurations, most of the libnds examples not working, etc.
So, lots of work ahead :)
Too keep a long story short, I'll just enumerate the changes:
- Fixed 3D core init.
- Fixed crash if a non mapped key was pressed.
- Preliminar flicker fix (this is VERY preliminar).
- Added lots of 3D display list command handling (needs more work).
- Fixed some of the 3D commands.
- Added sort of a "debug texture" when texture format is not yet handled.
- Fixes on 3D texturing.
- Thumb LDR and STR opcodes fixed.
- Arm LDR** and STR** opcodes fixed.
- Other fixes I cannot remember :P
And that in only 3 days :P
Here's an screenshot of a fix in the texturing (probably the easiest fix of all), of the nehe10 example provided with the libnds (most of the other nehe samples work, too):
Don't even ask about a release date, it'll be released when it's done. Neither about testing or something in that line.
After about 9h of hardcore debugging, I fixed more than half of the errors that it reported. The next day, after talking with Normmatt and sending him the improved core, he sent me an updated source base, that included the 3D and sound cores written by yopyop (the original author of desmume). One of the main bugs with the 3D, is that the screen where the 3D is shown flickers a lot due to the way Desmume handles window updating. There were lots of other bugs, like the 3D api init failing on certain configurations, most of the libnds examples not working, etc.
So, lots of work ahead :)
Too keep a long story short, I'll just enumerate the changes:
- Fixed 3D core init.
- Fixed crash if a non mapped key was pressed.
- Preliminar flicker fix (this is VERY preliminar).
- Added lots of 3D display list command handling (needs more work).
- Fixed some of the 3D commands.
- Added sort of a "debug texture" when texture format is not yet handled.
- Fixes on 3D texturing.
- Thumb LDR and STR opcodes fixed.
- Arm LDR** and STR** opcodes fixed.
- Other fixes I cannot remember :P
And that in only 3 days :P
Here's an screenshot of a fix in the texturing (probably the easiest fix of all), of the nehe10 example provided with the libnds (most of the other nehe samples work, too):
Don't even ask about a release date, it'll be released when it's done. Neither about testing or something in that line.
First post
In this blog I'll talk mainly about emulator developing. Right now I'm focused on working on DSmume, and improving the 3D core and the cpu cores.
Subscribe to:
Posts (Atom)