Drunken Coders

Breaking new ground in console homebrew development

Been working on a back end to host coding competitions and could use some help testing it out. Come check out our first compo:

Pong

Current Entries

Posted by dovoto

Hello!

I’ve been messing around with electronics lately (after all, that’s what I’m studying :P), and this is the result! A 32×8 LED Matrix controlled by a PIC18F2550. You can click in the pictures for full size! At the end of the post there is a video of the screen running.

Everything together

(more…)

Posted by AntonioND

This wont help but if you find it fun to watch then you are probably on the right website!

Posted by dovoto

Finally, a bitcoin mining with a purpose!

Although a touch convoluted by using a Raspberry Pi to fetch bits of work and a webcam to read the result from the NES in the form of a screen color change its still pretty slick.

RetroMiner

Posted by dovoto

This may be one of the more ridiculous things I have ever seen…an 8 bit serial computer built ENTIRELY of NAND gates and wire routing…

Crazy attention to detail. If you don’t follow kevtris you should.

What is a NAND computer? Well a quick refresher on digital logic:

There are a few fundamental building blocks to constructing a digital circuit. These are gates, which perform the basic boolean logic operations (OR, AND, NOR, NAND, XOR, NOT) and latches/flipflops which preserve the value of a bit. It turns out that you can build any gate from a NAND gate and so can build flip flops and latches from NAND gates as well. (for more info see: http://en.wikipedia.org/wiki/NAND_logic )

The picture you see below is a whole bunch of NAND gates in discreet logic chips which are wired together to form an entire CPU. The better you understand it the more you will realize how ridiculous it is that someone thought to do it and furthermore how crazy they must have been to actually finish it.

http://blog.kevtris.org/

Posted by dovoto

Now this is just pretty slick. It seems these guys decap and photo old chips, then trace them out using some custom tools and convert the images to netlists for use in a javascript simulator. The result? A gate level simulation of the entire NES PPU you can interact with on a web page (at about 2Hz mind you) including the ability to trace signals and nodes. Its pretty crazy and worth checking out.

http://www.qmtpro.com/~nes/chipimages/visual2c02/

http://www.visual6502.org/faq.html

Posted by dovoto

sverx has posted some memory benchmarks in an attempt to answer that ever annoying question of which memory copy method is fastest on the DS.

Some of the more interesting results:

  • copying using DMA is both the slowest option (when copying to main RAM) and the fastest one (when copying to video RAM)
  • memcpy() gets overtaken by almost every other method, so I guess one should use it only when prototyping or when performance is not an issue
  • reading from an uncached address seems to give some % of boost when using ldmia/stmia; it just makes things worse when using memcpy()
  • Loop unrolling doesn’t give any advantage when copying to main RAM; on the contrary, it effectively speeds up a bit when copying to video RAM
  • reading from a cache unaligned source address can slow down things a bit, especially when reading from a cached address
  • using a DTCM temporary copy doesn’t help

Read the full post here: A DS Homebrewer’s Diary

Posted by dovoto

If you have ever been interested in Nintendo Entertainment System, Apple II, or Atari programming you have probably ran into the 6502 processor. Here is a great site introducing assembly on the somewhat dated CPU complete with an interactive assembly language interpreter. I think its pretty awesome.

http://skilldrick.github.com/easy6502/

Posted by dovoto

Smealum has released a new teaser for his upcoming Portal recreation on the DS. Its looking pretty impressive and worth checking out.

Posted by dovoto

For those of you doing some mobile cross platform dev, MonoGame 3.0 was just moments ago released. Some features:

What’s New?

3D (many thanks to Infinite Flight Studios for the code and Sickhead Games in taking the time to merge the code in)
New platforms: Windows 8, Windows Phone 8, OUYA, PlayStation Mobile (including Vita).
Custom Effects.
PVRTC support for iOS.
iOS supports compressed Songs.
Skinned Meshs
VS2012 templates.
New Windows Installer
New MonoDevelop Package/AddIn
A LOT of bug fixes
Closer XNA 4 compatibility

Check it out: MonoGame.net

Posted by dovoto

I made a bit of a change to the site to suppress some of the more annoying spam…but really…not sure who comes here anymore. If anyone is interested in helping out (or taking over) the admin for the site drop me a pm (email: jason @ this domain).

Posted by dovoto

For the battle scene, I have created 2 characters from Saint Seiya Anime : Pegasus seiya and Lion Aiolia.
I used GIMP to create the sprites, using gameboy advance characters as base and overwriting them.

Here are the sprites : 2 96×96 pixels lion and 3 96×96 pixels seiya. Knowing that DS will take 32×32 sprites, they have been split in 9 to fit the DS needs.

Background is coming directly from sega Megadrive’s Shining Force :

After that, some programings and here we go in the DS. 
You can download and rename the following file with .nds instead of .jpg and launch it on emulator or DS :

 http://ndssstactics.drunkencoders.com/files/2012/12/DS_SaintSeiya_092012.jpg

Posted by GuiguiPanda

I have been playing around with windows metro dev and since I have been a big fan of C# for quite a few years I thought it might be fun to put together a quick C# game for the app store….but it seems C# has been left out of the Direct X loop by MS this time around.

I found that rather dissapointing and went in search of options.

What I found was SharpDX and monogame.

Interestingly, these are rather easily cross platform. I did a windows, xbox, and android build with a hanfull of #ifdefs sprinkled about.

The result is my work in progress:

Feel free to post XNA or metro topics to the Questions thread.

Posted by dovoto

Here is a simple way using 16 random tiles and a random palette:

/*---------------------------------------------------------------------------------

	Faking static

---------------------------------------------------------------------------------*/
#include <nds.h>
//---------------------------------------------------------------------------------
int main(void) {
//---------------------------------------------------------------------------------
	int tileCount = 16;

	videoSetMode(MODE_0_2D);

	int bg = bgInit(0, BgType_Text8bpp, BgSize_B16_256x256, 0, 1);

	u16* tiles = bgGetGfxPtr(bg);
	u16* map = bgGetMapPtr(bg);

	//create 16 tiles filled with random colors
	for(int i = 0; i < tileCount * 32; i++)
		tiles[i] = rand();

	//create a map filled with those random tiles
	for(int i = 0; i < 32 * 32; i++)
		map[i] = rand() % tileCount;

	while(1) 
	{
		//create a random palette every frame
		for(int i = 0; i < 256; i++)
		{
			//a grey scaled palette...could have used colors but this looks better
			int shade = rand() % 32;
			BG_PALETTE[i] = RGB15(shade, shade, shade);
		}
		swiWaitForVBlank();
	}

}

Here is a slightly more complex way using a single 16 color tile and some hblank tricks:

/*---------------------------------------------------------------------------------

	Faking static

---------------------------------------------------------------------------------*/
#include 

void hblank(void)
{
	REG_BG0HOFS = rand() & 0xFF;
	REG_BG0VOFS = rand() & 0xFF;
}
//---------------------------------------------------------------------------------
int main(void) {
//---------------------------------------------------------------------------------
	videoSetMode(MODE_0_2D);

	int bg = bgInit(0, BgType_Text4bpp, BgSize_T_256x256, 0, 1);

	u16* tiles = bgGetGfxPtr(bg);
	u16* map = bgGetMapPtr(bg);

	irqEnable(IRQ_HBLANK);
	irqSet(IRQ_HBLANK, hblank);

	//create 1 tile filled with random colors
	for(int i = 0; i < 16; i++)
		tiles[i] = rand();

	//create a map filled with those random tiles (only bits we are setting are the 4 bits for palette and the two for h and v flip)
	for(int i = 0; i < 32 * 32; i++)
	{
		map[i] = rand() & (0x3F << 10);
	}
	while(1) 
	{
		//create a random palette every frame
		for(int i = 0; i < 256; i++)
		{
			//a grey scaled palette...could have used colors but this looks better
			int shade = rand() % 32;
			BG_PALETTE[i] = RGB15(shade, shade, shade);
		}
		swiWaitForVBlank();
	}

}
Posted by dovoto

Using the new windowing API to enable and display a window

First, as always, lets look at the full source for the demo. Notice that it sets up and displays our traditional drunkencoders logo, enables the window and finally lets us mess with its settings.

The DS has 3 hardware windows which can be used to mask parts of the screen. Two of them are simple rectangles, you give the DS the top, bottom, right, and left of the square and it creates a mask of that size. We can control what layers are rendered inside the window and which ones are rendered outside the window.

The other window is an Object Window and it uses a sprite object as the mask (basically any pixel in the sprite that is not zero becomes “in window”).

All that is required to use a window is first to turn it on, second to define if you want your layers or objects to appear inside the window or outside all windows, and finally define the top, left, bottom, and right positions of the window..

This brings us to our first bit of code to enable a window and set how that window effects our background.

If we did not set our background window options the background would neither show up inside nor outside the window and therefor would not be displayed. It is perfectly acceptable to have the bg show up inside multiple windows or outside all windows or both (just OR the window options together which you pass to bgWindowEnable).

We next set the starting x, y, and size of our window box:

Next is our main loop:

Inside this loop we alter the x and y with key presses so the window moves about, we adjust the size as well so we can shrink of grow it.

We also add a little feature that allows us to alter the window by pressing X and Y keys. Pressing Y tells the DS to display the part of the background that is outside all windows, pressing Y causes it to display only inside window 0. If you were to not call bgWindowDisable to turn off the unwanted option the background would simply end up being displayed both inside window 0 and outside all windows.

Finally, we set the bounds on our window based upon the settings we had altered above:

Posted by dovoto

What follows is a very detailed look at the sprite hardware on the DS including memory and register layout and a listing of basic libnds functions dealing with sprites. This post is a bit technical and does not need to be understood before moving on to the sprite examples (in fact, an argument can be made that it is best to skip all but the overview until you have gotten a sprite or two to display).

Overview

Keep in mind the DS has two distinct 2D graphics cores which each handle one screen. The main engine and the sub engine have few differences. We will focus on the main engine in this discussion and point out any differences we encounter along the way. Recall, that the engines do not imply a specific physical screen as either engine can be used to control either the top or bottom screen.

To begin, each core is capable of controlling 128 independent sprite objects which share a common graphics and palette memory.

Each object can have a rotation, blend, or mosaic applied (although not strictly independently as we shall see).

Each sprite can be one of 3 color formats: 256-color paletted, 16-color paletted, or 16-bit direct color bitmap.

Each sprite can be one of the following pixel sizes:

SizeSquareWideTall
08×816×88×16
116×1632×88×32
264×6464×3232×64

Sprites can contain transparent pixels allowing sprites of any size or shape to be represented as long as they can fit inside one of the above boxes. If you need even larger sprites we will learn a few tricks to stretch them and combine them to create sprites of any size.

Memory Layout

Main Engine:
OAM Attributes (1KB): 0×07000000
Palette Memory (1KB): 0×05000200
Graphics Memory (256KB): 0×6400000
Extended Palette Memory: (Memory mapped so varies)

Sub Display
OAM Attributes (1KB): 0×07000400
Palette Memory (1KB): 0×05000400
Graphics Memory (128KB): 0×6600000
Extended Palette Memory: (Memory mapped so varies)

Object Attribute Memory:
Each core has what is known as Object Attribute Memory which contains the state for our 128 sprites. This state is stored in a set of 3 16-bit attributes (one set for each sprite) which control nearly every aspect of the sprite.

Attribute 0:

BitsDescription
0-7The Y coordinate of the top of the sprite
8The rotation scaling flag which determines if a scale/rotation is to be applied. The setting of this flag controls how several of the bits below are interpreted.
9Size double / hide sprite flag: When the rotation and scaling flag is set this flag will allow the sprite to double in size when a scaling or rotation is applied. When the rotation scaling flag is not set this bit will hide the sprite if set (very useful).
10-11Display mode: 0 is a normal paletted sprite, 1 is a blended sprite, 2 means the sprite is acting as an window mask, and 3 is used to specify a bitmap sprite.
12Mosaic flag: when set the mosaic will be applied
13Color depth: 0 = 16 color 1 = 256 color
14-15Object shape: 0 = Square, 1= Wide, 2 = Tall

Attribute 1:

BitsDescription
0-8The X coordinate of the top of the sprite (one more bit than Y as the screen is wider)
9-13Bits 9-13 are used to store the 5 bit index into the 32 possible rotation/scaling attributes to apply to the sprite. When the rotation and scaling flag is not set bits 9-11 are unused and bits 12 and 13 can be used to flip the sprite.
12When the rotation and scalling flag is NOT set this bit sets the horizontal flip flag, when set the sprite will be flipped horizontally. When the rotation scaling flag is set this bit is used as part of the rotation attribute selection
13When the rotation and scalling flag is NOT set this bit sets the vertical flip flag, when set the sprite will be flipped vertically. When the rotation scaling flag is set this bit is used as part of the rotation attribute selection
14-15Selects the object size (see table above)

Attribute 2:

BitsDescription
0-9Selects the offset of the start of the sprites graphics
10-11The sprite priority (what order it is drawn with respect to other sprites and backgrounds, two sprites with the same priority will be drawn in order of OAM number)
12-15Palette number when in 16-color mode or when using extended palettes in 256-color mode. These bits control the alpha blend when in bitmap mode.

Rotation and Scaling Attributes:
Following each set of 3 OAM attributes is a single 16-bit rotation and scale attribute. These 16-bit values are what is used to control the rotation and scale of a sprite. It actually takes 4 rotation attributes to describe the rotation and scale which means there are only 32 available rotations which can be applied (128 sets of oam attributes, 4 sets per set of rotation attributes, 128/4 = 32).

The layout in memory is as follows:

AddressDescription
Sprite 00×07000000Attribute 0
0×07000002Attribute 1
0×07000004Attribute 2
0×07000006Rotation 0 PA
Sprite 10×07000008Attribute 0
0x0700000AAttribute 1
0x0700000CAttribute 2
0x0700000ERotation 0 PB
Sprite 20×07000010Attribute 0
0×07000012Attribute 1
0×07000014Attribute 2
0×07000016Rotation 0 PC
Sprite 30×07000018Attribute 0
0x0700001AAttribute 1
0x0700001CAttribute 2
0x0700001ERotation 0 PD
Sprite 1260x070003F0Attribute 0
0x070003F2Attribute 1
0x070003F4Attribute 2
0x070003F6Rotation 31 PC
Sprite 1270x070003F8Attribute 0
0x070003FAAttribute 1
0x070003FCAttribute 2
0x070003FERotation 31 PD

As you may notice, each rotation attribute contains 4 values named PA, PB, PC, PD which form an affine matrix. For a complete and rather detailed description of this matrix please read the link above. Don’t pay much attention to any code presented in it, we are not going to use it as we have our own code for such things (although its perfectly fine for consumption of you so chose).

Paletted Sprites:
By selecting the appropriate bits in the attributes described above you can select either 16 color or 256 color palettes. A 16 color sprite uses half the graphics memory of a 256 color sprite which uses half the memory of a direct color bitmap sprite. By trading the number of colors for memory consumption we can make some smart decisions about the usage of DS graphics resources.

As we alluded to above, the sprite palettes are shared and we can control per sprite what color format it is in (sort of).

To display a sprite we simply load some colors into our palette then load the bitmap into sprite memory…unfortunately there is one small item left to discus and that is how the pixel data is actually stored.

Sprite pixel graphics, much like background graphics are constructed of tiles. What that means is that if you have a 16×16 sprite it is actually stored internally as 4 8×8 tiles. To translate your sprite from your graphics program to something the DS can understand we will need a tool. That tool is grit.

The tiles themselves can be stored in memory either linearly for each sprite (known as 1D mapping) or they can be stored in a 2D grid of 32×32 tiles in 16-color mode or 16×32 tiles in 256-color mode (2D mapping). There are few uses for 2D mapping and it makes it rather challenging to organize your graphics so we are pretty much going to let it go at that and stick to 1D mapping.

The offset from attribute 2 is used to specify where in memory the first tile of the sprite graphics reside. How we calculate that depends on another setting in the main video display control register. This setting determines the stride between starting tiles for sprites and can be 32 bytes, 64, 128 or 256. This setting allows you to reach all of sprite graphics memory by a single 9 bit offset stored in attribute 2. Location will be: Start of graphics memory + offset * stride.

Extended Palettes
You can enable extended palette memory for both backgrounds and sprites globally (for each engine). Once this is done you will have access to 16 256-color palettes for your sprite. Normal sprite palette memory will be ignored in this case. Extended palettes take a bit more work to set up and manipulate but it is enough for now to know we have to allocate a VRAM bank to them and we can only load palette data into that bank when it is unmapped.

Bitmap Sprites:
The DS also allows for direct color bitmap sprites. These are very useful not only because they allow the full range of color in your sprite but also because they allow you to specify an alpha blend value in the unused bits that normally specify the palette index to use. In this mode each pixel is a 15 bit color value with the most significant bit an alpha flag. This alpha flag is how you specify a transparent pixel (0 means transparent, 1 means draw it).

There is no palette so the only issue is how to store the sprite graphics data in memory. In bitmap mode sprite graphics are not tiled but stored as a linear bitmap. When in 1D mapping mode each sprite graphic is stored sequentially and the offset is simply offset * stride (where stride is 128 or 256).

In 2D mode the sprite graphics memory is treated as a single large bitmap that is 128 or 256 pixels wide. The offset gets split into a 5 bit X and a 4 bit Y value (or 4 bit X and 5 bit Y depending on the chosen stride). You specify the x and y of the top left corner of your sprite in the large bitmap (divided by 16).

That is it for our sprite technical overview. For a more in-depth look at these features be sure to check out the examples.

Posted by dovoto

Obviously, a sprite is a small creature, normally winged, that tends to flit and flutter about the place and do generally spritey things.

What does that have to do with computer graphics? Not much really; however, sprites also happen to be the name given to a certain hardware feature often exploited for game development.

There are other terms for sprites but we are not going to get into a history lessen It is enough to know that anything that tends to move or animate independently of the main background image is referred as a sprite.

If you have hardware which can render these little features then your job as a computer game programmer is not only easier but significantly more efficient.

As it happens, the DS has one of the better sets of sprite rendering hardware available for a 2D console. The answers you will find in this section are all about how to convince the DS hardware to display and animate our beloved game characters.

Posted by dovoto

Introduction

In yesterday’s chapter we talked at length about how to compose an image on screen by manipulating each pixel until our scene was formed. While this gave us a great deal of control over the final result we quickly realized the DS is not quite a software rendering powerhouse.

To compensate for a relatively slow processor and an even more limiting amount of VRAM the DS includes several hardware features, the result of which is the most advanced dedicated 2D processing systems ever placed in a video game console.

Tile based rendering is the key component of this 2D technology and understanding it will allow you to squeeze enormous, detailed, and fully interactive worlds from the seemingly limited DS resources.

Tile Modes

What are tile-based graphics? Put simply, it means to describe your scene using a mapping of tile indexes to tile graphics. Instead of describing the screen as a 2D matrix of pixels we are going to describe it as a matrix of tiles where each tile represents a small bitmap. Let us look at one of the better-known tile based games and get a feel for how it was put together.

Below is all the graphics used to construct the entire overworld of the original Zelda.

You may recognize these little 16×16 chunks as pieces of the Zelda world and perhaps you could imagine that in order to describe the look of the overworld all one would have to do is store which tile goes where. For instance the following familiar scene could be represented by an array of tile numbers.

Such as this:

short map[] = {
64,64,64,64,64,64,64,06,06,64,64,64,64,64,64,64,
64,64,64,64,07,64,62,06,06,64,64,64,64,64,64,64,
64,64,64,62,06,06,06,06,06,64,64,64,64,64,64,64,
64,64,62,06,06,06,06,06,06,64,64,64,64,64,64,64,
64,62,06,06,06,06,06,06,06,63,64,64,64,64,64,64,
06,06,06,06,06,06,06,06,06,06,06,06,06,06,06,06,
64,67,06,06,06,06,06,06,06,06,06,06,06,06,64,64,
64,64,06,06,06,06,06,06,06,06,06,06,06,06,64,64,
64,64,06,06,06,06,06,06,06,06,06,06,06,06,64,64,
64,64,66,66,66,66,66,66,66,66,66,66,66,66,64,64,
64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64
};  

To render the scene all you would have to do is loop through the indexes stored in the map array and use those values to blit each tile to the screen. The entire world map would only end up being a few KBs in size instead of the 10s of Megabytes it would take to store it as a big image. The trade off of course is the increase in the amount of time it takes to render because we have to do a conversion between this map and the final bitmap we want on the screen.

Fortunately (and hopefully obviously at this point) the NDS 2D hardware is built for just this purpose making rendering tile based worlds a snap. All we really need to do is create a map and a tile set, place them into 2D video memory and tell the DS where to find them and it will do the magic for us.

In order to meet this first goal of placing tiles and maps into memory we must know where in memory to place them and in what format the NDS expects this data. This brings us back to the seemingly ever-present task of video memory management and memory layout.

Maps and tiles are laid out in a rather strait forward way. Tiles are simply stored sequentially in video memory. Maps which are 32 tiles wide are stored as a simple linear array allowing a direct copy from your map data to map video memory.

It turns out you can place maps anywhere in the first 64KB of background memory and you can place tiles anywhere in the first 256KB. You can set the location of your map by setting the “map base” in the bgInit (or directly in the background control register if that is your thing). Each base is 2KB apart. Since all backgrounds share video memory you have to make an effort to ensure they dont overlap.

Similar to maps, tiles also have an offset you can set with “character base” in bgInit. Each base is 16KB for tiles. Often you will put your maps at the beginning of video memory and your tiles after because the maps can only be stored in the first 64k of video memory.

To load a map into memory you pick a block offset and write the map there, then tell the DS were to find it. You then do the same for your tile graphics. Finally you load a palette and call it a day (there may be a few more details).

One thing we need to figure out is the format of tile and map data. It turns out this is rather straight forward.

Map Entries

There are two forms of maps. Ones with 8 bit entries and ones with 16-bit. The 8-bit flavor are simply an offset into character memory. For instance if you want the third entry in your map to use tile number 4 you just stick a 4 in that maps entry.

8-bit indexed maps are used only for “Rotation” backgrounds. “Text” and “Extended Rotation” use the more flexible 16 bit indices.

16-bit indexes are broken up into character index and control bits. The low 10 bits represent the index of the character and allow you to address up to 1024 unique characters. The next two bits will cause the character to flip vertically or horizontally. Finally there are 4 bits which let you choose a palette.

BitsDescription
0-9The tile index
10Horizontal Flip
11Vertical Flip
12-15The palette index

To create a map you fill an array of short ints with these character indexes and you can control not only which character is rendered to the screen but what palette it uses and if it is flipped. If you ignore the flip bits and the palette bits you can treat it as a simple character index (you are still limited to 1024 tiles though).

I think at this point we know just enough about maps to get into trouble so let us see if we can trick the DS into displaying one.

First Map Demo

In this demo we are going to attempt to reproduce the screen depicted above, the starting point for one of my favorite games of all time. To keep things simple we are not going to use any external graphics files or map editors but we are going to code the map and the tile graphics by hand. Our goal is not a perfect reproduction mind you…only to get the idea. Lets take a look at the entire source code:

What do we get when we run it?

Lets take a look at the code. Notice the first thing we do is we define our map. This map was hand coded in that I decided how to place the tiles and edited the map array directly. Because this is a bit tedious I chose to only use 3 tiles. A tan one for the ground, a green one for the rocks, and a black one for the stairwell.

Next we have an array for our tiles. 3 8×8 tiles are defined each of a solid color. Normally we will use a graphics program to draw our tiles and a map editor to build our maps but today is a day for simplicity. Notice each pixel in our tiles is simply a number which is a color index. In order to make it the color we want we will need to set the colors in the palette.

Now that our map and tile data is defined lets take a closer look at main starting with the video init code.

Notice we set mode 0 which gives us 4 regular backgrounds layers to play with. We also give our background engine some video memory to utilize else it would have no room for our map or tiles.

Next we init a background layer. We chose layer 0, and set its type to 8 bit per pixel (256-color paletted) and chose the smallest background of 256×256 pixels (32×32 tiles).

We manually set some colors for our tiles to use. Recall that the black tile used color index 0 so we set the palette index 0 to a Red Green Blue of (0,0,0). The other two colors follow (I used a paint program color picker to figure out what combination of red green blue made tan and the tint of green I needed).

Copying in the tiles is pretty strait forward. Just use the hardware memory copy function of the ds to move our tile array to the background’s tile graphics memory.

Finally, the only complicated bit of code in the demo. Had I the patience to hand make a 32×32 tile map i could have just done a strait copy from my map to the hardware map…but I was lazy. The original Zelda tiles are 16×16 which makes their map about 1/2 the size as we need. For every tile of my map I copy in two tiles into the hardware map. This is done simply by dividing the x and y index on my map by two and remembering that the indexes are integers (any thing left over from a division of integers gets truncated).

Hopefully that wasn’t too complicated. In the next few demos we will use graphics editor to draw our tiles and a map editor to build our maps and take some of the tedium out of the process.

Posted by dovoto

[edit] What is a register

The first concept we must get under our belts (and the only one that really matters at the moment) is the concept of memory mapped registers.

Now, I am sure you are aware that the DS has several different chips inside responsible for creating the images and sounds that accompany most games. There is sound hardware responsible for producing annoying chip tunes, the video hardware which puts all your convoluted data together in a nice and pretty display, the memory chips that hold the data for our programs, and the CPUs which are in overall control of the whole shebang (in all honestly many of these “chips” are actually just parts of one large integrated circuit and not really separate chips).

When you are writing c code to describe the events in your game you are directly controlling the DS processors. But, the CPUs do not work alone and generally we like to have some control over what the rest of the system is doing. The method by which this is accomplished is the use of hardware registers.

Beginning at the memory address of 0×4000000 and running for quite some many bytes is the memory mapped register space. What this means is that if I write some arbitrary value to the address 0×4000000 then I will be writing to a register that will have some effect on how the system renders (or fails to render) my video game. Understanding registers is vital in understanding console development.

Using memory mapped registers requires some knowledge I hope you already possess (if you know any c at all) and that is: how to write data to any specific address.

Hopefully you remember the concept of a pointer but, if not, that’s okay because I will cover it briefly. Recall that a pointer is a type of variable that holds not data but, instead, the address of some type of data. In this example, let us say we know (which we do) that the register at 0×4000000 was 32 bits in length and controls the display and we hence call it DISPLAY_CR; we might use code like the following to write to this address:

  unsigned int  *DISPLAY_CR = (unsigned int *) 0x4000000; 
*DISPLAY_CR = somevalue;  

Now, this would work just fine but there are some issues with this code. First, because these are hardware registers it is possible (even likely) that the values stored at these addresses will be changed by the hardware directly. This is something the compiler needs to know else it will try to optimize our code and we would miss these changes. The way we tell the compiler that variables change outside of the c code is to declare them volatile.

The other issue is we are using a variable (RAM) to store a constant. We would be much better served if we just used a #define…this also allows us to dereference the register in its declaration and makes writing to it a bit simpler. Here is the new, more proper code.

  #define DISPLAY_CR  (*(volatile unsigned int *) 0x4000000) 
DISPLAY_CR = somevalue;  

Notice there is no longer any need to dereference the register prior to use as it is implicit in the definition. Also, this code uses no space in memory for the pointer as it is just a constant (of course the compiler being as smart as it is even had you used a variable it likely would have been smart enough to optimize and the result would have been the same).

I hope this concept is clear to you; about 30% of the following pages are nothing but descriptions and examples of how to use the hardware registers to control the many features of the DS

[edit] Twiddling Bits

It will rapidly become apparent that controlling hardware via registers will require an understanding of how to target specific bits inside the register. That is, we must be able to set or clear some bits in a register while leaving the rest untouched. Even though this is rather simple and talked about in many other places, it is important enough that we must waste the time of the 90% of readers who already know it in order to ensure the 10% who have no clue are not left in the dust.

[edit] Numbering Systems

Most of you can begin at 0 and count all the way to 9 (a feat by anyone’s reckoning). If so, you probably realize there are in fact 10 unique digits you will come across in this endeavor. Oddly enough, the numbering system we use on a daily basis is called base 10 (in academic elitist societies you may also hear it referred to as Decimal).

First let us review an interesting detail about base 10 that you already know: the significance of the placement of the digits in any number. If the digit is on the right-hand side of a number it is generally weighted less than those digits appearing on the left to such a degree one might call it exponential. For instance consider the following examples of decimal numbers:

1) 1    = 1 * 10^0
2) 10   = 1 * 10^1
3) 100  = 1 * 10^2
4) 1000 = 1 * 10^3

This ringing any bells? You realize quite readily that the weight of the digit in question is equal to 10 to the power of the place of that digit in the number.

Unfortunately for us, computers do not use the same numbering system we do. The reason for this is a simple one. They only know 2 digits. Why is this unfortunate for us? It turns out that 90 percent of the time we can ignore the fact that computers use Base 2 because the compilers and tools we use automatically convert our base 10 numbers to binary for us. But, sometimes an understanding of the computer numbering system is crucial to coding.

The binary system works much in the same way as does our own decimal system with the exception that it has fewer digits and instead of weighting the value of a digit by 10 to the power of the place we instead weight it by 2 to the power of the place. For instance here are the same numbers as above in base 2 and their decimal equivalents.

1) 1    = 1 * 2^0 = 1
2) 10   = 1 * 2^1 = 2 
3) 100  = 1 * 2^2 = 4
4) 1000 = 1 * 2^3 = 8

You might notice writing the value 8 in binary requires 4 digits, this may not seem an issue off hand but as numbers increase writing binary rapidly becomes cumbersome. A string of ones and zeros is prone to error and very difficult to read. To combat this, a new base was developed making manipulation of numbers on computers easier to handle. That numbering system is base 16; often referred to as Hexadecimal or Hex.

Hex numbering uses the digits 0 – 9 and A – F and you end up with numbers which look like the following:

4d6172696f 
4b69726279
4c75696769 
5a656c6461 

At first you may be wondering why the hell you would ever go through such seeming pain to write numbers in such a way. Well it turns out the conversion between Hex and Binary is very simple. Because there are 16 digits in hex (a power of 2 mind you) you can represent each hex digit with 4 binary numbers. All you need do is become familiar with counting in binary from 0 to 15 to convert back and forth.

A few examples might be in order:

Take the binary number 1101001001010101010101010001 To convert this number to decimal would require you to look at each bit and add 2 to the power of the place if there is a one in that position. This number in decimal becomes:

1 + 16 + 64 + 256 + 1024 + ….. and then you give up and put it in your calculator and get: 220,550,481

To do the same conversion to Hex you break the number into 4 bit nibbles and convert so it becomes:

1101 0010 0101 0101 0101 0101 0001
 D    2    5    5    5    5    1      = D255551

Now that you believe that hex to binary might be simpler than binary to decimal you may still be a bit unclear on why we don’t just write everything in decimal and let the compiler figure it out (because it will). As we said before computers are binary in nature and as such hardware is controlled by specific bits at specific addresses. Because we need to set specific bits in the binary number we must at some point think of the number in binary. This will become more apparent as we actually set those bits.

Another good reason is bit boundaries play a significant role in memory addressing on most systems. For instance DS video memory for one of the graphics units begins on a boundry. The address of this memory can be written in hex as 6000000, that same address in decimal is 100,663,296. Although you could store the value as a pointer and write to video memory using either numbering system the hex value is much easier to remember and much easier on the eyes.

As a final note in C hexadecimal numbers are denoted with an 0x at the beginning of the number.

0x3FF 
0x3ff
0x60000000

Notice case is not an issue.

[edit] Bitwise Operations

Being able to ‘and’ and ‘or’ bits together is important when attempting to enable certain features of hardware. Below is a summation of how bitwise operators work in C and how you might use them.

AND operator: ‘&’

Different than the logical and ‘&&’ the single ampersand denotes two operands should be ‘anded’ together. Each number will be compared against the other bit by bit and if either number has a 0 in that bit position a 0 will be stored in the result.

AND is useful for checking the status of bits. For instance to see if the first bit of a register is set simply AND the register with the value 1. The result can only be 1 if the first bit of the register is also 1.

Register value = 1101 0100 1101 1101
               & 0000 0000 0000 0001
------------------------------------
        Result = 0000 0000 0000 0001 

It is also good for clearing bits. If you want the lower 8 bits of a number cleared to 0 simply AND the value with a number that has all the other bits set to 1. This is where hex comes in very handy.

Original Number            = 1101 0100 1101 1101
                           & 1111 1111 0000 0000  = 0xFF00
------------------------------------------------ 
Result with low 8 bits clear 1101 0100 0000 0000

OR operator: ‘|’

A bit by bit comparison which causes the result bits to be set if either of the bits in the arguments are set.

OR is good for setting bits. To set a bit, simply OR the register with a value that has that bit and that bit only set. Here is an example of setting bit 9 of a 16 bit number (bits are numbered right to left begining at 0)

Old Value =              1101 0100 1101 1101
                       | 0000 0010 0000 0000
--------------------------------------------
New value with bit 9 set 1101 0110 1101 1101

XOR Operator: ‘^’

XOR is great for flipping between states. In the above example, if an XOR was used in the place of an OR then the bit would be set if it was clear and it would be cleared if it were set. This is very useful anytime you need certain bits to alternate states every frame.

NOT Operator: ‘~’

NOT inverts the bits in a number rendering all 1s to 0s and all 0s to 1s. This is very useful for clearing bits. For instance if you know the main graphics engine will render to the top LCD when Bit 15 of the power control register is set to 1 and on the bottom when set to 0. We can ‘or’ the power control register with bit 15 to set it and we can ‘and’ the register with bit 15 inverted to clear it. Here is a snippet from libnds.

//!	Forces the main core to display on the top.
static inline void lcdMainOnTop(void) { POWER_CR |= POWER_SWAP_LCDS; }   //!	Forces the main core to display on the bottom.
static inline void lcdMainOnBottom(void) { POWER_CR &= ~POWER_SWAP_LCDS; }  

In this case POWER_CR is defined as a pointer to the power control register and POWER_SWAP_LCDS is defined as bit 15.

The final bitwise operations to talk about are the shift operations ‘>>’ and ‘<<’. When used these operators cause the binary number to be shifted to the left or right the specified number of places:

0101011 << 1 = 1010110
0101011 << 2 = 0101100
0101011 << 3 = 1011000
0101011 >> 1 = 0010101
0101011 >> 2 = 0001010
0101011 >> 3 = 0000101

If you will recall the weight of a digit is proportional to the base raised to the power of its position in the number. When we shift numbers to the right ‘>>’ we are reducing the weight of the digits effectively dividing the number by 2^n where n is the amount we shifted. Similarly a left shift ‘<<’ will multiply by a power of 2.

Often it is beneficial to use shift operators when division and multiplication are required as they execute more quickly. Do not get carried away though as they are less readable and the compiler will convert multiplications to shifts when ever possible for you.

The shift operator has other uses and plays a big role in fixed point arithmetic which we will cover shortly.

[edit] Talking to the keypad

It is difficult to do any interesting yet simple demo programs without understanding how to read user input. Fortunately for us, getting the state of the DS keys is exceedingly simple (if you understood the above discussion that is).

The state of each button is stored as a bit in memory mapped register space. To know if a key is pressed or released we just read the state of a specific bit. All we need to process the keys is the knowledge of where these values are stored.

Let us write our first real demo that checks for key presses and prints their state on the screen. Before we get to the code let us look at the main register used for key state on the DS.

(insert key pad register description here).

One thing you might note is the glaring absence of the X and Y keys. A bit further down we will demonstrate a more refined approach to handling input and introduce the functionality built into libnds and see if we can’t find those missing buttons. For now let us get our first real demo out of the way.

#include <nds.h>
#include <stdio.h>   int main(void)
{
	consoleDemoInit();   while(1)
	{
		if(REG_KEYINPUT & KEY_A)
			printf("Key A is released");
		else
			printf("Key A is pressed");   swiWaitForVBlank();   consoleClear();
	}   return 0;
}  

Much of this code is as you have seen before. We initialize the print console so printf prints to the sub screen using default settings. The next two lines initialize the libnds interrupt handler and enable the vblank interrupt. This is necessary for something we do in the main loop but it is a bit out of scope for this first day. We will talk at much greater length about interrupts (IRQs) on a later day.

The main loop just checks the KEY_A bit of the input register. This happens to be bit 0 and when the key is pressed that bit will be clear. This is all there is to checking for key presses on the DS.

The next two lines of code force the DS to wait until the screen is done drawing and then clears the screen. This prevents some nasty looking text flickering. Again, understanding this bit of code requires some knowledge of interrupts which will have to wait.

image:Demo_3_1.PNG

Now that simple reading of the key presses has been covered it is time to consider a bit more advanced needs…such as what about those X and Y buttons?

Unfortunately the designers of the DS were a bit lazy and stole the GBA input hardware; it seems our wonderful GBA input was a bit lacking in the number of buttons available to the user. What this resulted in is that we can read the A, B, Up, Down, Left, Right, Start, Select, and the Left and Right shoulder buttons from one place but the X, and Y buttons are in a different register…and can’t even be read at all by the main CPU!. It seems in our very first foray into DS programming we must face the complexities of a dual processor system.

The solution is to read the keypad from the ARM 7 (which can read the X and Y buttons plus the hinge “button” on the DS lid and the pen state for the touch pad) and put the results someplace readable by the ARM 9.

If you are like me then this code seems a bit awkward. It would be nice if we had some way of wrapping all these bits into a single location to simplify the reading of key presses. Libnds provides just such a wrapper. Let us see how we would do the same code using the libnds wrapper then take a closer look at what the wrapper is doing.

#include <nds.h>
#include <stdio.h>   int main(void)
{
	consoleDemoInit();   while(1)
	{
		scanKeys();
		int held = keysHeld();   if( held & KEY_A)
			printf("Key A is pressedn");
		else
			printf("Key A is releasedn");   if( held & KEY_X)
			printf("Key X is pressedn");
		else
			printf("Key X is releasedn");   if( held & KEY_TOUCH)
			printf("Touch pad is touchedn");
		else
			printf("Touch pad is not touchedn");   swiWaitForVBlank();   consoleClear();
	}   return 0;
}  

Two things to note in this new demo is the use of a scanKeys() call every frame and the change to positive logic: Now the bits are set if the key is pressed and clear if they are released. Along with keysHeld() is a keysDown() which will only be true if the key was pressed since the last time you checked (ie it will return true once but unless the player releases the key and presses it again it will return false).

Some other useful functions are keysUp() which returns the released keys and keysDownRepeate which returns true after a certain delay (measured in number of scanKey() calls) even if the keys have been held down. Check the documentation for input.h for more information on how to use these other functions.

Basically the way scanKeys works is to combine the bits from REG_KEYINPUT and IPC->buttons and apply a little bit of state tracking to determine which have been pressed since the last call. Here is an excerpt from the key handling code in libnds:

#define KEYS_CUR (( ((~REG_KEYINPUT)&0x3ff) | (((~IPC->buttons)&3)<<10) | (((~IPC->buttons)<<6) & (KEY_TOUCH|KEY_LID) ))^KEY_LID)   void scanKeys(void) {
	keysold = keys;
	keys = KEYS_CUR;   ///..some code for handling key repeats
}   uint32 keysHeld(void) {
	return keys;
}   uint32 keysDown(void) {
	return (keys ^ keysold) & keys;
}  

The statement at the beginning does most of the work by negating the register state and masking out / recombining the two sources of key state. If you have not noticed by now you will need to have a decent understanding of bit operations to work with register controlled hardware.

[edit] Frame buffer…finally

It is nice to finally get to graphics programming…I don’t know about you but two full days of fluff is about all I can take.

If you have actually followed along with the subjects and code presented so far you will find doing frame buffer graphics on the DS is surprisingly simple. All we need do is put the DS into frame buffer mode and begin writing images to the screen.

If you recall from yesterday’s topic the DS supports many graphics modes with the main screen supporting a simple frame buffer. It is this mode we will turn to first as it is very easy to set up and even easier to use.

I figured I would start this chapter with some code and use that to explain the frame buffer mode.

#include <nds.h>
#include <stdio.h>   int main(void)
{
	int i;   //initialize the DS Dos-like functionality
	consoleDemoInit();   //set frame buffer mode 0
	videoSetMode(MODE_FB0);   //enable VRAM A for writing by the cpu and use 
	//as a framebuffer by video hardware
	vramSetBankA(VRAM_A_LCD);   while(1)
	{
		u16 color = RGB15(31,0,0); //red   scanKeys();
		int held = keysHeld();   if(held & KEY_A)
			color = RGB15(0,31,0); //green
		
		if (held & KEY_X)
			color = RGB15(0,0,31); //blue   swiWaitForVBlank();   //fill video memory with the chosen color
		for(i = 0; i < 256*192; i++)
			VRAM_A[i] = color;
	}   return 0;
}  

This code is very similar to the code for the day 1 introduction demo. As before we enable interrupts and turn on the vblank interrupt. Next we set the video mode to a frame buffer mode which uses the first video ram bank. We then set the first VRAM bank to be writable by the CPU and to act as a buffer for the LCD (recall the first VRAM bank is VRAM_A).

The main loop creates a color as a 16 bit unsigned short integer and sets its value to red. If A or X are pressed then the color is altered to be green or blue respectively. Finally we wait for the screen draw to finish and fill VRAM_A with the selected color.

Now that we have a base understanding of the code we need to get to the details, namely:

  • How do videoSetMode and vramSetBankx work?
  • How does the DS treat color?
  • What the hell is a frame buffer and how do I write pixels to it?

These are the questions we will now explore.

[edit] Display Control
(todo: add description of display control register here)

[edit] VRAM Control
(todo: add some guidance on VRAM control...or delete this section and move it to the next chapter)

[edit] DS Color Formats

The DS has several ways in which it represents color. These ways generally fall into two categories: Paletted and Direct.

Direct color means the value directly controls the intensity of red, green, and blue that is fed to the pixel. There are technically two direct color formats used by the DS but you will see the variance between the two is minimal.

Direct color uses 5 bits to represent how bright each color component can be (red, green, and blue). We refer to this format as 555 or sometimes 15 bit color. If you recall from our discussion on binary numbers, 5 bits amounts to 32 levels of intensity for each of the three colors.

To describe a color in this format we need to combine our desired values of red, green, and blue to form one 15-bit number. The color components are stored as follows:

xBBBBBGGGGGRRRRR

This can be translated as the least significant 5 bits hold the red component, the next 5 bits hold the green and the remaining 5 hold the blue. We refer to this as BGR format. This would be a good time to flex our bit twiddling muscles and see if we cant define some colors.

To do this we specify an intensity for each of the 3 components and then shift the values into the correct place, finally we OR them all together to get our 15 bit value.

int red = 31;
int blue = 0; 
int green = 0;   unsigned short int color_red = (blue << 10) | (green << 5) | red;  

Simple enough? Since we don’t normally want to concern ourselves with this detail we use the macro provided by libnds (or write our own) which is depicted below:

#define RGB15(r,g,b)  ((r)|((g)<<5)|((b)<<10))   //to use:   unsigned short int color = RGB15(red, green, blue);  

Taking a short break from theory you should now glance up to the demo we just wrote. You will notice I used this macro to paint the screen red. It should be apparent at this point that the frame buffer utilizes 15 bit Direct color format.

The other Direct color format is nearly identical to the one just discussed. The only difference is in the most significant bit (depicted as the ‘x’ in xBBBBBGGGGGRRRRR above). When utilizing this format this bit is known as the ‘alpha’ bit and when set to 0 will prevent the color from appearing onscreen. Most 16 bit graphics operations on the DS utilize this "alpha" bit to determine transperency of the rendered pixel.

When we move on to Direct color bitmap modes you will quickly discover not setting this alpha bit will result in nothing on screen. Recall to set this bit requires some more bit operations:

//set alpha
color = color | (1<<15);   //clear alpha
color = color & ~(1<<15);  

Although direct color formats provide a wide range of colors they have a serious drawback: They take up 16 bits for each pixel. Although the DS has an abundant amount of memory and CPU power compared to early 2D systems it still pales in comparison to most modern machines. You will be surprised how rapidly you will fill memory with a 16 bit image or how much stress you will place on the hardware if you attempt to blit large 16 bit images.

To alleviate this the DS utilizes many space saving tricks. The most prevalent is the use of paletted colors. Instead of specifying color components directly we instead build a table of colors and specify an index into this table.

Let us say we have a 256 color table (we will refer to this table as the “palette”) which contains 256 direct color values. We can then set pixels onscreen to these values by specifying an index. Because the table is small we would only need an 8 bit index to describe the pixel color…a savings of 50%!.

The DS supports 8 bit indexed palettes as well as 4 bit and we will figure out the mechanics of their use as we proceed.

[edit] Frame buffer 101

A frame buffer can be described as a direct map of memory values to onscreen colors. By simply writing the correct color value into memory we can set a pixel as we see fit.

Framebuffer memory can be completely described by three things: The address at which it begins, the color format of the pixels, and the number of pixels per horizontal line.

The memory for a frame buffer is a single linear map such that the first W entries correspond to the top row of pixels on the screen (W in this case is the “width” of the buffer). The next row of pixels follows and occupies entries W to 2*W – 1.

Image:Framebuffer_linear.png (image of linear memory) Image:Framebuffer_2D.png (image of memory as it represents 2D space)

[edit] Pixels and things

To accurately place pixels onscreen we must have some idea how to specify location. This is normally done using a modified Cartesian coordinate system where we specify how many pixels from the left and how many pixels from the top we wish our value to be placed.

Image:Coordinate sys.png

The distance from the left hand side of the screen is usually referred to as the X coordinate of the pixel and the distance from the top is the Y coordinate. (Those of you who are math whizzes might see the disparity between Cartesian coordinates as Y usually is measured from the bottom up…live with it).

To figure out the offset into framebuffer memory we need to perform a simple calculation based on the X and Y coordinates we wish to affect. Because memory is arranged linearly in the buffer to get to the correct horizontal line we simply multiply the number of pixels on a line by the value of the Y coordinate. We then add the value of X and we have our offset.

Image:Pixel offseting

unsigned short* frame_buffer = address_of_the_buffer;   frame_buffer[y * width_in_pixels + x] = color;  

Let us translate this new knowledge of pixel plotting and color formats and see if we can produce an interesting (sort of) demo.

For our first pixel demo we will do a starfield with little floating dots. Each dot will make its way across the screen at a random speed. When it reaches the end we will move it back to the beginning and give it a new random height and new random speed. This should give us a nice star-trekie feeling demo of a moving star field.

Here is the source in its entirety which we will pick apart below; you can cut and paste this code into your main.c (or template.c if it is so named) from our first few demos. You can then build and run the demo.

#include <nds.h>
#include <stdlib.h>   #define NUM_STARS 40   typedef struct 
{
	int x;
	int y;
	int speed;
	unsigned short color;   }Star;     Star stars[NUM_STARS];   void MoveStar(Star* star)
{
	star->x += star->speed;   if(star->x >= SCREEN_WIDTH)
	{
		star->color = RGB15(31,31,31);
		star->x = 0;
		star->y = rand() % 192;
		star->speed = rand() % 4 + 1;	
	}
}   void ClearScreen(void)
{
     int i;
     
     for(i = 0; i < 256 * 192; i++)
           VRAM_A[i] = RGB15(0,0,0);
}   void InitStars(void)
{
	int i;   for(i = 0; i < NUM_STARS; i++)
	{
		stars[i].color = RGB15(31,31,31);
		stars[i].x = rand() % 256;
		stars[i].y = rand() % 192;
		stars[i].speed = rand() % 4 + 1;
	}
}
void DrawStar(Star* star)
{
	VRAM_A[star->x + star->y * SCREEN_WIDTH] = star->color;
}   void EraseStar(Star* star)
{
	VRAM_A[star->x + star->y * SCREEN_WIDTH] = RGB15(0,0,0);
}   int main(void) 
{
	int i;   irqInit();
	irqEnable(IRQ_VBLANK);   videoSetMode(MODE_FB0);
	vramSetBankA(VRAM_A_LCD);   ClearScreen();
	InitStars();   //we like infinite loops in console dev!
	while(1)
	{
		swiWaitForVBlank();
	
		for(i = 0; i < NUM_STARS; i++)
		{
			EraseStar(&stars[i]);   MoveStar(&stars[i]);   DrawStar(&stars[i]);
		}		
	}   return 0;
}  

We begin with a structure to define our star. It needs a location in the form of an X and Y coordinate, it needs speed, and finally it needs color.

typedef struct 
{
	int x;
	int y;
	int speed;
	unsigned short color;   }Star;  

We then need an array of stars we can track across the screen:

#define NUM_STARS 40   Star stars[NUM_STARS];  

Before we start the demo, we need to clear the pixels of any color information they currently have. In other words, we are making sure we start with a black screen.

void ClearScreen(void)
{
     int i;
     
     for(i = 0; i < 256 * 192; i++)
           VRAM_A[i] = RGB15(0,0,0);
}  

To start the demo off it would be nice if we could arrange our stars randomly about the screen. We do this with an initialize function.

void InitStars(void)
{
	int i;   for(i = 0; i < NUM_STARS; i++)
	{
		stars[i].color = RGB15(31,31,31);
		stars[i].x = rand() % 256;
		stars[i].y = rand() % 192;
		stars[i].speed = rand() % 4 + 1;
	}
}  

This function loops through all stars and sets the color to white, the speed to a random value between 1 and 4 and the X and Y to some random location on screen. If the ‘%’ is unfamiliar to you I will give a brief explanation.

‘%’ performs a division and returns the remainder of that division.

Rand() returns a random short integer so moding the value with a number will result in a value which is between zero and that number. To generate a random number in any range between MIN and MAX is simply:

Num = rand() % (MAX-MIN) + MIN;

Next we need some function to move, draw, and erase the star. Let us begin with erase.

void EraseStar(Star* star)
{
	VRAM_A[star->x + star->y * SCREEN_WIDTH] = RGB15(0,0,0);
}  

To erase just set the location of the star in the framebuffer to the background color (black in our case).

Similarly to draw the star we set the location of the star in the frame buffer to the color of the star:

void DrawStar(Star* star)
{
	VRAM_A[star->x + star->y * SCREEN_WIDTH] = star->color;
}  

The final step is to move the star to its new location:

void MoveStar(Star* star)
{
	star->x += star->speed;   if(star->x >= SCREEN_WIDTH)
	{
		star->color = RGB15(31,31,31);
		star->x = 0;
		star->y = rand() % 192;
		star->speed = rand() % 4 + 1;	
	}
}  

Moving a star is simple, we just add its speed to its current x position. The caviate is we must then check if the star has gone off screen. To do that we compare its x location to the width of the screen. If it is greater we know we are off the screen and we can take appropriate action.

When a sprite goes off screen we move it back to the left by settings its X value to 0. We then give it another random speed and random Y value making it look like a new star has come on screen.

The main loop which controls the demo consists of looping through each star and first erasing it from its old position, then moving it to its new position, and finally redrawing it at its new location:

//we like infinite loops in console dev!
	while(1)
	{
		swiWaitForVBlank();
	
		for(i = 0; i < NUM_STARS; i++)
		{
			EraseStar(&stars[i]);   MoveStar(&stars[i]);   DrawStar(&stars[i]);
		}		
	}  

And so ends our pixel plotting demo. Below are a few more demos explained in the same excruciating manner as above.

Color Bar Demo

[edit] Touching things

The touch pad is an amazing addition to a handheld video game system that not only makes for an interesting gameplay experience but so too does it add a new level of fun to game programming.

This section will introduce the touch pad, show you how it works, and give a quick demo on its use.

The DS touchpad utilizes a resistive coating which changes conduction depending on the area of the contacting object. This change is measured by some analogue to digital converts on a special chip inside the DS and translated to an X and Y location. These measurements can also be used to determine the area of the contact point which, to some degree, can be translated into pressure.

To get to this raw data we must communicate with this chip via a serial interface which is only accessible via the ARM 7. Currently I do not have the stomach to go into serial comms in this tutorial but if you have a mind to explore such things the source code is in the arm7 code base of libnds.

For now I am just going to do a bit of hand waving and tell you there is code running on the arm7 in the default arm7 template which reads out the state of the touch pad. This data is unformed and does not correlate exactly to the dimensions of the screen meaning some processing must be done to convert this raw location data to useful pixel data.

The libnds arm7 stub does the appropriate conversion and communicates the result to the ARM9. These values are then read using the touchRead function which writes the raw coordinates and the transformed pixel coordinates into the pointer parameter.

You can also get a go – nogo test of the pen by using the scanKeys() macro we discussed as above. This tells you if the pen is up or down so you know when to read the touch pad.

Now for a simple demo. We will make the simplest of art programs possible. It will render random colored dots wherever you touch the screen.

#include<nds.h>
#include<stdlib.h>   int main(void)
{
	touchPosition touch;   videoSetMode(MODE_FB0);
	vramSetBankA(VRAM_A_LCD);
        
        //notice we make sure the main graphics engine renders
        //to the lower lcd screen as it would be hard to draw if the 
        //pixels did not show up directly beneath the pen
	lcdMainOnBottom();   while(1)
	{
		scanKeys();   if(keysHeld() & KEY_TOUCH)
		{
			// write the touchscreen coordinates in the touch variable
			touchRead(&touch);
			
			VRAM_A[touch.px + touch.py * 256] = rand();
		}
	}   return 0;
}  

There is not too much to say about this demo…but of course I am going to say it anyway.

You should notice I skipped the interrupt setup for this demo. This is for two reasons….there is no real animation cycle and we only render one pixel at a time. This means even if we draw while the screen is rendering our pixels there wont be anything to tear.

Notice also the use of scanKeys(); we use the keysHeld() macro instead of the keysDown() because keysDown() would only return true the first time the pen touches while keysHeld() returns true until the pen is lifted up again. This allows us to draw our dots without lifting the pen.

When using the touchRead function, we need to pass in the pointer to the variable touch rather than the variable itself. Putting an ampersand, &, in front of a variable replaces it with the address to that data.

Color is selected at random using rand() –recall rand() returns a random 16 bit value which is convenient as color is also 16 bit.

When drawing rapidly you probably noticed big gaps between your dots as apposed to nice smooth curves. This is because the touch coordinates are only updated once per frame and you can move the pen a lot faster than that. We will make this demo a little prettier when we learn how to draw lines a few pages hence.

[edit] Bitmap Graphics Modes

I talked a bit before about the DS graphics modes and alluded to being able to compose a scene from layers of graphics. Normally all rendering is done to one of these layers and this section will be the first real use of the 2D engine. Before we talk about the specifics let us look again at the possible graphics modes and what each layer can do in these modes.

Graphics Modes

Main 2D Engine

Mode

BG0

BG1

BG2

BG3

Mode 0

Text/3D

Text

Text

Text

Mode 1

Text/3D

Text

Text

Rotation

Mode 2

Text/3D

Text

Rotation

Rotation

Mode 3

Text/3D

Text

Text

Extended

Mode 4

Text/3D

Text

Rotation

Extended

Mode 5

Text/3D

Text

Extended

Extended

Mode 6

3D

-

Large Bitmap

-

Frame Buffer

Direct VRAM display as a bitmap

Sub 2D Engine

Mode

BG0

BG1

BG2

BG3

Mode 0

Text

Text

Text

Text

Mode 1

Text

Text

Text

Rotation

Mode 2

Text

Text

Rotation

Rotation

Mode 3

Text

Text

Text

Extended

Mode 4

Text

Text

Rotation

Extended

Mode 5

Text

Text

Extended

Extended

You will notice each engine has several modes of operation with each mode having different background layer configurations. The background configurations we are interested in this case are the ones marked “extended” graphics layer.

Extended rotation background layers can be configured as linear frame buffers, much like we have been using in the examples above.

When we talk tomorrow about tile based graphics we will cover in detail the capabilities of the “extended” backgrounds as well as the text and rotation backgrounds. For now we need only to understand a few things. First is how to put the display into a mode which supports an extended background, next is how to turn on the correct background, and finally we need to know how to initialize the background properly.

The first thing to remember when using the 2D engine is the lack of memory available. In fact, the DS has no memory assigned to its 2D units by default other than Sprite attributes and base palettes.

In order for us to do anything we must map video memory somewhere the 2D engine can find it. To do this we must know where the engine is going to expect memory to be and we need to know what video memory can be mapped to these regions. What follows is the layout of 2D graphics memory.

Image:nds_2D_memory.png

The areas we are concerned with now are the background graphics memories. To use background layers we must map video memory to at least one of these regions.

Let us start with a small example which paints the screen red and see how using a background layer differs from direct frame buffer access.

#include <nds.h>   //---------------------------------------------------------------------------------
int main(void) {
//---------------------------------------------------------------------------------   int i;
	
	//set video mode to mode 5
	videoSetMode(MODE_5_2D);	
	
	//map vram a to start of background graphics memory
	vramSetBankA(VRAM_A_MAIN_BG);
	
	//initialize the background
	bgInit(3, BgType_Bmp16, BgSize_B16_256x256, 0,0);
	
	//write the color red into the background
	for(i=0; i<256*256; i++){
			BG_GFX[i]=RGB15(0,31,0) | BIT(15);
	}
	while(1) {
		swiWaitForVBlank();
	}
	return 0;
}  

To better understand the memory layout for 2d backgrounds, we must first look at the image below. It is logically divided into 32 blocks of memory for graphics (also 32 blocks for map data but that is a topic for tomorrow). We will revisit this organization of graphics memory tomorrow but for today it is enough to know that the background will pull data starting at one of these blocks and which block it pulls from is controlled via the background control register.

Image: nds_2D_background_memory.png

First, we set the video mode. If you look back at the video mode table, you will realize mode 5 allows for background layers 2 and 3 to be extended rotation backgrounds. We chose background 3 for this demo although background 2 would have worked just as well.

//set video mode to mode 5
videoSetMode(MODE_5_2D);  

Next we map vram bank A to main background memory. The vram table shows we could have mapped other vram banks to this region as well. Picking which vram bank to map takes a bit of planning, but for our simple demos it will not be too difficult.

vramSetBankA(VRAM_A_MAIN_BG);  

Next, we initialize the background. bgInit follows the header from the nds/arm9/Background.h: bgInit(int layer, BgType type, BgSize size, int mapBase, int tileBase). In this demo, we wanted a bitmap mode and since we have not really discussed palettes yet we are going to stick with 16 bit color. We choose a size of 256×256 to fill the entire screen. Once this code is completed, BG_GFX will be set to the address of the initialized layer; in this case, it is layer 3.

//initialize the background 
bgInit(3, BgType_Bmp16, BgSize_B16_256x256, 0,0);  

The final step is to paint the screen red. Take a look at the following lines and see if you can note the difference between the straight framebuffer code.

	//paint the screen red
	for(i=0; i<256*256; i++){
		BG_GFX[i]=RGB15(0,31,0) | BIT(15);
	}  

Hopefully you noticed the setting of bit 15 of the color value. Remembering back to a previous talk about color formats on the DS you might recall there is a 16-bit color mode with the normal 5 bits of red green and blue along with one bit for alpha. 16-bit bitmap backgrounds use this color format.

The one bit of alpha tells the DS to render that pixel. If you leave this bit clear the pixel will not be drawn and anything behind it will show through. This is a very useful feature when you have two layers of background and want part of the top layer to be transparent.

Next we will look at paletted bitmaps through the somewhat practical exercise of decoding graphics files.

[edit] Bitmap on the sub display

For completeness sake here is an example which puts both the main and sub engines in mode 5 and renders to bitmap backgrounds. Notice the alternative register access using the background struct, instead of the bgInit function.

#include <nds.h>   //---------------------------------------------------------------------------------
int main(void) {
//---------------------------------------------------------------------------------   int i;
	
	//point our video buffer to the start of bitmap background video
	u16* video_buffer_main = (u16*)BG_BMP_RAM(0);
	u16* video_buffer_sub =  (u16*)BG_BMP_RAM_SUB(0);   //set video mode to mode 5 with background 3 enabled
	videoSetMode(MODE_5_2D | DISPLAY_BG3_ACTIVE);	
	videoSetModeSub(MODE_5_2D | DISPLAY_BG3_ACTIVE);   //map vram a to start of main background graphics memory
	vramSetBankA(VRAM_A_MAIN_BG_0x06000000);
	vramSetBankC(VRAM_C_SUB_BG_0x06200000);   //initialize the background
	BACKGROUND.control[3] = BG_BMP16_256x256 | BG_BMP_BASE(0);
	
	BACKGROUND.bg3_rotation.hdy = 0;
	BACKGROUND.bg3_rotation.hdx = 1 << 8;
	BACKGROUND.bg3_rotation.vdx = 0;
	BACKGROUND.bg3_rotation.vdy = 1 << 8;   //initialize the sub background
	BACKGROUND_SUB.control[3] = BG_BMP16_256x256 | BG_BMP_BASE(0);
	
	BACKGROUND_SUB.bg3_rotation.hdy = 0;
	BACKGROUND_SUB.bg3_rotation.hdx = 1 << 8;
	BACKGROUND_SUB.bg3_rotation.vdx = 0;
	BACKGROUND_SUB.bg3_rotation.vdy = 1 << 8;   //paint the main screen red
	for(i = 0; i < 256 * 256; i++)
		video_buffer_main[i] = RGB15(31,0,0) | BIT(15);
	
 	//paint the sub screen blue
	for(i = 0; i < 256 * 256; i++)
		video_buffer_sub[i] = RGB15(0,0,31) | BIT(15);
	while(1) {
		swiWaitForVBlank();
	}
	return 0;
}  

[edit] Background struct

libnds defines a struct for easy access of the background registers.

typedef struct {
	u16 x;
	u16 y;
} bg_scroll;   typedef struct {
    u16 hdx;
    u16 hdy;
    u16 vdx;
    u16 vdy;
    u32 dx;
    u32 dy;    
} bg_transform;   typedef struct {
    u16 control[4];
    bg_scroll scroll[4];
    bg_transform bg2_rotation;
    bg_transform bg3_rotation;
} bg_attribute;   #define BACKGROUND           (*((bg_attribute *)0x04000008))
#define BACKGROUND_SUB       (*((bg_attribute *)0x04001008))  

[edit] Working With Graphics Files

Being able to decode graphics files is a useful skill and although this is usually done on the PC we are going to do it directly on the DS just for kicks. There are a lot of different graphics files out there, and each file has advantages and disadvantages. For our purposes we need one that is easy to decode and is supported by many graphics applications. Some good choices would be: GIF, BMP and PCX.

I am going to tackle the BMP for this example. It is simple and supported by just about every graphics application on Earth.

To decode a file format you must first seek out its spec. A quick search on Google gives me the following information for bitmap files.

There is a short bitmap header followed by a variable length image header (this variable length header turns out to be 40 bytes in length almost always). Next comes the palette (if there is one) and finally the pixel data. Here is a bit more detail:

Bitmap File

Bitmap Header

Offset

Size in bytes

Description

0

2

The characters “BM”

2

4

Filesize in bytes

6

4

Reserved (usually set to 0)

10

4

Offset to data

Image Header

0

4

Size of image header (normally 40)

4

4

width of the image

8

4

Height of the image

12

2

Number of planes (normally 1)

14

2

Color depth (bits per pixel)

16

24

The rest is not interesting and hardly ever used

Palette Data

The color palette stored as Red, Green, Blue bytes (with an extra byte of padding)

Graphics Data

The pixel data. This will either be indexes into the palette or raw Red, Green, Blue color data.

BMPs support a number of bitmaps types and although this example will assume 8 bit 256 color bitmaps it could very easily be extended for other color depths (an excellent exercise for the reader if you are of a mind for those sorts of endeavors).

For a first step let us write a short demo which looks at the header, checks if it is bitmap file by reading the signature, and prints out the bits per color, height, and width.

Image:Nds day3 bmp show.png Bitmap Header Decode

We are going to use a trick called overlay to read the header. Instead of parsing each byte in and figuring it out we are going to define a structure which is the same size as the header. We can then pretend the start of the bitmap is the start of this structure and access all the attributes like normal structure members. Let’s start with the bitmap header struct.

typedef struct
{
	char signature[2];
	unsigned int fileSize;
	unsigned int reserved;
	unsigned int offset;
}__attribute__ ((packed)) BmpHeader;  

The idea is to look at the layout of the header and design a struct to match it. The first two characters are the signature so we add a character array of length 2 to line up with this. We do this for each element in the header.

Unfortunately for us the C language does not describe how exactly a compiler treats the memory assigned to a structure and often a compiler will pad a structure with empty bytes to make it a multiple of 32 bits in length. It does this because processing structs aligned so is generally more efficient. For us, we need the structures to be packed together so we can lay them on top of our bitmap data with no padding throwing us off. To ensure this is the case, we add the packed attribute to the structure definition…how you do this varies between compilers but for gcc this works like a charm.

Next we need a struct to hold the image header which we will just assume is 40 bytes, even though it could technically be variable.

typedef struct
{
	unsigned int headerSize;
	unsigned int width;
	unsigned int height;
	unsigned short planeCount;
	unsigned short bitDepth;
	unsigned int compression;
	unsigned int compressedImageSize;
	unsigned int horizontalResolution;
	unsigned int verticalResolution;
	unsigned int numColors;
	unsigned int importantColors;
	
}__attribute__ ((packed)) BmpImageInfo;  

Notice again the use of the packed attribute.

Finally we define a structure to hold the entire bitmap and image header.

typedef struct
{
	unsigned char blue;
	unsigned char green;
	unsigned char red;
	unsigned char reserved;
}__attribute__ ((packed)) Rgb;   typedef struct
{
	BmpHeader header;
	BmpImageInfo info;
	Rgb colors[256];
	unsigned short image[1];
}__attribute__ ((packed)) BmpFile;  

A bitmap file just consists of the two headers back to back followed by palette data and finally the image. This is where forgetting the packed attribute would byte you in the ass as gcc would stick in a few bytes of padding in-between the structs and things would not align (go ahead…try it).

The palette colors are stored as blue, green, red bytes followed by one empty byte. Since we are only going to decode 256 color bitmaps we are going to hardcode this into our bitmap structure. If you want to extend this you will need to do a small amount of parsing and first read in the bits per pixel and the number of colors before you do anything with the palette or image data.

The final entry in the bitmap file might seem a bit odd as it is an array of length one. Since we don’t know how big the image array needs to be in advanced we give it a length of one, since it is an overlay we can just keep reading as far as we like.

Finally the code to decode the bitmap header:

//---------------------------------------------------------------------------------
int main(void) {
//---------------------------------------------------------------------------------
	
	BmpFile* bmp = (BmpFile*)beerguy_bin;
	
	consoleDemoInit();
	
	printf("%c%cn", bmp->header.signature[0], bmp->header.signature[1]);
	printf("bit depth: %in", bmp->info.bitDepth);
	printf("width:     %in", bmp->info.width);
	printf("height:    %in", bmp->info.height);   return 0;
}  

This demo (if it were complete) would overlay our bitmap structure onto a bitmap file and print out the signature, bit depth and dimensions of the file. Hopefully, after all that discussion, there should only be one question remaining: How did I get a bitmap file into my DS application?

It turns out there are a lot of ways to get data into your application. You can use a file system and read it in from your compact flash or SD card. You can read it in from a wifi source like the internet or a file share. You can run it through a converter and output the data as a big c array and compile it in or you can use object copy and create an object file you can link in.

The simplest way (thanks to wintermutes wonderful make file) is the object copy method. To include data in our project we just create a folder called “data” in the project folder (right next to source and include) and drop in a file. If we add a “.bin” to the end of the file name the make file will pick it up, run it through object copy, and create a header file with some ease of use variables declared.

For instance in the above demo I dropped in a bmp file called beerguy.bmp into the data folder. I then renamed it to beerguy.bin and typed make. A header file was created called beerguy_bin.h which contained the following:

extern const u8 beerguy_bin_end[];
extern const u8 beerguy_bin[];
extern const u32 beerguy_bin_size;  

To access the data I just include the header file. I recommend you use this method as it works on any media and is simple. If you need file access because the 4MB limit is too much then the built in fatlib included with devkit pro is a great option.

I think we are finally ready to display this bitmap. All we have to do is copy the palette to the backgrounds palette memory and copy the image data to the backgrounds bitmap memory. Each of these steps have a small quirk.

The palette entries from the bitmap file are in 8 bits per color and we need 5. To fix this we need to chop off the lower 3 bits of each element. If you remember our conversation on bit operations this is a simple matter of shifting the number to the right by 3. Copying in the palette is done as follows:

//copy the palette	
for(i = 0; i < 256; i++)
{
Background_palette[i] = RGB15(bmp->colors[i].red >> 3, bmp->colors[i].green >> 3, bmp->colors[i].blue >> 3);
}  

The quirk for the image data is that the image is (for some strange reason) stored upside down. We have to flip the image by reading the bottom of the bitmap into the top of the video buffer.

//copy the image
for(iy = 0; iy < bmp->info.height; iy++)
{
	for(ix = 0; ix < bmp->info.width / 2; ix++)
		video_buffer[iy * 128 + ix] = bmp->image[(bmp->info.height - 1 - iy) * ((bmp->info.width + 3) & ~3) / 2 + ix];
}  

We do this by starting at line height minus 1 and subtracting the y index. Because video memory only accepts 16 or 32 bit writes we copy two bytes at a time (this is why you see the width divided by 2 in several locations and why video memory and bitmap image memory are declared as pointers to short instead of char).

Demo Source

Image:Nds day3 bmp decode.png Displaying the Bitmap

[edit] The Double Buffer

Double buffering is a common technique used to address a common problem with raster graphics. Usually you do not want the user to see what you are rendering until you are done rendering it. To ensure this happens you have one of two options.

First, you can only render during the time the display is not rendering—vblank and hblank. This method works well and in fact is an often used mode on the DS as the hardware does much of the rendering work for us. But, often you just need more time to get your scene in order.

This brings us to method number two: The double buffer. For this we simply render to an off screen buffer. When we are done we make the off screen buffer visible and whatever was visible before becomes our new off screen buffer. This gives us as much time as we need to compose the scene with the slight drawback of needing a copy of visible video memory. Fortunately the DS is more than roomy enough to accommodate.

There are actually several ways to go about implementing a double buffer but the easiest is to use a single background’s video memory and allocate one half to the visible buffer and one half to the non visible buffer (the “back buffer”).

To demonstrate the use of double buffers let us extend our bitmap demo and animate a series of 10 frames. A friend was kind enough to render me up some full screen graphics and for this first go we will do it without a double buffer.

#include <nds.h>   #include "beerguy0010_bmp.h"
#include "beerguy0020_bmp.h"
#include "beerguy0030_bmp.h"
#include "beerguy0040_bmp.h"
#include "beerguy0050_bmp.h"
#include "beerguy0060_bmp.h"
#include "beerguy0070_bmp.h"
#include "beerguy0080_bmp.h"
#include "beerguy0090_bmp.h"
#include "beerguy0001_bmp.h"     extern void DisplayBmp(unsigned short* video_buffer, unsigned short* pal, unsigned char* bmp_data);     u8* bmps[10] = 
{
	(u8*)beerguy0010_bmp,
	(u8*)beerguy0020_bmp,
	(u8*)beerguy0030_bmp,
	(u8*)beerguy0040_bmp,
	(u8*)beerguy0050_bmp,
	(u8*)beerguy0060_bmp,
	(u8*)beerguy0070_bmp,
	(u8*)beerguy0080_bmp,
	(u8*)beerguy0090_bmp,
	(u8*)beerguy0001_bmp
};   //---------------------------------------------------------------------------------
int main(void) {
//---------------------------------------------------------------------------------   int frame = 0;
	
	BG3_CR = BG_BMP8_256x256 | BG_BMP_BASE(0);
	u16* video_buffer = (u16*)BG_BMP_RAM(0);   irqInit();
	irqSet(IRQ_VBLANK, 0);
	
	videoSetMode(MODE_5_2D | DISPLAY_BG3_ACTIVE);	
	//map vram a to start of background graphics memory
	vramSetBankA(VRAM_A_MAIN_BG_0x06000000);
	
	//initialize the background
	
	BG3_XDY = 0;
	BG3_XDX = 1 << 8;
	BG3_YDX = 0;
	BG3_YDY = 1 << 8;   while(1)
	{
		DisplayBmp(video_buffer, BG_PALETTE, bmps[frame]);
		
		if(++frame > 9)
			frame = 0;
		
		swiWaitForVBlank();
	}
	
	return 0;
}  

Compile and run this demo and you will see a bit of tearing and a lot of ugliness. The reason for this is my bitmap decode function is very slow and cannot complete in the vblank time frame.

The only thing different between this demo and the one before is the inclusion of 10 bitmap files which we cycle through once per vblank.

Let us use a double buffer and fix the problem.

#include <nds.h>   #include "beerguy0010_bmp.h"
#include "beerguy0020_bmp.h"
#include "beerguy0030_bmp.h"
#include "beerguy0040_bmp.h"
#include "beerguy0050_bmp.h"
#include "beerguy0060_bmp.h"
#include "beerguy0070_bmp.h"
#include "beerguy0080_bmp.h"
#include "beerguy0090_bmp.h"
#include "beerguy0001_bmp.h"     extern void DecodeBmp(unsigned short* video_buffer, unsigned short* pal, unsigned char* bmp_data);     u8* bmps[10] = 
{
	(u8*)beerguy0010_bmp,
	(u8*)beerguy0020_bmp,
	(u8*)beerguy0030_bmp,
	(u8*)beerguy0040_bmp,
	(u8*)beerguy0050_bmp,
	(u8*)beerguy0060_bmp,
	(u8*)beerguy0070_bmp,
	(u8*)beerguy0080_bmp,
	(u8*)beerguy0090_bmp,
	(u8*)beerguy0001_bmp
};   //---------------------------------------------------------------------------------
int main(void) {
//---------------------------------------------------------------------------------   int frame = 0;
	int swaped = 1;
	int i = 0;
	
	unsigned short old_palette[256];
	
	BG3_CR = BG_BMP8_256x256 | BG_BMP_BASE(3); //<--changed to (3)
	u16* back_buffer = (u16*)BG_BMP_RAM(0);   irqInit();
	irqSet(IRQ_VBLANK, 0);
	
	videoSetMode(MODE_5_2D | DISPLAY_BG3_ACTIVE);	
	
	vramSetBankA(VRAM_A_MAIN_BG_0x06000000);
	
	BG3_XDY = 0;
	BG3_XDX = 1 << 8;
	BG3_YDX = 0;
	BG3_YDY = 1 << 8;
		
	
		
	while(1)
	{
		
		swiWaitForVBlank();
			
		if(swaped)
		{
			swaped = 0;
			BG3_CR = BG_BMP8_256x256 | BG_BMP_BASE(0);	
			back_buffer = (u16*)BG_BMP_RAM(3);
		}
		else
		{
			swaped = 1;
			BG3_CR = BG_BMP8_256x256 | BG_BMP_BASE(3);	
			back_buffer = (u16*)BG_BMP_RAM(0);
		}
		
		for(i = 0; i < 256; i++)
			BG_PALETTE[i] = old_palette[i];
			
		//decodes a bmp to the backbuffer
		DecodeBmp(back_buffer, old_palette, bmps[frame]);
		
		if(++frame > 9)
			frame = 0;
		
	}
	
	return 0;
}  

In this demo we set up much as before only this time we set the starting point of background graphics to base 3. If you recall each base represents 16 KB of memory and if you are quick with math you will notice that a 256×192 image takes up exactly 3 blocks of memory. This means we set aside the first three blocks as visible and the next three as our render buffer.

BG3_CR = BG_BMP8_256x256 | BG_BMP_BASE(3);
u16* back_buffer = (u16*)BG_BMP_RAM(0);  

We create a Boolean value which tracks which buffer is visible and alternate each frame. To swap the buffers we simply tell background 3 to render from one base and set the back buffer to the other base. This ensures we are always rendering to the off screen buffer and displaying the on screen buffer.

		if(swaped)
		{
			swaped = 0;
			BG3_CR = BG_BMP8_256x256 | BG_BMP_BASE(0);	
			back_buffer = (u16*)BG_BMP_RAM(3);
		}
		else
		{
			swaped = 1;
			BG3_CR = BG_BMP8_256x256 | BG_BMP_BASE(3);	
			back_buffer = (u16*)BG_BMP_RAM(0);
		}  

The only other difference to note might seem a bit odd. It turns out that all my bitmaps use a separate palette so I must double buffer the palette as well! For this I use an in memory buffer.

The decode bitmap copies the palette to my local palette array (old_palette), and when I swap the visible buffer I also swap in this palette so the right palette is loaded with the right buffer.

	for(i = 0; i < 256; i++)
			BG_PALETTE[i] = old_palette[i];
			
	//decodes a bmp to the backbuffer
	DecodeBmp(back_buffer, old_palette, bmps[frame]);  

That is about all I am going to say on double buffers. They are useful and fairly simple to implement and can greatly enhance the look and feel of your program.

[edit] Raster 101

Raster graphics are the means by which most early games were rendered. It simply means to draw to a display on a per-pixel basis. We are going to let the DS hardware do most of our rendering for us, but there is something to be said for doing things the hard way every once in a while. We will cover line, circle, and polygon raster graphics and throw in a bit of optimization discussion along the way.

[edit] Bresenham Lines

We will begin our raster discussion with line drawing. The deceptively simple task of connecting two points on a 2D display by a series of pixels has been the subject of much research and countless papers. Perhaps we should start by defining a line.

In the majority of mathematical realms a line is an infinite, straight, one dimensional projection through space. To define a line all you need is the location of one point on that line and some indication of its direction.

Because one dimensionality is tough to achieve on a computer display and an infinite line might take too long to render we will have to restrict this definition a bit. For us all lines will lie on the plane of the screen, the length will definitely be finite, and that one dimensional thing will be very loosely applied. Let us take a close look at what a line on a computer display looks like to get a feel for what we need to accomplish.

<image of line here>

As you can see the line can only move in discrete steps of pixels. To render a line we just iterate through one dimension and plug the other into the equation for a line. Let us remember way back to geometry class and recal that a line can be defined as follows:

Y = mX + b;

Where Y is the axis we are trying to calculate, X is the axis we are iterating through, b is the value of Y when the line crosses the X axis and m is the slope of the line defined as change in Y divided by change in X (rise over run). I don’t know about you but when I want to render a line I plan on just picking the two ends points and the color and having the algorithm do the rest. We can certainly calculate these values from two points but there is a slightly less common equation for a line that will be easier to work with:

Y – y = m (X – x)

In this case X Y and m are as before but x and y are the coordinates of any point on the line. As you might notice we don’t have to worry about the intercept (previously ‘b’) in this form which simplifies things a bit.

So let us see if we can translate this equation into a line on the screen. First we will need to calculate the slope. Let us define our function as DrawLine(x1, y1, x2, y2, color) where the x’s and y’s are coordinates of the two points we wish to connect.

float m = (y1 – y2) / (x1 – x2)

One thing you should note right away is that slope will most likely be fractional in nature requiring us to use floating point math. You may also remember the DS has no floating point hardware and if you are particularly astute you will very shortly realize all this discussion is going to lead to some more interesting way of drawing a line.

Some pseudo code for drawing the line using this point slope equation would be as follows:

void DrawLine(float x1, float y1, float x2, float y2, short int color)
{
	Float m = (x1 – x2) / (y1 – y2);   for( ; x1 < x2; x1++)
	{
		Float y = m * (x1 – x2) + y2;
		
		Buffer[x + y * BUFFER_WIDTH] = color;
}
}  

I don’t recommend trying to compile this as there are several flaws in the algorithm. First we are kind of assuming the line changes more in x than it does in y, if that is not the case we are going to be jumping more than 1 pixel in y each time we iterate through the loop and leave large gaps in the line. Second we are assuming that x1 is smaller than x2 when really we could have specified any two points for the function. You could of course modify the above and use this method to draw lines on the DS but it is certainly not the best way to go about.

This brings us to a more realistic way of rendering lines. It involves only integer math, no division, and no multiplication.

Bressenham lines:

Let us again look at how a line ends up looking on screen and see if we cant find a way to iterate through X and change Y accordingly without calculating the slope. First look at a line that changes more in the X direction than it does in the Y:

<image of a small slope closup line>

From inspection you can probably note the slope on this line is 1 / 3. If we zoom in a bit we see this 1 / 3 slope holds as the line drops down 1 pixel in the Y direction every 3 pixels in the X.

To calculate the slope we would normally do something like this:

M = (y1 – y2) / (x1 – x2)

But instead let us treat the difference in y and the difference in x seperatly and call then ydiff and xdiff respectively.

For this case:

Xdiff = x1 – x2;
Ydiff = y1 – y2;

It might be helpful at this point to consider how we would draw the above line if we had infinite resolution.

We would first plot a pixel at x1, y1 then move one X to the right. If this were an infinite display we would also move 1/3 of a pixel down and plot the pixel. Because our real display is finite the best I can do is move one pixel in X and 0 in the Y. If I were to continue I would move another pixel in X and another 1/3 of the way down in Y. Again, because we have a finite display and 2/3 of a pixel is pretty meaningless we settle for no change in Y. Finally, as I move for the third time in X I reach a point where I should be a full pixel in Y further down and I can render at the correct location.

This process is the basis of Bressenham’s algorithm.

We keep track of this difference between the line we SHOULD draw and the line we CAN draw in an error term; when that error reaches a threshold we know we can correct by adjusting our Y value by one pixel up or down (down in this case). But what is that threshold and how much do we adjust the error term each time? Well, the easy answer would be to use 1.0 as the threshold and add the slope to the error each time. If we did this things would move along smoothly…unfortunately calculating slope requires a floating point division and tracking the error term would require even more floating point operations. Fortunately there is a simpler way.

We store the difference between x1, and x2 (xdiff) as well as the difference between y1 and y2 (ydiff). These two values will be proportional to the numerator and denominator of the slope. For instance, in the line depicted above the coordinates are (0,10) and (30, 0). This amounts to an xdiff of 30 and a ydiff of 10 (and a slope of 1/3 incase you have forgotten). To render the line we iterate in the X direction (because it changes more than the Y direction) and keep track of how far the line we are drawing is from the line we would like to draw.

We can use a threshold of xdiff and increment the error term by the ydiff (or the other way around for a line that changes more in Y than it does in X). In this case we add 10 to our error term each time we move in X and when it reaches 30 we move one step down in Y. We then reset the error term by subtracting 30 and continue on (notice we don’t set it to zero as in most cases the ydiff and xdiff will not align so well). If we continue we will draw a line at the correct slope from the two points.

threshold = xdiff;   for(x = x1; x < x2; x++)
{
	//increment the error term
error += ydiff;   //if the error gets big enough we correct by moving down one 
	//y increment
	if(error > threshold)
	{
		y++;
		error = error – threshold;
	}
	
	buffer[x + y * BUFFER_WIDTH] = color;
}  

That is the basis of the bressenham algorithm. We just keep going one direction building up an error term until the error term reaches a certain point, then we correct by incrementing the other direction and reset the error term by the threshold.

Unfortunately there is a bit more to it. As before we only handle the case where the second point is above and to the left of the first point and the line changes more in X than it does in Y. Fortunately handling these other cases turns out to be pretty easy.

Here is the final line algorithm, following will be a short explanation of the changes needed to make the code handle the other cases.

void DrawLine(int x1, int y1, int x2, int y2, unsigned short color)
{
    int yStep = SCREEN_WIDTH;
    int xStep = 1;      
    int xDiff = x2 - x1;
    int yDiff = y2 - y1;   int errorTerm = 0;
    int offset = y1 * SCREEN_WIDTH + x1; 
    int i; 
    
    //need to adjust if y1 > y2
    if (yDiff < 0)       
    {                  
       yDiff = -yDiff;   //absolute value
       yStep = -yStep;   //step up instead of down   
    }
    
    //same for x
    if (xDiff < 0) 
    {           
       xDiff = -xDiff;            
       xStep = -xStep;            
    }   //case for changes more in X than in Y	 
    if (xDiff > yDiff) 
    {                            
       for (i = 0; i < xDiff + 1; i++)
       {                           
          VRAM_A[offset] = color;   offset += xStep;   errorTerm += yDiff;   if (errorTerm > xDiff) 
          {  
             errorTerm -= xDiff;     
             offset += yStep;        
          }
       }
    }//end if xdiff > ydiff
    //case for changes more in Y than in X
    else 
    {                       
       for (i = 0; i < yDiff + 1; i++) 
       {  
          VRAM_A[offset] = color;   offset += yStep;   errorTerm += xDiff;   if (errorTerm > yDiff) 
          {     
             errorTerm -= yDiff;  
             offset += xStep;   }
       }
    }   }  

Notice we broke the cases where change in Y is greater and change in X is greater so we could loop through either X or Y accordingly. Also if the x and y values for the points were opposite from expected we just take the absolute value of the difference and change the x and y step so we step the other way through the line.

Now that we have an okay understanding of line drawing let us modify our drawing demo from before to connect the pixels we were drawing with lines. This should fill the gaps and make a nice smooth line as we trace around.

int main(void)
{
	touchPosition touch;
	
	int oldX = 0;
	int oldY = 0;
	
	videoSetMode(MODE_FB0);
	vramSetBankA(VRAM_A_LCD);
	
	lcdMainOnBottom();
	
	while(1)
	{
		scanKeys();
		
		touchRead(&touch);
		
		if(!(keysDown() & KEY_TOUCH) && (keysHeld() & KEY_TOUCH))
		{
			DrawLine(oldX, oldY, touch.px, touch.py, rand());				
		}
		
		oldX = touch.px;
		oldY = touch.py;	
		
		swiWaitForVBlank();
	}
	
	return 0;
}  

We make an old X and Y value to hold the previous position, grab the new position, and draw a line between them. Finally we update the old x and y and repeat. This code only draws a line if the pen is down for more than two frames because we need two points to draw a line. This is done by not drawing the line if the keysDown() touch is set.

When putting the code together, do not forget to #include <nds.h> and either copy and paste the DrawLine function above the main function or #include a header file with it defined.

[edit] Bliting Things
(todo: add a section on software blitting...not that important all things considered)

[edit] Drawing Pictures

[edit] Polygons
(todo: some software 3D)

[edit] Circles

This code is in pascal, but some changes will turn it easily in c++ :

procedure DrawPixel(X,Y:Integer;Color:Integer);
var
i : Integer;
begin
  i:=X + Y * SCREEN_WIDTH;
 if Color=0 then
     VRAM_A[i] := RGB15(0,0,0)
 else if Color=1 then
      	 VRAM_A[i] := RGB15(31,31,31);
end;  

To draw a Circle

procedure DrawCircle (Rayon,X_Centre,Y_Centre : Integer; Color : Integer);
        var
        x, y, m : Integer;
        begin
	x :=0 ;
        y := rayon ;        // Place on the top of the circle 
        m := 5-4*rayon ;             // initialisation
        while x <= y  do begin    // while we are in the second half
                DrawPixel( x+x_centre, y+y_centre, Color ) ;
                DrawPixel( y+x_centre, x+y_centre, Color ) ;
                DrawPixel( -x+x_centre, y+y_centre, Color ) ;
                DrawPixel( -y+x_centre, x+y_centre, Color ) ;
                DrawPixel( x+x_centre, -y+y_centre, Color ) ;
                DrawPixel( y+x_centre, -x+y_centre, Color ) ;
                DrawPixel( -x+x_centre, -y+y_centre, Color ) ;
                DrawPixel( -y+x_centre, -x+y_centre, Color ) ;  

Fill the circle or remove this 4 drawing lines if empty circle

LineBresenham( x+x_centre, y+y_centre, -x+x_centre, -y+y_centre, Color ) ;
LineBresenham( y+x_centre, x+y_centre, -y+x_centre, -x+y_centre, Color ) ;
LineBresenham( -x+x_centre, y+y_centre, x+x_centre, -y+y_centre, Color ) ;
LineBresenham( -y+x_centre, x+y_centre, y+x_centre, -x+y_centre, Color ) ;  

Don’t miss the end of the code

                if m > 0 then begin      //choix du point F
                        y := y - 1 ;
                        m := m-8*y ;
                end ;
                x := x+1 ;
                m := m + 8*x+4 ;
        end ;
end ;  
Posted by dovoto

Introduction

This days content is meant to be read or even skimmed in a single sitting. It is not meant for you to understand or be able to use much of this information at this point but only to serve as an introduction to what is inside the DS.

Hardware Overview

The Nintendo DS is rich in features. It possesses one of the most advanced 2D rendering systems ever seen on a console system, abundant memory resources (many, many times that of the SNES), dual processors capable of outperforming the Nintendo 64 (floating point operations aside), integrated wireless networking, a modest 3D system with an easy to understand interface, a microphone, and touch screen.

What follows is a brief description of these features and a foreshadowing of the things you might accomplish with the knowledge gained in this guide.

Memory Layout

The memory footprint of the Nintendo DS is one of its more intimidating features for newly introduced console programmers. Understanding where memory is and what its uses are is key to getting the most from your applications and in many cases it is key to doing anything at all. Often a picture can be helpful in understanding how memory is arranged.

(Unless otherwise stated the data width for each bus is 16 bits.)

Below is an excerpt from gbatek showing a bit more detail in the memory mapping.

NDS9 Memory Map
  00000000h  Instruction TCM (32KB) (not moveable) (mirror-able to 1000000h)
  0xxxx000h  Data TCM        (16KB) (moveable)
  02000000h  Main Memory     (4MB)
  03000000h  Shared WRAM     (0KB, 16KB, or 32KB can be allocated to ARM9)
  04000000h  ARM9-I/O Ports
  05000000h  Standard Palettes (2KB) (Engine A BG/OBJ, Engine B BG/OBJ)
  06000000h  VRAM - Engine A, BG VRAM  (max 512KB)
  06200000h  VRAM - Engine B, BG VRAM  (max 128KB)
  06400000h  VRAM - Engine A, OBJ VRAM (max 256KB)
  06600000h  VRAM - Engine B, OBJ VRAM (max 128KB)
  06800000h  VRAM - "LCDC"-allocated (max 656KB)
  07000000h  OAM (2KB) (Engine A, Engine B)
  08000000h  GBA Slot ROM (max. 32MB)
  0A000000h  GBA Slot RAM (max. 64KB)
  FFFF0000h  ARM9-BIOS (32KB) (only 3K used)
The ARM9 Exception Vectors are located at FFFF0000h. The IRQ handler redirects to [DTCM+3FFCh].

NDS7 Memory Map
  00000000h  ARM7-BIOS (16KB)
  02000000h  Main Memory (4MB)
  03000000h  Shared WRAM (0KB, 16KB, or 32KB can be allocated to ARM7)
  03800000h  ARM7-WRAM (64KB)
  04000000h  ARM7-I/O Ports
  04800000h  Wireless Communications Wait State 0 (8KB RAM at 4804000h)
  04808000h  Wireless Communications Wait State 1 (I/O Ports at 4808000h)
  06000000h  VRAM allocated as Work RAM to ARM7 (max. 256K)
  08000000h  GBA Slot ROM (max. 32MB)
  0A000000h  GBA Slot RAM (max. 64KB)
The ARM7 Exception Vectors are located at 00000000h. The IRQ handler redirects to [3FFFFFCh aka 380FFFCh].

Further Memory (not mapped to ARM9/ARM7 bus)
  3D Engine Polygon RAM (52KBx2)
  3D Engine Vertex RAM (72KBx2)
  Firmware (256KB) (built-in serial flash memory)
  GBA-BIOS (16KB) (not used in NDS mode)
  NDS Slot ROM (serial 8bit-bus, max. 4GB with default protocol)
  NDS Slot FLASH/EEPROM/FRAM (serial 1bit-bus)

Shared-RAM
Even though Shared WRAM begins at 3000000h, programs are commonly using mirrors at 37F8000h (both ARM9 and ARM7). At the ARM7-side, this allows to use 32K Shared WRAM and 64K ARM7-WRAM as a continous 96K RAM block.

Undefined I/O Ports
On the NDS (at the ARM9-side at least) undefined I/O ports are always zero.

Undefined Memory Regions
16MB blocks that do not contain any defined memory regions (or that contain only mapped TCM regions) are typically completely undefined.
16MB blocks that do contain valid memory regions are typically containing mirrors of that memory in the unused upper part of the 16MB area (only exceptions are TCM and BIOS which are not mirrored).

A few of these memory regions warrant further explanation.

Main Memory is the 4 megabyte block of RAM which generally holds your ARM9 executable as well as the vast majority of all game data.

Both the ARM7 and the ARM9 can access this memory at any time. Any bus conflicts are delegated to the processor which has priority (the ARM7 by default but changeable via a control register) causing the other processor to wait until the first has finished its operation.

Although it is possible to execute both ARM7 and ARM9 code from main RAM at the same time, devkitPro defaults to placing the ARM7 into the 64K of fast iwram for performance reasons. Official games generally place both ARM7 and ARM9 executables into Main Memory after which the ARM7 copies the majority of its own code to iwram..

ARM 7 Fast Ram (IWRAM)
Start Address : 0x03800000
End Address :  0x0380FFFF  

The ARM7 has exclusive access to 64KB of fast 32 bit wide memory. It is this region that contains the ARM7 executable and data. When designing ARM7 code it will be in your interest to keep the binary small.

ARM 9 Caches

The ARM9 contains both a data cache and an instruction cache. Although the operation of these caches is a bit complex and really out of scope for this document a few things are worth noting.

Main memory is cacheable by default. This means all data and code being accessed from main memory will be stored temporarily in the cache. Because the DMA circuitry and the ARM7 do not have access to the cache often you will get unexpected results if you attempt to DMA from main memory or share data between ARM7 and ARM9 via main memory.

To help utilize the cache effectively the mirror of main memory that begins above 0×02400000 is not cacheable. There are also several functions provided by the library which allow you to flush the data cache and ensure main memory is in sync.

Although the cache adds a certain level of complexity its boost to performance is well worth this small inconvenience.

Fast Shared Ram

There are two small 16KB banks of fast 32 bit ram that can be assigned to the ARM7 or ARM9. Access to either block by both CPUs at the same time is prohibited. Commonly, both banks will be mapped to the ARM7 as they form a continuous block with ARM7 IWRAM effectively giving the ARM7 96KB of ram.

Video Ram

The Nintendo DS has nine banks of video memory which may be put to a variety of uses. They can hold the graphics for your sprites, the textures for your 3D space ships, the tiles for your 2D platformer, or a direct map of pixels to render to the screen. Figuring out how to effectively utilize this flexible but limited amount of memory will be one the most challenging endeavors you will face in your first few days of homebrew.

Below is a table of the banks along with a description as to what uses they can be put. You should not worry about understanding this at the moment but it might be handy to bookmark or print out for later use.

View large intimidating table

Virtual Video Ram

In order for the 2D systems to function they need RAM. One of the major differences between the 2D graphics engine on the Gameboy Advanced and those on the DS is the DS has almost no memory dedicated to the 2D system. Instead of setting aside a given amount of video memory for the 2D system it allows you to map the video RAM banks into 2D engine memory space.

This might be a bit difficult to grasp at first. An example might be helpful.

Scenario: You want to render a tile based map to the screen using the main 2D graphics engine.

Because you are an uber Nintendo DS programmer you already know two things:

  1. Where the 2D graphics engine expects the map and tile data to be
  2. What video RAM banks can be mapped to this “virtual” 2D graphics memory to hold your tiles and map.

Solution: Tell the Nintendo DS to map a video RAM bank to the right place…in this case we might map video RAM bank A (VRAM_A) to 0×6000000 for use as 2D background memory but we could have chosen another bank (turns out almost all vram banks can be mapped to main background memory).

videoSetBankA(VRAM_A_MAIN_BG_0x6000000);

We will revisit this topic when we create our first few 2D demos.

This might seem intimidating and difficult at first, but it does offer you a fair bit of flexibility and power over where everything is.

Sound Hardware

What would a game be without its compliment of blips, bleeps, and chip tunes?

Sound and music are an important piece of any good game and to ensure your next graphical adventure is accompanied by an equally astounding audio experience the DS comes equipped with some impressive hardware.

16 independent audio channels can pump digitized music in 8 bit, 16 bit, or ADPCM format. Each channel has its own frequency, volume, panning, and looping controls allowing for virtually CPU free MOD quality playback.

Wifi

What is there to say but that it supports communication with 802.11 standard access points. A full socket library has been implemented which allows porting of PC network code to the DS.

Input

User input is where the DS excels and is the basis for its much lauded inventive game play. 8 Buttons, 4 direction D-Pad, Touch screen, and microphone make for an interesting mix of possibilities.

Touch Screen

As I am sure you have already noticed the DS has a touch screen. It is very standard in operation and communicates to the DS via a serial interface to the ARM7. Using the default ARM7 binary from libnds causes the touch screen values to be read once per frame and reported to an area you can reach with the ARM9.

In the next chapter we will cover how to access the touch screen values in code.

Buttons

Button presses are detected by reading registers on the ARM7 and ARM9. Some buttons are only detectable by the ARM7: the door open-close, the X and Y buttons, and the pen down “button” are all detected on the ARM7 and recorded in shared Main Memory for access by the ARM9.

Microphone

Perhaps one of the most interesting additions to the Nintendo DS was the inclusion of a microphone. I have not played with it much to be honest but many interesting ideas come to mind and we will defiantly do a demo or two which captures input from the microphone.

Real-Time Clock

Being able to know what time it is to the second is pretty handy. Your game can respond differently based on the time of day, you can tell how long it has been since the player last played the game, or it can be used as simple in-game clock. And best of all, reading the date and time is a snap.

Upgradeable Firmware

The firmware on the Nintendo DS is stored in flash memory. It can be overwritten with custom firmware. For more information on how to achieve this check here

Upgrading the firmware is useful to developers because it allows you bypass the RSA check when downloading wifi demos. This means we can send our own .nds files to our DS via Wifi instead of just officially signed ones. Also, the hacked firmware will check the GBA slot prior to booting. If it finds an .nds file signature it will execute the code automatically eliminating the need to use a passthrough based device each time you wish to run code from your GBA cart-based flash cart.

If you currently use a passthrough device to boot your .nds files from the GBA slot, upgrading the firmware is an easy and relatively safe process.

Graphics Overview

Believe it or not, the Nintendo DS is a very capable very advanced graphics power house. It has an interesting combination of 2D and 3D rendering hardware.

2D

The Super Nintendo is considered by many to be the best 2D console ever made (by many I really mean me…nobody else counts). SNES possessed a 16-bit 3.58Mhz processor, 128KB of 8 bit ram and 64KB of video ram. By comparison the Nintendo DS has a 32-bit 66Mhz processor, 4MB of main ram, and 656KB of video ram and that’s not counting all its little caches of fast 32 bit ram nor its second 33Mhz processor. This is a very capable machine.

There are two separate graphics cores on the DS. They are referred to as Main and Sub graphics cores. Each core has similar features which vary depending on their mode of operation. The major differences between the cores are as follows:

  • The main core has two extra modes which are capable of rendering large bitmaps.
  • The main core can give up one of its background layers to the 3D engine.
  • The main core can bypass the 2D engine and render from memory to the screen directly in what is often referred to as frame buffer mode.

As alluded to above, each core operates in one of several modes. Below is a table of these modes.

Graphics Modes

Main 2D Engine

Insert video mode table and description here

Text backgrounds are general purpose tiled backgrounds; rotation backgrounds are also tiled and can be rotated and scaled. Extended rotation backgrounds support a larger set of tiles (at the expense of a larger map), support more palettes, and can operate in a bitmap mode as well as tiled modes.

These modes and background types will be explored as we go along.

3D

It may not posses the poly pushing, texture blending, hardware pixel shading capabilities of the current generation GPUs but where the Nintendo DS lacks in performance and eye candy it excels in features.

Limited to 6144 vertexes per frame (about 2048 triangles or 1536 quads) the 3D system might seem a bit sparse. But given the small screen size a lot can be done with even this small number of available points.

Hardware fog, lighting, and transformation along with non blending texture mapping, toon-shading, and edge anti-aliasing make up a rather impressive set of features for an otherwise lackluster 3D machine.

The 3D core operates as a very openGL like state machine allowing much of its functionality to be wrapped in gl compliant code. One major difference between open gl and the DS core is the absence of floating point number support. All operations on the DS are carried out in fixed point precision.

If you want to get a jump on 3D look at the 3D examples included with libnds and the NeHe tutorials the source code originated from.

Toolchain Explained

Understanding how your code goes from being a text file to being executed on the Nintendo DS will become very important as your projects progress in complexity. To aid in that understanding we are going to recreate Demo 1 from scratch and build it step by step from the command line. This section is not for the faint of heart and can safely be skipped until such time as you find yourself wondering just how your tools are building something which runs on the DS.

Before we begin there are a few terms you are likely familiar with but I feel necessary to go on about anyway.

Compiler

The compiler is the first tool you pass your C source through. It is responsible for interpreting that code and translating it into machine based assembly language. From there the assembly language is further reduced into its binary machine code equivalent by another tool known as the assembler (which the compiler will call for you).

The output of the compiler is generally not executable but is instead in what is known as an object file format. Although the instructions have been translated to machine code binary, the decisions about where that code and associated data is to be physically placed in memory have been left undecided.

Linker

The tool used to combine the object files and determine physical addressing such that functions from a multitude of object files can operate in a coherent fashion is called the linker. By passing your object files to the linker you can produce an executable binary file.

Because the linker is responsible for determining where things should be placed physically within the NDS system it must be told a fair amount of information about the memory layout of the DS. This description is located in a linkscript file which describes both the memory layout of the DS and the way in which we want the different regions of our code to map to it. Here is a small piece of the devkitARM default linkscript for the arm9 (yes there is a separate one for the arm7 since it has a different memory layout).

 OUTPUT_FORMAT("elf32-littlearm", "elf32-bigarm", "elf32-littlearm")
 OUTPUT_ARCH(arm)
 ENTRY(_start) MEMORY {rom	: ORIGIN = 0x08000000, LENGTH = 32M
 	ewram	: ORIGIN = 0x02000000, LENGTH = 4M - 4k
 	dtcm	: ORIGIN = 0x0b000000, LENGTH = 16K
	itcm	: ORIGIN = 0x01000000, LENGTH = 32K
} __itcm_start	=	ORIGIN(itcm);
__ewram_end	=	ORIGIN(ewram) + LENGTH(ewram);
__eheap_end	=	ORIGIN(ewram) + LENGTH(ewram);
__dtcm_start	=	ORIGIN(dtcm);
__dtcm_top	=	ORIGIN(dtcm) + LENGTH(dtcm);
__irq_flags	=	__dtcm_top - 0x08;
__irq_vector	=	__dtcm_top - 0x04; __sp_svc	=	__dtcm_top - 0x100;
__sp_irq	=	__sp_svc - 0x100;
__sp_usr	=	__sp_irq - 0x100; 

There is much more to this script; most of which is utterly incomprehensible and any of which can have extremely difficult to understand consequences if muddled with. It is good to understand these scripts do exist and their general purpose, but actually editing them is well beyond the scope of this document.

The snipit above was chosen because it is somewhat comprehensible; it describes the 4 primary memory regions of the DS that will be used to contain code and data.

  • rom is the GBA cartridge space; it is 32MB in size and begins at an absolute address of 0×8000000
  • ewram is external working ram and is the slow 4MB of main memory for the DS.
  • dtcm stands for data tightly coupled memory and is a special area of memory on the ARM9 intended for use as fast data memory. The standard link script places the stack in this area. dtcm is a mere 16k so be careful with those local variables.
  • itcm stands for instruction tightly coupled memory and is another special area intended for use as fast instruction memory. This area is 32k and may be used for small functions which need to be fast. libnds uses this area for the interrupt dispatcher.

The rest of the script file deals with mapping of your code to these regions (read only data, code, variables, stack, global data, constructors for C++ stuff, etc). It does this using an obscure and rather intimidating expression type language that I will not even pretend to understand completely. Fortunately a few people like Jason Wilkins, Jeff Frohwein, and most recently (and most successfully) Dave Murphy have done the grunt work; what was once something which caused countless "interesting" issues can now be relied upon confidently to just work.

Generally speaking there is no need to worry about the linkerscript unless you have some pressing need to change where things go in memory. Everything except the stack goes in main memory by default so all you have to worry about is fitting everything into 4meg.

Build A Demo The Hard Way

To sum things up you first compile your source files into object files and then link them into a binary executable. Normally this would be the end of the process but, alas, our little DS is a bit more complex as it has not one but two processors and each need their own separate binary executable.

What are we to do? Create both of course. The process is identical with the only difference being we use the linkscript for ARM7 when linking the arm7 object files.

The final step in the process is packaging the binaries into a single file that we can then load onto the DS. Fortunately a tool is included in the devkitARM package which does just this. Official NDS game carts happen to use a file format which suites our needs rather well. This format includes a small header with an embedded logo and a short description of the .nds file in several languages, followed eventually by the executable binaries for the arm7 and arm9. The included logo and description text shows up when you boot a game card from the firmware or start to download one over wireless multiboot. (After a bit of investigation it turns out the logo is not actually embedded in the header but exists separately…it is however pointed to by the header)

Now that we have a feel for the process let us create a full .nds file from the command line (so we can confidently never do it again).

Okay…one quick thing to mention. When we created demo1 you may have noticed we had no arm7 code in the project. The reason we are able to get away with this is there is an arm7 executable binary already present in the devkitPro package that you can include by default. This binary performs some very basic things such as read the touch pad and real time clock as well as some very simple sound playback. For anything more advanced you will be providing your own arm7 code or at least modifying the supplied code.

Now…on to the demo. Follow the instructions from Day 1 on building your first demo with the following exception: Instead of grabbing the arm9 template grab the template labeled combined. Within you will find an arm9 folder and an arm7 folder. Replace the code in the template.c inside the arm9 folder with the demo1 code (or leave it the same as it does not really matter what we compile for this exercise).

You will notice a makefile inside the combined directory. If you were to navigate to that directory in a Dos/terminal window (try this now in fact), you could simply type make and the scripts contained inside the makefile would do all the steps we are about to do by hand.

Here are the commands needed to build the nds file from the command line.

Before we explain what is going on it is best to take a moment and absorb just how much compiling from the command line sucks…

Good, now that we know let us figure out what we did.

What is not shown is me setting my path so that the devkitARM tools could be found. On windows this is simply:

PATH=c:/devkitpro/devkitarm/bin

First we invoked the compiler on the arm9 template.c file. This translated it into an object file (template.o). We passed it the file as an argument as well as the include directory for the libnds header files we are using. Because libnds does different things depending on if you are constructing an arm7 or an arm9 binary we must define ARM9 with the –D option.

Next we link the file into an executable (.elf file). We pass the object file as an argument, we tell it we would like to include libnds (-lnds) and then we tell it where to look for the linkscript and what default libraries to use (-specs=ds_arm9.specs).

Because our loader does not handle the .elf format very easily we strip away all that extra info using objcopy. This leaves with a nice flat binary for execution. (the .elf file contains debug information and other things which are useful; you will need the .elf file to use the remote debugger or the source level debugger in no$gba if you can afford such luxuries).

This entire process is repeated for the arm7 leaving us with an arm7.bin and an arm9.bin. We next combine these binaries into an .nds file using ndstool.

If there is anything you should take from all this it is the convenience of the template makefiles. All you do is drop .c/.cpp/.s files into the source directories for the processors, .h files into the include directories and .bin data files into the data directory (more on this when we talk about getting your data into your program) and type make. The script will automate this entire process in an efficient and easy to use fashion which reduces the entire painful process of above into the single command: make.

Conclusion

Today we took a short peek at the capabilities of the DS and learned a bit more detail on the process of creating executable from code. Much of the hardware descriptions were intentionally vague as the real detail will come in the following chapters.

Tomorrow we will begin looking at the hardware in detail when we explore raster graphics.

Posted by dovoto
Featured Video