NokiMo
vitorvilela
vitorvilela

patreon


The throughput problem

The Super Nintendo master clock is 21.48 MHz (derived from 21.477272727272...). Normally, everything you do from the SNES derives from the clock speed. Super FX speed is 21.48 MHz. SA-1 speed is 10.74 MHz, which is 21.48 MHz / 2. FastROM speed is 3.58 MHz, which is 21.48 MHz / 6, etc.

Let's say I would like to make my own 3D game or a special 2D rendered game, via a frame buffer. In other words, instead of using the SNES PPU core, I'm gonna render the game using the SNES, SA-1 or Super FX CPU.

Disregarding the V-Blanking limitations, the internal SNES screen is 256x224. Normally, we can do 192x192.

If we divide the 21.477272 MHz master clock by the number of pixels, we have around 9.71 cycles per pixel. Alternatively, we have 19,42 cycles per pixel in case we want 30 FPS.

Only being able to do 20 operations per pixel to achieve a 30 FPS stable gameplay is way too small.

For the SA-1 chip, this means we can only do 5 operations per pixel, since the CPU runs at 10.74 MHz and the fastest 65c816 CPU opcode requires at least 2 cycles. The fastest store opcode requires at least 3 cycles. The same situation applies for Super FX, with the difference of you being able to use the cache (512 byte acceleration memory cache) which lets you do operations on a single 21.48 MHz cycle.

With so little time available, it's pretty much impossible to use techniques such as z-buffer, texture mapping or matrix transformations, which the latter two usually requires using arithmetic operations (add, subtract, multiply and divide) which are slower than a simple load and store operations.

It's not impossible, of course. Games that relies on polygons without (usually) textures such as Star Fox can spend +- 100 operations for calculating the first pixel of a polygon line and for the rest of the line, we can reuse part of the calculations in a way that we can optimize to only use 5-10 operations per pixel. That's the path for optimizing the game for a better speed which makes the 30 FPS dream patch for the Star Fox not impossible.

Yet, the throughput problem is real for the SNES. The next generation consoles, with CPUs being able to scale nearby 100 MHz with a 32 or 64-bit data bus makes it much easier to deal with pixels than the 16-bit consoles.

For clarifying once again, the issue is not that the consoles are not able to do 3D calculations or projections, the issue is simply that there's too much pixels to be done per second for a satisfactory performance. We could of course make 3D games that runs between 5-10 FPS or have an extremely small resolution (like 128x64) which reduces the thoughput pressure drastically, but it harms the final user experience.

We are lucky that the Moore's law made the CPUs capabilities expand drastically in the 2000s to the point of the throughput problem not being an issue anymore. Still, for who wants an extra challenge, definitely making real-time 3D games for the 16-bit consoles is the way for taking the console limits into the extreme.

Comments

The tile-based nature is somewhat simple to resolve, by setting up a tilemap where each block is mapped to the next character map. So you can set it up in a manner that it's 1 -> 1 and the only thing you have to worry is editing the character map. The issue on the character map is that they're bit planed. Instead of being a bitmap, where you can set a pixel by calculating the x and y position (usually by accessing the array via formula y * WIDTH + x), you have to divide y and x in 8x8 blocks and then set N bytes sequentially (where N is the bitdepth) and set/reset a particular inside the 8x8 block. It ends being much more expensive. SA-1 solves this issue by making a DMA that does the work for you at 2.68 MHz speed. Super FX solves this issue the PLOT doing the necessary conversions for you after setting up X and Y position. And other SNES games usually use Mode 7 which can be hacked to act as a regular bitmap

Vitor

Thanks for writing such a clear explanation about this. Tbh I'd never considered a framebuffer approach could even be possible due to the tile-based nature of most 16-bit and earlier consoles. I did know that Sega's SVP chip basically works as a framebuffer but then again MD Virtua Racing isn't exactly the best example of pretty graphics or high framerate. I've always just assumed the concept would be too impractical and slow to be useful in most cases (which, if I understand your post correctly, is the case).

Muriel Melvin

Oh, an addition about the impressive demos available around the internet, usually focused on Amiga/Atari/Sega/etc. consoles: they overcome the throughput problem by using two techniques: lookup tables and extra RAM. 1- The lookup tables allows for you preprocessing the work usually you do at real time in a manner that you only need 5 operations to finish what's left for outputting the pixel. At cost of you not being able to change the camera angles, zooming/out or changing an object position without having to 'preprocess' everything again. 2- the RAM buffer lets you store dozens of processed frames which would the processing is done during the demo loading or when it does trivial things such as displaying a static logo or a simple particle-like effect. It's a wise strategy for doing more complex (and impressive) scenes that would not be possible to do at real time, but they make sure to process 10-20 seconds of frames so they don't have to worry at all if it takes for example, 4-5x more time than normal because you still have a lot of frames queued to be displayed. Of course you can't do that during real-time games because that would add tons of input lag.

Vitor


Related Creators