The way I designed it for a fixed screen (no scrolling) was to only clear the parts of the screen which where written to leaving the rest alone. This does limit the number of sprites you can show without flickering and the fastest I could get was 8 16x16 fully masked sprites per frame. For The Order while plotting the sprites I build up a mini back buffer only for the pixels overwritten by the sprites. On the next frame it works backwards through this buffer putting everything back as it was, this works very fast before the scanline even touches the top of the screen. This does help with overlapping sprites as it is just reverse plotting but doing that very fast. There are quicker ways using XOR but I wanted the masked sprites.
Yes, reverse plotting is a nice way to resolve overlapping sprites problem.
When I was creating a scrolling game I created a routine that basically had a full back buffer but was designed so that back buffer was a ton of ld de,000 push de statements. This was the quickest I could get it and could plot a 24x16 char display per frame no flickering at all. The biggest pain with this approach was loading all the de,0000 statements with data and I found it tough to do this quickly so ended up having a game at 16fps. It also took up loads of memory around 7kb for the back buffer alone. I can post the code extract if you are interested.
LD DE, nn [3 Bytes](10 T-States)
PUSH DE [1 Byte ](11 T-States)
This makes a total of 21 T-States for every 2 bytes of screen content, and doubles the amount of RAM you actually need for the buffer.
That will work quite fast, for dumping or plotting the buffer. But has you say, filling the buffer will be kind of awkward, since you have 2 bytes of data, every 4 bytes, and these 2 bytes are reversed due to the way Z80 saves words or 16 bit registers. And since the buffer is linear, but the ZX display buffer (#4000..#5800) is not, you will have the same row address oddities.
I think your approach is probably the best to keep the fps as high as you can plus you want to utilise the entire screen and not just a smaller window. Treat scenery differently to the player and maybe a couple of exceptions like the electicity.
TomDD
I am trying to avoid making exceptions (except for the ball), because that will probably force me to allocate and manage dynamic amounts of memory, which can be a pain to do in ASM, and will require some more cycles and memory for control structures.
OptimizationsI have been thinking, and there are some more optimizations that I can do, but it will start to limit some options I currently have, but which I do not make use of right now.
Example 1:So far, all my level sprites are placed aligned with the first row of a char, although currently I can move them, and it will work.
If I limit myself to this condition (aligned to first char row) I can infer something from this, and reduce my code somewhat.
Example 2:I have all level sprite data structures saved in an ordered continuous array (ordered list if you will), with their location [ X(2 bytes), Y(1 Byte)], which although flexible, implies that I have to calc it's screen address every time I blit/plot them.
But since I only scroll horizontally, I can save pixel calc address or keep Y already pre-shifted, since scrolling (except for clipping), is just a matter of INC or DEC that address, since the screen width is limited to 256 pixels (32 bytes), which fits in 8 bits.
This can save a lot of grunt calc work, when redrawing the level.
I'll probably try a simple hack to measure how much can be gained with an approach like this.
NOTE: This can be tested with a simple replacement of calc function with one that returns a fixed position, hence no burden with calcs, but still returns a valid output, so that blit/plot routines can do their work, even if everything will be overwritten in the same place, it will work and will allow me to measure the time improvement, without changing any other conditions.
Example 3:I'm using a trick when bliting, which consists in incrementing the address, when changing to next row. Every 8 rows, there is a special case, but most of the time (7 rows) it can be a simple INC. So I detect if I'm in the special condition and only process the special increment when needed. This detection is fast (4 instructions), however there could be a faster way (for every case), by using some kind of reference table (still have to explore this further).
Example 4:Currently, on every frame, I'm looping my active sprite list (a linear subset of my level list), to check if it requires animation.
At least, for now, I don't foresee having more than 3 or 4 animations active on screen, but I probably have more than 8 sprites on screen. So having an animated sprite list, with just animated sprites on it, will allow me to process all sprites without further testing.
But I will have to add or remove elements from this list, which requires that I implement some kind of circular array so that I can scroll left and right, but it also limits the total animated sprites on screen, to some pre-defined maximum (array size).
I could also have several distinct lists, since animations can have different speeds, allowing me to avoid resetting an animation frame to the same previous value, when it doesn't change in that particular screen frame. Although a more stable processing load could be preferable.
Resuming, there are quite a few tweeks that can still be done, but I'm unsure if they will save me from a 25Hz frame rate game
NOTE: In terms of game play, is irrelevant, since 25Hz is fast enough that user will not notice the difference, but it will wrack havoc on my current code base, because it will probably force me to require a back buffer, that my code is not really prepared for.
One possibility is to place this buffer in a memory aligned position, relative to ZX Display buffer, so that everything works, just by changing the most significant address bits.
For example, changing base address from #4000(display) to #8000 (buffer), with the added bonus of having no contention (ULA is cool with addresses above 16K), and eventually allowing me to use the same code to write to both locations, by changing a base address only.
If needed, I will probably try this first, since it probably only requires minor changes in the screen address calc function
In my opinion, one big flaw (limitation) in ZX Spectrum design, was not to allow software control of display buffer address bit 13, by writing to the ULA port, using just one of the unused bits. This would have allowed to provide a dual display buffer, with a single OUT instruction to flip between them.
It would allow us to swap the base display buffer address between #4000 and #6000, synchronously (using interrupt or HALT instruction) and still be within the ULA enforced contended memory area (lower 16K).