Maximizing Compression of Apple II Hi-res Images
I have the somewhat niche problem of trying to fit as many Apple II
hi-res images into RAM (and disk) at a time. Uncompressed these images
are 8k (well, you can get away with 8184 bytes without any trouble
for reasons we'll discuss later).
Background
I won't go too much into the wacky world of Apple II hi-res graphics,
you can read more on that here.
For our purposes the important part
is that you can think of it as a 280x192 monochrome image with 7-bits
per byte that typically is used to generate NTSC artifact color
(the top bit shifts the pixel slightly to essentially choose another
palette, blue/orange vs purple/green).
You might think 280x192 at 7bpp should result in a 40 byte by 192 image fitting
nicely in a linear 7.5k. Alas, no, and you can probably blame Woz for this.
The graphics avoid crossing page boundaries so there are "holes"
in the memory map. After each three rows (120 bytes) 8 bytes are left
unused to pad things out to a nice power of 2.
The final issue is that complex interleaving goes on so rows are not
contiguous in memory. This is for various reasons, possibly to save
a few chips on the motherboard. (In addition the addresses are all over
the place on the actual RAM chips to make for "free" DRAM refreshes but
you can only see that if you're logic-probing the address lines).
A brief summary of the interlacing, in PAGE1 of hi-res memory starting at
$2000 (on 6502 processors you use $ to indicate hexadecimal)
you have something like this:
$2000: Row 0, Row 64, Row 128, 8-bytes padding
$2080: Row 8, Row 72, Row 136, 8 bytes padding
...
and after 1k of this, you then start over with
$2400: Row 1, Row 65, Row 129, 8 bytes padding
...
$3F80: Row 63, Row 127, Row 191, 8 bytes padding
This leads to the "Venetian blind" effect often seen when loading HGR
graphics linearly.
Compression
I won't go into too many details here, there are a lot of 6502 compression
algorithms that have various tradeoffs between code size, compression
ratio, and speed. I've settled on
ZX02
for now which is a nice compromise
and has low code size which I like because often I am doing size coding.
Extra-Compression
It turns out though that while compressing the interleaved graphics works
pretty well, you can get a bit more compression if you de-interlace first.
You can see a video of this in action here:
Sample Video on Youtube
| | zx02 compressed | zx02+de-interlace |
Kerrek 1 (video game) |
 |
951 bytes 12% of original 8k |
808 bytes (-143) 10% of original 8k |
Christmas (fancy text) |
 |
2572 bytes 31% |
2402 bytes (-170) 29% |
Riven Maglev (hand converted bitmap) |
 |
3423 bytes 42% |
3263 bytes (-160) 40% |
Ice Warrior (iipix auto-converted bitmap) |
 |
5176 bytes 63% |
5094 bytes (-82) 62% |
So you can save roughly 100 or so bytes per image, depending on the entropy
in the original image. Does this matter? I definitely have had projects
where every byte counts and if you have more than 10 or so images it can
add up.
The Algorithm
The nice thing about this algorithm is you can do it in-place so you don't
have to waste 8k on a temporary buffer to do this.
It's two steps:
- Go through the compact data and add back in
the 8-byte memory holes every 128 bytes
- Sort the lines to the proper location (via what is essentially
a selection sort)
The Cost
Code Size
So this isn't free. How expensive is it to do this?
- The zx02 ("optimized" by qkumba) decompression code is 142 bytes
- On top of this, the de-interlace code (which I have not optimized at all)
is 355 bytes (188 bytes of that are a lookup table)
- So currently you'd need to have 4 images before it is a net win.
Note: This assumes that we already have 384 bytes of hi/lo hires lookup tables
already in memory for other reasons. Usually you do if you're doing hires
work with any sort of speed.
Time Overhead
These measurements are for the ice3 case which is a bit of a worst case for
zx02 (the de-interlace should be the same speed no matter what image).
| zx02 decompression time |
426538 cycles |
~417ms |
~25 frames |
| add-back-holes |
121031 cycles |
~118ms |
~7 frames |
| de-interlace |
237414 cycles |
~232ms |
~14 frames |
As a reminder, an NTSC Apple II updates the screen at 60Hz which is
approximately 16.7ms.
Also note the Apple II runs the 6502 at approximately 1.023 MHz
(it's complicated).
Other Uses
I originally thought of doing this when doing my double-hires
Monstersplash demo for Demosplash 2025.
Double-hires has its own issues
that make compression harder (the graphics are spread across two memory banks
and the pixels alternate between them in complex ways) so the deinterlace
is more of a win.
I would like to see if this would help much on lo-res or double-lores.
There are some scenes from the Rewind2
and Second Reality that are space constrained.
I also think it might be of use in various of my games like
the Myst demake or the
Riven Demake or Peasant's Quest.
Code
The code currently lives here, though I might eventually move it
to a better location:
hgr compressed on github
Questions
Q. Could you just modify ZX02 to be apple-ii hires aware?
A. Maybe? The problem is compression algorithms like this will grab
into the already-decompressed output data for patterns and so if you
are scattering it around it makes life more difficult.
Other VMW Software Demos
Other Apple2 Projects
Back to the VMW Software Productions Page