Maximizing Compression of Apple II Hi-res Images

I have the somewhat niche problem of trying to fit as many Apple II hi-res images into RAM (and disk) at a time. Uncompressed these images are 8k (well, you can get away with 8184 bytes without any trouble for reasons we'll discuss later).

Update (20 December 2025)

After some discussion on the size-coding discord I decided to attempt making a hi-res aware ZX02 decoder (that decompresses to the right interleave directly without the intermediate unpack/sort steps). This is tricky because with ZX02 style decompression it uses the already decompressed data as a source for further decompression, so if you de-interlace when writing out you also have to de-interlace these lookups too and that's tricky.

It turns out it is possible and works, but the most straightforward way to do that involves dividing a 12-bit number by 40 (and getting the remainder) which is a bit of a pain on 6502. My proof-of-concept uses iterative subtraction which is *slow* (like 40s an image). I've optimized it a bit to get down to 10s and I have some other ideas on how to maybe get it to a reasonable speed. The traditional goto of lookup tables ends up being too big, but I might be overthinking and possibly can just have a separate counter and a triple comparison to catch when addresses move to the next 40-byte chunk.

However I should be grading the student's final exams now before grades are due and probably not working on this...

Background

I won't go too much into the wacky world of Apple II hi-res graphics, you can read more on that here. For our purposes the important part is that you can think of it as a 280x192 monochrome image with 7-bits per byte that typically is used to generate NTSC artifact color (the top bit shifts the pixel slightly to essentially choose another palette, blue/orange vs purple/green).

You might think 280x192 at 7bpp should result in a 40 byte by 192 image fitting nicely in a linear 7.5k. Alas, no, and you can probably blame Woz for this.

The graphics avoid crossing page boundaries so there are "holes" in the memory map. After each three rows (120 bytes) 8 bytes are left unused to pad things out to a nice power of 2.

The final issue is that complex interleaving goes on so rows are not contiguous in memory. This is for various reasons, possibly to save a few chips on the motherboard. (In addition the addresses are all over the place on the actual RAM chips to make for "free" DRAM refreshes but you can only see that if you're logic-probing the address lines).

A brief summary of the interlacing, in PAGE1 of hi-res memory starting at $2000 (on 6502 processors you use $ to indicate hexadecimal) you have something like this:

$2000:  Row   0, Row  64, Row 128, 8-bytes padding
$2080:  Row   8, Row  72, Row 136, 8 bytes padding
...
and after 1k of this, you then start over with
$2400:  Row   1, Row  65, Row 129, 8 bytes padding
...
$3F80:  Row  63, Row 127, Row 191, 8 bytes padding

This leads to the "Venetian blind" effect you'll see on Apple II when loading hi-res graphics linearly.

Compression

I won't go into too many details here, there are a lot of 6502 compression algorithms that have various tradeoffs between code size, compression ratio, and speed. I've settled on ZX02 for now which is a nice compromise and has low code size which I like because often I am doing size coding.

Extra-Compression

It turns out though that while compressing the interleaved graphics works pretty well, you can get a bit more compression if you de-interlace first.

You can see a video of this in action here: Sample Video on Youtube

	zx02 compressed	zx02+de-interlace
Kerrek 1 (video game)	951 bytes 12% of original 8k	808 bytes (-143) 10% of original 8k
Christmas (fancy text)	2572 bytes 31%	2402 bytes (-170) 29%
Riven Maglev (hand converted bitmap)	3423 bytes 42%	3263 bytes (-160) 40%
Ice Warrior ( iipix auto-converted bitmap)	5176 bytes 63%	5094 bytes (-82) 62%

So you can save roughly 100 or so bytes per image, depending on the entropy in the original image. Does this matter? I definitely have had projects where every byte counts and if you have more than 10 or so images it can add up.

The Algorithm

The nice thing about this algorithm is you can do it in-place so you don't have to waste 8k on a temporary buffer to do this.

It's two steps:

Go through the compact data and add back in the 8-byte memory holes every 128 bytes
Sort the lines to the proper location (via what is essentially a selection sort)

The Cost

Code Size

So this isn't free. How expensive is it to do this?

The zx02 decompression code (which is pretty "optimized" as I think it's been given a once-over by size-coding expert qkumba) is 142 bytes
On top of this, the de-interlace code (which I have not optimized at all) is 355 bytes (188 bytes of that are a lookup table)
So currently you'd need to have 4 images before it is a net win.

Note: This assumes that we already have 384 bytes of hi/lo hires lookup tables already in memory for other reasons. Usually you do if you're doing hires work with any sort of speed.

Time Overhead

These measurements are for the ice3 case which is a bit of a worst case for zx02 (the de-interlace should be the same speed no matter what image).

zx02 decompression time	426538 cycles	~417ms	~25 frames
add-back-holes	121031 cycles	~118ms	~7 frames
de-interlace	237414 cycles	~232ms	~14 frames

As a reminder, an NTSC Apple II updates the screen at 60Hz which is approximately 16.7ms. Also note the Apple II runs the 6502 at approximately 1.023 MHz (it's complicated).

Other Uses

I originally thought of doing this when doing my double-hires Monstersplash demo for Demosplash 2025. Double-hires has its own issues that make compression harder (the graphics are spread across two memory banks and the pixels alternate between them in complex ways) so the deinterlace is more of a win.

I would like to see if this would help much on lo-res or double-lores. There are some scenes from the Rewind2 and Second Reality that are space constrained.

I also think it might be of use in various of my games like the Myst demake or the Riven Demake or Peasant's Quest.

Code

The code currently lives here, though I might eventually move it to a better location: hgr compressed on github

Questions

Q. Could you just modify ZX02 to be apple-ii hires aware?

A. Maybe? The problem is compression algorithms like this will grab into the already-decompressed output data for patterns and so if you are scattering it around it makes life more difficult. (update: I have this working but the proof-of-concept is slow, check back for updates)

Other VMW Software Demos

Other Apple2 Projects

Back to the VMW Software Productions Page