Extreme Size-coding the Applesoft BASIC version of Hellmood's Starpath Intro
data:image/s3,"s3://crabby-images/3e095/3e095b5c65183383ee50988fd503b4081fba9ae5" alt=""
Details
Hellmood released an amazing
64-byte DOS intro "Starpath" at the
Lovebyte 2025
size-coding demoparty. It makes an animated ray-marched path against
a gradient sky with stars. All in 64 bytes.
I wanted to see if I could port this to Applesoft BASIC
(Apple II), and what's more, have it fit in a Tweet (roughly 280 bytes)
in honor of the sadly no-longer-existent
AppleIIBasicBot.
Note this takes roughly 32 minutes to draw a frame
on a stock Apple II (1MHz 6502).
I do have faster assembly language versions that take roughly
8 seconds per frame but I'm not releasing those yet.
The Optimization Progression
Version 1 -- Direct Conversion of Hellmood's Code
This one is 630 bytes. Some notes:
0 GOTO 8:SPEED=$)%DEL$LRUSR:SPEED=$)%DEL$LRUSR
8 POKE 2068,9:DIM CL(32):FOR I=0 TO 31:READ CL(I):NEXT:GR:POKE 49234,0
30 FOR Y=0 TO 47:FOR X=0 TO 39:D=14
40 YP=Y*4*D:T=(X*6)-D:IF T>=0 THEN 70
50 C=11:A=(X*6)+YP
55 IF A>256 THEN A=A-256:GOTO 55
57 IF A<6 THEN 95
60 C=(C*16)+((Y*4)/16):C=C-160:GOTO 95
70 XP=T*D:POKE 2067,XP/256:POKE 2069,YP/256:CALL 2066:T=PEEK(36)
75 POKE2057,T:POKE2059,D+F:CALL 2056:C=PEEK(36):D=D+1
80 IF C<16 THEN 40
90 C=C-16
95 COLOR=CL(C):PLOT X,Y:NEXT X,Y:F=F+1:GOTO 30
97 DATA 0,5,0,5,5,5,10,10,5,5,10,10,7,7,15,15,1,2,1,2,3,9,3,9,13,12,13,12,4,4,4,4
100 REM APPLE II VERSION OF HELLMOOD'S 64B STARPATH
Version 2 -- First Size Optimization Pass
This next pass did some typical BASIC size optimization tricks.
Note that optimizing for overall filesize isn't the same as the on-disk
filesize as that is tokenized (with keywords generally taking 1 byte).
For this project I am going for overall filesize. This version
is 444 bytes.
Things done here:
- Make line numbers as short as possible.
- Have as few lines as possible. This is more of a problem
with the tokenized version as each line adds 5 bytes.
It also helps the text version, as carriage-return/line
number is longer than just a ':' separator.
- Remove all spaces. Applesoft (unlike some other BASICs) ignores
all spaces so you can leave them all out.
- Reduce all variable names to one letter.
- Remove extraneous comments.
- If any constants are used more then 4 times or so it's often more
compact to assign the value to a variable and use that instead.
(P is an example here, which is the offset to the assembly code)
- Remove extraneous parenthesis. Orders of operation are often
your friend.
- Inline any variables assigned to but only used once.
- On Applesoft "IF EXPR THEN GOTO 5" can be reduced
to "IF EXPR THEN 5" for a nice savings.
- Not a generic change, but we also reduced the color lookup table by half.
Other tricks not used here:
- Sometimes you can save space by
using scientific notation, e.g. 1E3 instead of 1000
- If using values less than 0 you can leave off the leading
0, .5 instead of 0.5
- Use ? to substitute for PRINT
- All variables init to 0 on start so you
don't have to explicitly set them.
- Could we use integer variables like X% to do anything good?
Applesoft generally does everything in floating point
even with integer variables so this doesn't usually buy
us anything, but it might save a call to INT().
- You can do comparisons like "IF EXPR" rather than
"IF EXPR<>0" sort of like in C,
(<> means not equal) which might save a byte.
Similarly maybe you can
do "IF X-Q" instead of "IF X<>Q"
0GOTO1:SPEED=$)%DEL$LRUSR:SPEED=$)%DEL$LRUSR
1P=2060:POKEP+8,9:DIML(16):FORI=0TO15:READL(I):NEXT:GR:POKE49234,0
2FORY=0TO47:FORX=0TO39:D=14
3Z=Y*4*D:T=(X*6)-D:IFT<0THENC=31:A=(X*6)+Z:GOTO7
4POKEP+7,(T*D)/256:POKEP+9,Z/256:CALLP+6:T=PEEK(36)
5POKEP-3,T:POKEP-1,D+F:CALLP-4:C=PEEK(36):D=D+1:IFC<16THEN3
6GOTO9:DATA0,5,10,5,10,7,15,15,2,1,3,9,13,12,4,4
7IFA>256THENA=A-256:GOTO7
8IFA>6THENC=(Y/4)+32
9COLOR=L((C-16)/2):PLOTX,Y:NEXTX,Y:F=F+1:GOTO2
Version 3 -- Optimization Pass 2
This version is 366 bytes. Changes:
- Changed up the assembly language a bit to be more compact and load
the first value from zero page rather than a constant.
We chose "CH" ($24) which is the textmode cursor
horizontal position. This value is
not used while BASIC is running.
- Moved the color lookup table to also be raw binary data after
the assembly language. At firsgt we are manually subtracting off 65
to convert from ASCII. In the next iteration we
take advantage of the fact
that in Applesoft BASIC the COLOR= operator doesn't bounds check
that the color is 0..15 but helpfully masks off the top 4-bits.
This allows us to use @=0, A=1, etc, with no subtraction.
0GOTO1:ONERR$)%)%DEL$LRUSRAFKFKHPPCBDJNM
1P=2056:POKEP+2,9:DIML(16):FORI=0TO14:L(I)=PEEK(P+I+11)-65:NEXT:GR:POKE49234,0
2FORY=0TO47:FORX=0TO39:D=I
3Z=Y*4*D:T=X*6-D:IFT<0THENC=31:A=X*6+Z:GOTO7
4POKE36,T*D/256:POKEP+3,Z/256:POKEP+5,D+F:CALLP:C=PEEK(36):D=D+1:IFC<16THEN3
6GOTO9
7IFA>256THENA=A-256:GOTO7
8IFA>6THENC=Y/4+32
9COLOR=L(C/2-8):PLOTX,Y:NEXTX,Y:F=F+1:GOTO2
Version 3 -- More extreme Optimization
This version is 306 bytes. Changes:
- Make the 256 constant a variable V. You'll note one use of this is to
get the top byte of the multiplication result (emulating a shift
right by 8). I tried so hard
to do this in assembly language because it's essentially free
if you can get the 16-bit value in RAM. No luck though.
Also you might think some of the BASIC routines could do this for
you, but Applesoft is actually pretty good at catching out-of-bounds
and giving you an error if you try to get it to sneakily
truncate things for you.
- Moved the POKE to set full graphics mode into the assembly language routine.
It does it each time the routine is called, which adds some
overhead but it's harmless and saves a lot of bytes.
- Moved the assembly language to the end of the program.
This avoids wasting a GOTO to skip the code,
but it means we now have to constantly re-adjust
the P pointer to the code any time we add/remove bytes to the file.
- Realized we can skip building the lookup table of colors completely and just
PEEK it directly during the COLOR= command
- Noticed that the obscure POS() routine (which gets the current textmode
cursor X position) handily reads zero page value $24 out.
That's where
our result was anyway and avoids the longer PEEK(36) call.
0P=2282:V=256:POKEP+5,9:GR
1FORY=0TO47:FORX=0TO39:D=14
2Z=Y*4*D:T=X*6-D:IFT<0THENC=31:A=X*6+Z:GOTO6
3POKE36,T*D/V:POKEP+6,Z/V:POKEP+8,D+F:CALLP:C=POS(0):D=D+1:IFC<16THEN2
4GOTO8
5A=A-V
6IFA>VTHEN5
7IFA>6THENC=Y/4+32
8COLOR=PEEK(C/2+P+6):PLOTX,Y:NEXTX,Y:F=F+1:GOTO1
9,RTAB(ONERR$)%)%DEL$LRUSR@EJEJGOOBACIML
Almost Final Version -- Fits in a Tweet
This is actually 281 bytes, but due to how twitter handles whitespace it
actually will fit in a tweet. Do note (to be pedantic)
the AppleIIbot had directives to specify how long to run before
recording a movie, and these take up room we don't have.
Also I think it maxed out at 99second delay and
we'd need much longer that that to draw a full frame.
Also note that while this looks nice for a single frame, it's not an
exact repo of the intro (due to the stars, as I'll explain later).
- Moved the assembly language to span lines 8/9 (which is now
line 2344 ($928) for machine language reasons).
Note that this can immediately follow the GOTO, not even needing
a colon separator. BASIC ignores everything after a GOTO on a line.
On Applesoft when tokenized the end of a line/
start of next is 0, AddrL, AddrH, LineNumL, LineNumH.
AddrL/H is a pointer to the next line in the program.
By dropping a "," (which is BIT) at the end of a line
we essentially comment out the terminating 0 and AddrL values
and can treat AddrH and the 16-bits of line number as
machine code.
This allows us to get a "9" for ORA without having to POKE it in
place.
One problem is we finally shrunk things enough that the Tokenized
Program is less than 256 bytes. This annoyingly meant that
AddrH dropped from 9 (which could be an ORA opcode) to 8
(which is PHP). So we have to waste a byte on a PLP
opcode "(" to keep the stack in sync.
Thanks to Tom Greene for suggesting this optimization.
- While doing this, I also noticed that our JMP to the RTS instruction
in ROM was LRUSR = JMP $D552. It turns out that since our
color lookup starts with colors $00 and $05, and
these get the top bits masked off, we can try to merge it with
the preceding JMP instruction.
If JMP $D550 (remember, little endian) works
to get us an RTS instead of the existing $D552
we can treat the jump target
as two bytes of the color lookup. Luckily it works, the
earlier address does an extraneous LDY that doesn't hurt anything.
- This is where I got stuck, until I decided to diverge a bit from
the origina x86 implementation.
I wasted a lot of time trying to optimize away the code
doing the MOD operation to see if we should draw a star.
Then I realized I could cheat. The original code faked random numbers,
why not use the actual BASIC random number generator (which is
famously bad on Apple II).
We can use RND(1)*256 to give a random number from 0 to 255.
To keep
the stars mostly consistent we have to seed it with the same
seed each frame, the RND(-1) at the start does that. The
downside here is that RND() is only run on sky pixels, and as
the demo progresses the amount of sky changes slightly and the
stars shift slightly. I am willing to make this sacrifice because
at 30 minutes per frame that's probably the least of your worries.
This also freed up a bunch of space as we didn't need to keep Z
(Yprime) around anymore and we could optimize the GOTOs a bit.
0P=2250:V=256:GR
1R=RND(-1):FORY=0TO47:FORX=0TO39:D=14
2T=X*6-D:IFT<0THEN7
3POKE36,T*D/V:POKEP+8,Y*4*D/V:POKEP+10,D+F:CALLP:C=POS(0):D=D+1:IFC<16THEN2
4GOTO8
7C=31:IFRND(1)*V>6THENC=Y/4+32
8COLOR=PEEK(C/2+P+9):PLOTX,Y:NEXTX,Y:F=F+1:GOTO1ONERR$,
2344))%DEL$,RTAB(LPUSRJEJGOOBACIML
Final Version -- Fits in a Tweet + Stable Stars
Final version, 277 bytes (234 bytes tokenized):
- I wanted stable stars, but how do I make a random number generator
only dependent on X/Y co-ordinates?
Then I remembered a trick I've used before when size-coding: treat the code
in ROM like a source of entropy!
This updated version indexes into ROM. I tried a few
values until 57000 ($DEA8) seemed to generate reasonable looking
stars. This is in the middle of the FAC floating point handling
code.
- I might try for a better pattern at some point. The nice part of this
is 57000 can be shortened by a byte to 57E3.
0P=2247:V=256:GR
1FORY=0TO47:FORX=0TO39:D=14
2T=X*6-D:IFT<0THEN7
3POKE36,T*D/V:POKEP+8,Y*4*D/V:POKEP+10,D+F:CALLP:C=POS(0):D=D+1:IFC<16THEN2
4GOTO8
7C=31:IFPEEK(X*Y+57E3)>2THENC=Y/4+32
8COLOR=PEEK(C/2+P+9):PLOTX,Y:NEXTX,Y:F=F+1:GOTO1ONERR$,
2344))%DEL$,RTAB(LPUSRJEJGOOBACIML
Failed Attempts
While doing this I tried a bunch of other things that were excessively
clever but failed to improve the size:
- Putting more code into assembly. BASIC is actually much denser
than assembly so you have to be selective of what you put into it,
especially if you have to encode it with ASCII/Tokens.
- Doing the whole thing in assembly?
During the heyday of the Apple II twitter bot
the largest all-text binary decoder payload we could
come up with could load 140 bytes of machine code.
I don't have an assembly version
of the demo that's that small yet (the big hold up on loader size,
as with this exercise, was mostly due to
the lack of bitwise AND/OR/XOR/MOD/shift instructions in Applesoft)
- The end of the HTAB routine at $F7FA tantalizingly ends in
STA CH / RTS which would be ideal for our assembly snippet,
but I could not figure out a way to get to $F7FA in a compact way.
Same with the POS/SNGFLT routines at $E2FF/$E301
which can convert CH or a value in the Y register
to FAC, but getting $FF or $01 into the data
stream to jump to these is difficult.
- Ampsersand/USR routines. Applesoft has three ways to call assembly
language:
- CALL ADDR -- which essentially does a "JSR" to the address
specified, but with no setup
- R=USR(E) -- this will essentially "JSR" to zero page address
$0A. (The assumption is you'll have some sort of jmp
there in $A/$B/$C). By default Applesoft sets this
to a "JMP" to the illegal operation handler,
but if you poke the top 2 bytes you can have it go anywhere.
This will take the expression E and convert to floating
point in the floating-point accumulator (FAC) values.
On exit, whatever is in the FAC will be returned in R.
This seems ideal for our use, but POKEing the address takes
up a lot of room, and to manipulate the results you have
to have a lot of assembly code that converts values to/from
the FAC (5-byte floating point) values. There are ROM
routines to do things like convert FP to 8 or 16-bit ints,
etc, but they are verbose and also hard to encode in ASCII.
So I was never able to find a way to do this more compact
than the eventual solution.
- Ampersand -- the ampersand operation, (just '&')
will "JSR" to location $3F5 in memory which Applesoft
sets up as a JMP to an RTS instruction. The accumulator
has the next token in it which can be used to multiplex,
this is why you'll see a lot of Applesoft extensions where
they define commands like &PLOT or &GR, etc.
With two pokes you can point this anywhere as well.
This is potentially very compact, the trouble is
getting values in and out. You can use ROM routines to
parse the BASIC lines afterward to get values into the FAC
and out of it. To return a value though you have to
manually look up the address of a variable and dereference
it, which takes a lot of code.
I had a working prototype that could
do this all but it was way bigger than the code being replaced.
The setup POKEs alone practically blew our byte
budget right from the start.
- Trying to use HTAB (horizontal position set) to set the initial machine
language value into CH in the
zero page. The trouble is HTAB mods it with 40 (because
of the 40-column screen, oops)
- Trying to have one large unified loop (instead of nested X/Y) and using
HTAB to 40-mod the value to get X position. In the end
this didn't save any bytes.
- Not really failed attempts, but here are some things
removed when something better came along:
- The table lookup init code only really needs 14 colors, so
we could re-use the loop initializer after the init to
set the D depth value in one fewer byte.
Disk image: starpath.dsk (140k)
Other VMW Software Demos
Other Apple2 Projects
Back to the VMW Software Productions Page