Making sense of the messExtracting the game dataSo where I am heading here with the reverse engineering is to start from a ROM dump. I’ve dumped the cartridge myself. Those who know me from some retro gaming forums may know I have an obsession for old Sega 8-bit games. That’s the sort of thing I do to my cartridges:
(picture added for the dramatic effect)
I have different setup for different use cases. Pictured above, there’s a prototype of R-Type, it comes on a bulky development board that the developers (Compile) used in 1987. I plugged the board into a homemade adapter. On the back of the console there’s an extension port that was meant to accessories but has never been used. We put a gender adapter here and then a cartridge modified to carry a simple software I wrote for the Master System, in Z80 assembly. The software run on the console, copies itself to RAM then execute from RAM. The software in RAM now reads the game cartridge data (here R-Type prototype, but same works with Wonder Boy III The Dragon’s Trap cartridge) and copies it back on the custom writable cartridge that’s plugged on the back. The writable cartridge is battery backed. I unplug everything and read can read the writable cartridge back on the PC using a custom made cart reader (I didn’t make the hardware myself, I would probably hurt myself with a soldering iron).
So I’ve extracted the data from the original game. It is your typical Master System cartridge. What’s we’ve got essentially is a 256 Kilobyte file. I’ll call it wb3.sms. You can find this file on the internet frankly, although for the sake of it I’m showing the full process here.
The file in an hex editor. A thousand pages of number. That’s our game!
Toward making this a little more readable..Here are simplified specs of the Master System just to understand what we are talking about.
CPU: Z80 at 3.57 Mhz
RAM: 8 KB
VRAM: 16 KB
ROM: 256 KB (for this game)
You can find more detailed documentation
here but you don’t really need to for the purpose of following this blog. Unless you are curious or at ease with the hardware info. The Zilog Z80 was a popular CPU first made in 1976 but used through the eighties in variety of game and computer systems: many arcade games, Amstrad CPC, Game Boy, Master System etc.
(A Zilog Z80 CPU)
The way is works is the cartridge contains both the code and the data interleaved. There’s no file-system as on a modern OS, but the game knows where its data are. There’s no easy way to tell which byte are data and which bytes are code. Being able to differentiate code from data would be useful though!
Code starts at offset 0000 which is the first byte of the ROM here. Using a Z80 disassembler or an emulator you can follow on this code. What a disassembler would typically do here: it will start from the first byte at 0000 and decode it into a readable instruction, then on to the next. Here’s a reference to the Z80 instruction set:
http://clrhome.org/tableA disassembler knows about all the branching instructions and follows every possible branch to map all possible code paths it finds and decode them into a readable instruction format. The reason you have to do that and can’t just decode every byte of the ROM sequentially is that Z80 opcodes are of variable size. Some instructions are encoded in 1 byte, some in 2 bytes, some in 3 or 4 bytes. Based on the 1st byte you can tell if you need a second byte, etc. So you need to be at a correct position to decode the following bytes as code. If you start disassembling in the middle of a 3 bytes instruction, they will be interpreted as a completely different stream of instructions. On other architectures such as the ARM family of CPU widely used in mobile devices nowadays, all instructions are encoded in 4 bytes so this issue is made easier.
Here an example of disassembling a few instructions. Onward I will sometimes use the form FFh or $FF to denote an hexadecimal value because this is how they are typically expressed in old assemblers and disassemblers. Modern languages such as C or C++ would use 0xFF. Sometimes I will omit the prefix/suffix to make text more legible.
From ROM address 028Fh, the following 13 bytes values are DB BF CB 7F 28 FA 10 F8 3E FF D3 3F C9
This disassemble into:
Address -- Bytes -------- Decoded instruction -- My comments
028F: DB BF ina (BFh) ; read from hardware input port BFh into A register
0291: CB 7F bit 7,a ; test if bit 7 of A register is set
0293: 28 FA jr z,-06h (028Fh) ; if bit of 0, jump back to 28F
The first instruction, ina (BFh) read from hardware input port BFh (which happen to be reading the status byte of the video chip, according to this documentation
http://www.smspower.org/uploads/Development/smstech-20021112.txt ). ina (BFh) is encoded in 2 bytes as DB BF. If instead we started decoding start from the following byte 0x290 (0x28F+1 = 0x290) we would get:
0290: BF cp a ; compare A register with A register
0291: CB 7F bit 7,a ; test if bit 7 of A register is set
0293: 28 FA jr z,-06h (028Fh) ; if bit of 0, jump back to 28F
When taken in isolation, the second byte of what was our ina (BFh) instruction became a cp a instruction!
None are “incorrect”, but we can easy tell that the game is meant to execute from 028Fh and not from 0290h because:
There’s an jump back to 028Fh, so that’s pretty explicit!
Doing cp a followed by bit 7,a wouldn’t make much sense, since both are test instructions that affect the Flags register, and the result of cp a instead tested for.
In context it become usually easy to tell what is a correct / meaningful sequence of instruction, something which a disassembler may have a hard time to do in some situation.
Now let’s see that comes after… (don’t worry about understanding the details of this code)
029C: 21 AC 02 ld hl,02ACh ; set HL register (16 bits) to 02ACh
029F: 11 6C CF ld de,CF6Ch ; set DE register (16 bits) to CF6Ch
02A2: 06 14 ld b,14h ; set B register (8 bits) to 14h
02A4: 7E ld a,(hl) ; load from the address pointed to by HL into A
02A5: EF rst 28h ; call the function at 28H
02A6: 12 ld (de),a ; load A register into the address pointer to by DE
02A7: 23 inc hl ; increment HL
02A8: 13 inc de ; increment DE
02A9: 10 F9 djnz -07h (02A4h) ; decrement B, if B is not zero, jump back to 2A4H
02AB: C9 ret ; return;
02AC: 26 80 ld h,80h ; <-- from there the code doesn’t make much sense
02AE: A2 and d ; also, we are right after a “ret” instruction
02AF: 81 add c ; which is an unconditional “return from call” instruction
02B0: FF rst 38h ; so we can easily guess that those bytes are not code
02B1: 82 add d ; they are data!
02B2: FF rst 38h ;
Here this disassembler choose to continue disassembly after the RET instruction at 02ABh in spite of the instruction being an unconditional return. The code after RET that won’t be executed, unless some other parts of the program jumps directly into it. Now, by looking at the “instructions” coming afterward we can guess that they aren’t meant to be executed by the CPU, but are rather data bytes used for another purpose.
Now, most Z80 disassemblers have a problem. Some of the branch are dynamic, created by the game based on state variables manipulations, and the disassembler can’t really tell about them. A static branch would be “if this register is zero, then jump execution to this address”. here we know about the address in question. A dynamic branch would be “take this address and add 2 times to the content of register HL”, then jump to that address. Because the register content can be computed in any number of way and come from any source it’s hard for a disassembler to find and follow every path of code. The code that I’ve looked at for The Dragon’s Trap seemed to use lots of dynamic branches and jump tables and most disassemblers would miss a lot of the code.
So we have two solutions here: use an emulator and inspect the game as it is running, or find a better disassembler.
Using an emulatorI’ve made this Sega Master System emulator called MEKA a while ago. So i know how the hardware works which is saving me lots of time here. In fact the emulator was called MEKA in homage to this very game! (Meka being the name of the first dragon that curses you into lizard form). So my love for this game runs from long ago! The emulator is old clunky software but it provides a bunch of very useful debugging and hacking tools which are perfect to assist in reverse engineering this game.
Using the emulator it shows me a short disassembled view based on the current CPU instruction. I can step through the code and set breakpoints. It’s still extremely raw and abstract, there’s literally tens of thousands of instructions and it’s hard to make sense out of them out right now. However we will still use the emulator and its tools extensively and we’ll see how we can make this mess clearer later.
Finding a better disassemblerSo I mentioned dynamic branching that most Z80 disassemblers can’t follow. There are a lot of disassembler available (a basic one is fairly easy to write) and it became clear that if I gave them the compiled code for The Dragon’s Trap they output a poor result (not separating code/data, or missing out on code).
As it turns out, someone going by the nickname of Calindro has been working on a project called Emulicious (
http://emulicious.net ). It is an emulator but it also aims to provide great debugging/hacking features and include a powerful disassembler. The idea is that his disassembler is leveraging emulation techniques to provide a more accurate disassembly of the game. It also knows about the machine (its memory map, its I/O port map) and uses lots of trick to be able to map where all the code is located.
When I first tried Emulicious it didn’t really work out perfectly and missed a lot of the code, but going back and forth with Calindro he implemented a few fixes and improvements to his disassembler and it largely improved the situation. Even though, it doesn’t manage to find and isolate all bits of code vs data automatically. But through a configuration file it can be provided a few code “entry points” that we’ve discovered by studying the partial disassembly and the behavior in emulator. So e.g. I tell the disassembler that I know for a fact there is code at address 696Ah and it will disassemble from this location in addition to the other locations it found itself. Using this data Emulicious can provide me with a fairly accurate disassembly that separates code from data.
Here’s the same chunk of code output by Emulicious.
-: ; <-- this is a local anonymous label
in a, ($BF)
bit 7, a
jr z, - ; ..so when you read the jump execution you know where it is going
djnz -
+++++:
ld a, $FF
out ($3F), a
ret
_LABEL_029C_: ; <-- it is a named label because it has detected that
ld hl, _DATA_2AC_ ; other code is “calling” into that address
ld de, _RAM_CF6C_ ; <-- every location in the RAM range also gets a label
ld b, $14
-:
ld a, (hl)
rst $28 ; _LABEL_0028_
ld (de), a
inc hl
inc de
djnz -
ret
; Data from 2AC to 2BF (20 bytes) ; <-- It has figured out that those bytes are data
_DATA_2AC_: ; because no code is ever executed here
.db $26 $80 $A2 $81 $FF $82 $FF $83 $FF $84 $FF $85 $FB $86 $00 $88
.db $00 $89 $00 $8A
Note that the syntax is a little different. Here our “ina (BFh)” read as “in a, ($BF)”. They are totally the same instructions but different assemblers/disassembler uses slightly different syntax.
It is much better! The reason the disassembler is turning addresses (such as 029C or CF6C) into a named label (such as _LABEL_029C_ or _RAM_CF6C_) is to give the user a chance to rename them easily in a text editor. In fact, a lot of the reverse engineering work will consists in giving legible names to all those obscure labels to lift some of all that fog. Once we learn that RAM address CF41h contains the amount of boomerang carried by the players, we can rename the _RAM_CF41_ label into something like NumberOfBoomerang all across the generated disassembly. Then all reference to this variable will make the code more readable.
We can even use a compatible assembler to turn back the disassembly listing into code. If we haven’t made modification to any of the instructions, the output will be identical to what we started with. The label themselves are just references for the programmers and the assembler to convert into addresses, but their name aren’t part of the compiled code. If the disassembler has managed to turn all addresses and references into named labels (which it seems to have done successfully), then we can also move or add instructions into the disassembly and rebuild the game with modification. The chances are high that will still run without bugs. If the game code does some funky address computations expecting known/hardcoded addresses that the disassembler couldn’t catch, moving a single byte of the code will likely break the game until we manage to patch the offending code.
Attached is the disassembly output by Emulicious. It is 28000 lines of instructions + short data blocks for the main file + remaining 181 KB of unknown binary data left untouched. We are going to study those instructions and the data, and figure out if we can turn it into a modern game!
disassembly.zip (180 KB)
Let me know if there’s something I can get into details, if it’s unclear, or just insane!