Decompiler Instructions

2018-05-07

The decompiler is written in Lisp, which is a text-line-based interface.

It has a GUI extension for Windows which it runs with the command '(gui)'

The GUI has 3 parts:

  1. RevDasm Disassembler
  2. RevEngE Decompiler
  3. Append C Files

See Part 4 for Lookup Data Files.

Part 5 explains the types of variables.

Part 6 details goto's.

Below are instructions for all 4 parts.

Part 1: RevDasm Disassembler

Details: The Disassembler takes a binary (which begins at memory address 0), typically a 64KB binary from a 6502 machine. It also takes a start address. Its purpose is to create a series of text files which typically look like this:

Disassembly of Function 1CB7

1CB7: jsr 1CAE
1CBA: lda #20
1CBC: sta 5
1CBE: lda #0
1CC0: sta 4
1CC2: ldx #1F
1CC4: ldy #0
1CC6: sta (4):#y
1CC8: dey 
1CC9: bne 1CC6
1CCB: inc 5
1CCD: dex 
1CCE: bne 1CC4
1CD0: ldy #3F
1CD2: sta (4):#y
1CD4: dey 
1CD5: bpl 1CD2
1CD7: rts 

The 'Recurse' button says that, when we get to 'jsr 1CAE', we create a new text file which is the disassembly dump of the function starting at 1CAE.

Just a note: A disassembly dump ends when it gets an 'rts' instruction, AND the code is in the 'root path'.

The highlighted button above brings up this dialog box:

Clicking on 'Look in: assembly' at the top brings up this:

Click on 'jobs':

Now click 'zol.bin' and 'Open'.

Get this:

We now have the bin file needed, and we have to determine where to disassemble from in the bin file. In this case, we're going with a known entity: Function 6776 (hexadecimal).

Note the .bin file is typically a block of 6502 memory starting at 0 bytes. So a 64KB file from 0000 to FFFF is a typical .bin file. If there are any extra bytes at the start (WinVice puts 2 to indicate the start address), the .bin will not be aligned properly to 0.

All your disassembled ASM files go in the 'assembly' folder under your program name (say jobs/zol/assembly).

Finally, 'Recurse' means if when disassembling, we come across a new function call, it disassembles THAT.

Part 2: RevEngE Decompiler

The decompiler basically takes its code from a disassembly text file (see 1CB7 above). It decompiles thus:

void ClearScreens() //usub_1CB7
{
int LLocal1;
int LLocal2;
int LLocal3;
ClearColourMap();
GUI_Draw_Pointer_hi = 32;
GUI_Draw_Pointer_lo = 0;
LLocal2 = 31;
do {
        LLocal1 = 0;
        do {
                *(GUI_Draw_Pointer)[LLocal1] = 0;
                LLocal1 = (LLocal1 - 1);
        } while (LLocal1 != 0);//LoopEndWh 1CC9
        GUI_Draw_Pointer_hi = (GUI_Draw_Pointer_hi + 1);
        LLocal2 = (LLocal2 - 1);
} while (LLocal2 != 0);//LoopEndWh 1CCE
LLocal3 = 63;
do {
        *(GUI_Draw_Pointer)[LLocal3] = 0;
        LLocal3 = (LLocal3 - 1);
} while (LLocal3 >= 0);//LoopEndWh 1CD5
return;// 1CD7
};

Click on the button 'Browse for ASM File'.

This button brings up this dialog box:

Clicking on 'Look in: assembly' at the top brings up this:

See how you navigate from the root of the drive down to where RevEngE6502 stores its assembly text files?

Finally, in the RevEngE Launcher dialogue, 'ASM File' is now in the box. Click 'Launch Revenge' to decompile the chosen assembly text file.

Options

Option What it Does
Recurse When a function is called from your decompiled ASM file, it decompiles the called ASM file. Basically, if 1004 calls 2000, the file zol.usub_2000.txt is then decompiled.
Interleave Every line normally displayed, is displayed after the address the line is on. This is so you can load the interleaved data into your disassembler (like SuperSAM) and it'll show the decompiled code alongside assembly code.
  • void MainMenu() {
  • 19585
  • Global_90 = 1;
  • 19587
  • usub_1CFB();
  • 19605
  • usub_1D9A();
  • 19608
  • usub_439A();
  • 19611
  • usub_49E4();
  • 19614
  • DrawMenuLines();
Code to File Output code not to the main (black console) window, but to a .code file. They are stored in the 'jobs/zol' (example) folder, alongside the 'assembly' folder.
CC65 Output C code in CC65-friendly format. CC65 is a popular C compiler for 6502 architectures.
Hex Globals Output all global variables from base 10 (say Global_812) into hexadecimal format (Global_32C).

Part 3: Append C Files

Basically, this is easy.

In the text box 'Program Name (eg, zol)' put the text 'zol' then click 'Append C Files'.

It will concatenate all .c files in (for example) jobs/zol to a single 'zol.allcode.c' file.

Part 4: Lookup Data Files

Look in jobs/zol and find 'zol.lookup.txt'.

There are 3 sections - Functions, Globals and Locals.

Functions are names given to assembly functions, say above at 1CB7:

[Functions]
19920 = DrawObjectTile

Globals are names given to global variables - which are any addresses within the first 64KB of a file:

[Globals]
155 = CurrentlySelectedMenuItem

Locals are names given to LLocals, BLocals or FLocals.

[Locals]
19296.FLocal19 = GetType
20393.BLocal1 = ThisVar
#x4b60.LLocal1 = Counter
#x4b60.LLocal2 = Counter2

If you want to identify a function address in hexadecimal, put '#x4b60' where you would put '19296' (which is 4b60 in hex).

If you find out what a given function, global, or local variable does, put it in the lookup.txt definitions file. It will auto-refresh every new item in the file when you decompile something, so you don't have to reload the decompiler program.

Part 5: Variable Types

There are 4 types of variable in RevEngE:

  1. BLocals
  2. LLocals
  3. Flocals
  4. Globals
  5. Arithmetic Tuples

It is our research that these cover EVERY use in a standard C program.

The purpose of these Local variables is to filter out ALL register assignments.

BLocals

A BLocal - short for 'Branch Local' - is where a variable is initialised before a block of code (an 'If' block). See here:

	BLocal1 = Global_37;
	if ((Global_37 >= 97)) {
		BLocal1 = (Global_37 C:- 96);
	}//EndIF; 2ADE
	Global_44 = BLocal1;

  1. The BLocal is initialised
  2. The BLocal is used in an If block
  3. The BLocal is referenced AFTER the If block AND its value remains unchanged after the If block.

The BLocal sometimes doesn't need initialising before the If block, because sometimes it is set in every sub-If block it is set in.

	if ((*(CreatureUnderCursor-HealthEtcAttributes)[2] < 144)) {
		BLocal1 = 1;
	} else {
		if ((*(CreatureUnderCursor-HealthEtcAttributes)[4] == 255)) {
			BLocal1 = 2;
		} else {
			BLocal1 = 3;
		}//EndIF; 67E2
	}//EndIF; 67E2
	DrawBorderTile(BLocal1);

FLocals

FLocal means 'Function Local'.

If a variable is assigned to the return value from a called function (say "1CB7: jsr 1CAE"), then:

Here is an example of a FLocal:

	void FLocal19 = usub_2BEC();
	if ((*(LocationDescriptionPrint)[(FLocal19 + 1)] < 0)) {
		Global_35 = (FLocal19 + 1);

See how it is used twice even though the function is called only once?

As a note, above, FLocal19 has a ref count of 3.

If we said 'usub_2BEC()' every time the return value from that function is used, we would call the function more than once! Which would be bad.

Here's where it's used never:

	void usub_7e92() {
	Global_96 = 24;
	Global_97 = 234;
	Global_98 = 0;
	Global_99 = 192;
	usub_82A7(Global_32455[Global_170]);
	if ((Global_32476[Global_170] != 0)) {
		usub_7F58(Global_32476[Global_170]);
		usub_82A7(Global_32455[Global_170]);
	}//EndIF; 7EB9
	Global_31815 = 0;
	Global_31816 = 216;
	return;// 7EC6
	};

And finally, 1 assignment, where it doesn't need an FLocal creating:

	BLocal3 = (Global_40 + 1);
	if (((Global_40 + 1) == 40)) {
		if (Global_78 == 0)
			goto Label_2abc_0;
		BLocal2 = (Global_41 + 1);
		if (((Global_41 + 1) == 21)) {
			Global_25 = 13;
			if (13 != 0)
				goto Label_2abc_1;
			Label_2abc_0:;
			if (((Global_41 + 1) == 25)) {
				Label_2abc_1:;
				BLocal3 = usub_2B49();
				BLocal2 = Global_41;
			}//EndIF; 2B1A
		}//EndIF; 2B1A
		Global_41 = BLocal2;
		BLocal3 = 0;
	}//EndIF; 2B1E
	Global_40 = BLocal3;

Finally, to see function tuples from the last decompiled source file, type '*function-ref-counts*' into the Lisp command prompt.

We are using the 'zol.usub_6776.txt' assembly file for this example.

(4 . 1) means 4 is the function number, 1 is the ref count):

	#S(HASH-TABLE :TEST FASTHASH-EQUAL (1 . 1) (2 . 1) (3 . 1) (4 . 1))

As a side note, you can see a function name, its arguments and return type, and finally the address of the call (for instance, 'jsr E0A6' is at 26486 or hex 6776), type '*function-tuples*' into the Lisp command prompt:

	#S(HASH-TABLE :TEST FASTHASH-EQUAL
	(1 . ("usub_E0A6" ("") "void" NIL 26486))
	(2 . ("usub_4635" ("") "void" NIL 26505))
	(3 . ("usub_E5B0" ("") "void" NIL 26553))
	(4 . ("DrawBorderTile" ("BLocal1") "void" ("void") 26596)))

Finally, see those function tuples in action in 6776:

void usub_6776() {
int BLocal1;
usub_E0A6();
if ((*(TileBeingDrawn)[0] >= 140)) {
        SelectedCreatureYPos = CurrentDrawingTileY;
        SelectedCreatureXPos = CurrentDrawingTileX;
        usub_4635();
        CreatureUnderCursor-HealthEtcAttributes_lo = CreatureUnderCursorMapInfo_lo;
        CreatureUnderCursor-HealthEtcAttributes_hi = CreatureUnderCursorMapInfo_hi;
        if ((Global_206 != 0)) {
                if ((*(CreatureUnderCursor-HealthEtcAttributes)[1] == 128) || (*(CreatureUnderCursor-HealthEtcAttributes)[4] != Global_2307)) {
                        if ((*(Global_46)[1] & 128)) {
                                CreatureUnderCursor-HealthEtcAttributes_lo = Global_46;
                                CreatureUnderCursor-HealthEtcAttributes_hi = Global_47;
                        }//EndIF; 67B7
                }//EndIF; 67B7
        }//EndIF; 67B7
        usub_E5B0();
        BorderDrawParamColour = Global_26600[(*(CurrentlySelectedCreatureHealthEtcAttributes)[4] & 7)];
        if ((*(CreatureUnderCursor-HealthEtcAttributes)[2] < 144)) {
                BLocal1 = 1;
        } else {
                if ((*(CreatureUnderCursor-HealthEtcAttributes)[4] == 255)) {
                        BLocal1 = 2;
                } else {
                        BLocal1 = 3;
                }//EndIF; 67E2
        }//EndIF; 67E2
        DrawBorderTile(BLocal1);
}//EndIF; 67E4
return;// 67E7
};

LLocals

An LLocal is a loop variable.

Explanation soon.

Part 6: Goto's

Java doesn't support goto's in source code.

However, C/C++ does, and also .NET.

We use goto's not because of a failure to handle flow control, but because our flow control can identify places where only goto's will work.

For instance, say you have 2 nested loops.

The only construct for a loop is 'break'. This means if the given condition is true, jump to the line directly after the end-loop branch.

But if there are 2 nested loops, the inner loop cannot jump out of both loops by using 'break'. For this case, we create a goto to the line after the end-loop branch of the outer loop.


Next up is where people have used || and &&.

The compiler, instead of using conditional expressions and a single branch, puts all kinds of crazy branches in the code, to deal with || and &&. (Also a note that 6502 goto branches will also be correctly identified, even though not compiled by a machine but a human).

We know when it's been doing this, because the crazy branches jump from within 1 conditional block into another.

We fix this by spotting the branches and turning them into goto's.

The final note is, it's possible to read these goto's and turn them back into an || and && conditional block. We haven't done this yet, though 1 simple case (of ||) works pretty well at the moment.

Finally about Java: It's possible to turn goto's into conditional blocks with flags stating what code to skip. We haven't done this yet.