Snestistics is a tool that helps the user reverse engineer games for the Super Nintendo. In general snestistics needs a ROM file (.sfc/.smc) and a custom trace file. The second entry in the tutorial series shows how to create a trace file and how to make snestistics generate assembly listing.
Snestistics is an “emulator-guided” disassembler. This helps it beat other disassemblers doing only static analysis. Because of this one (or multiple) sessions must first be “recorded” in an emulator. Each such session yields a .trace-file that represent a particular run of the game.
Command Line Reference
Each feature of snestistics has a few command line options of their own. These are shown in a table in the relevant section. A typical command line invokation looks like this:
snestistics -romfile myrom.sfc -autoannotate true -nmifirst 12 -nmilast 24 -asmoutfile output.asm
ROM Support
Almost all command requires a ROM file to be specified. Most of the time it is enough to supply the name of the ROM file and the rest should be inferred from other files (such as the trace file or by doing auto-detection based on content in the ROM-file):
Switch Name | Short | Type | Description |
---|---|---|---|
romfile | r | input file name | ROM file. Currently only LoROM ROMs are supported. |
romsize | rs | integer | Size of ROM cartridge (without header). 0 means auto-detect. default: 0 |
Trace
A trace file describes what happened during a session in an emulator with a particular ROM-file. See the second entry in the tutorial series to see how it is created. Here is the command line options:
Switch Name | Short | Type | Description |
---|---|---|---|
tracefile | t | input file name | Trace file from an emulation session. Multiple allowed for assembly source listing. |
Assembly Listing
If you supply a ROM-file and a trace-file (written by snes9x-snestistics) you can generate an assembly listing of the program. See the command line reference for relevant switches. Then annotations can be be added to beautify the assembly listing. The idea is to work with the assembler listing and the annotations in an iterative way, progressively building up an understand of the inner workings of the game.
Switch Name | Short | Type | Description |
---|---|---|---|
reportoutfile | rp | output file name | Generate assembly report. Companion file to asmoutfile. |
asmoutfile | a | output file name | Generate assembly listing. |
asmheaderfile | ah | input file name | File content will be included in assembly listing. |
asmprintpc | apc | boolean | Print program counter in assembly source listing. default: true |
asmprintbytes | ab | boolean | Print opcode bytes in assembly source listing. default: true |
asmprintregistersizes | ars | boolean | Print registers sizes in assembly source listing. default: true |
asmprintdb | adb | boolean | Print data bank in assembly source listing. default: true |
asmprintdp | adp | boolean | Print direct page in assembly source listing. default: true |
asmlowercaseop | boolean | Print lower-case opcode in assembly source listing. default: true |
|
asmcorrectwla | boolean | Make sure generated source compiled in WLA DX. default: false |
Annotations
Snestistics uses labels-files to let the user add information about instructions. This has multiple purposes. The first is to let the user annotate and beautify the assembly listing to make it more comprehensible. The second is to guide the predict logic, the auto annotate logic as well guiding the trace log to perform better. In this section we will show some examples. It is allowed to have multiple labels-files. This can help organization of your reverse engineering effort.
Auto-annotations
In most games there are thousands of unknown pieces of code. In order to use the trace log successfully we need to give these ranges names, even if the names are anonymous and meaning less. For this there is a feature to create auto-annotations. A special labels-file is specified that will be re-generated if missing (or if -autoannotate true) is specified. The auto annotate feature merges ranges of code that uses branches between each other. Anything between the range where the branch happened and the range where the branch ends up is merged together. It does not follow long branches (BRL) or jumps, unless a hint is given (see Labels File Format).
Switch Name | Short | Type | Description |
---|---|---|---|
labelsfile | l | input file name | A file containing annotations. Custom file format. |
autolabelsfile | al | input/output file name | A file containing annotations. It will be regenerated if missing or if autoannotate is specified. |
autoannotate | aa | boolean | A file where automatically generated annotations are stored. Automatically generate labels in free space (not used by symbols from regular labelsfile-files) space and save to autolabelsfile. This will also happen if the file specified by autolabelsfile is missing. default: false |
symbolfmaoutfile | sf | output file name | Generate symbols file in FMA format compatible with bsnes-plus. |
Labels Markup
Functions
A function is composed of a range with a starting address and an end address. These are easy to find from the assembly listing. Comments can be added with ; on lines before the function keyword. The line starting with # specifies a use comment that is special. It is used as a summary that is written whenever someone references this function (say a jump). That way you get a summary at the site of the jump.
# Important function
; This function seems very important
; It does many things
function 801000 802000 MyFunction
Functions are not allowed to overlap.
Data
Currently data ranges acts almost like a function. They are, however, allowed to be inside a function range (but not overlap function start/end).
# Big table of data
; Data seems to be compressed
; TODO: needs more investigation!
data 803000 804000 Table4
Labels
Labels are similar to functions but they do not specify a range. They can be used if there is no logical range to assign to a function. They provide all the features of functions apart from that:
# Important function
; This function seems very important
; It does many things
label 801000 MyFunction
Labels can exist within functions and inside data blocks but they can’t start at the same line as a function/data block starts/ends.
Comments
comment 801000 "Wow this really is an interesting function"
comment 801000 "I should write a book about this line!"
If you want to create multi-line comments the line keyword is also useful:
; Comment 1
; Comment 2
line 807000
Ignored lines
Lines starting with @ are ignored. They are handy when writing notes to yourself that should not be part of the assembly listing, or if you want to put some text at the top of the file for people to read:
@ TODO: Re-organize the labels file
Hints
Hints is very similar to comments but they are structured. By having structured comments snestistics can read and understand them and use the information to make better choices during prediction of instruction that was not part of a trace, of relationships between functions during trace log as well as guiding auto-annotation of labels.
hint 081234 jump_is_jsr
hint 081234 jump_is_jsr_ish
hint 081234 jsr_is_jmp
hint 081234 branch_always
hint 081234 branch_never
hint 081234 predict_jump_merge
Hint | Description |
---|---|
jump_is_jsr | While the instruction is a jump (or a branch) in reality the code being called will return using RTS/RTL. The return address is prepared by the calling function in some non-standard way. |
jump_is_jsr_ish | Same as jump_is_jsr but the return address does not have to be the instruction after the jump instruction. This is quite comment; the calling function wants to call a subroutine using JSR and then it want to call another part of itself. Which parts depends on some state. Instead of remembering that state (on the stack or in a register) it figures out where to go next after the JSR and then it can forget the state that dictate where to go next. |
branch_always | This hint says that while the instruction is a branch (such as BNE) it will always do the jump. That is the NE-“test” will always succeed. This can help predict not trying to predict the fall-through case. |
branch_never | This hint says that while the instruction is a branch (such as BNE) it will always fall through. That is the NE-“test” will always fail. This can help predict not trying to predict the jump case. |
jsr_is_jmp | While this instruction is a JSR the called function will consume the return address and never return to it. The most common case here is that following the JSR there is a data table. By using JSR the pointer to the data table will be pushed to the stack, ready to be consumed by the function we are jumping to. |
annotate_merge | When annotating code, the jump at this address should be used to merge the function we are jumping from with the function we are jumping to into one function. By default BLR and JMPs are not used to merge ranges together but this can change the default. |
Code Prediction (Predict)
Not all code is executed during a typical game session. It is very typical to see that there is a branch that jumps a few instructions forward but then just after it there is some code for a cases that wasn’t triggered by the emulation session. To help fill in these gaps there is the predict feature. It goes on a bit and see what happens if the branch hadn’t been taken etc. It sometimes become confused and need some help using hints, but in general it is very useful to get a cleaner source code; it makes it very easy to spot data inside a function.
Currently this affects the assembly listing and the auto-annotation feature. The latter is because when more code is available more and larger functions can be identified.
NOTE: By default prediction only works within annotated functions.
Switch Name | Short | Type | Description |
---|---|---|---|
predict | p | enumeration | This setting specify where snestistics is allowed to predict code. This is currently only used for assembly listing. never: No prediction functions: Only predict within annotated functions (default) everywhere: Predict as much as possible |
Scripting
Some features of snestistics can only be reached from scripts. Currently snestistics only supports scripts written in the scripting language squirrel. See Trace log and Rewind for how to enable scripts there. See the Scripting Reference to find out what functions the different objects supports.
Switch Name | Short | Type | Description |
---|---|---|---|
scriptfile | s | input file name | A squirrel script. See user guide for scripting reference. |
Trace Log
The trace log shows what functions are being called. A NMI (basically a frame) range can be supplied to limit the log. This feature support adding in a script to enhance the log.
Switch Name | Short | Type | Description |
---|---|---|---|
nmifirst | n0 | integer | First NMI to consider for trace log. default: 0 |
nmilast | n1 | integer | Last NMI to consider for trace log. default: 0 |
tracelogoutfile | tl | output file name | Generate trace log. Nmi range can be controlled using nmifirst and nmilast. Custom printing can be done using scripting. |
When the trace log feature is enabled on the command line and a script is given snestistics expects the script to be a squirrel script with the following functions:
trace_log_init(replay)
Replay replay: a replay object
returns: nothing
This function is used for setup. Breakpoints can be set on the replay objects and global squirrel state can be constructed if the user wants that.
trace_log_parameter_printer(replay, report)
Replay replay: a replay object
ReportWriter report: a report writer object
returns: nothing
This function is called whenever the trace log hits a program counter that it has a breakpoint set for. The trace log system itself will print the name of the function and determine indentation, but this is a chance to do additional printing on some functions that are under investigation.
Rewind
This feature allow generation of a visual report depicting the flow of data through the processor. This can sometimes be very helpful to track where values to a function is coming from. This feature requires scripting in order to run.
NOTE: This feature is currently in re-development since it does not understand DMA (or writing to) $2180. It also needs scripting support to be controllable on the command line.
Switch Name | Short | Type | Description |
---|---|---|---|
rewindoutfile | rw | output file name | Generate rewind report in dot file format. Use graphviz to generate PDF/PNG report. |
Scripting Reference
Currently only two objects exists that the script can interact with. We expect this to increase in the future as more parts of snestistics is exposed to scripting.
Replay
These are the operations that can be performed on an instance of the Replay class.
replay.set_breakpoint(pc)
integer pc: the program counter to set a break point at
returns: nothing
replay.set_breakpoint_range(pc_start, pc_end)
integer pc_start: the first program counter to set a break point at
integer pc_end: the last program counter to set a break point at
returns: nothing
replay.read_byte(address)
integer address: 24-bit address specifying where to read a byte (8-bit)
returns: integer
replay.read_word(address)
integer address: 24-bit address specifying where to read a byte (16-bit)
returns: integer
replay.read_long(address)
integer address: 24-bit address specifying where to read a byte (24-bit)
returns: integer
replay.pc()
returns: current program counter
replay.a()
returns: current value of register a (16-bit)
replay.al()
returns: current low byte of register a (8-bit)
replay.ah()
returns: current high byte of register a (8-bit)
replay.x()
returns: current value of register x (16-bit)
replay.xl()
returns: current low byte of register x (8-bit)
replay.xh()
returns: current high byte of register x (8-bit)
replay.y()
returns: current value of register y (16-bit)
replay.yl()
returns: current low byte of register y (8-bit)
replay.yh()
returns: current high byte of register y (8-bit)
replay.p()
returns: current value of status register (16-bit)
replay.s()
returns: current value of stack register (16-bit)
replay.dp()
returns: current value of direct page register (16-bit)
replay.db()
returns: current value of data bank register (8-bit)
ReportWriter
These are the operations that can be performed on an instance of the ReportWriter class:
report_writer.print(str)
str: string to write in the report
returns: nothing
Will write the string str to the report at the current indentation level. Adds a newline automatically.