Bytecode

From Boktai Hacking Wiki
Revision as of 06:26, 15 October 2024 by Raphi (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Script structure

  • Bytecode scripts are stored in the script directory
  • A script must start with a block instruction. The script terminates either when that block ends, or with an explicit return instruction.
  • Blocks may only contain the following instructions as direct descendants: end, expression, control, and call.
  • If the bytecode interpreter expects a value, almost all instructions can be used, except the following: end, control, and call. call/control instructions can be used by wrapping them inside a block.
  • With the exception of pointers, all values the bytecode interpreter deals with are 32-bit words. The instructions u8 0x44, u16 0x44, and i32 0x44 are therefore all equivalent.

Instruction encoding

The bytecode uses a variable-length instruction encoding. If the first byte is <= 0x0f, then the 1st byte is the opcode itself. Otherwise, the top 4 bits of the 1st byte indicate the opcode, and the bottom 4 bits are part of the instruction's parameters.

Opcode 0x00 (end)

Terminates control, call, and block instructions.
Parameters: None.
Example:

74d8da       call 0xdad8
c3               i32 0x2
00           end

Opcode 0x01 (i16)

Immediate signed 16-bit integer.
Parameters: Value in little-endian byte order (2 bytes).
Example:

018002       i16 0x280

Opcode 0x02 (u8)

Immediate unsigned 8-bit integer.
Parameters: Value (1 byte).
Example:

0246         u8 0x46

Opcode 0x03 (u8)

Unused alias of opcode 0x02.

Opcode 0x04 (u8)

Unused alias of opcode 0x02.

Opcode 0x06 (u16)

Immediate unsigned 16-bit integer.
Parameters: Value in little-endian byte order (2 bytes).
Example:

06d40c       u16 0xcd4

Opcode 0x07 (immediate string)

String/byte array directly embedded in the script data.
Parameters:

  1. Size of array in bytes (1 byte; This means the maximum size of an immediate string is 255)
  2. N bytes of data

Example:

070967616d656f76657200    string "gameover\x00"

Opcode 0x08 (u16)

Used alias of 0x06.

Opcode 0x09 (i32)

Immediate signed 32-bit integer.
Parameters: Value in little-endian byte order (4 bytes).
Example:

09c0d40100   i32 0x1d4c0

Opcode 0x0a (i32)

Unused alias of 0x09.

Opcode 0x0d (i32)

Unused alias of 0x09.

Opcode 0x0e (string reference)

References a string in the Script directory.
Parameters: ID of the string in little-endian byte order (2 bytes).
Example:

0e850a       string-ref 0xa85

Opcode 0x10 (pointer)

Pointer to a value in GBA memory. The address is computed as address = base_address + offset.
Parameters:

  1. Data type of referenced value (bottom 4 bits of opcode):
    • 0x1: i16
    • 0x2: u8
    • 0x3: u8 (same as 0x2)
    • 0x4: bool (Warning: very slow in Boktai 1, because the bytecode interpreter uses two BIOS division calls to compute the address.)
    • 0x6: i16 (same as 0x1)
    • 0x8: u16 (Note: for indexed pointers, the game assumes that the element size is 4 bytes instead of 2 bytes. This pointer type is unused in Boktai 1 scripts.)
    • 0x9: i32
  2. Base address (1 byte):
    • Top 4 bits: See table below.
    • Bottom 4 bits: if type == bool, then the bottom bits indicate the bit number to access.
  3. Offset in big-endian byte order (2 bytes)

Base address:

0x00 0x10 0x80
Contents World struct Scratch struct Stat struct
Boktai 1 0x0203e800 0x0203f000 0x0203d800
Boktai 2 *(void**) 0x03004698 *(void**) 0x03004690 *(void**) 0x030046a0
Boktai 3 *(void**) 0x0203db08 *(void**) 0x0203e308 *(void**) 0x0203c508
Boktai 2/3 (valid only until soft reset) 0x0203da00 0x0203e200 0x0203c400

See each game's RAM map page for the structure of the world/scratch/stat structs.

Example:

1110018c     ptr i16, 0x203f18c
1412010c     ptr bool, 0x203f10c, bit=0x2

Opcode 0x20 (indexed pointer)

Similar to pointer, but with an additional dynamic offset. The address is computed as address = base_address + offset + index*sizeof(data_type).
Note: For type == bool: bitnum = base_bit + index; address = base_address + offset + bitnum/8; bit = bitnum%8.
Parameters:

  1. Data type of referenced value (see pointer opcode for details)
  2. Base address (see pointer opcode for details)
  3. Offset (see pointer opcode for details)
  4. Array size (not used by the game)
  5. Index (instruction)

Example:

22100129     indexed-ptr u8, 0x203f129
c5           i32 0x4  ; array size
32           expr     ; index
42               param 0x2
a0           end-expr

Opcode 0x30 (expression)

Marks the start of an expression. Expressions are encoded in reverse polish notation (e.g. "4 * 8 + 1" would be encoded as "4 8 * 1 +"). Most instructions inside of an expression "push" values onto an operand stack, while operators "pop" arguments and may "push" a result back.
Parameters: Length of expression in bytes (container length).
The following general rules should be followed; it is unknown what happens if you violate them:

  • Expressions must be terminated with an end expression operator.
  • At the end of an expression, the operand stack should contain either 0 values (e.g. if the expression is a statement like a = b+c;), or exactly 1 value (e.g. if the expression is the condition of an if control structure).
  • call instructions should be wrapped inside of a block.
  • Unknown what happens if the operand stack under/overflows.

Examples

Simple "statement"-type expression:

38           expr          ; v3 = (v2-v1) * -1
93               var 0x3
92               var 0x2
91               var 0x1
a5               sub 
c0               i32 -0x1
a6               mul 
b6               store 
a0           end-expr

Unary operators like "not" take a dummy first operand (the i32 0x0):

37           expr          ; !BIT(0x0203e954, 3)
c1               i32 0x0
14030154         ptr bool, 0x203e954, bit=0x3
a2               not 
a0           end-expr

Calling a function within an expression requires wrapping the call with a block:

39           expr          ; FUN_7644() == 0
85               block 
734476               call 0x7644
00                   end 
00               end 
c1               i32 0x0
ab               eq 
a0           end-expr

Opcode 0x40 (parameter)

Accesses a parameter of the current script.

  • For parameter numbers < 0xf: Parameter number is the bottom 4 bits of the opcode.
  • For parameter numbers >= 0xf: Bottom 4 bits of opcode are set to 0xf, parameter number is 0xf + following byte.
  • Parameter 0 is maybe the return value of the last control/call?
  • Maximum number of parameters is unknown. The highest parameter Boktai 1 uses is param 0x10.

Examples:

4d           param 0xd
4f01         param 0x10

Opcode 0x50 (keyword)

Keywords are used inside of control structures to define their behavior. Keywords contain child instructions, but are not terminated with an explicit "end" instruction.
Parameters:

  1. Length of keyword (container length)
  2. Keyword type (1 byte; usually a printable ASCII character hinting at the meaning, e.g. 0x63 = "c" = "case")

Example (see the control structure reference for details and more examples):

5274         keyword 0x74

Opcode 0x60 (control)

Marks the start of a control structure. Control structures must be terminated with an end instruction.
Parameters:

  1. Length of control structure in bytes (container length)
  2. Control structure type in little-endian byte order (2 bytes)
  3. Byte count until the next keyword instruction or the end instruction (whichever comes first)
    • Value <= 0x7f: 1 byte
    • Value > 0x7f: 2 bytes; top bit of 1st byte is set to 1; value = ((first & 0x7f)<<8) | second

Examples (see the control structure reference for details and more examples):

6d12ff220e       control 0x22ff, next_keyword=0xe
6e6d05860d82c5   control 0xd86, next_keyword=0x2c5

Opcode 0x70 (call)

Calls another script. Supports passing arguments, and the return value can be used in expressions. Calls must be terminated with an end instruction.
Parameters:

  1. Length of call in bytes (container length)
  2. Script ID in little-endian byte order (2 bytes)
  3. 0-N arguments (instructions)

Example:

7d120aa5     call 0xa50a    ; FUN_a50a(7, *0x203f11c - 0x100, *0x203f11e)
c8               i32 0x7
39               expr 
1110011c             ptr i16, 0x203f11c
010001               i16 0x100
a5                   sub 
a0               end-expr 
1110011e         ptr i16, 0x203f11e
00           end

Opcode 0x80 (block)

Starts a block. Every script must be wrapped inside of a block. Blocks are also used to delimit the branches of if and switch control structures. Blocks must be terminated with an end instruction.
Parameters: Length of block in bytes (container length)
Example:

5963         case      ; case = keyword 'c'
c6           i32 0x5
86           block
34               expr
95                   var 0x5
c1                   i32 0x0
b6                   store 
a0               end-expr 
00           end

Opcode 0x90 (variable)

Accesses a variable of the current script. Maximum number of variables is unknown. The highest variable number Boktai 1 uses is var 0xb.
Parameters: Variable number (bottom 4 bits of opcode).
Examples:

97           var 0x7

Opcode 0xa0-0xbf (operator)

Operators perform computations and effects inside of an expression. All operators (except for "end expression") take two operands; unary operators take a dummy first operand which is popped from the stack but otherwise ignored. For examples, see the expression opcode. Opcode 0xb7 is defined but unused in Boktai 1 scripts. Opcodes 0xb8-0xbf are undefined and should not be used.

Opcode Name Stack transition
0xa0 end expression
0xa1 negate ..., dummy, value → ..., result
0xa2 logical not ..., dummy, value → ..., result
0xa3 bitwise not ..., dummy, value → ..., result
0xa4 add ..., value1, value2 → ..., result
0xa5 subtract ..., minuend, subtrahend → ..., result
0xa6 multiply ..., value1, value2 → ..., result
0xa7 divide ..., dividend, divisor → ..., result
0xa8 modulo ..., dividend, divisor → ..., result
0xa9 shift left ..., value, shift → ..., result
0xaa logical shift right ..., value, shift → ..., result
0xab equal ..., value1, value2 → ..., result
0xac not equal ..., value1, value2 → ..., result
0xad less than ..., value1, value2 → ..., result
0xae less or equal ..., value1, value2 → ..., result
0xaf greater than ..., value1, value2 → ..., result
0xb0 greater or equal ..., value1, value2 → ..., result
0xb1 bitwise or ..., value1, value2 → ..., result
0xb2 bitwise and ..., value1, value2 → ..., result
0xb3 bitwise xor ..., value1, value2 → ..., result
0xb4 logical or ..., value1, value2 → ..., result
0xb5 logical and ..., value1, value2 → ..., result
0xb6 store ..., destination, source → ...
0xb7 unused ..., dummy, value → ..., value
0xb8+ undefined ..., dummy, dummy → ..., zero

Opcode 0xc0-0xff (i32)

Immediate signed 32-bit integer, "compressed" encoding for integers in the range [-1; 62].
Formula: value = (opcode & 0x3f) - 1
Parameters: None.
Example:

d7      i32 0x16

Undefined opcodes

The following opcodes are undefined and should not be used: 0x05, 0x0b, 0x0c, 0x0f.

Container lengths

Instructions that start a "container"-like structure (expression, keyword, control, call, and block) include the length of the container in bytes as a parameter. This length is calculated over the parameters of the call/control instruction, all child instructions, and the terminating end or end expression instruction. It does not include the opcode byte of the current instruction or the length bytes themselves. As an example, the following call instruction has a length of 0x10 bytes:

7d10dd56     call 0x56dd       //  2 bytes of call parameters (script id 0x56dd)
06bccc           u16 0xccbc    //  \
06e890           u16 0x90e8    //  |
c1               i32 0x0       //  |  13 bytes of child instructions
0842f1           u16 0xf142    //  |
08eb74           u16 0x74eb    //  /
00           end               //  1 byte of end instruction

The length is encoded like so:

  • Length <= 0xc bytes: Length is stored in the bottom 4 bits of the opcode byte.
  • Length <= 0xff bytes: Bottom 4 bits of opcode byte are set to 0xd, after the opcode byte is 1 byte containing the length.
  • Length <= 0xffff bytes: Bottom 4 bits of opcode byte are set to 0xe, after the opcode byte are 2 bytes containing the length in little-endian byte order.
  • Length <= 0xffffff bytes: Bottom 4 bits of opcode byte are set to 0xf, after the opcode byte are 3 bytes containing the length in little-endian byte order.

Control structures

This section documents the control structures supported by the control and keyword instructions. Each control structure contains a description of its grammar in EBNF.

Control 0x0d86 (if/else if/else)

Conditional execution. Supports "else if" and "else" blocks, both optional.
Grammar:

if = control 0x0d86, value, block, { else if }, [ else ], end ;
else if = keyword 0x69, value, block ;
else = keyword 0x65, block ;

Example:

6d28860d0c   if 
34               expr 
42                   param 0x2
c1                   i32 0x0
ab                   eq 
a0               end-expr 
86               block 
745d9f               call 0x9f5d
c3                       i32 0x2
00                   end 
00               end 
5d0d69           else-if 
34               expr 
42                   param 0x2
c5                   i32 0x4
ab                   eq 
a0               end-expr 
86               block 
745d9f               call 0x9f5d
c2                       i32 0x1
00                   end 
00               end 
5865             else 
86               block 
745d9f               call 0x9f5d
c1                       i32 0x0
00                   end 
00               end 
00           end

Control 0x121f (call indirect)

Calls a script using a dynamic (= not hardcoded) script id

call-indirect = control 0x121f, script_id, { args }, end< ;

Example:

691f1205         control 0x121f
1900021c             ptr i32, 0x203ea1c  ; script id
c1                   i32 0x0             ; param 1
00               end

Control 0x22ff (TODO)

Unknown.
Grammar:

unknown = control 0x22ff, { value }, end ;

Example:

6d0dff2209       control 0x22ff
0675d8               u16 0xd875
060000               u16 0x0
06f773               u16 0x73f7
00               end

Control 0x4a6f (switch/case/default)

Conditional execution. Supports a "default" case if no explicit case matches. There is no explicit "break" statement like in other programming languages; after a case matches and its code was executed, no further cases will be interpreted.
Grammar:

switch = control 0x4a6f, value, { case }, [ default ], end ; case = keyword 0x63, value, block ; default = keyword 0x64, block ;

Example:

6d2a6f4a06   switch 
35               expr 
198002dc             ptr i32, 0x203dadc
a0               end-expr 
5a63             case 
c5               i32 0x4
87               block 
653acd01             return 
c7                       i32 0x6
00                   end 
00               end 
5a63             case 
c7               i32 0x6
87               block 
653acd01             return 
c7                       i32 0x6
00                   end 
00               end 
5964             default 
87               block 
653acd01             return 
c1                       i32 0x0
00                   end 
00               end 
00           end

Control 0x9906 (engine call)

Similar to control 0xb745 but with a different calling convention. The 1st parameter is the ID of the engine function to call. The 2nd parameter will be passed to the engine function in r0. Both control 0x9906 and 0xb745 use the same dispatch table.

See each game's engine call page for a list of engine function IDs.

Control 0xb745 (engine call)

Generic "call engine" or "call native code" instruction. The 1st parameter is the ID of the engine function to call. The called function is then responsible for interpreting the remaining parameters and keywords.

See each game's engine call page for a list of engine function IDs.

Control 0xb96e (TODO)

Unknown - Possibly printf() to a debugger? Only used in Boktai 2 and 3; might exist (but unreferenced) in Boktai 1.

Example:

control 0xb96e
  string "quality\x00"
  param 0x5
end

Control 0xc8bb (load map)

Switches to another map.
Grammar:

load-map = control 0xc8bb, init_script_id, [ keyword 0x6e ], end ;

If keyword 0x6e (0x6e == 'n') is present, then the backup of the stat & world structs is suppressed. Usually, when loading a map, the stat & world structs are copied to another space in memory, and this copy is reloaded when using the fool's card, on death, or when saving the game. Use keyword 0x6e to suppress creation of this copy. This is used for example in boss rooms, where the player should respawn outside of the boss room, and not inside.

Example:

control 0xc8bb
  u16 0xb980
  keyword 'n'
end

Control 0xcd3a (return)

Returns from the current script, optionally with a return value. The return value can be almost anything, including an expression. Usage of a return instruction is optional; the script will implicitly return when the top-level block ends. If no return value is specified, 0 is implicitly returned.
Grammar:

return = control 0xcd3a, [ value ], end ;

Example:

653acd01     return
96               var 0x6
00           end

Control 0xd4cb (set zone callback)

Sets a callback when something touches a zone. Used for many things, for example to create loading zones when Django touches an exit.

Grammar:

set-zone = control 0xd4cb, zone_id, unknown_1, kw_m, [ kw_w ], [ kw_s ], [ kw_b ], callback, end ;

kw_m = keyword 0x63, unknown_2 ;
kw_w = keyword 0x77, { value } ; // Likely paramaters passed to the callback?
kw_s = keyword 0x73, { value } ; // unknown
kw_b = keyword 0x62, { value } ; // unknown

callback = keyword 0x65, block ;

zone_id refers to the zone id defined in the map file. One of unknown_1 or unknown_2 should be an "actor ID", to set which type of actors can trigger the callback. The callback block will always receive some parameters (unknown which).

Example:

control 0xd4cb
  u16 0xa67b
  u16 0xf5eb
  keyword 'm'
  u16 0xdd2
  keyword 'e'
  block
    call 0xc108
    end
  end
end

Control 0xe43c (TODO)

Unknown.

Example:

control 0xe43c
 keyword 's'
end

Unused control codes

These have handlers in the bytecode interpreter, but are not used in any script:

Boktai 1
TODO
Boktai 2
0x0bb3, 0x64c0, 0xc091
Boktai 3
TODO

Credits

  • Prof9 for documenting most opcodes in SolDec
  • Anonymous for reverse engineering the script index addresses in each game

Tools

  • Bokasm: Bytecode assembler and disassembler
  • SolDec: Bytecode decompiler