Difference between revisions of "Synergistic Processing Unit (SPU)"

From PS3 Developer wiki
Jump to: navigation, search
m (moved SPU Infos to Secure Processing Unit (SPU): Minor cleanup)
m (moved Secure Processing Unit (SPU) to Synergistic Processing Unit (SPU): SPUs aren't "Secure Processing Units"....)
(No difference)

Revision as of 11:13, 12 April 2011

SPU Application Binary Interface Specification

All the informations are taken from this PDF.

Function Calling Sequence

The standard calling sequence requirements apply only to global functions. Local functions that are not reachable from other compilation units may use different conventions; however, using non-standard calling sequences is not recommended.

Register Usage

Register Status Usage
R0 (LR) Dedicated Return Address / Link Register. This register contains the address to which a called function normally returns. It is volatile across function calls and must be saved by a non-leaf function.
R1 (SP) Dedicated Stack pointer information. Word element 0 of the SP register contains the current stack pointer. The stack pointer is always 16-byte aligned, and it must always point to the lowest allocated valid stack frame and grow towards low addresses. The contents of the word at the stack-frame address always point to the previous allocated stack frame. Word element 1 of the SP register contains the number of bytes of Available Stack Space.
R2 Volatile Environment pointer. This register is used as an environment pointer for languages that require one.
R3-R74 Volatile First 72 quadwords of a function’s argument list and its return value.
R75-R79 Volatile Scratch Registers.
R80-R127 Non-volatile Local variable registers. These must be preserved across function calls.

Stack Frame Layout

In addition to using registers, each function call may have a stack frame on the runtime stack. The runtime stack grows downward from high addresses.

     +-----------------------------+ High Address
+--->| Back Chain                  |
|    +-----------------------------+
|    | Register Argument Save Area |
|    +-----------------------------+
|    | General Register Save Area  |
|    | (max. 48 * 16 bytes)        |
|    +-----------------------------+
|    | Local Variable Space        |
|    +-----------------------------+
|    | Parameter List Area         |
|    +-----------------------------+
|    | Link Register Save Area     |
|    +-----------------------------+
+----| Back Chain                  |
     +-----------------------------+ Low Address <---- Stack Pointer (SP/R1)
     <--------- 128 bits ---------->

In the above figure, SP denotes the stack pointer (word element 0 of the general-purpose register R1) of the called function after it has executed the code that establishes its stack frame.

Argument Passing

For the SPU, up to 72 quadwords are passed in general-purpose registers, loaded sequentially into registers R3 through R74. If fewer than 72 argument registers are needed, the unneeded registers are not loaded, and any values that they contain when entering the called function are undefined. When arguments passed to a callee function will not fit into these 72 registers, the caller function must allocate additional space for these arguments in its Parameter List Area.

Program Initialization

When an SPU program is first entered, the contents of register R1 (SP) are initialized to the top of the stack. Generally, the top of the stack is a minimal stack located at the largest quadword address. A system with 256 KB of local storage initializes the stack pointer to 0x3FFD0. This address contains a Back Chain pointer to 0x3FFF0. The Back Chain pointer at 0x3FFF0 contains a NULL (0) pointer. Space is allocated for the entry function to save the Link Register (address 0x3FFE0). The contents of all other registers are unspecified. Thus, if a program requires registers to have specified values, it must explicitly set them.

     +----------------------------+
+--->| Back Chain Pointer 0x0     | 0x3FFF0
|    +----------------------------+
|    | Link Register Save Area    | 0x3FFE0
|    +----------------------------+
+----| Back Chain Pointer 0x3FFF0 | 0x3FFD0
     +----------------------------+ <------- Initial Stack Pointer
     |                            |
     ~                            ~

SPU Assembly Language Specification

All the informations are taken from this PDF.

Notation and Conventions

Notation/Convention Meaning
ch Channel number. Channels are specified as either $ch followed by a channel number (for example, $ch3) or a specific channel mnemonic.
ra, rb, rc Source register. Registers are specified as a dollar symbol ($) followed by a register number from 0 to127. For example, $38 refers to register 38.
rt Target register. Registers are specified as a dollar symbol ($) followed by a register number from 0 to127. For example, $38 refers to register 38.
s3, s6 3-bit or 6-bit signed value, respectively. Encoded as a 7-bit signed immediate in which only a subset of the bits is used.
s7 7-bit sign-extended value.
s10 10-bit sign-extended value.
s11 11-bit sign-extended value.
s14 14-bit sign-extended value.
s16 16-bit sign-extended value.
s18 Relative address computations.
scale7 7-bit scale exponent. Values range from 0 to 127.
spr Special purpose register.
u3, u5, u6 3-bit, 5-bit, or 6-bit unsigned value, respectively. Encoded as a 7-bit unsigned immediate in which only a subset of the bits is used.
u7 Unsigned 7-bit value.
u14 Unsigned 14-bit value.
u16 Unsigned 16-bit value.
u18 Unsigned 18-bit value.

Instruction Set

Instruction/Usage Description
a rt, ra, rb Add word. Each word element of register ra is added to the corresponding word element of register rb, and the results are placed in the corresponding word elements of register rt.
absdb rt, ra, rb Absolute difference of bytes. Each byte element of register ra is subtracted from the corresponding byte element of register rb. The absolute values of the results are placed in the corresponding elements of register rt.
addx rt, ra, rb Add word extended. Each word element of register ra, the corresponding word element of register rb, and the least significant bit of the corresponding word element of register rt are added, and the results are placed in the corresponding word elements of register rt.
ah rt, ra, rb Add halfword. Each halfword element of register ra is added to the corresponding halfword element of register rb, and the results are placed in the corresponding halfword elements of register rt.
ahi rt, ra, s10 Add halfword immediate. The sign-extended immediate value s10 is added to each halfword element of register ra, and the results are placed in the corresponding halfword elements of register rt.
ai rt, ra, s10 Add word immediate. The sign-extended immediate value s10 is added to each word elements of register ra, and the results are placed in the corresponding word elements of register rt.
and rt, ra, rb And. The value of register ra is logically ANDed with register rb, and the result is placed in register rt.
andbi rt, ra, s10 And byte immediate. The 8 least significant bits of s10 are logically ANDed with each byte element of register ra, and the results are placed in the corresponding elements of register rt.
andc rt, ra, rb And with complement. The value of register ra is logically ANDed with the complement of register rb, and the result is placed in register rt.
andhi rt, ra, s10 And halfword immediate. The sign-extended immediate value s10 is logically ANDed with each halfword element of register ra, and the results are placed in the corresponding elements of register rt.
andi rt, ra, s10 And word immediate. The sign-extended immediate value s10 is logically ANDed with each word element of register ra, and the results are placed in the corresponding elements of register rt.
avgb rt, ra, rb Average bytes. The corresponding byte elements of registers ra and rb are averaged ((a+b+1) >> 1), and the results are placed in the corresponding byte elements of register rt.
bg rt, ra, rb Borrow generate word. Each unsigned word element of register ra is compared to the corresponding unsigned word element of rb. If the value of ra is greater than that of rb, a 0 is placed in the corresponding element of rt; otherwise, a 1 is placed there.
Please help to fill out! Please help to fill out!