Mips Jump Tutorial [Archive]

SonniE

05-09-2008, 10:53 PM

This Was Not Made By Me

Conditional and Unconditional Jumps

Introduction

So far, we've only considered instructions that do computations, or data movement (i.e., load and store). You can do straight-line code with these instructions. However, high level languages must be able to handle conditional statements and loops.
ISAs do not have support for block programming, nor support for traditional loops. Instead, they have support for conditional and unconditional jumps, which are essentially goto statements.
What does it mean to jump?

Perhaps you've never given much thought to it, but have you thought about what it means for the control flow of a program to "jump". Do you recall what your teacher said when you learned how an if-else statement worked? Perhaps the teacher pointed out that if a condition was true, then the control flow went into the if-body, and if the condition was false, the control flow went into the else-body. This was probably demonstrated by pointing at some code on a board or a power point slide, or perhaps by using a debugger.
However, those jumps have to be implemented, and "jumping" is too abstract a concept.
Here's what happens. At any point in time, you are executing some instruction. This instruction appears in memory at some address. The hidden register PC (the program counter) stores the address of the instruction that's currently being executed.
When a conditional jump occurs, a condition is checked. If the condition is true, then the jump occurs. A jump means updating the PC with the instruction to execute, which in turn causes that instruction to be fetched and run. If the condition is false, then the instruction at PC + 4 is executed. PC + 4 is the address of the next instruction in memory. We add 4 instead of 1 because each address in memory stores one byte and each MIPS instruction requires four bytes of memory.
An unconditional jump always occurs. There is no conditions to check.
A conditional jump is called a branch in MIPS.
Jump instructions

We'll consider several jump instructions, and also talk about slt, which is used to implement certain branches that don't exist as instructions MIPS. There are the list of instructions we'll look at.
beq Branches if the quantities of two registers are equal.
bne Branches if the quantities of two registers are NOT equal.
bgtz Branches if a quantity in a register is greater than zero (quantity is 32 bit, 2C).
bgez Branches if a quantity in a register is greater than or equal to zero (quantity is 32 bit, 2C).
bltz Branches if a quantity in a register is less than zero (quantity is 32 bit, 2C).
blez Branches if a quantity in a register is less than or equal to zero (quantity is 32 bit, 2C).
j Jump to an address
jr Jump to an address stored in a register
jal Jump to an address, and store the return address in a register.
jalr Jump to an address stored in a register, and store the return address in another register.Here's how they would be written beq $rs, $rt, offset
bne $rs, $rt, offset
bgez $rs, offset
bgtz $rs, offset
blez $rs, offset
bltz $rs, offset
j offset
jal offset
jr $rs
jalr $rs

The offset is a 16-bit 2C value, except for j and jal, where they are 26 bits UB. j and jal are J-type instructions, while all the branch instructions are I-type. Notice that the instruction format (e.g., R-type, I-type) are formats. Just because an instruction is used for jumping (e.g., beq) does not mean it's a J-type instruction. It's the format that matters.
Semantics of Branch instructions

The branch instructions (i.e., beq, bne, bgtz, bgez, bltz, blez) all have 16 bit immediate values. How are these values used to compute the address to jump (should the condition be true)?
Addr = PC + (IR15)14 IR15-0 00
if ( condition )
PC <- Addr
else
PC <- PC + 4
You don't want to use the 16 bit immediate value as an address to jump to because that limits where you can jump. MIPS has 32 bits for an address. If you only use 16 bits, you are using a very small fraction of the possible addresses.
To think about a reasonable solution, ask yourself when a branch occurs in a program? They occur in loops and conditional statements. Most loops and conditional statements occur within a few statements in the code. Even if you expect that each C statement may translate into 20 or more MIPS instructions, jumps are expected to jump within several thousand instructions of the jump instruction.
Therefore, it makes sense to jump relative to the current instruction. This kind of jumping is called PC relative addressing because it adds an offset to the program counter.
If we do this naively, we can jump anywhere from PC - 215 to PC + 215 - 1, which is the range of values for a 16 bit 2C.
However, that is naive. Instructions are 32 bits long. Thus, they are 4-byte quantities. In MIPS, 4-byte quantities must be stored at word-aligned addresses, which means those addresses must be divisible by 4. This means that we will never jump to an address that is not divisible by 4. Thus, three-fourths of all immediate values would be useless (only one-fourth of the immediate values end in 00, the other three-fourths end in 01, 10, or 11).
The better idea is to realize that we want to jump to an address divisible by 4, and not store the 00 in the immediate value. This is a similar idea to the "hidden 1" used in floating point. Just think of this as the "hidden 00".
In effect, the immediate value tells you how many instructions to jump forward or backwards.
To compute the address of the branch instruction, take the 16 bit immediate value, and sign-extend it. Then shift it left by 2 bits (which fills in 0's for the low 2 bits).
The result allows you to jump four times as far. You can jump anywhere from PC - 217 to PC + 217 - 4. Notice we subtract 4, because we multiplied the limits of the 16-bit 2C value by 4 (the limits of a 16-bit 2C values are -215 up to 215 - 1).
Semantics of Jump instructions

Both j and jal are J-type instructions. J-type instructions use 6 bits for the opcode, and 26 bits for the immediate value (called the target). This is the semantics of the j instruction. PC <- PC31-28 IR25-0 00
The new address is computed by taking the upper 4 bits of the PC, concatenated to the 26 bit immediate value, and the lower two bits are 00, so the address created remains word-aligned.
This is called pseudo-direct addressing. Direct addressing would specify are 32 bits. It's called pseudo-direct because some bits of the PC are used to compute the address.
j allows you to access 1/16 of all possible word-aligned addresses.
jal and subroutines

The semantics of jal are similar, except the return address is stored. R[31] <- PC + 4
PC <- PC31-28::IR25-0::00
Notice register 31 is assigned PC + 4. What is PC + 4? If PC is the current instruction, then PC + 4 is the next instruction. jal is used for subroutine calls. In a subroutine call, you jump to a subroutine (basically, it's a function call). When you are done, the subroutine jumps back to the instruction just after the subroutine call. In MIPS, jal are used for subroutine calls.
How do you get back after the subroutine call? You need to store the return address. This return address is PC + 4, and is stored in $r31.
When the subroutine call is finished, you run the instruction jr $r31, which causes the PC to be assigned the address in $r31. That causes the control flow to return to the instruction just after the return call.
CISC processors often have very few registers. Instead of storing the return address in a register, the return address was often stored on the stack, in memory. Storing it in a register has the advantage that it's quick. Memory is much slower by comparison.
Of course, you may wonder what happens if a jal call is made while in a subroutine. Won't the return address be overwritten with a new return address? That's true. In this case, you must place the return address onto the stack.
Using the register helps delay the use of the stack, somewhat. If you have many leaf procedures (functions/subroutines that do not call any other subroutines), then saving the return address in a register offers speed improvements.
It's Really PC + 8

It turns out that the return address is PC + 8, not PC + 4. This is most likely due to the fact that jump instructions have to be followed by a branch delay slot instruction. Such instructions are often placed after jumps and branches for the purposes of avoiding stalling in pipelines. Since pipelining is an advanced topic, we'll assume the return address is PC + 4.. Jumping to Arbitrary Word-Aligned Addresses

Even though j and jal are both useful, they still don't allow you to access all possible word-aligned addresses. You need instructions that let you do this. So there is jr and jalr. Instead of specifying the address to jump to in the instruction, you refer to a register, which specifies the entire 32 bit instruction. Since registers have to be specified, jr and jalr are not J-type instructions. They are R-type. This is unusual because they could be I-type. However, by making them R-type, you can use more of the function bits.
The semantics of jr $rs is: PC <- R[s]

The semantics of jalr $rs is: R[31] <- PC + 4
PC <- R[s]
Branch Instructions and Labels

Consider the following set of instructions: beq $r1, $r2, L1 # (0) If R[1] == R[2] goto L1
addi $r1, $r1, 1 # (1) R[1]++
addi $r2, $r2, 1 # (2) R[2]++
L1: add $r1, $r1, $r2 # (3) R[1] = R[1] + R[2] L1 is a label. Labels refer to addresses. These addresses are computed by the assembler. An assembler is a program that converts an assembly language code to machine code. Labels make it easier for the ISA programmer to write programs. That way, they don't have to compute the offsets.
The assembler computes the offset in the following way. Instead of using the current instruction, it computes target instruction - (branch instruction + 1).
For example, let branch instruction be instruction 0 (it's in parentheses in the comments above). The target instruction is where L1 appears. It's at instruction 3. So, subtract 3 - 1 = 2. The offset is +2.
The same rule applies even if you branch backwards. L1: add $r1, $r1, $r2 # (0) R[1] = R[1] + R[2]
addi $r1, $r1, 1 # (1) R[1]++
addi $r2, $r2, 1 # (2) R[2]++
beq $r1, $r2, L1 # (3) If R[1] == R[2] goto L1 In this case, the branch instruction is instruction 3. The target instruction (where L1 is) is instruction 0. So (0 - (3 + 1)) = -4. The offset is -4. This would be converted to 16 bit 2C, and stored in the immediate part of the instruction.
You might wonder why compute branch instruction + 1. Why not just the branch instruction? Once the instruction is fetched, PC is often incremented to PC + 4, the next instruction in memory, in anticipation of the next instruction to be fetched. So, it's easier to do the computation with PC + 4 (which is the same as branch instruction + 1).
You may also wonder how I numbered the instructions. How did I decide a certain instruction was instruction 0? It turns out it doesn't matter. Pick any number you want: L1: add $r1, $r1, $r2 # (21) R[1] = R[1] + R[2]
addi $r1, $r1, 1 # (22) R[1]++
addi $r2, $r2, 1 # (23) R[2]++
beq $r1, $r2, L1 # (24) If R[1] == R[2] goto L1 Instead of making the first instruction, instruction 0, I made it instruction 21. If you do the computations, you get (21 - (24 + 1)) = -4. Recall these computations compute differences, and so as long as the instructions are numbered in ascending order, you get the same result, regardless of what instruction number you start with.
This kind of address computation is called PC-relative since it is computed based on the value of PC, the program counter.
Jump Instructions and Labels

Jump instructions also jump to labels (at least, j and jal). The assembler also figures out the appropriate addresses. However, it does so using pseudo-direct addressing. Pseudo-direct addressing has one disadvantage compared to PC-relative addressing. It's harder to relocate the code. Relocating the code means to place code at a different address in memory. Typically, a loader program would have to rewrite instructions that use direct addressing because the instruction it jumps to might not be in a particular location in memory (since the program has been relocated).
This is similar to informing various institutions that you have moved. You tell them the new address so they don't mail you stuff to your old address. Similarly, if a program is loaded at a different address than the assembler expected, then instructions like j and jal, which are pseudo-direct may need to be updated. PC-relative instructions like beq do not need this, since relative addressing works at any location in memory.
These days, with virtual memory, it's not so necessary to relocate code. Virtual memory will be covered at some future set of notes.
Machine Code Representation

Branch instructions are I-type.
http://i286.photobucket.com/albums/ll97/codinghs/1stchart.jpg

The dashes are 5-bit encoding of the register number in UB. For example, $r7 is encoded as 00111. The offset is represented in 16-bit 2C.
If you look at bgtz, bgez, bltz, and blez, they're all unusual. For example, bgtz and blez are I-type instructions, where B20-16, which are the bits normally used by $rt, are all 0's (otherwise the instruction is invalid).
bltz and bgez use a special 000 001 opcode for register immediate. For this unsual use of I-type instructions, bits B20-16, which are the bits normally used by $rt, are used like the function bits in R-type instructions.
jr and jalr are R-type instructions.

http://i286.photobucket.com/albums/ll97/codinghs/2ndchart.jpg

You will notice that bits B15-11 are set to all 1's, which refers to register 31. As it turns out, jalr can take an optional second argument, where you specify the return address register. When it's not specified, the assembler fills in 11111 into those bits. This option is not available for jr (since no return addresses are used).
Finally, both j and jal are J-type instructions.

http://i286.photobucket.com/albums/ll97/codinghs/3rdchart.jpg

The dashes are the 26-bit target, which uses pseudo-direct addressing.

WhoIsYou

07-04-2008, 07:21 AM

Don't know what I was reading, but I just read it all =]

MAD Industries

07-28-2008, 01:20 AM

lol, that actually made sense

without a doubt the BEST mips tutorial i have ever seen

WooZie

09-18-2008, 10:50 AM

need a little help with nite's decoder. ive got a 'j' instruction and next to that is an address so how do i jump to that address? :help:

Scruffy120

09-18-2008, 02:02 PM

need a little help with nite's decoder. ive got a 'j' instruction and next to that is an address so how do i jump to that address? :help:

i THINK, but im not 100% sure but j means it is a offset, so the value is the address?

WooZie

09-18-2008, 02:41 PM

i THINK, but im not 100% sure but j means it is a offset, so the value is the address?

i need something more definitive... what im looking at in the decoder is written text. when i change the text, someother instruction is changing it back as is was before and i think the 'j' is pointing that instrution. still, how do i access it?

WhoIsYou

09-18-2008, 04:25 PM

Noo, what you wanna do is look at the right side of the decoder... It looks something like

---Address-----------Hex--------------Opcode------------Args
0xADDRESS------ 0xVALUE -------------j--------- $ADDRESS

The red will be numbers, that's what I believe you're asking for.. just follow that.

WooZie

09-18-2008, 04:56 PM

the address under 'Args' yes thats what im looking at. what to do to access that address?

WhoIsYou

09-18-2008, 05:25 PM

Well that's an address... So just go to it in nitePR..

Or do you want to change it or something? If you change it, just modify the value... 1=4

xxsnipexx

09-18-2008, 08:04 PM

Long story short: J stands for Jump. Subtract 880000 to get what you really need from the "Jump" address.

Scruffy120

09-19-2008, 02:13 PM

WIY, thats what i was tryin to say lol, the little diagram made it EZ xD

WhoIsYou

09-19-2008, 02:19 PM

Diagrams and color always help :]

Scruffy120

09-19-2008, 04:31 PM

Diagrams and color always help :]

FTW lol

SonniE

09-23-2008, 10:49 PM

Pretty Much To follow a jump it will just have j $address + 08800000 then all you have todo is turn on real addressing in nitepr and enter that exact address or -08800000 and just go right there. Questions?

WooZie

09-25-2008, 10:37 AM

yes, ive got this address: 03BC0000
subtracting 08800000 from 03BC0000 doesnt give me a valid address result in nite's browser. can you explain this?

Pain_ELementaL

09-25-2008, 11:04 AM

yes, ive got this address: 03BC0000
subtracting 08800000 from 03BC0000 doesnt give me a valid address result in nite's browser. can you explain this?

Can't you see by looking at that what your problem is? haha.

You're trying to subtract 08800000 from a smaller number.

Where the f0ck did you get an address like that anyways? I don't think games even use that area.

Scruffy120

09-26-2008, 04:08 PM

cant u just switch on real addressing in browser copy it over there then real addressing in decoder re-copy it over to the decoder then turn it all off? and you have the address?

WhoIsYou

09-26-2008, 04:29 PM

cant u just switch on real addressing in browser copy it over there then real addressing in decoder re-copy it over to the decoder then turn it all off? and you have the address?

Why would you copy the j? He wants to go to the address it jumps to, so yes for that you can use real in the decoder..

And chances are he's already looking at in the the decoder, not the browser :P