Assembly Language Instructions and Addressing Modes
This article assumes an understanding of the purpose of assembly language and the relationship to machine code.
The set of assembly language instructions available (the instruction set) is different for each type of CPU but tend to fall into one of a number of functional groups that are common across many CPUs. These groups include:
- Arithmetic and logical
- Instructions involved in the transfer of data
- Instructions for performing comparisons
- Instructions for branching and conditional instructions
- Input and output
Arithmetic and logical instructions might perform addition, subtraction, multiplication and division but also bitwise operations are common. Bitwise operations include shifting a number to the left or right by one binary place. Shifting a number to the left in binary is equivalent to multiplying by 2 and shifting to the right is equivalent to dividing by 2. Instructions for performing a logical AND are common, that is an operation similar to placing AND logic gate between two registers, where the bits are compared and only if the bits are 1 in the same binary position in both registers is a 1 placed into the same position in the output.
Data transfer instructions might typically move data into, out of or between CPU registers. In order to work on numbers, such as using the arithmetic operations, at least one of the values must usually be in a register and these instructions move the data from or to memory.
Comparison instructions might test if one number is bigger than or equal to another number. Comparison instructions may be merged with branching instructions to perform an action based on the outcome of the comparison.
Instructions for branching and conditionals include instructions that change the current position of the program counter. The program counter is the register that marks the current position in the machine code program being executed. These instructions might change the program counter to point to a different location in memory (a jump) or only perform the jump if a certain condition is met. For example they may only jump if a particular register is set to zero. These kinds of instructions enable "if" statements, loops and subroutines to be implemented.
Input and output instructions are instructions dedicated to controlling hardware. These instructions might fetch a character typed at the keyboard or check the mouse. Many CPUs do not have instructions for this purpose and control hardware in other ways but some do supply them. Instructions concerned with interrupts may also be placed into this category.
Instructions on some CPUs also have a number of modes of operation. That is there are several varieties of same instruction. Most typically the difference between the instructions is the way in which they obtain the operand, that is the value to be worked on. These varieties are known as addressing modes. If we consider an instruction that obtains a number from memory and places it into a register, which we will call the accumulator register, the varieties of instruction available might include:
Direct or immediate: in this mode the instruction finds it's data in memory directly after the opcode. For example this instruction loads the number 5 straight into a register. Here we use the mnemonic LDM for the immediate addressing mode of a register load instruction in a fictional instruction set. The instruction is told the exact number it is going to load into the accumulator register explicitly, immediately following the mnemonic.
LDM 5
Absolute addressing: the instruction is told where in memory it will find it's data. In this example the instruction is told to look in the memory address with the hexadecimal value FF01. There is will find the number it needs to load into the register. We will give this instruction the mnemonic LDD.
LDD FF01
Indexed addressing: as in absolute addressing, the instruction is going to get the value it will be working with from memory but the memory location is the sum of the address provided plus the value of a register. In this example the instruction will find the data item to be loaded into the accumulator at the address FF01 plus whatever number is in the "X" register.
LDX FF01
This addressing mode is useful for dealing with arrays in a high level language. For example it is common to write something like this in a high level language to retrieve values from an array by index:
result = my_array[8]
The language needs to lookup the location of "my_array" and then add 8 to it (assuming it's an array of bytes) to find the data item in the array the programmer is looking for. This might be done by loading 8 into the X register. Then we would call LDX with the memory location at which "my_array" begins. The instruction adds the 8 located in the X register to the memory address provided yielding the location of the item in the array. Then it retrieves the value found there.
Indirect addressing: the instruction contains the memory address of a pointer to the actual memory address. That is, the instruction will look-up the supplied memory address, from there it will retrieve another memory address and from there it will get the value to be worked with. In this example the address FF01 contains the address where the data item is to be found:
LDI FF01
Relative addressing: In relative addressing the address is relative to the program counter or one of the other registers. For example we may say we want the memory location 8 bytes ahead of or behind the location that the program counter currently points to. This kind of addressing mode might commonly be used in combination with a jump or branching instruction.
Next: See an example of real assembly language on a Raspberry Pi