Wednesday, September 11, 2013

Assembly Language

.386                ; Tells MASM to use Intel 80386 instruction set.
.MODEL FLAT ; Flat memory model
option casemap:none ; Treat labels as case-sensitive

.CONST ; Constant data segment

.STACK 100h ; (default is 1-kilobyte stack)

.DATA ; Begin initialized data segment

.CODE ; Begin code segment
_main PROC ; Beginning of code


_main ENDP
END _main ; Marks the end of the module and sets the program entry point label

Assembly and C Code Compared

Some simple high-level language instructions can be expressed by a single assembly instruction:

Assembly Code C Language Code
---------------- ---------------------------------
inc result ++result; // Increment value
mov size, 1024 size = 1024; // Assign value
and var, 128 var &= 128; // Apply AND bitmask
add value, 10 value += 10; // Addition


More Assembly and C Code

Most high-level language instructions need more than one assembly instruction:

Assembly Code C Language Code
---------------- ---------------------------------
mov AX, value size = value; // Assign variable
mov size, AX

mov AX, sum sum += x + y + z; // Arithmetic computation
add AX, x
add AX, y
add AX, z
mov sum, AX

Assembly vs. Machine Language

Assembly Language uses mnemonics, digital numbers, comments, etc.
Machine Language instructions are just a sequences of 1s and 0s.

Readability of assembly language instructions is much better than the machine language instructions:
    Assembly Language   Machine Language (in Hex)
----------------- --------------------------
inc result FF060A00
mov size, 45 C7060C002D00
and var, 128 80260E0080
add value, 10 83060F000A

Controlling Program Flow

Just as in high-level language, you want to control program flow.

The JMP instruction transfers control unconditionally to another instruction.

JMP corresponds to goto statements in high-level languages:
        ; Handle one case
label1: .
jmp done

; Handle second case
label2: .
jmp done

Conditional Jumps

Conditional jump is taken only if the condition is met.

Condition testing is separated from branching.

Flag register is used to convey the condition test result.

For example:
                cmp    ax, bx
je done

General-Purpose Registers

  • The EAX, EDX, ECX, EBX, EBP, EDI, and ESI registers are 32-bit general-purpose registers, used for temporary data storage and memory access.
  • The AX, DX, CX, BX, BP, DI, and SI registers are 16-bit equivalents of the above, they represent the low-order 16 bits of 32-bit registers.
  • The AH, DH, CH, and BH registers represent the high-order 8 bits of the corresponding registers.

Similarly, AL, DL, CL, and BL represent the low-order 8 bits of the registers.

16-bit general-purpose registers









Since the processor accesses registers more quickly than it accesses memory, you can make your programs run faster by keeping the most-frequently used data in registers. 

Typical Uses of General-Purpose Registers

Register Size Typical Uses
EAX 32-bit Accumulator for operands and results
EBX 32-bit Base pointer to data in the data segment
ECX 32-bit Counter for loop operations
EDX 32-bit Data pointer and I/O pointer
EBP 32-bit Frame Pointer - useful for stack frames
ESP 32-bit Stack Pointer - hardcoded into PUSH and POP operations
ESI 32-bit Source Index - required for some array operations
EDI 32-bit Destination Index - required for some array operations
EIP 32-bit Instruction Pointer
EFLAGS 32-bit Result Flags - hardcoded into conditional operations


 x86 Registers

Four 32-bit registers can be used as
  • Four 32-bit registers EAX, EBX, ECX, EDX.
  • Four 16-bit registers AX, BX, CX, DX.
  • Eight 8-bit register AH, AL, BH, BL, CH, CL, DH, DL.

Some registers have special use...
  • ...ECX for count in LOOP and REPeatable instructions
  x86 registers

  x86 Registers, Cont

  • Two index registers ESI (source index) and EDI (destination index) can be used as
    • 16-bit or 32-bit registers
    • Also in string processing instructions
    • In addition, ESI and EDI can be used as general-purpose data registers
  • Two pointer registers ESP (stack pointer) and EBP (base pointer)
    • 16-bit or 32-bit registers
    • Used exclusively to maintain the stack.

Index and pointer x86 registers

  x86 Control Registers

EIP Program counter (Instruction Pointer)

EFLAGS is set of bit flags:
  • Status flags record status information about the result of the last arithmetic/logical instruction.
  • Direction flag stores forward/backward direction for data copying.
  • System flags store
    • IF interrupt-enable mode
    • TF Trap flag used in single-step debugging.

MOV, Data Transfer Instructions

The MOV instruction copies the source operand to the destination operand without affecting the source.

Five types of operand combinations are allowed with MOV:
    Instruction type             Example
-------------------------- ------------------
mov register, register mov DX, CX
mov register, immediate mov BL, 100
mov register, memory mov EBX, [count]
mov memory, register mov [count], ESI
mov memory, immediate mov [count], 23

Note: the above operand combinations are valid for all instructions that require two operands.

Ambiguous MOVes: PTR and OFFSET

For the following data definitions
table1 DW 20 DUP (?)
status DB 7 DUP (0)
mov EBX, table1 ; "instruction operands must be the same size"
mov ESI, status ; "instruction operands must be the same size"
mov [EBX], 100 ; "invalid instruction operands"
mov [ESI], 100 ; "invalid instruction operands"

The above MOV instructions are ambiguous.

Not clear whether the assembler should use byte or word equivalent of 100.

        mov    EBX, OFFSET table1
mov ESI, OFFSET status
mov WORD PTR [EBX], 100
mov BYTE PTR [ESI], 100

INC and DEC Arithmetic Instructions

    inc destination
dec destination

    destination = destination +/- 1

The destination can be 8-bit, 16-bit, or 32-bit operand, in memory or in register.

No immediate operand is allowed.

    inc    BX       ; BX = BX + 1
dec [value] ; value = value - 1

  ADD Arithmetic Instruction

    add  destination, source

    destination = (destination) + (source)

    add    ebx,eax
add [value], 10h


Note that
    inc    eax
is better than
    add    eax, 1

INC takes less space.

Both INC and ADD execute at about the same speed.

SUB Arithmetic Instruction

    sub     destination, source

    destination = (destination) - (source)

    sub     ebx, eax
sub [value], 10h


Note that
    dec     eax
is better than
    sub     eax, 1

DEC takes less space.

Both execute at about the same speed.

CMP instruction

    cmp  destination, source

    (destination) - (source)

The destination and source are not altered.

Useful to test relationship such as < > or = between the two operands.

Used in conjunction with conditional jump instructions for decision making purposes.

        cmp ebx, eax
je done ; jump if equal

 Unconditional Jumps

    jmp  label

  • Execution is transferred to the instruction identified by the label.
  • Infinite loop example:
            mov    eax, 1
    inc eax
    jmp inc_again
    mov ebx, eax ; this will never execute...

Conditional Jumps

    jcondition  label

  • Execution is transferred to the instruction identified by label only if condition is met.
  • Testing for carriage return example:
            ; Assume that AL contains input character.
    cmp al, 0dh ; 0dh = ASCII carriage return
    je CR_received
    inc cl

Conditional Jumps, Cont

Some conditional jump instructions treat operands of the CMP instruction as signed numbers:
    je     jump if equal
jg jump if greater
jl jump if less
jge jump if greater or equal
jle jump if less or equal
jne jump if not equal

Conditional Jumps, Cont

Some conditional jump instructions can also test values of the individual CPU flags:
    jz     jump if zero      (ZF = 1)
jnz jump if not zero (ZF = 0)
jc jump if carry (CF = 1)
jnc jump if not carry (CF = 0)

jz is synonymous for je
jnz is synonymous for jne

 LOOP Instruction

    loop  target

  • Decrements ECX and jumps to target, if  ECX > 0
  • ECX should be loaded with a loop count value before loop begins.

  • Loop 50 times example:
        mov    ecx, 50
    ; loop body:
    loop repeat
  • Equivalent to:
        mov    ecx, 50
    ; loop body:
    dec ecx
    jnz repeat
  • Surprisingly,
        dec   ecx
    jnz repeat
  • executes faster than
        loop  repeat


  Logical Instructions

    and  destination, source
or destination, source
xor destination, source
not destination

  • Perform the standard bitwise logical operations.
  • Result goes to the destination.

TEST is a non-destructive AND instruction:
    test  destination, source

TEST performs logical AND but the result is not stored in destination (similar to CMP instruction.)

Logical Instructions, Cont.

Example of testing the value in AL for odd/even number:
        test  al, 01h  ; test the least significant bit
je even_number
; process odd number
jmp next
; process even number

Shift Instructions

Shift left format:
        shl  destination, count
shl destination, cl

Shift right format:
        shr  destination, count 
shr destination, cl
where count is an immediate value.

  • Performs left/right bit-shift of destination by the value in count or CL register.
  • CL register contents is not altered.

  SHL and SHR Shift Instructions

Bit shifted out goes into the carry flag CF.

Zero bit is shifted in at the other end:

  SHL Instruction

  SHR Instruction

Shift Instructions Examples

Count is an immediate value:
    shl    eax, 5

Specification of count greater than 31 is not allowed.

If greater, only the least significant 5 bits are actually used.

CL version of shift is useful if shift count is known at run time,
  • e.g. when the shift count is a parameter in a procedure call.

Only CL register can be used.

Shift count value should be loaded into CL:
    mov    cl, 5
shl ax, cl

Rotate Instructions

Two types of rotate instructions:
  1. Rotate without carry:
    • ROL (ROtate Left)
    • ROR (ROtate Right)
  2. Rotate with carry:
    • RCL (Rotate through Carry Left)
    • RCR (Rotate through Carry Right)

Rotate instruction operand is similar to shift instructions and supports two versions:
  • Immediate count value
  • Count value is in CL register

ROL and ROR, Rotate Without Carry

  ROL Instruction

  ROR Instruction

RCL and RCR, Rotate With Carry

  RCL Instruction

  RCR Instruction

  EQU directive

EQU directive eliminates hardcoding:

No reassignment is allowed.

Only numeric constants are allowed.

Defining constants has two main advantages:
  1. Improves program readability
  2. Helps in software maintenance.
        mov    ecx, 90 ; HARDCODING is less readable and harder to maintain

Multiple occurrences can be changed from a single place

The convention is to use all UPPER-CASE LETTERS for names of constants.

EQU Directive Syntax

    name   EQU  expression

Assigns the result of expression to name.

The expression is evaluated at assembly time.

More examples:
    NUM_OF_ROWS   EQU   50