arm ho

background image

1. Required Knowledge to Write in Assembly

1. Application Binary Interface

→ (ABI): Function/OS interop-

eration

(a) Argument passing

(b) Stack handling

(c) Register conventions

2. Instruction Set Architecture

→ (ISA): ISA actually hex inst

formats,

but most assem-

blers use suggested mneu-
monics

→ These are the instructions

that you must build programs
out of

3. Registers/flags
4. Assembler used (gas):

(a) Assembler

directives

(pre-

fixed by .)

(b) Operand order (dest, src1,

src2)

(c) Const identifier (#)

5. For ARM, what MODE you are in

• Has thumb mode, where inst

are 16 bytes (not covered)

• Has mixed mode, where inst

16 or 32 bytes (not cov)

• Has 32 byte ARM mode (ev-

erything we do)

2. Further Resources

• ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition

– http://www.ecs.syr.edu/faculty/yin/teaching/CIS700-sp11/arm_architecture_reference_manu

• ATLAS assembly page (from [links] on class homepage):

– http://math-atlas.sourceforge.net/devel/assembly/

• ATLAS architecture page (from [links] on class homepage):

– http://math-atlas.sourceforge.net/devel/arch/

3. Linux/ARM Calling Sequence and Stack Frame

• Stack grows downward in mem-

ory

• Caller puts callees’ args in its

frame

• Frame 8-byte (64 bit) aligned

(can be 4-byte aligned for leaf
node)

• If callee needs no scratch space,

can leave SP unmodified

• Otherwise, subtract frame size

from SP, keeping 4-byte aligned

Caller’s frame
last overflow arg

...

SP

1st overflow arg

Stack frame passed to callee

• Float/dlb passed in iregs, then

overflow to stack

• Doubles never partially in iregs,

kept 8-byte aligned on stack

4. ARM has 16 (14) Integer Registers

CALLEE

REGISTER

USAGE

SAVE

r0-r1

para0/1, return value

NO

r2-r3

para2/3

NO

r4-r11

General

YES

r12

(IP) scratch reg

NO

r13

(SP) stack ptr

YES

r14

(LR) Link register (ret @)

NO

r15

(PC) program counter

NO

CPSR

status register

NO

• IP used by linker, but not within routine
• Jump back to LR at end of func

background image

5. ARM Floating Point Registers

• FPU is optional, diff versions have different # of regs

– VFP-v2 has 32 floats (s0-s31) and 16 doubles (d0-d15)
– VFP-v3 has 32 of each (s0-s31 / d0-d31)
– SIMD uses q0-q15

• s0-s15 (d0-d7, q0-q3) are caller-saved (scratch)
• s16-s31 (d8-d15, q4-q7) are callee-saved
• d16-d31 (q8-q15) are caller-saved (scratch)
• PFSCR : bits 28-31 are conditions bits, 8-12 are exception bits,

22-23 are rounding mode bits, 24 controls flush to zero, 16-18 are
length bits, 20-21 are stride bits

– All except condition bits are callee saved.

• Float/dlb passed in iregs, then overflow to stack
• Doubles never partially in iregs, kept 8-byte aligned on stack

6. GNU/Linux/ARM Integer Overview

• Three operand assembler: op<pred>[s] <dest> <src1> <shft>

– pred: EQ, NE, GE, LT, GT, LE, LS, AL, CS, CC, MI, PL, VS, VC, HI
– s suffix means update cond codes

• dest,src must be registers (shft also reg for fops)

All iops take shft, which can be:

• An 8-bit constant rotated by

2*immediate

• A register with or w/o rotation

(any # of bits)

• ADD R8, R5, R4, LSL #2

– R8 = R5 + 4*R4

• ADD R8, R5, R4, LSR #3

– R8 = R5 + R4/8

• add r0, r0, r1, LSL r2

– R0 += R1 << r2

• add r0, r2, r3, LSL r4

– R0 = R2 + (R3 << R4)

Shift meanings

• LSL : Logical Shift Left, 0s filled

in vacated bits

• LSR : Logical Shifr Right, 0s

filled in vacated bits

• ASR : shift to right, fill vacated

pos wt unchanged sign bit

• ROR : ROtate Right,

bits

shifted off one end into other

• RRX : 33-bit rotate (we’re not

covering it)

7. ARM Integer Load Operations

Mnm

Operands

Action

Simple loads

ldr

rd, [rs]

rd = *rs

ldr

rd, [rs #±imm12]

rd = *(rs ± imm12)

ldr

rd, [rs1, ±rs2]

rd = *(rs1 ± rs2)

ldr

rd, [rs1, ±shft]

rd = *(rs1+shft)

Pre-increment loads

ldr

rd, [rs #±imm12]! rs = rs±imm12; rd = *rs

ldr

rd, [rs1, ±rs2]!

rs1 = rs1 ± rs2; rd = *rs

ldr

rd, [rs1, ±shft]!

rs1 = rs1±shft; rd = *rs

Post-increment loads

ldr

rd, [rs], #±imm12 rd = *rs; rs = rs ± imm12)

ldr

rd, [rs1], ±rs2

rd = *rs1; rs1 = rs1 ± rs2

ldr

rd, [rs1], ±shft

rd = *rs1; rs1 = rs ± shft

• ldr : LoaD Register (32 bits)

• ldrb : same for loading single byte (8 bits)

• suffix wt pred for predicated operation

• If predicate not true, val not loaded and rs1 not updated

• imm12 : 0 - 4095

• shft is all options shown on slide 6

8. ARM Integer Store Operations

Mnm

Operands

Action

Simple stores

str

rs, [ra]

*ra = rs

str

rs, [ra #±imm12]

*(ra ± imm12) = rs

str

rs, [ra1, ±ra2]

*(ra1 ± ra2) = rs

str

rs, [ra1, ±shft]

*(ra1+shft) = rs

Pre-increment stores

str

rs, [ra #±imm12]! ra = ra±imm12; *ra = rs

str

rs, [ra1, ±rs2]!

ra1 = ra1 ± ra2; *ra = rs

str

rs, [ra1, ±shft]!

ra1 = ra1±shft; *ra = rs

Post-increment loads

str

rs, [ra], #±imm12 *ra = rs; ra = ra ± imm12)

str

rs, [ra1], ±rs2

*ra = rs; ra1 = ra1 ± ra2

str

rs, [ra1], ±shft

*ra = rs; ra1 = ra1 ± shft

background image

9. ARM LD/ST multiple

ARM has ability to load/store any subset or all registers at once:

• stm[IB,IA,DB,DA], ra[!], {reg list}
• ldm[IB,IA,DB,DA], ra[!], {reg list}

register list is an increasing set of registers, specified individually or by
ranges:

• {r2,r5-r11,r14} // ld/st r2,r5,r6,r7,r8,r9,r10,r11,r14
• For each reg in list, size(RL) = 4*nreg
• Low reg # is stored to low part of memory

The suffixes indicate how to form the address and what to store of ! is
set. EA will be the address accessed, while UA will be the address that is
written to ra if it has the ! suffix:

suff

Meaning

Addresses

IB

Increment Before

EA = ra+4; UA = ra+size(RL)

IA

Increment After

EA = ra; UA = ra+size(RL)

DB

Decrement Before

EA = UA = ra-size(RL)

DA

Decrement After

EA = ra-size(RL)+4; UA = ra-size(RL)

10. LD/ST Examples: Saving and restoring Integer Registers

Saving all callee-saved registers and restoring with 1 inst:

PROLOGUE:

stmDB SP!, {r4-r11,r13}

// SP -= #of regs*4, save all callee-saved iregs

....

DONE:

ldmIA SP!, {r4-r11,r13}

// restore all callee-saved registers

bx

// jump to link reg, restoring PC (R15)

Example of mixed operation:

PROLOGUE:

str r6, [SP #-4]!

// ST -= 4; *ST = r6

str r5, [SP #-4]!

// ST -= 4; *ST = r5

sub SP, SP, #4

// ST -= 4

str r4, [SP]

// *ST = r4

DONE:

ldmIA [SP], {r4,r5}

// restore r4 & r5, leave SP unchanged

ldr [SP #8], r6

// restore r6, SP unchanged

add SP, SP, #12

// restore SP

bx

// jump to link reg, restoring PC (R15)

11. Common Integer Arithmetic Operations

Mnm

Operands

Action

add

rd, rs, shft

rd = rs + shft

sub

rd, rs, shft

rd = rs - shft

rsb

rd, rs, shft

rd = shft - rs

mul

rd, rs1, rs2

rd = rs1 * rs2

mla

rd,rs1,rs2,rs3

rd = rs1*rs2 + rs3

umull

rdlo,rdhi,rs1,rs2

(rdhi,rdlo) = rs1*rs2 (unsigned)

smull

rdlo,rdhi,rs1,rs2

(rdhi,rdlo) = rs1*rs2 (signed)

umlal

rdlo,rdhi,rs1,rs2

(rdhi,rdlo) = rs1*rs2 + rdlo(unsigned)

smlal

rdlo,rdhi,rs1,rs2

(rdhi,rdlo) = rs1*rs2 + rdlo(signed)

• AFAIK, no integer division on ARM!
• can suffix for predication
• suffixing with ’S’ make them update the condition codes

12. Common Bit-Level Operations

Mnemonic

Operands

Action

mov

rd, shft

rd = shft

mvn

rd, shft

rd = ~(shft)

and

rd, rs, shft

rd = rs & (shft)

orr

rd, rs, shft

rd = rs | (shft)

eor

rd, rs, shft

rd = rs ^ (shft), (if shft=rs, zero!)

bic

rd, rs, shft

rd = rs & ~(shft)

clz

rd, rs

rd = # of leading zeros (most sig bits) in rs

• can suffix for predication
• suffixing with ’S’ make them update the condition codes (inc mov)

background image

13. ARM Integer Condition Codes

• Condition codes signalled in 4 most sig bits of current program

status registers (CPSR)

• Can be set by most integer ops with ‘S’ suffix

Condition flag bits explanation:

• N (bit 31): set to sign bit of result
• Z (bit 30): result of op is zero
• C (bit 29): Carry bit; has several cases:

1. For ADD or CMN, 1 if add produces a carry (unsigned overflow)
2. For SUB or CMP, C is set to 0 if sub produces a barrow (unsigned

underflow), else 1.

3. For most other inst, set to last bit shifted out of the value by

the shifter

• V (bit 28): set to 1 if overflow occured in add or sub

14. Condition code/predicate mnemonic

pred

Flag

Mnem

Meaning

Test

EQ

equal

Z=1

NE

not equal

Z=0

CS/HS

Carry set/unsigned higher or same

C=1

CC/LO

Carry clear/unsigned lower

C=0

MI

MInus/negative

N=1

PL

PLus/positive or zero

N=0

VS

Overflow (V Set)

V=1

VC

no overflow (V Clear)

V=0

HI

Unsigned higher

C=1, Z=0

LS

Unsigned lower or same

C=0 or Z=1

GE

Signed greater than or equal

(N==V)

LT

Signed less than

(N6=V)

GT

Signed grater than

(Z=0, N==V)

LE

Signed less than or equal

(Z=1 or N6=V)

AL

always

ignored

15. Common ARM Comparison and Branch Instructions

Mnemonic

Operands

Action

cmp

rs, shft

Set CC as if rs - shft

cmn

rs, shft

Set CC as if mrid + rs

tst

rs, shft

Set CC as if mris & rid

teq

rs, shft

Set CC as if mris ^ rid

Mnemonic

Operands

Action

B

label

jump to label

BL

label

R14 (link reg) = next inst; jump to label

BX

rs

jump to (rs & 0xFFFFFFFE); if low bit is 0
ARM mode, else THUMB mode

BLX

addr

Not covered (jump to thumb func)

• Can do comparison early, branch later (no intevening iops must set CC)
• All branches predicated like every other inst using suffixes of slide 12
• Can return from func called with BL by MOV PC,R14

16. ARM Floating Point Introduction (VFP)

• Almost completely IEEE compliant
• Has logical vector through banks, we won’t cover
• floats in registers s0-s31
• doubles in regs d0-d15 (overlapped with s0-s31), and sometimes d16-

d31

• Inst suffixed with ‘s’ do floats, ’d’ handle doubles
• After precision suffix fpinst take usual predicate suffixes which use iCC
• Has three new system registers:

FPSCR : status (comparison results & exception flags) and control

(set vector length/stride, rounding mode, traps, etc) bits

FPSID : read-only register IDing VFP architecture
FPEXC : contains a few bits for system-level status & control

background image

17. FPSCR Information

Status bits:

31 : N: 1 if comparison produced a less than result
30 : Z: 1 if comparison produced a equal result
29 : C: 1 if cmp is equal, greater than or unordered
28 : V: 1 if comparison produced an unordered result (NaN)

Control bits:

24 FZ: 0: IEEE compliant, 1: flush-to-zero enabled

• 23:22 : Set IEEE rounding mode
• 18:16 : Set vector mode: set to 000 for scalar operations
• 12:8 : trap enable bits for fp exceptions given below

4 : IXC - Inexact Exceptions (non-zero rounding occurred)
3 : UFC - Underflow Exceptions (result to small in magnitude)
2 : OFC - Overflow Exceptions (result too large in magnitude)
1 : DZC - Division by Zero
0 : IOC - Invalid Operation (NaN result)

⇒ Use FMRX & FMXR to manipulate (next slide)

18. ARM FP ld/st/move

Mnem

Operands

Action

fld

fd, [ra]

fd = *(ra)

fld

fd, [ra, ± imm8*4]

fd = *(ra ± imm8*4)

fst

fs, [ra]

*(ra) = fd

fst

fs, [ra, ± imm8*4]

*(ra ± imm8*4) = fd

fcpy

fd, fs

fd = fs

fabs

fd, fs

fd = abs(fs)

fneg

fd, fs

fd = -fs

fmrs

rd, fs

rd=fs (bit transfer, no fp-to-int conversion)

fmsr

fd, rs

fd=rs (bit transfer, no fp-to-int conversion)

fmd[l,h]r

fd, rs

xfer ireg to low or high part of double reg

fmrd[l,h]

rd, fs

xfer upper or lower half of double fp to ireg

fcvtds

dd, fs

convert float to double

fctsd

sd, ds

convert double to float

fmstat

none

move FPSCR’s N,Z,C,V flag to integer CC of same name

fmrx

rd, sysreg

move FPSID, PFSCR, or PFEXC to ireg rd

fmxr

sysreg, rs

mov ireg rs to FPSID, PFSCR, or PFEXC

• imm8*4 is written #N, where N is multiple of 4: fldd rd, [PTR,#16]

19. ARM FP LD/ST multiple

FP LD/ST multiple can load contiguous registers only

• fldm[IA,DB]_, ra[!], {reg list}
• fstm[IA,DB]_, ra[!], {reg list}

register list is an contiguous increasing set of registers, specified by ranges:

• {s5-s11} // ld/st s5,s6,s7,s8,s9,s11
• For each sreg in list, size(RL) = 4*nreg
• For each dreg in list, size(RL) = 8*nreg
• Low reg # is stored to low part of memory

The suffixes indicate how to form the address and what to store if ! is
set. EA will be the address accessed, while UA will be the address that is
written to ra if it has the ! suffix:

suff

Meaning

Addresses

IA

Increment After

EA = ra; UA = ra+size(RL)

DB

Decrement Before

EA = UA = ra-size(RL)

20. ARM FP LD/ST multiple w/o type

FP LD/ST multiple X can load contiguous registers regardless of type
(useful for prologue/epilogue):

• fldm[IA,DB]x, ra[!], {reg list}
• fstm[IA,DB]x, ra[!], {reg list}

register list is an contiguous increasing set of double registers, specified
by ranges:

• {d3-d6} // ld/st d3-d6 regardless of int/float/double
• For each dreg in list, size(RL) = 8*nreg
• Low reg # is stored to low part of memory

The suffixes indicate how to form the address and what to store if ! is
set. EA will be the address accessed, while UA will be the address that is
written to ra if it has the ! suffix:

suff

Meaning

Addresses

IA

Increment After

EA = ra; UA = ra+size(RL)

DB

Decrement Before

EA = UA = ra-size(RL)

background image

21. Common ARM Floating Point Computation Instructions

Mnemonic

Operands

Action

fmac

fd, fs1, fs2

fd += fs1*fs2

fnmac

fd, fs1, fs2

fd -= fs1*fs2

fmsc

fd, fs1, fs2

fd = fs1*fs2 - fd

fnmsc

fd, fs1, fs2

fd = -fs1*fs2 - fd

fmul

fd, fs1, fs2

fd = fs1*fs2

fnmul

fd, fs1, fs2

fd = -(fs1*fs2)

fdiv

fd, fs1, fs2

fd = fs1/fs2

fadd

fd, fs1, fs2

fd = fs1 + fs2

fsub

fd, fs1, fs2

fd = fs1 - fs2

fsqrt

fd, fs

fd =

f s

22. ARM Floating Point Comparison Instructions

Mnem

Ops

exp

fcmp

rd, rs

compare rd and rs

fcmpe

rd, rs

fcmp; raise inval
op exception on NaN

fcmpz

rd

compare against 0 (rs=0)

fcmpez

rd

rs=0 raise exc on NaN

FPSCR:

31

30

29

28

fcmp

N

Z

C

V

rd > rs

0

0

1

0

rd < rs

1

0

0

0

rd = rs

0

1

1

0

is(NaN)

0

0

0

1

• Use fmstat to move from FPSCR to iCC for predication
• Can fmxr FPSCR, rd for bit level operations

23. Simple ZIAMAX in ARM Assembly

#define x0

d0

#define x1

d1

#define maxval

d2

#define sum

d3

#define N

r0

#define X

r1

#define maxX

r2

#define XX

r3

/*

r0

r1

*int ATL_UIAMAX(int N, const TYPE *X,
*

const int incX)

*/

#include "atlas_asm.h"
.text
.code 32
.globl ATL_UIAMAX
ATL_UIAMAX:

mov maxX, X
mov XX, X
fldmIAd X!, {x0,x1}

/* load real&imag, update X ptr */

fabsd x0, x0
fabsd x1, x1
faddd maxval, x0, x1
subs N, N, #1
bEQ DONE

/*

* for (maxval=0.0;i=0; i < N; i++)
* {
*

sum = abs(x[2*i]) + abs(x[2*i+1])

*

if (sum > maxval) { maxval=sum, iret=i

* }
* return(iret);
*/

LOOP:

fldmIAd X!, {x0,x1}
fabsd x0, x0
fabsd x1, x1
faddd

sum, x0, x1

pld [X,#168]

/* prefetch */

fcmpd maxval, sum

/* N=1 iff maxval <

fmstat

/* set iCC */

fcpydMI maxval, sum
subMI maxX, X, #16
subs N, N, #1

bNE LOOP

DONE:

sub r0, maxX, XX
mov r0, r0, LSR #4
bx lr

24. Safe NRM2 in ARM assembly pt I

#define

N

r0

#define pX

r1

#define fpsav

r2

#define N0

r3

#define XX

r12

#ifdef SREAL

#define sx0

s0

#define ssum0

s1

#define zero

s2

#define scal

s3

#define SO 4

#else

#define sx0

d0

#define ssum0

d1

#define scal

d2

#define zero

d3

#define fdivs fdivd
#define flds

fldd

#define fmacs fmacd
#define fabss fabsd
#define fcmps fcmpd
#define fcpysMI fcpydMI
#define fldsNE flddNE
#define fsqrts fsqrtd
#define fmuls fmuld
#define fcpys fcpyd
#define SO 8

#endif

/*

r0

r1

* TYPE ATL_UNRM2(int N, const TYPE *X,
*

const int incX)

*

r2

*/

.text
.code 32
.globl ATL_UNRM2
ATL_UNRM2:

fmrx fpsav, FPSCR

/* save original FPSCR */

mvn

N0, #0xF

/* N0 = 0xFFFFFFF0 */

and

N0, N0, fpsav

/* zero exception bits */

bic

N0, N0, #(1<<24)/* turn off flush-to-zero mode */

fmxr FPSCR, N0
mov

N0, #0

#ifdef DREAL

fmdlr zero, N0
fmdhr zero, N0

#else

fmsr zero, N0

#endif
fcpys ssum0, zero
fcpys scal, zero
mov N0, N
mov XX, pX

background image

25. Safe NRM2 in ARM assembly pt II

LOOP1:

flds sx0, [pX]
fmacs ssum0, sx0, sx0
fabss sx0, sx0
fcmps scal, sx0

pld [pX,#96]

fmstat
fcpysMI scal, sx0
subs N, N, #1
add pX, pX, #SO

bNE LOOP1

DONE:
/*

* If over/underflow happened redo
*/

fmrx r1, FPSCR
tst r1, #0xF
fmxr FPSCR, fpsav /* restore FPSCR */
bNE SSQ
fsqrts ssum0, ssum0
#ifdef DREAL

fmrdl r0, ssum0
fmrdh r1, ssum0

#else

fmrs r0, ssum0

#endif
bx lr

SSQ:

mov pX, XX
mov N, N0
fcpys ssum0, zero

SSQLOOP:

flds sx0, [pX]
fabss sx0, sx0
fdivs sx0, sx0, scal

pld [pX,#96]

fmacs ssum0, sx0, sx0
subs N, N, #1
add pX, pX, #SO

bNE SSQLOOP

fsqrts ssum0, ssum0
fmuls ssum0, ssum0, scal
#ifdef DREAL

fmrdl r0, ssum0
fmrdh r1, ssum0

#else

fmrs r0, ssum0

#endif
bx lr


Wyszukiwarka

Podobne podstrony:
Asembler ARM przyklady II
Fascia in the Lateral Upper Arm tapeSP
arm assembly
6 ARM obsluga LCD Nieznany
Mikrokontrolery ARM cz18
Powitanie Świętego Mikołaja, MIKOŁAJ HO HO
Mikrokontrolery ARM cz5
Praktyka Ho, RELAX, Ho'oponopono
dwa sposoby, RELAX, Ho'oponopono
ho ho ho lesson 1 v.2 student's worksheet for 2 students, ho ho ho
fras,systemy wbudowane L, sprawozdanie ARM 7 obsługa przetwornika?
SHSBC324 THE TONE ARM
Методичка курс ARM LPC2148 USB Keil
Mikrokontrolery ARM cz16
Ćwiczenie 2 HO
ARM zestawienie kodow instrukcji
Ho'oponopono

więcej podobnych podstron