CortexM4 core and STM32F4 introduction

STM32 product series
4 product series
STM32 leading Cortex-M portfolio
STM32 F4 series
High-performance Cortex"!-M4 MCU
STM32F4xx Block Diagram
ż� Cortex-M4 w/ FPU, MPU and ETM
64KB CCM data RAM
ż� Memory
AHB2 (max 168MHz)
CORTEX-M4
D-bus
Encryption**
ż� Up to 1MB Flash memory
CPU + FPU +
MPU
I-bus
ż� 192KB RAM (including 64KB CCM
512kB- 1MB
Camera Interface
168 MHz
Flash Memory
data RAM
S-bus
ż� FSMC up to 60MHz
USB 2.0 OTG FS
128KB SRAM
ż� New application specific peripherals
JTAG/SW Debug
ż� USB OTG HS w/ ULPI interface
External Memory
Power Supply
Interface
ETM Reg 1.2V
ż� Camera interface
POR/PDR/PVD
ż� HW Encryption**: DES, 3DES, AES Nested vect IT Ctrl USB 2.0 OTG
FS/HS
256-bit, SHA-1 hash, RNG. XTAL oscillators
1 x Systic Timer
32KHz + 8~25MHz
ż� Enhanced peripherals
Ethernet MAC
DMA
Int. RC oscillators
ż� USB OTG Full speed
10/100, IEEE1588
32KHz + 16MHz
16 Channels
ż� ADC: 0.416�s conversion/2.4Msps,
PLL
Bridge APB1 (max 42MHz)
up to 7.2Msps in interleaved triple AHB1
Clock Control
mode
RTC / AWU
(max 168MHz)
5x 16-bit Timer
ż� ADC/DAC working down to 1.8V
51/82/114/140 I/Os
4KB backup RAM
ż� Dedicated PLL for I2S precision Bridge
2x 32-bit Timer
2x6x 16-bit PWM
2x DAC + 2 Timers
ż� Ethernet w/ HW IEEE1588 v2.0
Synchronized AC Timer
2x Watchdog
ż� 32-bit RTC with calendar (independent& window)
3 x 16bit Timer
2x CAN 2.0B
ż� 4KB backup SRAM in VBAT domain
1x SDIO
Up to 16 Ext. ITs
2 x SPI / I2S
ż� 2 x 32bit and 8 x 16bit Timers
3x 12-bit ADC
ż� high speed USART up to 10.5Mb/s
1 x SPI
24 channels / 2Msps 4x USART/LIN
ż� high speed SPI up to 37.5Mb/s
2 x USART/LIN
Temp Sensor
3x I2C
ż� RDP (JTAG fuse)
HS requires an external PHY connected to ULPI interface,
ż� More I/Os in UFBGA 176 package
** Encryption is only available on STM32F415 and STM32F417
4
Flash I/F
Arbiter
(max 168MHz)
ARM � 32-bit multi-AHB bus matrix
APB2 (max 84MHz)
STM32F4 Series highlights 1/3
ż� Based on Cortex M4 core with new DSP and FPU instructions combined
to 168MHz
ż� Over 30 new part numbers pin-to-pin and software compatible with
existing STM32 F2 Series.
Advanced technology and process from ST:
ż� Memory accelerator: ART Accelerator"!
ż� Multi AHB Bus Matrix
ż� 90nm process
Outstanding results:
ż� 210DMIPS at 168MHz.
ż� Execution from Flash equivalent to 0-wait state performance up to
168MHz thanks to ST ART Accelerator
5
STM32F4 Series highlights 2/3
More Memory
ż� Up to 1MB Flash with option to permanent readout protection (JTAG fuse),
ż� 192kB SRAM: 128kB on bus matrix + 64kB (Core Coupled Memory) on
data bus dedicated to the CPU usage
Advanced peripherals
ż� USB OTG High speed 480Mbit/s
ż� Ethernet MAC 10/100 with IEEE1588
ż� PWM High speed timers: 168MHz max frequency
ż� Crypto/Hash processor, 32-bit random number generator (RNG)
ż� 32-bit true RTC with calendar and with sub-second accuracy
6
STM32F4 Series highlights 3/3
Further improvements
ż� Low voltage: 1.8V to 3.6V VDD , down to 1.7*V on most packages
ż� Full duplex I2S peripherals
ż� Three 12-bit ADCs: 0.41�s conversion/2.4Msps (7.2Msps in interleaved
mode)
ż� Up to 6 high speed USARTs up to 10.5Mbits/s
ż� Up to 3 high speed SPIs up to 37.5Mbits/s
ż� Camera interface up to 54MBytes/s
*external reset circuitry required to support 1.7V
7
STM32F4 portfolio
Extensive tools and SW
ż� Evaluation board for full product feature evaluation
ż� Hardware evaluation platform for all interfaces
ż� Possible connection to all I/Os and all peripherals
ż� Discovery kit for cost-effective evaluation and
STM3240G-EVAL
prototyping
$349
ż� Starter kits from 3rd parties available soon
ż� Large choice of development IDE solutions from the
STM32F4DISCOVERY
STM32 and ARM ecosystem
$14.90
Tools for development SW (examples)
ż� Commercial ones:
ż� IAR eval 32kB/30days for test
ż� Keil (ARM) eval 32kB for test
ż� Based on GCC commercial:
ż� Atollic Lite (no hex/bin, limited debug),
ż� Raisonance debug limited to 32kB
ż� Rowley Crossworks 30 days for test
ż� Free
ż� STVP FLASH prog.
ż� STLink utility FLASH prog. (+cmd line)
ż� ST FlashLoader FLASH prog.
ż� Libraries (free)
ż� Standard peripherals library with CMSIS
ż� USB device library
ARM Cortex M4 in few words
Introduction
Cortex-M processors binary compatible
Cortex-M feature set comparison
Cortex-M0 Cortex-M3 Cortex-M4
Architecture Version V6M v7M v7ME
Instruction set architecture Thumb, Thumb-2 Thumb + Thumb-2 Thumb + Thumb-2,
System Instructions DSP, SIMD, FP
DMIPS/MHz 0.9 1.25 1.25
Bus interfaces 1 3 3
Integrated NVIC Yes Yes Yes
Number interrupts 1-32 + NMI 1-240 + NMI 1-240 + NMI
Interrupt priorities 4 8-256 8-256
Breakpoints, Watchpoints 4/2/0, 2/1/0 8/4/0, 2/1/0 8/4/0, 2/1/0
Memory Protection Unit (MPU) No Yes (Option) Yes (Option)
Integrated trace option (ETM) No Yes (Option) Yes (Option)
Fault Robust Interface No Yes (Option) No
Single Cycle Multiply Yes (Option) Yes Yes
Hardware Divide No Yes Yes
WIC Support Yes Yes Yes
Bit banding support No Yes Yes
Single cycle DSP/SIMD No No Yes
Floating point hardware No No Yes
Bus protocol AHB Lite AHB Lite, APB AHB Lite, APB
CMSIS Support Yes Yes Yes
13
Cortex M4
DSP features
Cortex-M4 processor architecture
ż� ARMv7ME Architecture
ż� Thumb-2 Technology
ż� DSP and SIMD extensions
ż� Single cycle MAC (Up to 32 x 32 + 64 -> 64)
ż� Optional single precision FPU
ż� Integrated configurable NVIC
ż� Compatible with Cortex-M3
ż� Microarchitecture
ż� 3-stage pipeline with branch speculation
ż� 3x AHB-Lite Bus Interfaces
ż� Configurable for ultra low power
ż� Deep Sleep Mode, Wakeup Interrupt Controller
ż� Power down features for Floating Point Unit
ż� Flexible configurations for wider applicability
ż� Configurable Interrupt Controller (1-240 Interrupts and Priorities)
ż� Optional Memory Protection Unit
ż� Optional Debug & Trace
15
Cortex-M4 overview
ż� Main Cortex-M4 processor features
ż� ARMv7-ME architecture revision
ż� Fully compatible with Cortex-M3 instruction set
ż� Single-cycle multiply-accumulate (MAC) unit
ż� Optimized single instruction multiple data (SIMD)
instructions
ż� Saturating arithmetic instructions
ż� Optional single precision Floating-Point Unit (FPU)
ż� Hardware Divide (2-12 Cycles), same as Cortex-M3
ż� Barrel shifter (same as Cortex-M3)
ż� Hardware multiply (same as Cortex-M3)
Single-cycle multiply-accumulate unit
ż� The multiplier unit allows any MUL or MAC
instructions to be executed in a single cycle
ż� Signed/Unsigned Multiply
ż� Signed/Unsigned Multiply-Accumulate
ż� Signed/Unsigned Multiply-Accumulate Long (64-bit)
ż� Benefits : Speed improvement vs. Cortex-M3
ż� 4x for 16-bit MAC (dual 16-bit MAC)
ż� 2x for 32-bit MAC
ż� up to 7x for 64-bit MAC
Cortex-M4 extended single cycle MAC
OPERATION INSTRUCTIONS CM3 CM4
16 x 16 = 32 SMULBB, SMULBT, SMULTB, SMULTT n/a 1
16 x 16 + 32 = 32 SMLABB, SMLABT, SMLATB, SMLATT n/a 1
16 x 16 + 64 = 64 SMLALBB, SMLALBT, SMLALTB, SMLALTT n/a 1
16 x 32 = 32 SMULWB, SMULWT n/a 1
(16 x 32) + 32 = 32 SMLAWB, SMLAWT n/a 1
ą
(16 x 16) (16 x 16) = 32 SMUAD, SMUADX, SMUSD, SMUSDX n/a 1
ą
(16 x 16) (16 x 16) + 32 = 32 SMLAD, SMLADX, SMLSD, SMLSDX n/a 1
ą
(16 x 16) (16 x 16) + 64 = 64 SMLALD, SMLALDX, SMLSLD, SMLSLDX n/a 1
32 x 32 = 32 MUL 1 1
ą
32 (32 x 32) = 32 MLA, MLS 2 1
32 x 32 = 64 SMULL, UMULL 5-7 1
(32 x 32) + 64 = 64 SMLAL, UMLAL 5-7 1
(32 x 32) + 32 + 32 = 64 UMAAL n/a 1
ą
32 (32 x 32) = 32 (upper) SMMLA, SMMLAR, SMMLS, SMMLSR n/a 1
(32 x 32) = 32 (upper) SMMUL, SMMULR n/a 1
All the above operations are single cycle on the Cortex-M4 processor
Saturated arithmetic
ż� Intrinsically prevents overflow of variable by
clipping to min/max boundaries and remove CPU
burden due to software range checks
1,5
ż� Benefits
1
Without
0,5
ż� Audio applications
0
saturation
1,5
-0,5
1
-1
0,5
-1,5
1,5
0
1
-0,5
0,5
-1
0
With
-1,5
-0,5
saturation
-1
-1,5
ż� Control applications
ż� The PID controllers integral term is continuously accumulated
over time. The saturation automatically limits its value and
saves several CPU cycles per regulators
Single-cycle SIMD instructions
ż� Stands for Single Instruction Multiple Data
ż� It operates with packed data
ż� Allows to do simultaneously several operations with 8-bit or 16-bit data
format
ż� i.e.: dual 16-bit MAC (Result = 16x16 + 16x16 + 32)
ż� Benefits
ż� Parallelizes operations (2x to 4x speed gain)
ż� Minimizes the number of Load/Store instruction for exchanges between
memory and register file (2 or 4 data transferred at once), if 32-bit is not
necessary
ż� Maximizes register file use (1 register holds 2 or 4 values)
Packed data types
ż� Byte or halfword quantities packed into words
ż� Allows more efficient access to packed structure types
ż� SIMD instructions can act on packed data
ż� Instructions to extract and pack data
A B
Extract
00......00 A 00......00 B
Pack
A B
Cortex M4
Floating Point Unit
Overview
ż� FPU : Floating Point Unit
ż� Handles real number computation
ż� Standardized by IEEE.754-2008
ż� Number format
ż� Arithmetic operations
ż� Number conversion
ż� Special values
ż� 4 rounding modes
ż� 5 exceptions and their handling
ż� ARM Cortex-M FPU ISA
ż� Supports
ż� Add, subtract, multiply, divide
ż� Multiply and accumulate
ż� Square root operations
Rounding issues
ż� The precision has some limits
ż� Rounding errors can be accumulated along the various operations an
may provide unaccurate results (do not do financial operations with
floatings& )
ż� Few examples
ż� If you are working on two numbers in different base, the hardware
automatically � denormalize on of the two number to make the
calculation in the same base
ż� If you are substracting two numbers very closed you are loosing the
relative precision (also called cancellation error)
ż� If you are � reorganizing the various operations, you may not
obtain the same result as because of the rounding errors&
IEEE 754
Number format
ż� 3 fields
ż� Sign
ż� Biased exponent (sum of an exponent plus a constant bias)
ż� Fractions (or mantissa)
ż� Single precision : 32-bit coding
32-bit
1-bit Sign
8-bit Exponent 23-bit Mantissa
ż� Double precision : 64-bit coding
64-bit
&
1-bit Sign
11-bit Exponent 52-bit Mantissa
Number format
ż� Half precision : 16-bit coding
16-bit
1-bit Sign
5-bit Exponent 10-bit Mantissa
ż� Can also be used for storage in higher precision FPU
ż� ARM has an alternative coding for Half precision
Normalized number value
ż� Normalized number
ż� Code a number as :
A sign + Fixed point number between 1.0 and 2.0 multiplied by 2N
ż� Sign field (1-bit)
ż� 0 : positive
ż� 1 : negative
ż� Single precision exponent field (8-bit)
ż� Exponent range : 1 to 254 (0 and 255 reserved)
ż� Bias : 127
ż� Exponent - bias range : -126 to +127
ż� Single precision fraction (or mantissa) (23-bit)
ż� Fraction : value between 0 and 1 : "(Ni.2-i) with i in 1 to 24 range
ż� The 23 Ni values are store in the fraction field
(-1)s x (1 + "(Ni.2-i) ) x 2exp-bias
ARM Cortex-M FPU
Introduction
ż� Single precision FPU
ż� Conversion between
ż� Integer numbers
ż� Single precision floating point numbers
ż� Half precision floating point numbers
ż� Handling floating point exceptions (Untrapped)
ż� Dedicated registers
ż� 32 single precision registers (S0-S31) which can be viewed as 16
Doubleword registers for load/store operations (D0-D15)
ż� FPSCR for status & configuration
Modifications vs IEEE 754
ż� Full Compliance mode
ż� Process all operations according to IEEE 754
ż� Alternative Half-Precision format
ż� (-1)s x (1 + "(Ni.2-i) ) x 216 and no de-normalize number support
ż� Flush-to-zero mode
ż� De-normalized numbers are treated as zero
ż� Associated flags for input and output flush
ż� Default NaN mode
ż� Any operation with an NaN as an input or that generates a NaN
returns the default NaN
Complete implementation
ż� Cortex-M4F does NOT support all operations of IEEE
754-2008
ż� Full implementation is done by software
ż� Unsupported operations
ż� Remainder (% operator)
ż� Round FP number to integer-value FP number
ż� Binary to decimal conversions
ż� Decimal to binary conversions
ż� Direct comparison of Single Precision (SP) and Double Precision
(DP) values
FPU instructions
FPU arithmetic instructions
Operation Description Assembler Cycle
Absolute value of float VABS.F32 1
float VNEG.F32 1
Negate
and multiply float VNMUL.F32 1
Addition floating point VADD.F32 1
Subtract float VSUB.F32 1
float VMUL.F32 1
then accumulate float VMLA.F32 3
Multiply then subtract float VMLS.F32 3
then accumulate then negate float VNMLA.F32 3
the subtract the negate float VNMLS.F32 3
then accumulate float VFMA.F32 3
Multiply then subtract float VFMS.F32 3
(fused) then accumulate then negate float VFNMA.F32 3
then subtract then negate float VFNMS.F32 3
Divide float VDIV.F32 14
Square-root of float VSQRT.F32 14
FPU compare & convert instructions
Operation Description Assembler Cycle
float with register or zero VCMP.F32 1
Compare
float with register or zero VCMPE.F32 1
between integer, fixed-point, half precision
Convert VCVT.F32 1
and float
FPU Load/Store Instructions
Operation Description Assembler Cycle
multiple doubles (N doubles) VLDM.64 1+2*N
multiple floats (N floats) VLDM.32 1+N
Load
single double VLDR.64 3
single float VLDR.32 2
multiple double registers (N doubles) VSTM.64 1+2*N
multiple float registers (N doubles) VSTM.32 1+N
Store
single double register VSTR.64 3
single float register VSTR.32 2
top/bottom half of double to/from core register VMOV 1
immediate/float to float-register VMOV 1
two floats/one double to/from core registers VMOV 2
Move
one float to/from core register VMOV 1
floating-point control/status to core register VMRS 1
core register to floating-point control/status VMSR 1
double registers from stack VPOP.64 1+2*N
Pop
float registers from stack VPOP.32 1+N
double registers to stack VPUSH.64 1+2*N
Push
float registers to stack VPUSH.32 1+N

Wyszukiwarka

Podobne podstrony:
[Życińska, Heszen] Resources, coping with stress, positive emotions and health Introduction
Introducing the ICCNSSA Standard for Design and Construction of Storm Shelters
Introduction to Microprocessors and Microcontrollers
Introduction to Network Self defense technical and judicial issues
Matlab Introducing to Matlab and it s Graphics Capabilities
Linux Online Firewall and Proxy Server HOWTO Introduction
INTRODUCTION OF THE PERSONAL?TA PRIVACY AND SECURITY?T OF 14
1 4 Introduction to SQL and database objects Lab
CWIHP Bulletin nr 14 15 Introduction and table of contests
Barth Introduction Ethnic Groups and Boundaries
Kolb Introduction to Brain and Behavior 2e TOC
Introduction to Prana and Pranic Healing – Experience of Breath and Energy (Pran
saint saens bizet introducion and rondo?priccioso [vl pf]
Introduction to CPLD and FPGA Design
Schnabel; History, Theology and the Biblical Canon an Introductionl

więcej podobnych podstron