Xenon System Architecture

background image



Xenon System Architecture

Specification

Doc #:

H02174

Project: Xenon

Author: Nick

Baker

Revision: 0.43

Date: 06/24/04

background image

Version: 1.3

Xenon System Architecture

Proprietary Notice

The information contained herein is confidential, is submitted in confidence, and is proprietary information of
Microsoft Corporation, and shall only be used in the furtherance of the contract of which this document forms
a part, and shall not, without Microsoft Corporation’s prior written approval, be reproduced or in any way
used in whole or in part in connection with services or equipment offered for sale or furnished to others. The
information contained herein may not be disclosed to a third party without consent of Microsoft Corporation,
and then, only pursuant to a Microsoft approved non-disclosure agreement. Microsoft assumes no liability for
incidental or consequential damages arising from the use of this specification contained herein, and reserves
the right to update, revise, or change any information in this document without notice.


Published by
X-box Console Group
Microsoft Corporation
One Microsoft Way
Redmond, WA 98052-6399
Telephone (425) 882-8080

© 2002 Microsoft Corporation. All rights reserved. Printed in the USA.
Microsoft, and MS are registered trademarks and Windows is a trademark of Microsoft Corporation.

Microsoft Proprietary and Confidential. Page 2 of 46

background image

Version: 1.3

Xenon System Architecture

Revision History

Revision

Changes

Date

By

Status

0.1 Original

2/26/03

Nick

Baker Draft

0.1a

Added Structure

3/15/03

Nick Baker

Draft

0.2

Updated Southbridge description. Added Big Endian
notes as they apply to Southbridge

6/5/03 Greg

Williams

Draft

0.25

Updated top-level block diagram. Updated Southbridge.
Updated memory.

8/27/03 Greg

Williams Draft

0.30

Updated Ana relevant sections

9/21/03

John Tardif

Draft

0.35

Updated Southbridge Endianness section

12/15/03

Greg Williams,
Stephen Au

Draft

0.36

Rolled in Andy’s WMA Endianness document

12/16/03

Greg Williams,
Andy Walters

Draft

0.37

General clean-up

1/7/04

Nick Baker

Draft

0.4

Added BackSide Bus Spec subdocument

2/13/04

Greg Williams

Draft

0.41

Corrected and updated system information

3/3/04

L. Del Castillo

Draft

0.42

Updated memory, memory interface, system boot, reset,
and time sections

3/9/04 Harjit

Singh Draft

0.43

Southbridge update – USB overhaul 6/24/04

Greg

Williams

Draft


Conventions Used

Description

Represents

Examples

Documents Referenced

Title

Document Number

Author

Xenon Product Specification

\\xenon\specs\Xenon Product Spec.doc

Todd Roshak (toddro)

Xenon Design Specification

\\xenon\specs\Hardware\Xenon Design Specification.doc

Harjit Singh (harjits)

Microsoft Proprietary and Confidential. Page 3 of 46

background image

Version: 1.3

Xenon System Architecture

Table of Contents

1

Introduction ________________________________________________________________ 7

2

System Block Diagram ________________________________________________________ 7

3

Architecture Overview ________________________________________________________ 8

3.1

Introduction ___________________________________________________________________ 8

3.2

Core Digital Components ________________________________________________________ 8

3.3

Architecture Justification ________________________________________________________ 9

3.4

System Components ___________________________________________________________ 10

3.5

Distributed Components ________________________________________________________ 11

3.6

Key Architectural Mechanisms __________________________________________________ 11

3.7

Low Level Software Architecture (?)______________________________________________ 12

3.8

Alternate SKU Considerations ___________________________________________________ 13

3.9

Technical Specifications ________________________________________________________ 13

3.10

Performance Overview (Sue) __________________________________________________ 14

4

Core Digital Components_____________________________________________________ 15

4.1

CPU _________________________________________________________________________ 15

4.1.1

Further Reading ____________________________________________________________________16

4.2

GPU_________________________________________________________________________ 17

4.3

Memory______________________________________________________________________ 19

4.4

Southbridge __________________________________________________________________ 20

4.4.1

Notes_____________________________________________________________________________21

4.5

Ana _________________________________________________________________________ 21

4.6

Front Side Bus (Art) ___________________________________________________________ 26

4.6.1

Link Layer ________________________________________________________________________26

4.7

Back Side Bus_________________________________________________________________ 27

4.8

Memory Bus __________________________________________________________________ 27

4.9

SMBus_______________________________________________________________________ 28

5

System Components _________________________________________________________ 29

5.1

Questions ____________________________________________________________________ 29

5.2

DVD_________________________________________________________________________ 29

5.3

HDD ________________________________________________________________________ 29

5.4

MU__________________________________________________________________________ 29

5.5

Game Controllers _____________________________________________________________ 30

5.6

Network _____________________________________________________________________ 30

5.7

Expansion ____________________________________________________________________ 30

Microsoft Proprietary and Confidential. Page 4 of 46

background image

Version: 1.3

Xenon System Architecture

6

Distributed Components______________________________________________________ 30

6.1

Audio________________________________________________________________________ 30

6.2

Video ________________________________________________________________________ 30

7

Key Architectural Mechanisms ________________________________________________ 31

7.1

System Dataflow ______________________________________________________________ 31

7.2

System Memory Map __________________________________________________________ 31

7.3

System Coherence _____________________________________________________________ 31

7.4

System Interrupt Mechanism ____________________________________________________ 31

7.5

System Ordering ______________________________________________________________ 31

7.6

System Security _______________________________________________________________ 31

7.7

System Endianess______________________________________________________________ 31

7.8

System Clocking_______________________________________________________________ 31

7.9

System Boot __________________________________________________________________ 32

7.10

System Reset ________________________________________________________________ 33

7.11

System Time ________________________________________________________________ 33

7.12

Power States & Power Management ____________________________________________ 33

7.12.1

Off ______________________________________________________________________________33

7.12.2

Standby___________________________________________________________________________33

7.12.3

Quiet _____________________________________________________________________________34

7.12.4

Full Power ________________________________________________________________________34

7.12.5

Power Control Events________________________________________________________________34

7.13

CPU/GPU Synchronization (Nick) ______________________________________________ 39

7.14

CPU – GPU Procedural Geometry Communication (Nick)__________________________ 40

7.14.1

Vertex commands___________________________________________________________________40

7.14.2

GPU Memory mapped registers ________________________________________________________40

7.14.3

Current Implementation ______________________________________________________________41

7.14.4

Requirements ______________________________________________________________________42

7.15

System Debug Facilities (?) ____________________________________________________ 43

7.15.1

Low Level Debug ___________________________________________________________________43

7.15.2

Development Systems _______________________________________________________________43

7.16

System Bandwidth / Latency Roll-Up (Nick)______________________________________ 43

8

Low Level Software Architecture (MarcW?) _____________________________________ 45

8.1

Flash Resident Drivers _________________________________________________________ 45

8.2

BIOS ________________________________________________________________________ 45

8.3

Low Level Drivers _____________________________________________________________ 45

8.4

Network Stack ________________________________________________________________ 45

8.5

Procedural Geometry __________________________________________________________ 45

8.6

Video Mode Selection __________________________________________________________ 45

9

Alternate SKU Considerations (Nick) ______________________Error! Bookmark not defined.

Microsoft Proprietary and Confidential. Page 5 of 46

background image

Version: 1.3

Xenon System Architecture

10

Other ___________________________________________________________________ 46

10.1

Not Covered ________________________________________________________________ 46

Microsoft Proprietary and Confidential. Page 6 of 46

background image

Version: 1.3

Xenon System Architecture

1 Introduction

The purpose of this document is to capture the specification for the Xenon system architecture.

It serves as a high level description and requirements for interoperability of different core components between
themselves, other IO devices and peripherals and software. It is not a detailed architecture description of individual
components. The intended audience is hardware and software architects and designers who want a high level overview
of the system.

2 System

Block

Diagram

Figure 1: Xenon System Block Diagram

CPU

16

Bitrate: 5.4 Gbps

Host

Core

Dataflow

Cores

16

GPU

10MB

EDRAM

512 Mb,

x32

128

Total Size: 256 MB

Raw B/W: 21.6 GB/sec

Signaling: GDDR

Raw B/W: 22.4 - 25.6 GB/sec

Signaling: Custom Differential

Bitrate: 1.4 - 1.6 Gbps

SouthBridge

2

2

Bitrate: 2.5 Gbps

Raw B/W: 1 GB/sec

Signaling: PCI-Express

Ethernet

PHY

Audio
DACs

16 MB

System

Flash

Core VDD: 1.8 V
Power: 6-10W

DVD Drive

HDD Connector

Memory Card Interface (x2)

Front-Panel (x2):

e.g. 1 gamepad (or hub) +

1 camera, or 2 gamepads

1 MB Shared L2$

32 kB L1 I$

32 kB L1 D$

32 kB L1 I$

32 kB L1 D$

RJ-45

Serial

ATA

MII

I2S

AVIP

Serial

ATA

Serial

Debug

SMBus

3D Core

(Gfx)

DENC

Video
DACs

64 kB

Boot ROM

JTAG

Xenon System Block Diagram

Rev 2.6, 06/18/04

S/ PDIF

SMC

SMC: 12 kB

Kernel: 256 kB

Drivers: 1 MB

Config: 256B

Dash etc ~11 MB

Launch Process: 0.15u

Power: 3.2W
Signal I/O: 183

Frequency: 125 MHz

Launch Die Size: 34.7 mm^2

Core VDD: 1.8V

SB

Package: 23x23 382 TEBGA

Ana

Launch Process: 0.18u
Launch Die Size: 13.4 mm^2
Frequency: 170 MHz
Core VDD: 1.8V
Power: 1.3W
Signal I/O: 81
Power I/O: 61

Package: LQFP144

CPU

Launch Process: 90nm enhanced SOI (10KE0)
Launch Die Size: 168 mm^2
Frequency: 3.0-3.5 GHz
Core VDD: APS 1.075V – 1.275V (@ Ball)
Power: 85W
Power/Ground Bumps: 2113
Signal I/O: 219
Power/Ground Balls: 680
Package: Flip-Chip 899 Ball Plastic /Organic BGA

RTC

Thermal diode info

(2 add’l from board )

2

4

8

Parallel

Debug

64

Clk Gen,

Thermal

Sensor ,

Fan Driver

Ana

Fan(s)

Clocks to CPU ,

NB, SB, EPHY

Argon Wireless Module

Baseband

2.4 GHz

Radio

IR

8

Front-Panel , PSU,

DVD Tray

GPIOs

15

EHCI

Kernel

Debug Port

UART

SMC

Debug Port

UART

PWM x2

OHCI

2 indpt.

controllers

HDD Drive

RTC

SATA

IIC

PC SKU (Helium) Extras

XDVO: 135MHz DDR

HDMI/HDCP

XDVO

TMDS

Future HDTV Support

HDMI

GPIO’s in SB /Ana for DDC , UPD, CEC
HDMI or DVI w/HDCP separate chip or integrated into Ana
Separate connector to reduce cost on non

-Pro SKU

HDMI connector smaller form factor than DVI
Only analog outputs or digital outputs enabled at any one time
TMDS - 4 high speed differential pairs

EHCI

OHCI

+XUSB

+XUSB

Helium: 4 expansion

Helium: Keyboard/Mouse

USB connections

noted below

BGC: 2 of 4

ports unused

BGC: 1 of 5

ports unused

GPU

Launch Process: 90 nm bulk (TSMC 90GT)
Launch Die Size (main) : 177 mm^2
Launch Die Size (EDRAM) : 71 mm^2
Core Frequency: 500 MHz
Core VDD: 1.1V
Power: 38W (29W + 9W eDRAM)
Signal I /O: 443
Power I/O : 582 (282 I/O, 300 core)
Package: Flip-Chip 35x35 1025 ball BGA
- Target ThetaJc = 0.2-0.5 degreesC/W

NorthBridge

Rear-Panel (x1):

e.g. Omni WiFi (802.11x),

or general expansion

EEPROM

Wireless gamepads (Radon)
charged via front -panel USB

USB

USB

USB

USB

SMC

GPIO

2

The diagram

[NRB1]

shows the main system components. These are described in more detail in the next section. Note that

the latest version of the diagram may be obtained from:

\\xenon\specs\Architecture\Xenon System Block Diagram.vsd

Microsoft Proprietary and Confidential. Page 7 of 46

background image

Version: 1.3

Xenon System Architecture

3 Architecture

Overview

3.1 Introduction

The following sections give a high level view of the system architecture. For further reading see the
corresponding main chapters later in this document.

3.2

Core Digital Components

The system consists of the following main components:

• CPU: The CPU is a custom 4GHz PowerPC CPU designed specifically for Xenon. It

consists of 3 CPU cores running in a SMP model. Each core supports 2-way Simultaneous
Multi-Threading (SMT), allowing a total of six simultaneous hardware threads. The
architecture is scalable to accommodate a late binding decision on the exact number of
cores depending on the final cost model. All cores are identical and have specially
designed vector floating point acceleration (VMX2) and a shared 1MB L2 cache.
See

\\xenon\specs\CPU\CPU_One_Pager(IBM).doc

for an overview.

• GPU: This is the main system controller hub containing the CPU’s Bus Interface Unit (BIU),

the memory controller, a DX9/10 3D rendering core, system coherency controller and IO
interface. It is broken into two main sections: the Northbridge (BIU, Memory, IO) and the
3DCore. The 3DCore is a 500MHz unified shader architecture based on the R500 and
uses 10MB of embedded DRAM for the render targets and z-buffer to provide more
consistent rendering performance.
See

\\xenon\specs\Graphics\GPU Preliminary Specification (1 pager).doc

for an overview.

• SouthBridge: This is the IO controller chip that contains interfaces to all the peripherals,

including audio output and decompression, Serial-ATA for DVD and HDD, USB1.1/2.0 for
peripherals and memory units. It also contains the System Management Controller (SMC)
and system FLASH interface.

[NB2]

See

\\xenon\specs\Southbridge\Southbridge_One_pager.doc

for an overview.

• Ana: Ana is the ANAlog chip that contains the system clock reference, video DACs,

thermals sensors as well as the digital encoder for analog video standards
(NTSC/PAL/HDTV/VGA).
See

\\xenon\specs\Ana\Ana_ One_Pager.doc

for an overview.

• Memory: The system has a unified memory architecture consisting of GDDR memory.

128MB, 256MB, 512MB and 1GB memory configurations are supported, although 256MB
console with 512MB development systems are the POR.
See

\\xenon\specs\memory\specs\ati_spec_16mx32_8b_v11.pdf

for a sample part spec.

The components communicate over the following interfaces:

• Front Side Bus (FSB): Interface between the CPU and GPU. This is a 5.4Gbps differential

link custom to the CPU vendor. It is symmetrical with 10.8GB/sec peak in the write
direction, and 10.8GB/sec peak in the read direction. The high bandwidth (in the write
direction at least) is to support procedural data generation (XPS) on the CPU which is
pushed in a tightly coupled fashion to the GPU.
See

\\xenon\specs\CPU\FSB\FSB_BUSSPEC.pdf

for the FSB documentation.

• Back Side Bus (BSB): This connects the GPU to the SouthBridge. This is a PCI-Express

2x bus with a peak of 500 MB/sec in each direction.
See

\\xenon\HWDev\Electrical\Southbridge\Interface Specs\PCI\pciexpress_base_10a.pdf

for the PCI-Express base specification.

Microsoft Proprietary and Confidential. Page 8 of 46

background image

Version: 1.3

Xenon System Architecture

• Memory Bus: The interface between the GPU and main memory. This is GDDR, running at

1.4-1.6Gbps. At 128bit wide, this provides a peak of 22.4GB/sec (@1.4Gbps). The exact
frequency will be determined later based on the availability of parts.
The sample part spec (

\\xenon\specs\memory\specs\ati_spec_16mx32_8b_v11.pdf

) also

serves as the interface definition.

• Xenon Digital Video Output (XDVO): This is the pixel output bus that interfaces the GPU to

the video encoder portion of Ana. It is a 15bit 135MPix/sec DDR (2 cycles to transfer one
pixel) bus that supports most HDTV and HD Monitor standards135MPix.
See (

\\xenon\Specs\Ana\XDVO.doc

) for the XDVO bus documentation.

• System Management Bus (SMBus): This is a low pin count serial interface (similar to IIC)

that the various chips use to communicate with one another for reset and power
management purposes. Most likely there is no direct connection to the GPU/CPU, other
than indirectly say through resets.
See

\\xenon\HWDev\Electrical\Industry Standards\SMBus Version 2.0.pdf

for the base

specification.

3.3 Architecture

Justification

The full reasoning behind the architecture is beyond the scope of this document (please read the
“Think Week White Papers” for a more thorough analysis).

At the highest level the system looks like a multi-processor PC with integrated graphics. This was
not necessarily the intention, and actually the distance of the CPU from main memory was longer
than hoped. A split memory architecture with local processor memory was desired, but cost
constraints dictated a unified memory architecture. Once that decision was made, placing the
memory next to the highest bandwidth customer (the GPU) was the next logical step. This does
present a memory latency problem to the CPU, so large caches and CPU pre-fetching are required
to compensate.

The next level is the exact intent of the multi-processing and multi-threading. Again this was driven
by cost efficiency reasons. For developers, the easiest architecture is to present a single high
performance processor. However, the CPU industry has reached a limit to how far instruction level
parallelism can be taken with increasing cost and complexity for doing so. Forcing parallel
programming with several simpler (cheaper) processors especially in a closed environment such as
a game console is a logical step, for which there is also prior art. Furthermore, several areas of a
game (especially at the low levels of physics and rendering) are parallelizable.

To aid in the parallel programming within the rendering pipeline in particular, the CPU and GPU
have been closely coupled to allow procedural generation of data. This also helps in a cost
constrained environment where the amount of system memory a developer would need to store all
the offline generated art they want can never be achieved. Even if a developer does not want to
tackle a multi-processing problem, by using Microsoft supplied APIs he can effectively take
advantage of the extra processing power to perform parallel number crunching as well as
decompression of geometry and to some extent textures.

An interesting note here is that because parallel programming is hard, we do want to get significant
performance out of a single CPU core, so the processor (read single core) chosen still has
competitive SpecINT.

The use of Embedded DRAM also requires some discussion here. Graphics processors are
extremely bandwidth hungry, and this is typically solved in PC graphics by using very wide memory
interfaces, render target and z-buffer compression, and on-chip caching. EDRAM was chosen
because going wider than 128bits was not a cost option, and because compression and caching
typically behave unpredictably.

Microsoft Proprietary and Confidential. Page 9 of 46

background image

Version: 1.3

Xenon System Architecture

The choice of a DirectX compatible 3DCore should be self explanatory, and the most up to date
version of the standard was chosen (DX10). However, given schedule / cost constraints, not all of
the DX10 spec made it into the hardware. The main rendering features of interest are:

• Unified Shader Core: This allows effective load balancing of vertex and pixel shaders, so

achieving better efficiency of the compute resources.

• Multi-Render Target: This allows deferred lighting passes, where per pixel computations

can be performed in a geometry independent fashion.

• High Dynamic Range: Floating point and high precision fixed point formats are supported

to allow HDR effects.

3.4 System

Components

The core digital components (CPU, GPU, SB, Memory, Ana) comprise the minimum architecture
components required to be able to boot and run the OS. In addition there are several storage and
IO devices supported, all of which may or may not be present in a given product configuration.

-

DVD: Used for game content delivery.

-

HDD: Optional component used for the alternate SKUs to enhance certain capabilities
such as ripped audio, saved games.

-

Ethernet (10/100BaseT): For Live and sharing content with PCs. Will also be used for
development systems.
See

\\xenon\HWDev\Electrical\Ethernet PHY\VIA-DS6103110.pdf

for example external

PHY specification.

-

Audio DACs: Separate component for audio output.
See

\\xenon\HWDev\Electrical\Audio\Wolfson\WM8726.pdf

for example spec.

-

Memory Cards: As in Xbox1, saved game data can be stored on USB based memory
cards. Without a HDD in all configurations, this will be required for all game saving on
Xenon.

-

Wired Controllers: As with Xbox1, the wired controllers use a modified version of USB
(XUSB).

-

Expansion Devices: A couple of standard USB ports will be available for other expansion
devices such as USB HDDs, Cameras, etc.

-

IR Input: IR is supported directly by the SB.
See

\\xenon\HWDev\Electrical\IR Receiver\Xenon Infrared Receiver Spec.doc

for spec.

-

Wireless Controllers: Wireless controllers are supported via additional circuitry that
interfaces to one of the SB USB ports.

-

AV Packs: Again similar to Xbox1, Xenon supports a Audio Video Interface Port (AVIP) to
break-out the audio and video signals depending on the AV components the end customer
has. Planned AVPacks would be: Standard (Analog Stereo Audio+Composite Video), RF
(one each for North America, Japan, Europe), Enhanced (Digital Audio + S-Video added),
SCART (RGB component along with Composite video), Component, and VGA. See

\\xenon\Specs\Peripherals\Xenon_AV_Pack_Design_Spec.doc

for design spec.

-

HDMI/HDCP: The base architecture also supports routing the XDVO bus to an optional
HDMI chip.

Documentation for the different IO busses can be referenced as follows:

-

Serial ATA:

\\xenon\HWDev\Electrical\Southbridge\Interface Specs\ATA\Serial ATA 1.0 gold.pdf

Microsoft Proprietary and Confidential. Page 10 of 46

background image

Version: 1.3

Xenon System Architecture

-

USB 2.0:

\\xenon\HWDev\Electrical\Southbridge\Interface Specs\USB\USB 2_0 Spec.pdf

-

USB 1.1:

\\xenon\HWDev\Electrical\Industry Standards\usb11.pdf

-

MII:

\\xenon\HWDev\Electrical\Southbridge\Interface Specs\EMAC\MII.pdf

-

I2S:

\\xenon\HWDev\Electrical\Industry Standards\i2sbus.pdf

As shown in the diagram, it is possible to extend the system further by using IIC (SMBus) or
additional USB Hubs. These will be used add additional USB ports and other components such as
a Real-Time Clock for the PC SKU.

3.5 Distributed

Components

There are a few features for which there is no dedicated processing, rather processing is shared
amongst the different chips and emulated. These are called out below:

-

Digital Video Processing: There is no dedicated MPEG decoder in the system. The
processing (decode) is expected to be performed completely on the CPU, with maybe
some help from the 3DCore’s shader array if needed.

-

Audio: The audio support on the hardware is limited to WMAPro decode and audio out
DMA. All voice generation, mixing, and effects processing is done on the CPU.

3.6 Key

Architectural

Mechanisms

The salient points about how certain architectural features are implemented are listed below. Refer
later to more detailed descriptions:

-

Endianess: The CPU is Big Endian (byte ordering). All devices on the system are big
endian as well, except for the SouthBridge IO components which are Little Endian (due to
their PC heritage).

-

CPU Coherence: Coherence between the CPU cores is maintained by hardware. The
coherence point is the L2 cache. To aid in this the L2 is inclusive of the L1 caches, and the
L1 caches are write-through.

-

DMA Coherence: Only IO coherence via snooping is implemented. High bandwidth
devices, such as rendering, should avoid using this mechanism and software should use
non-cached write combining, or software managed coherence when synchronizing data in
these cases.

-

Instruction Ordering: PowerPC is loosely ordered. This requires the use of barrier
operations to force ordering when required, e.g. when accessing hardware registers.

-

DMA Ordering: As always, ordering rules are required to guard against race conditions
between DMA transfers, interrupts and Memory Mapped IO (MMO) operations. At a high
level, the hardware will use interrupts to guarantee that ordering is maintained. There is no
support for simultaneous fine grain (e.g. within a CPU cache line) access to any memory
location or register by different devices.

-

Interrupts: Interrupts are message based (there is no interrupt pin on the CPU). Messages
flow from the Southbridge to the main collector in the Northbridge which then forwards the
messages to the interrupt processor on the CPU chip. Emulation for edge triggered and
level sensitive interrupts is supported.

-

Memory Map: The CPU can address a 42bit memory range. Only 32bits are available
outside of the CPU (providing a 4GB main system memory map). This 32bit space is
broken down into a 1GB main memory window (not all of which may be present), 2GB of
reserved, and 1GB of MMIO and configuration space. Internal to the CPU, the additional
memory range is used to implement a secure boot environment to guard against certain

Microsoft Proprietary and Confidential. Page 11 of 46

background image

Version: 1.3

Xenon System Architecture

security attacks, i.e. there are certain structures on the CPU that only it can address, and
only in a super-privileged (HyperVisor) mode.

-

Security: Secure boot, piracy prevention and DRM are implemented via a security scheme
that relies on a boot sequence that start on the main CPU die itself, as well as with a
security engine that allows blocks of main memory to be protected.

-

Boot Procedure: The OS kernel is booted via a several stage boot process. Initially the
CPU’s internal bootloader (BL1) starts up. This fetches and decrypts the stage 2
bootloader (BL2) from external FLASH. The BL2 enables main memory and copies the
kernel from FLASH to main memory before entering the kernel.

-

Xenon Procedural Synthesis: There is a collection of features implemented in the CPU and
GPU that allow transient geometry data to be generated by the CPU and absorbed directly
by the GPU without hitting main memory. Briefly, a 128kB set of the CPU’s L2 can
optionally be locked down for several geometry FIFOs. These FIFOs can be read directly
by the GPU so as to fetch vertex data. A low latency (non-interrupt based) synchronization
scheme is achieved by allowing the GPU to write command processor tail-pointer updates
directly to the CPU. The CPU also allows streaming data past the L2 for reads, and past
the L1 for writes to assist in avoiding cache pollution. Intrinsic VMX2 data pack instructions
are also an important feature.

3.7

Low Level Software Architecture

3.7.1 Overview

3.7.2 Hardware Resident APIs/Code

This section defines what drivers and/or code should live in FLASH vs. in the title library (and
therefore game media). This is so that hardware can be rev’d over time and still provide backwards
compatibility.

1. Power

Management

2. DVE

a. HDMI support may/will require additional code space

b. Closed

captioning

c. Wide-screen

signalling

3. Video

Resize

4. Video Colorspace conversion

5. Video

Gamma

6. Temp

sensor

a. Calibration parameters need to be stored on a per box basis

7. Ana (Clocks, etc.)

8. Ethernet

Phy

9. Audio

codec

10. Flash

11. USB

12. SATA

13. FSB settings

Microsoft Proprietary and Confidential. Page 12 of 46

background image

Version: 1.3

Xenon System Architecture

14. Memory settings

15. PCIe settings

16. Basic IR decode (power and eject buttons)

17. SMC code and kernel i/f

18. Fan algorithm

19. Front panel

20. Argon interface

21. CPU

a. Init

sequence

b. For some CPU cost reduction items, we need some kernel support: 32bit mode

SLB, use of less than the full launch TLB (still figuring out how that will work), etc.

c. Certain code sequences may cause bugs. These need to be caught at Cert.

d. Supervisor 42bit MMIO drivers (Interrupt handlers).

3.8

Alternate SKU Considerations

The core architecture has the requirement to support a few different SKUs.

This list can be boiled down to allowing for 2x memory, future use of HDMI/HDCP. I think other
expansion issues such as RTC and USB for Helium are out of place in this document, as they are
more implementation-specific.

• Development Systems: The main constraint imposed by the DevKits is the requirement to

double the amount of main memory (512MB) and still maintain the same memory
performance. The challenge here will be the electricals when doubling the memory parts.
The DevKits will also use the serial debugger interface on the SouthBridge for kernel
debugging.

• PRO SKU: This is identical to the game console, except it will come with different

peripherals (including HDD and wireless gamepads) as standard.

• PC SKU: This is the most challenging and to some extent the least well understood.

Nominally Windows (XP or Longhorn) will be run using the Connectix emulation layer.
Additional peripherals including a Real Time Clock (RTC) and USB Hub will be added to
the motherboard. A maximum display resolution of 1280x1024@75Hz has been chosen,
which drives the minimum amount of Embedded DRAM and the maximum pixel display
rate (135MPix/sec).

• HDTV DVD: Not currently POR, but we are planning hooks for future HDTV DVD playback.

Other than the correct choice of media and compression scheme, the main issue for the
core architecture is the copy protection required on the display output. It looks like this will
be HDMI/HDCP which is not supported in the current Ana. A couple of options for design
updates at a later date are possible and discussed later.

3.9 Technical

Specifications

Spec

Value

Spec

Value

Number of processor cores

3

XPS

Microsoft Proprietary and Confidential. Page 13 of 46

background image

Version: 1.3

Xenon System Architecture

CPU Frequency

4GHz

SpecINT

1461 (single thread)

RAM

256MB Console, 512MB Dev
(128MB-1GB supported)

L1 Cache (data)

32KB, 4 way associative

L1 Cache (instruction)

32KB, 2 way associative

L2 Cache

1 MB shared (8-way)

GPU

ATI R500

HDD Storage bus

Serial ATA

DVD Storage bus

Serial ATA

Front side bus speed

10.8GB/sec read + 10.8GB/sec
write

Gamepad interface

XUSB (modified USB 1.1)

Memory units:

Xenon 64MB MUs (USB 1.1)

HDD TBD

DVD Drive

Serial ATA

CPU Core Hardware Threads

2 (per core)

CPU Core Instruction Issue

2 per cycle

CPU Core VMX2 Datapath

128bits

CPU Core VMX2 Registers

128 (x2 threads)

CPU Core VMX2 Double
Precision

Scalar only

System Memory BW

22.4GB/sec

GPU RT+ZB Memory BW

256GB/sec

GPU RT+ZB Memory Capacity

10MB

GPU Frequency

500MHz

GPU Geometry Rate

500MVtx/sec

GPU Shader Rate

24GInstr/sec (shared vertex and
pixel)

GPU Shader Datapath

128bits

GPU Texture Rate

8Gtex/sec (filtered)

GPU Pixel Rate

4GPix/sec (AA, alpha blend, z-
test)

Display Pixel Rate

135MPix/sec

3.10 Performance Overview (Sue)

2-6-04 Asymmetric Latencies
At the performance meeting today Jim confirmed that different cores will definitely have different latencies to L2. They
still do not have information about just what those latencies will be.

Microsoft Proprietary and Confidential. Page 14 of 46

background image

Version: 1.3

Xenon System Architecture

4 Core Digital Components

4.1 CPU

Link Layer

Physical
Layer

Coherency Block

"MPi Bus" like

GPU

Memory
Controller

Graphic
Processors

IO Controller

IBM

"MPi Bus" like

Waternoose SOC

1 MB Shared L2

CIU

NCU

BIU

L2C

Link Layer

Physical
Layer

Security Engine

1MB
Bank 1

FSB

MPi Bus

MPi Bus

HFC+

MMU

VMX2

HFC+

MMU

VMX2

HFC+

MMU

VMX2

Interrupt C

.

Sec. fuses

Boot RAM

Boot ROM

FSB

FSB

The CPU is a multi-core SOC arranged in an SMP fashion. All cores are identical and are
optimized for vector floating point, as is common in 3D graphics applications. It is the belief that
certain portions of a game are parallelizable, so we provide parallelism both at the core level (SMP)
and thread level (SMT). To ensure that this system is programmable by a wide range of
developers, the cores are all coherent with each other, and Microsoft intends to provide middleware
libraries to developers so that the parallelism is hidden from those developers that do not want to
take this programming challenge.

To help visualize how this system may be programmed in this fashion one possible example is
discussed. One core is allocated to the game engine, this is in fact the only CPU core than the

Microsoft Proprietary and Confidential. Page 15 of 46

background image

Version: 1.3

Xenon System Architecture

developers program. All threads associated with that game run on this “Host” core. A set of API
functions implement accelerated and optimized routines for physics, animation, collision detection,
audio, etc. that run on a second core. A third core is dedicated to procedural synthesis. This last
one is an important subset as the CPU and GPU have dedicated hardware to allow procedural
synthesis on the CPU to be efficiently pushed in a tightly coupled fashion to the GPU, without
spilling to memory. For the API functions mentioned, including the procedural geometry, the
developer can provide their own routines that the XOS schedules appropriately.

Important features of the CPU:

• ISA: 64bit PowerPC, derived from Power4 architecture.
• SMP: All cores are identical and coherent with one another.
• SMT: All cores support 2 simultaneous threads.
• Vector Floating Point (VMX2, 128bit): The cores each have a dedicated vector floating

point unit that is capable of performing the equivalent of a DP4 each cycle at sustained
throughput. There are also special instructions for data swizzling and compaction so as to
be compatible with common 3D datatypes and operations.

• Advanced Cache Management: The caching control supports many of the issues common

with multi-media systems with optimizations for data-streaming, set locking etc.

• System Coherency: The CPU supports a coherency protocol that allows the L2 and L1

caches to be coherent (if so desired) with any DMA hardware (3D or otherwise). Use of this
is via snoops, so this should only be used for low-bandwidth operation.

• System Security: The CPU implements a confidential scheme that allows the system to be

protected for copy protection, DRM and privacy purposes.

MMU

Cached write combining and the RC machines

Instruction throughput / latency

Streaming support

Locking support

XPS support

Scalar FP

4.1.1 Further

Reading

The following documentation is recommended for a better understanding of PowerPC and this
processor in particular.

Standard PowerPC documentation:

\\xenon\specs\CPU\CPU_Vendor_data\PPC_Specs\PowerPC_Architecture_Book_I.pdf

\\xenon\specs\CPU\CPU_Vendor_data\PPC_Specs\PowerPC_Architecture_Book_II.pdf

\\xenon\specs\CPU\CPU_Vendor_data\PPC_Specs\PowerPC_Architecture_Book_III.pdf

Xenon specific documentation:

\\xenon\specs\CPU\CPU_Vendor_data\PPC_Specs\PPC_WN_Book4.pdf

\\xenon\specs\CPU\CPU_Vendor_data\PPC_Specs\vmx128-isa-000.pdf

Microsoft Proprietary and Confidential. Page 16 of 46

background image

Version: 1.3

Xenon System Architecture

4.2 GPU

NorthBridge Description

3D core features

Control / synchronization

Driver implementation issues and architecture

Procedural synthesis

The GPU is the main system controller hub, connecting the FSB, memory interface and BSB. It
also contains the 3D rendering core.

The following diagram shows conceptually how the major components within the GPU are
connected.

Microsoft Proprietary and Confidential. Page 17 of 46

background image

Version: 1.3

Xenon System Architecture

FSB PHY Tx

FSB PHY Rx

FSB Link Tx

FSB Link Rx

FSB Link (MCLK)

Bus Interface Unit (MCLK)

IO Controller (IOC )

PC

I-

E PH

Y

PC

I-

E L

in

k

BSB

Master

BSB

Slave

Bus Interface (BIF)

RBBM

Command

Processor (CP)

3D Core

EDRAM

(RBCLK)

Memory Hub

Memory

Controller 0

Memory

Controller 1

XPS

Buffer

Display

Controller

FS
B

T

x

16
b/

5

.4

G

H

z

FS
B

R
x

16
b/

5

.4

G

H

z

Ma
s

te

r

Re
a

d

32
b

Ma
s

te

r

Wr
ite

32
b

Sl

a

v

e

Wr
ite

32
b

Sl

a

v

e

Re
a

d

32
b

12
8

b

12
8

b

CPU Read 32b

CPU Write 32b

SB Write 32b

SB Read 32b

PCI-E Tx

4b/2.5GHz

PCI -E Rx

4b/2.5GHz

64
b

1.

6

G

H

z

64
b

1.

6

G

H

z

32

b

bc

/

r

b

32

b

tc

32

b

vc

32

b

vg

t

32

b

Re

a

d

/

Sn

o

o

p

32

b

Wr

it

e

32

b

25
6

b

25
6

b

25
6

b

25
6

b

12
8

b

12
8b

12
8

b

12
8b

SB
R
e

a

d

3

2

b

SB
W
rit

e 3
2

b

XDVO

16b/270MHz

GFX

BIU (SCLK)

XD

VO

PH

Y

FSB CLK: 675MHz

MEM CLK: 800MHz

Core CLK: 500MHZ (White)

PIX CLK: 135MHz

PCI-E CLK: 2.5GHz

EDRAM CLK: 500MHz

ATI/MS Confidential

Note in the diagram that there are no DACs but there is an independent Video output bus to Ana
which contains a digital encoder for analog video as well as the video DACs.

TODO: Add RBClk domain.

GPU Memory Configurations

Configuration

Total Memory

#

of Ranks

Banks Row

Column

Microsoft Proprietary and Confidential. Page 18 of 46

background image

Version: 1.3

Xenon System Architecture

Name

System
Memory

Device
Size

memory
devices
per MC

per

rank

Bits

(per bank)

Bits

(per
bank)

Xenos_128_4

128MB

8Mx32

4

1

4

12

9

Xenos_256_8

256MB

8Mx32

8

2

4

12

9

Xenos_256_4

256MB

16Mx32

4

1

8

12

9

Xenos_512_8

512MB

16Mx32

8

2

8

12

9

Xenos_512_4

512MB

32Mx32

4

1

8

13

9

Xenos_1024_8

1GB

32Mx32

8

2

8

13

9

4.3 Memory

The memory devices shall conform to the GDDR3 memory specification. The key features of the
devices are:

• 512Mb device density configured as 16Mx32 devices
• Two data accesses per clock cycle with a 4n prefetch
• Differential clock. Clock frequency of 800 MHz
• Single ended, per byte read and write strobes
• Pseudo open drain I/O with calibrated output drive
• On die termination
• Packaging supports mirror function to allow a clamshell memory design
• Packaging supports 1.6Gbps signaling
• Need to add: banks, cycle time

Important features of the memory are:

• Given our best estimate of memory pricing and our overall cost target, a total capacity of

256 MB is targeted for the product. This would be based on (4) 512 Mb, x32 parts.

• Given the need for extra memory to accommodate debug and pre-optimized games, a total

capacity of 512 MB is targeted for the development systems.

This will be based on either (8) 512 Mb, x32 parts (2 DRAM loads on the data wires), or (8)
512 Mb, x16 parts. The first of these cases might require special operating conditions as
we do not need to guarantee operation in high volume and over all conditions. The latter
requires special part development by a DRAM vendor, though some vendors have
indicated the possibility of a single design supporting both x16 and x32.

• Because memory pricing is volatile and difficult to forecast, these targets could change and

the device availability over the life of the product must support a range from 128 MB to 1
GB of memory.

• Support for a path to (2) 1 Gb parts is necessary in the controller, though it may never be

cost effective to make this transition. This must be completely seamless.

• The memory devices shall support a boundary scan capability to allow verification of the

connection between the GPU memory controller and the memory devices. The test
coverage shall include shorts between signals, opens on the GPU, the signal trace or
memory device. It may support coverage that allows isolation of the problem to a particular

Microsoft Proprietary and Confidential. Page 19 of 46

background image

Version: 1.3

Xenon System Architecture

end of the signal; for example, the open is on the GPU side and not the signal trace or
memory device.

The memory configurations supported by the GPU and memory interface are documented in the
GPU section. See chapter 4.2.

Eventually, we’ll need specifications for the exact timing parameters (on the order of 1 or more
clock cycles) used by the bus.

4.4 Southbridge

Audio
DACs

Audio Out

I2S

Southbridge

Ethernet

MAC

S/PDIF

To AVIP

PCI-Express

Ctrl (PCIECB)

BackSide

Bus to NB

Group A

PCI-

Expr x2

On-Chip

Control

(OCBCB)

System

Flash

Interface

Flash

SMC

DVD

Drive

Interface

SATA

DVD Drive

HDD

Interface

SATA

HDD

SMBus (x2)

IR

Front-Panel

Power Supply Unit

DVD Tray

SiS

Microsoft

IR

DFT

Kernel Debug

Ana

UART

SMC GPIO

GPIOs

JTAG

TAP

NAND

EHCI

OHCI

Dead Box

Xface

SPI

8051

Timer

Interrupt

Collector

Bridge

(ECB2HDEDB

BDG) for LB
device DMA

WMA

Pro

Decoder

32

x2
dir

32

x2
dir

HBEDB

ECB(PCI)

AVIP

PHY

PHY

PHY

Ana

PWM x2

SMM

RTC

SMC Debug

UART

Ethernet PHY

MII

Bridge

(PCIE2ECB

BDG) for CPU -init

config

Note: Bus Arb /Muxing module for both busses

not shown . ECB: 15 units, HBEDB: 6 units

MAC

PHY

Group B

OHCI

EHCI

PHY

USB2.0/

1.1/XUSB

Group A

Devices

Group B

Devices

USB2.0/

1.1/XUSB

PHY

The SouthBridge chip is akin to those found in traditional PC architectures, but is customized for
the game console, primarily for cost reasons. Some of the more standard (or soon to be standard)
functions and interfaces in the SouthBridge are:

Microsoft Proprietary and Confidential. Page 20 of 46

background image

Version: 1.3

Xenon System Architecture

• PCI-Express x2 link as the bus interface to the NB.
• Serial ATA ports for both the DVD drive and an optional HDD.
• 10/100 Ethernet MAC.
• USB 2.0/1.1, which in this chip takes the form of separate EHCI and OHCI host controllers

for two sets of ports, group A (4 ports), and group B (5 ports). All USB ports support the
custom XUSB protocol which reduces EMI, as outlined in the “notes” below. All USB ports
also support remote wake via direct connection of transceiver inputs to the SMC.

The more custom aspects of the chip include:

• WMA Pro decode hardware for a portion of the audio processing; the rest is performed on

one of the CPU cores.

• System Management Controller (SMC) hardware, which is basically an 8051 core that

handles, among other things, power, reset, and thermal management. It is powered
separately in the Southbridge. It includes a cheap form of RTC (Real Time Clock), which
functions only when the unit is plugged in, as well as a programmable interval timer (cheap
version of 8254) which provides an interrupt rate of 1 ms for the OS scheduler. Finally, it
provides UART ports for both kernel and SMC firmware debug.

• System Flash interface. This is custom in the sense that (1) it allows for NAND Flash parts

to be used to reduce cost, and (2) has the necessary control interfaces to allow SMC code
to be DMA’d into an internal Southbridge SRAM, as well as the kernel to be DMA’d into
main memory.

[NB3][GW4]

• IR interface. A small hardware block samples and decodes the output of an external

demodulator, which then allows the SMC communicate the commands received to the
main CPU..

• Wireless game controller interface. The baseband controller for wireless will live in an

external chip and communicate to the Southbridge over a USB 1.1 interface.

All of the devices within Southbridge communicate over an on-chip bus interface which is custom to
Microsoft, but based on an internal version of the PCI bus.

4.4.1 Notes

XUSB is basically USB 1.1 run in LS mode (1.5 Mbps) but with two key modifications:

• Traffic is not broadcast to all ports as in standard USB 1.1, there are additional bits the

driver controls to direct it appropriately. This reduces EMI.

• The payload is 32B instead of the maximum 8B allowed in standard LS mode. This allows

for enough bandwidth in our worst-case scenarios (4 controllers, headsets, etc).

4.5 Ana

This section needs to focus on the only real architecturally-significant functional blocks in the ANA
chip: the DVE and the clock architecture. The thermal sensor, fan control, etc are design-specific
details that are not relevant to the Xenon console.

The Ana chip consists of four main functional blocks: A Thermal Sensor (TS) block which is used to
monitor system temperatures, a Clock Synthesizer (CS) which is used to generate the required
system clocks for Xenon, a Digital Video Encoder (DVE) which is required to convert a digital pixel
stream from the Northbridge into analog outputs suitable for connection to a television or monitor,
and a System Management Bus Interface which provides the host interface for Ana. A simplified
block diagram of the Ana chip is shown in Figure 4.1.

Microsoft Proprietary and Confidential. Page 21 of 46

background image

Version: 1.3

Xenon System Architecture

Thermal

Sensor

Clock

Synthesizer

Control

Interface

Digital

Video

Encoder

Digital Timing

System Clocks

Bypass Signals / Clocks

Ana

1 X Pixel Clock In

To External

Sensors (Diodes)

From Pixel

Generator

( GPU)

To System Clock

Destinations

To Filters &

Video Interface

Port

To/From

SMC

System

Crystal

( 27 MHz)

For Test

Analog

SMBUS

DAC A

DAC B

DAC C

DAC D

Analog

Analog

Analog

Analog

CSR Interface

CSR

Interface

CSR Interface

Video Clock

Reset

From SMC

Pixel Data

Power Down Signals

Digital Timing (Syncs)

Power-on

reset

To SMC

Stdby power

Fan driver op-

amps

PWM from SB

Fan feedback

Fan drive output

2 X Pixel Clock Out

Figure 4.1. Top Level Ana Block Diagram.

The configuration and control of the blocks within Ana is provided by the Control Interface
(CI). The System Management Bus (SMBus) is a 2 line serial clock and data bi-directional
interface. An external host connected to the SMBus is required to correctly configure the chip. The
CI acts as a bridge through which the external host can read and write registers that are local to
each of the other internal blocks (DVE, TS, and CS).

Ana also includes the ability to write DVE control registers using parallel transfers via the pixel data
bus when enabled. These control packet transfers are initiated by software using the GPU and
should only be performed when the video encoder is not using that bus for pixel data (i.e. only
when the video encoder is either not enabled or is blanked). Writes when the encoder is using
the pixel bus for pixel data or any reads must be performed over the SMBus.

System clocks are generated by the Clock Synthesizer (CS) and driven off chip. These clocks are:

Microsoft Proprietary and Confidential. Page 22 of 46

background image

Version: 1.3

Xenon System Architecture

• 25 MHz Ethernet clock output to Ethernet PHY
• 25 MHz Serial ATA reference clock to Southbridge
• 100 MHz (down spread spectrum frequency modulated to -0.5%) for Serial ATA in

Southbridge

• 100 MHz (down spread spectrum frequency modulated to -1.5%) for PCI Express in

Southbridge, Northbridge system clock and CPU system clock

• 48 MHz standby clock for Southbridge (for USB and System Management Controller)
• 24.576 MHz audio clock to Southbridge
• Programmable video clock for DVE (only driven off chip for test)
• Programmable 2x pixel clock for Northbridge

The system crystal oscillator (27 MHz) is the default reference clock for all the PLLs. In order to
provide for A/V sync, the audio, video and 2x pixel clocks must be generated from the same source
and with deterministic ratios such that there is 0 ppm drift between these various clocks. The CS
also has the ability to select an external AV oscillator path to drive the audio, video and 2x pixel
clock PLL reference clocks.

The video clock is programmable to accommodate the numerous video standards that are
supported. The video PLL can be bypassed and the clock tree driven by an external clock
chip. The 2x pixel clock which is driven from a separate PLL is used by the Northbridge for
generating a 1x pixel clock that is sent back to Ana to clock in pixel data on the (Xenon Digital
Video Output) XDVO bus. The video clock and 1x pixel clock maintain integer ratios determined by
the level of over-sampling the DVE is using. The programming of these clocks is done via the CI.

The clock synthesizer block includes power down inputs that can be used to turn off some of the
clock drivers (via software) during standby mode.

The Digital Video Encoder (DVE) is used to convert a digital pixel stream from an external device
into analog video suitable for output to a TV or monitor. The DVE supports a variety of analog
video standards including NTSC, PAL, component standard definition and high definition as well as
VGA video standards. The DVE does not support resizing of video data so the Northbridge must
supply video data with the resolution required by the video encoding format. The bulk of the video
processing required in the system is performed by the Northbridge. There is limited video
processing functionality in the video encoder block itself.

The DVE receives a video clock from the CS that is used to generate the output video timing
signals and clock the digital processing pipelines in the DVE. This video clock is different for the
various supported output video standards and it is therefore programmable in the CS. A 2x pixel
clock is output to the Northbridge and a 1x pixel clock returned. The input 1x pixel clock is used to
clock in the digital pixel data received from the Northbridge’s pixel source data.

The Thermal Sensor module (TS) receives input from 5 remote diode channels. Four of these
remote diode channels will be connected to thermal diodes on system components (CPU,
Northbridge, EDRAM and a voltage regulator temperature sense diode). One channel is devoted
for calibrating the thermal sensor during manufacturing board functional test. The TS is comprised
of an analog switch, ADC and current source generator along with digital logic for delivering
temperature data to a register interface. The TS registers are accessible through the CI. The TS
uses a delta VBE method for measuring temperature.

Ana contains a power-on reset cell which detects 3.3V standby, 1.8V standby and generates an ok
signal once both voltages cross appropriate thresholds. This ok signal is fed to logic which
generates internal resets for the Ana PLLs as well as the SMC reset for the Southbridge.

The

power-on reset cell also has a separate comparator to detect thresholds on a

n external sense

input

which is divided down from the system board’s 12V supply.

Ana contains two fan driver op-amps which each take as inputs

Microsoft Proprietary and Confidential. Page 23 of 46

background image

Version: 1.3

Xenon System Architecture

• A pulse width modulated signal converted to a fixed voltage from the Southbridge
• A feedback signal from external fan driver circuitry
• The output of the op-amps drive the external fan drive circuitry.

Ana contains a JTAG interface for boundary scan of the XDVO bus and for controlling an internal
tap controller which can be used to access analog IP test structures. Parallel scan chains will be
used for ATPG coverage of digital logic.

Interrupt functionality is provided via an interrupt pin (VID_INT). This pin is attached to the closed
caption logic in the DVE and will signify when the hardware is ready to accept more closed caption
data for a specific field.

There are also miscellaneous pins on Ana devoted to various contingency/visibility options (e.g.
bypasses for PLLs, power on reset cell and crystal oscillator, bringing out video clock, oscillator
output and output of power-on reset cell to pins).

The following pertains to the system clock generation:

• All component outputs are observable on Ana pin for debug and measurement purposes.

Similarly, all inputs to components can be directly supplied from pins (either during
standard operation or via bypass inputs for debug, measurement, and contingency
purposes.

• Power down capability for every PLL (but not the oscillator).
• Single 27 MHz oscillator source with bypass allowing external clock source. Output of

oscillator available on external pin for debug, measurement purposes as well as ability to
slave external clock generation device to internally generated clocks (though phase locking
not supported). Always running as long as box is plugged into wall.

• Programmable Video (and 2xPixel) clock generation (up to 170MHz video DAC frequency

or 135 Mpix/sec pixel rate) for video encoder and pixel interface. With bypass capability
allowing video clock to be supplied from external clock source. Locked to audio clock.
2Xpixel clock is output externally via differential outputs. Output enabled only when box is
powered on.

• 24.576MHz Audio clock generation. Locked to video clock. Output enabled only when box

is powered on.

• Input to audio and video PLLs from either the on-chip oscillator or from external clock

source. Allows for off-chip pullable oscillator.

• 25 MHz clock generation for Ethernet clock and Southbridge. Outputs enabled only when

box is powered on.

• 48 MHz standby clock generation for internal SMBus clock (bypassable with external pin)

and for standby clock to Southbridge. Always on as long as box is plugged into wall.

• 100 MHz clock generation for Serial ATA components and interfaces. Selectable spread

spectrum with -0.5% down-spread triangle modulation. Output via differential outputs.
Output enabled only when box is powered on.

• 100 MHz clock generation for Northbridge, PCI Express and CPU (separate differential

outputs for each. Selectable spread spectrum with up to -1.5% down spread spectrum
modulation. Outputs enabled only when box is powered on.

The following pertains to the thermal measurement block:

• remote temperature sensing channels to monitor CPU, GPU, EDRAM and board

temperature diodes in addition to calibration channel

• Metal fuse window planned for trimming band gap

Microsoft Proprietary and Confidential. Page 24 of 46

background image

Version: 1.3

Xenon System Architecture

• Programmable Resolution (<= 1

o

C)

• +/- 2

o

C accuracy in 80– 140C range, +/- 4

o

C accuracy in 0-79C range

• < 500 uA operation current

The following pertains to the video encoder:

• Support SDTV formats NTSC-M/J, PAL 60 (640x480I/60Hz, 720x480I/60Hz), PAL-

B/B1/D/G/H/I (640x576I/50Hz, 720x576I/50Hz).

• Support EDTV formats (640x480P/60Hz, 720x480P/60Hz, 640x576P/50Hz,

720x576P/50Hz)

• Support HDTV formats (1280x720P/50Hz, 1280x720P/60Hz, 1920x1080I/50Hz,

1920x1080i/60Hz)

• Support VGA formats (programmable up to 135MHz pixel rate).
• Component Output Support (R/G/B, Y/Pb/Pr), CVBS/Composite Support (CVBS/Y/C), or

SCART Output Support (CVBS/R/G/B) (only one at a time)

• Programmable Color Space Conversion (for SCART RGB input for composite output).
• Support 10:10:10 bit YUV or RGB input data.
• Up to 12x over-sampling supported to reduce reconstruction filter requirements.
• Programmable filters for components, composite luma, composite baseband chroma, and

composite bandpass chroma.

• Programmable sync slew rates.
• Support closed captioning for SDTV formats.
• Support Macrovision 7.1L1, EDTV (525p/625p).
• Support WSS encoded in VBI interval for SDTV, EDTV formats.
• Support sideband WSS signaling for Japan s-video and SCART
• Support sync-on green, digital CSYNC or digital HSYNC/VSYNC for VGA
• Support CGMS-A
• Slave mode timing.

The following pertains to the video DAC:

• Four 10-bit DACs.
• Video Signal to Noise > 75 dB (noise relative to flat DC input)
• Differential nonlinearity < +/- 1.0 LSBs
• Integral nonlinearity < +/- 2.0 LSBs
• 35 mA drive capability per DAC

The following pertains to the Power Supply:

• 3.3V standby power supplies for analog IP and digital/analog I/Os
• 1.8V standby power supply for digital IP
• 3.3V power supply for DAC

Microsoft Proprietary and Confidential. Page 25 of 46

background image

Version: 1.3

Xenon System Architecture

4.6

Front Side Bus (Art)

The FSB is divided into six sections consisting of three Layers which are further broken into
Transmit and Receive sections: The Transport layer communicates with the rest of the CPU/GPU,
The Link layer is responsible for CRC generation/checking and handling error checking and packet
retransmission (as well as data alignment in the receive section), and the Phy layer, which changes
from a wide, slower interface to the 5.4 Gb/sec. PCB lanes. The phy receive layer is responsible
for bit-aligning the data to the forwarding clock transmitted with the data.

The CPU FSB transport layer also communicates with the Security Unit for encryption and
decryption of some data. It also has some other logic not needed in the GPU version: An ability to
map the 10-bit MPI tags used inside the CPU to the 5-bit Transaction IDs (TIDs) used by the FSB
(and the GPU). Other than these differences, the primary difference between the CPU and GPU
versions of the FSB are that they are implemented in different silicon processes, and the units have
different primary datapath widths: 8 bytes in the CPU and 16 bytes in the GPU.

Needs update …

The Design of the FSB Link and PHY will be provided by the CPU company.

The FSB is a high performance bus that connects the GPU and the CPU. Like many high speed
busses, it is comprised of a link and a physical layer. The general physical specifications of the
bus are in the following table.

Topology

Unidirectional Point- Point

Signaling

Low voltage Differential CML, DDR

Clocking

Clock forwarding, On Clock signal per byte

Frequency

3Ghz Clock for 4Gb/s data rate

Number of Lanes

CPU-GPU: 32 GPU-CPU: 16

Byte Transfer

Parallel to Serial (8:1)

Framing

Frame signal per 16 bits

Transfer efficiency

89%

4.6.1 Link Layer

Defines Packets for transactions

Maintains Credit-based flow-control

Detects some Errors, Retries if possible and Reports hard errors.

The Link layer protocol defines the packets sent over the link as well as link initialization, flow
control and error handling. Link packets carry transactions (i.e. Read, Write) from the CPU/GPU
while also taking part in maintaining flow control for each of 4 virtual channels that are supported
over the link. Link Packets are specifically designed for the coherent system in that fields and
command types support requests and responses are required by the coherent environment.

4.6.1.1

Packet Types:

Packet Type

Usage

Control 0 Packet

Xmit Acks and Credits for VC 0,1 – used for link
training and link retries in event of sequence error

Control 1 Packet

Xmit Acks and Credits for VC 2, 3

Basic Command Packet

8 Byte Payload packets min.

Extended Command Packet

8 Byte Payload Packets with mask

Response Packet

Completion with or without data

Microsoft Proprietary and Confidential. Page 26 of 46

background image

Version: 1.3

Xenon System Architecture

4.6.1.2

Command Types and Packets Used

Command Type

Packet Type Used

Description

Write (Length spec’ed)

Basic command

Write of various lengths up to 128 bytes
(count indicated with a field). All bytes
valid and to be written

Write

Extended

Includes a byte mask

Read

Basic command

Different Flavors for Read Line, Read
Dword, Read Intent to modify, Read with
no intent to cache, Read W/O Claim

Synchronization

Basic

Used to enforce ordering requirements

Coherence Commands

Basic

Castout, Clean, Deallocate Directory tag,
Dkill, Flush, IKill

Rerun

Basic

Interrupt

Basic

IORead

Basic

IOWrite

Basic

Data Response

Response

Completion Response (non-
data)

Response

For each of the FSB commands, the packets include the following informational fields:

Field

# Bits

Usage

Command Type

5

Defines type – i.e. Read, Write, Interrupt

Virtual Channel

2

Channel # used – VC’s allow flexible ordering models

Sequence Count

4

Per VC – used to track lost packets – up to 16 outstanding

Transaction ID

4

Effectively a Tag to identify responses

Address

32

Memory address for transaction

Byte Mask

8

Used for partial updates

Command Modifiers

3

Defines more specific action for a command – i.e.
Synchronization types for Sync command or
Castout vs Kill for Coherence commands

CRC

16

Like parity only better

<Chip to add data to packet>

4.7

Back Side Bus

The Back Size Bus is a 2 lane PCI-Express link connecting the SouthBridge to the GPU.

\\xenon\specs\Architecture\Xenon BackSide Bus Spec.doc

4.8 Memory

Bus

The memory bus shall conform to the GDDR3 specification. The key features of the GDDR3 spec.
are:

• Unified memory architecture utilizing high-speed graphics memory devices.
• Total data bus width of 128-bits
• Data signaling at 1.6 Gbps, implying peak bandwidth 25.6 GB/sec.

Microsoft Proprietary and Confidential. Page 27 of 46

background image

Version: 1.3

Xenon System Architecture

• Because memory pricing is volatile and difficult to forecast, these targets could change and

the interface must support a range from 128 MB to 1 GB of memory.

• To provide better utilization of the memory bus, two independent memory controllers are

likely. The memory interleaving between these controllers, as well as between the internal
banks within the DRAM, is to be determined.

• Support for a path to (2) 1 Gb parts is necessary in the controller, though it may never be

cost effective to make this transition. This must be completely seamless.

• Note that identical performance between XDKs and consoles is only met when running in

the lower half of the XDK memory. When the upper half of the XDK memory is used, there
is an extra cycle penalty when doing back to back reads from alternating ranks.

• The memory interface shall support a boundary scan capability to allow verification of the

connection between the GPU memory controller and the memory devices. The test
coverage shall include shorts between signals, opens on the GPU, the signal trace or
memory device. It may support coverage that allows isolation of the problem to a particular
end of the signal; for example, the open is on the GPU side and not the signal trace or
memory device.

The memory configurations supported by the GPU and memory interface are documented in the
GPU section. See chapter 4.2.

4.9 SMBus

I think only one is required, that which must communicate to the AVIP. The mechncanism by which
the DVE is configured I believe is implementation-specific.

The system shall support two SMBus V2.0 compliant interfaces. These interfaces may be
contained in the Southbridge and shall be accessible by the CPU and System Management
Controller.

One SMBus interface shall be used for communication internal to the system. In the existing
system architecture, the interface is used to connect the Southbridge to the Ana IC. If other SMBus
devices are added to the system, they shall use this interface.

The second SMBus interface shall be used for communication with external devices over the Audio
Video Interface Port (AVIP). The two uses identified are:

• Video Electronics Standards Association Display Data Channel (VESA DDC) found on

Video Graphics Adapter (VGA) monitors

• Controlling the console in a Kiosk for point of sale demonstration purposes. The controls

include power, DVD and system settings. This application may be extended to include
development and factory test.

The SMBus interfaces may be time multiplexed, thus the Southbridge is the only supported master.

Microsoft Proprietary and Confidential. Page 28 of 46

background image

Version: 1.3

Xenon System Architecture

5 System

Components

5.1 Questions

Are we going to encrypt data. Linked to per box basis? What level or performance extraction.
Critical levels of performance. Be explicit on this.

How do we bundle things together for online. Harddrive be some of internal. Assume user level
expansion.

Some of these should be pluggable by the user. Have Xbox LOGO program.

5.2 DVD

The DVD drive is a custom form factor drive built specifically for the Xenon console.

General Specifications:

Form-Factor: Sub-half-height, custom form factor for Xenon console

[NB5]

Interface: SATA 1.0 + sideband tray control and status

Speed: CAV, 12x DVD at outer diameter

Media formats (read-only): CD, CD-R, DVD-5, DVD-9, DVD-X2 (Xbox 1.0), DVD-X3 (Xbox
2.0)

Access time: 115ms average

5.3

[NB6]

HDD

The HDD is based on a cost-optimized small form factor drive.

General Specifications:

Form-Factor: Standard 2.5”

[NB7]

Capacity: 20GB or more

Interface: SATA 1.0

Speed: Comparable to industry metrics of 10ms average seek time, 20MB/s transfer rate

5.4

[NB8]

MU

The console has two MU-specific slots on the front panel.

General Specifications:

Capacity: 64MB to 1GB

[NB9]

Interface: USB 2.0 logical interface, custom slot connector

Durability: 300k write/erase cycles

[NB10]

Random Access Time: 25ms

Read Speed: 8MB/s

Write speed: 1MB/s

Microsoft Proprietary and Confidential. Page 29 of 46

background image

Version: 1.3

Xenon System Architecture

5.5 Game

Controllers

The console architecture supports connectivity to both wired and wireless game pads designed
specifically for Xenon.

Wired Gamepad:

Interface: X-USB, low-speed USB signaling with expanded payload capability

Power: 4 units or less of USB power

Connector: Standard USB connector

[NB11]

Wireless Gamepad:

This section should focus on how the wireless radio interconnects to the system via USB. There
should be reference to the remote resume requirement.

Wireless Interface: Custom 2.4GHz spread spectrum, half-duplex transceiver

Wired Interface: Used during recharging, X-USB

5.6

[NB12]

Network

The console includes an Ethernet network port:

Connector: RJ-45 with integrated LED indicators for link and activity status.

[NB13]

Connection: 10Mbit and 100Mbit Ethernet

Peer-to-peer connectivity: Auto-MDIX supports peer-to-peer connectivity without hub or
crossover cable.

5.7 Peripheral

Expansion

Expansion of the base console and connection to peripheral devices is via three USB 2.0 ports, two
mounted on the front panel, one mounted on the rear panel.

6 Distributed

Components

6.1 Audio

A description of the distributed Audio processing can be found in the following document.

\\xenon\specs\Architecture\Xenon Audio.doc

6.2 Video

A description of the distributed Video processing can be found in the following document.

\\xenon\specs\Architecture\Xenon Video.doc

Microsoft Proprietary and Confidential. Page 30 of 46

background image

Version: 1.3

Xenon System Architecture

7 Key

Architectural

Mechanisms

7.1 System

Dataflow

This document discussed the main producer / consumer models of the system.

\\xenon\specs\Architecture\Xenon System Dataflow.doc

7.2

System Memory Map

The System Memory Map is maintained in 2 separate documents:

\\xenon\specs\Architecture\System Memory Map (32 bit).doc

\\xenon\specs\Architecture\System Memory Map (42 bit).doc

7.3 System

Coherence

\\xenon\specs\Architecture\Xenon System Coherence Model.doc

7.4

System Interrupt Mechanism

\\xenon\specs\Architecture\Xenon Interrupt Specification.doc

7.5 System

Ordering

\\xenon\specs\Architecture\Xenon System Ordering.doc


2-6-04 Ordering
I’ve extracted the relevant part of Hartog’s response on this (long and tortuous) thread:

I think that the implication of the behavior we're discussing here is that the GPU and the CPU cannot reliably use a flag
in memory to indicate the arrival of some chunk of data from the CPU, since if the GPU polls such a flag, it can see it
set before the data is actually delivered. This occurs because the in-order property for writes is not preserved for
locations that are in different banks (channels) , and because there is no concept of a "conflicting read" for reads issued
by the GPU. I think the question for us is whether the GPU ever needs to do such polling. Right?
[SH] Exactly.

7.6 System

Security

\\xenon\specs\Architecture\Xenon System Security Scheme.doc

7.7 System

Endianess

\\xenon\specs\Architecture\Xenon Southbridge Endianness.doc

7.8 System

Clocking

\\xenon\specs\Architecture\Xenon System Clocking.doc

Microsoft Proprietary and Confidential. Page 31 of 46

background image

Version: 0.40

Xenon System Architecture

7.9 System

Boot

<SMC Boot Sequence + FLASH Swizzle>

How does the kernel boot. What is the boot sequence. What systems come up in what order. What
resets get released. How does hardware get initialized with hardware drive strengths. Everything
up to high.

Not using standard BIOS.

Need to let CPU execute out of Flash, i.e. a path must exists that allows the CPU to access code
from the system flash connected to the SouthBridge. This implies that out of reset, the hardware
must be in a default state that a). enables this path and b). allows this path to be reliable. This is
required before any internal registers within the CPU, GPU or SouthBridge can be set.

1. SMC boots then waits for a power up event

a. When power is applied to the console, the SMC comes out of reset and the SMC

boot loader, which is hard coded in a ROM inside the Southbridge, copies the
SMC code from the system flash into a code RAM located inside the Southbridge.
The System Flash Controller is responsible for correcting errors that occur during
the copy from the system flash to the internal code RAM. Any unrecoverable errors
shall be handled by the SMC boot loader.

b. SMC boot loader restores SMC state to a reset state and then jumps to the code

RAM.

2. SMC detects power up event and boots system

a. The SMC enables the system clocks, and voltage regulators

b. The SMC released the SB, and then the GPU from reset. It monitors the BSB link

training and on completion of that, releases the CPU from reset.

3. The CPU trains the FSB link and starts executing the 1BL (first boot loader) code from the

internal ROM. The code in the internal ROM shall have the minimum device dependent
settings and information to allow loading and execution of the 2BL from the system flash.

The CPU, GPU and SB shall support a mechanism to allow fetching and execution of the
1BL from the system flash. This mechanism shall be functional without any configuration or
setup.

4. The 2BL sets up the hardware to get memory going and anything to get ROM XIP more

performant. Once the memory is up and running, the 2BL copies the 3BL into memory and
jumps to it.

5. The 3BL uses the System Flash controller’s DMA interface and copies the kernel from the

system flash into memory. It patches and verifies the memory image. Again system flash
interface in SB deals with idiosyncrasies of NAND flash. Incidentally, some value slightly
less than 8 MB is actually available to software due to parts shipping with bad blocks.

6. The CPU jumps to the kernel now in memory.

7. Main system boot phase. Note that none of what was described dealt with encryption or

security in the Southbridge or the system flash itself. Believe this is consistent with
Dinarte’s preferred scheme but need to parse latest mail and verify this.

Microsoft Corporation Confidential

32

background image

Version: 0.40

Xenon System Architecture

7.10 System Reset

This section should focus on the aspects of reset sequence that are relevant to architecture. It is
my opinion that the details of reset sequence are entirely design-specific.

System Reset is controlled by the System Management Controller. The various reset signals are
configured in a star topology, with the SMC as the hub. The reset sequence follows the power-
supply sequence, and consists of sequentially releasing reset to various components, verifying an
acknowledgement, and proceeding to the next stage.

The detailed power-on reset diagram is located in

\\xenon\Specs\Hardware\Xenon Power on Reset.vsd

7.11 System Time

The system time shall be maintained in a forty bit counter.

The time base for the counter shall be one millisecond.

The tolerance shall be below +/- 50 ppm over the life of the system.

[NB14]

The counter is reset when the system is in OFF state. The time shall be maintained when in
Standby power state.

7.12 Power States & Power Management

The three power

[NB16]

states for the system are: “Off”, “Standby”, and “Operational”.

To reduce overall system power consumption, power minimization techniques may be employed in
all states.

This section should describe the aspects in which the architecture needs flexability in order to
support quiet mode operation:

1. CPU cores must be capable of being individually enabled/disabled
2. The GPU must have capability of disabling shaders
3. The CPU shall have selectable clock frequency between “Fast” and Slow”

SW will need to go through a rigid sequence of operations to gracefully change power modes
(there is no automatic hardware power sequencing control).

7.12.1 Off

This is the state of the system when the power supply is not plugged into the wall and/or the power
supply is not plugged into the system.

When the system power supply is plugged in and the power supply is connected to the system, the
system shall transition to the Standby state.

Regardless of how short a duration of time that the system enters this state, on application of
power, it shall transition to the Standby state without operator intervention subject to certain ESD,
susceptibility limitations.

7.12.2 Standby

This state shall be entered from either Off, Quiet or Full Power states.

Microsoft Corporation Confidential

33

background image

Version: 0.40

Xenon System Architecture

In this state, the functions that shall be powered include: The clock generator for the SMC, the
SMC, the SMC firmware store, the front panel button circuitry, the expansion ports, the power
circuit in the AVIP, the IR receiver and demodulation block, the wired and wireless controller ports.

On detection of any power up event on the front panel buttons circuitry, the AVIP, the IR
demodulator, the wired and wireless controller ports, and the SMC power cycle timer, the system
shall transition to the Full Power state.

For details on how the system handles power events via the expansion port, see section 7.12.4.2

In the event of a power interruption that is less than 16 milliseconds in duration, the system shall
remain in this state and continue to operate as if no power interruption occurred. If the power
interruption is longer than 16 milliseconds, the system may transition to the Off state.

[NB18]

7.12.3 Operating States

This section should take the place of the Quiet and Full Power states below, and focus on the
configurations of the system to implement various power states. The degrees of freedom are:

CPU Cores enabled {1, 2, 3}

GPU Shaders enabled { 1, 4, 12}

ODD spin max speed: {slow, med, fast}

7.12.3.1 Quiet

This state shall be entered from Full Power state only.

This state is designed for A/V playback and wireless game pad charging. The goal is to minimize
system acoustic level. This shall be accomplished by reducing the system power consumption to a
minimum. The CPU, NB shall include circuitry that allows portions of the chips to be turned off,
slowed down either under program control or by internal circuit usage determination functions.

The SB may include circuitry that allows portions of the chips to be turned off, slowed down either
under program control or by internal circuit usage determination functions.

On detection of a power down event, the system shall transition to the Standby state.

On detection of loss of power, the system software shall put the system in a safe state and
immediately transition to the Standby state.

The system software may transition the system to Full Power state based on user input.

The system software shall monitor transitions between this state and Full Power state and ensure
that they occur at a frequency that is not annoying to the user.

7.12.3.2 Full Power

This state shall be entered from Standby or Quiet state.

In this state, the system shall operate at its maximum capabilities.

On detection of a power down event, the system shall transition to Standby state.

On detection of loss of power, the system software shall put the system in a safe state and
immediately transition to the Standby state.

The system software may transition the system to Quiet state based on user input.

The system software shall monitor transitions between this state and Quiet state and ensure that
they occur at a frequency that is not annoying to the user.

7.12.4 Power Control Events

[NB19]

This section documents the power control events that can be generated via the front panel button
circuitry, AVIP, the IR receiver and demodulator, the USB expansion port, and the wireless

Microsoft Corporation Confidential

34

background image

Version: 0.40

Xenon System Architecture

controller ports

[HS20]

. The event detection is done by a different subsystem depending on the

current state of the system.

Since there is the possibility of multiple power control events occurring simultaneously, the SMC
shall implement the following behavior:

• On detection of the first power up event, the SMC shall ignore all power control events for

a period of 500 milliseconds.

• The SMC shall report the first power up event
• Need to work with usability to create a chart that has priorities, multiple sources, etc. For

example, box is in standby, user presses and holds remote control on button and then
presses the power button on the console. Should the console go to full power (remote),
then transition to standby (front panel button), and then go back to full power (remote) ?

• I have put verbiage below on power transitions however, keep in mind that this verbiage

will be modified once we have the full chart from usability.

7.12.4.1 Front Panel Buttons

The front panel consists of two momentary push buttons: Power Control Button, DVD Tray Control
Button.

7.12.4.1.1

Power Control Button

This button shall be used to toggle the power of the system from Standby to Full Power and from
either Full Power or Quiet to Standby.

When this button is pressed and the system is in Standby mode, the SMC shall note the source of
the power up event is the power control button, transition the system to Full Power state and when
requested by the system software, provide the power event source.

When this button is pressed and the system is in Quiet or Full Power state, the SMC shall note the
source of the power down event is the power control button, send a message to the system
software and wait for a response. If the system software doesn’t respond within 5 seconds, the
SMC shall transition the system to the Standby state.

7.12.4.1.2

DVD Tray Control Button

Insert DVD tray control info. here.

7.12.4.2 Expansion Port

A device plugged into any of the front or rear USB expansion ports may signal for a power up event
by signaling USB remote wakeup. The SMC shall monitor the state of the DP/DN signals on each
port, and look for either a J to K or a K to J state transition, which indicates a remote wakeup has
been signaled.

Standard USB devices only signal remote wakeup if they are enumerated, that feature is enabled,
and subsequently placed in standby mode. The wired game pad shall be designed such that
remote wakeup may be signaled at any time following connection; i.e. the wired game controller
does not need to be preconfigured to allow remote wakeup signaling.

While the system architecture supports transitions from Quiet state or Full Power state to Standby
state, the implementation is heavily device, device protocol and system software dependent. The
implementation of this is beyond the scope of this document and is not covered here.

7.12.4.3 AVIP Power Control

There is a dedicated signal from the AV interface port that is used to enable external devices to
issue power control requests and well as issue events to the SMC that require further action. The
PWRON signal is pulled up to 3.3V standby power on the motherboard and is an input to the SMC
and output to the AVIP. When the system is in Standby state, the SMC shall monitor the PWRON

Microsoft Corporation Confidential

35

background image

Version: 0.40

Xenon System Architecture

signal for a low value. A power on sequence is a low value for >40 msec. Upon detection of the
power up sequence, the SMC shall note the source of the power up event is the AVIP power up
command, transition the system to Full Power state and when requested by the system software,
provide the power event source.

In Quiet or Full power states, the SMC shall continue to monitor the PWRON signal. A low value
sequence that is ~40msec in length is used to indicate that the SMC needs to issue a transaction
on the DDC interface on AVIP to go read further control/status information. One of the control
modes returned could be a power down request. Also, a low value sequence that is ~200msec or
more is used to indicate a forcible power down state. Upon detection of either of the power down
sequences, the SMC shall note the source of the power down event is the AVIP power down
command, send a message to the system software and wait for a response. If the system software
doesn’t respond within 5 seconds, the SMC shall transition the system to the Standby state.

If auto power on required, a controller connected to the AVIP interface could use a relay (such that
when power is not applied, the signal is shorted to ground) to ground the signal which SMC
samples to indicate power on. +5V then becomes available to controller which energizes the relay
to isolate the signal from ground and allow the controller to drive the signal. If no auto power on is
needed, one could hook a momentary switch to the PWRON to pull the signal to ground for power
up. The SMC maintains power on state so that when the machine is commanded to shut down
(and the +5V goes away, which cause the relay to shut PWRON to ground), the SMC will then
ignore the fact that the PWRON is pulled low (so you don't get endless repetition). Similarly,
because the enabling of +5V may take sometime, the SMC will wait to see PWRON go high on a
transition from standby and poweron. If that event doesn’t happen in 5 secs, the SMC will log an
error event and power down the machine until it sees the PWRON signal asserted.

The AVIP Power Control protocol is document in the Xenon Design Specification.

7.12.4.4 IR Receiver and Demodulator

This block is different from other blocks in that the output of this block has to be processed to
determine whether a power control event has occurred.

When the system is in Standby state, this block shall decode the signal from the IR receiver and
output it to the SMC. The SMC shall interpret the command and compare it to the IR power up
command. If the received command matches the IR power up command, the SMC shall note the
source of the power up event is the IR power up command, transition the system to Full Power
state and when requested by the system software, provide the power event source.

In Quiet or Full power states, this block shall continue to decode the signal from the IR receiver and
output it to the SMC. The SMC shall interpret the command and compare it to the IR power down
command. If the received command matches the IR power down command, the SMC shall note the
source of the power down event is the IR receiver, send a message to the system software and
wait for a response. If the system software doesn’t respond within 5 seconds, the SMC shall
transition the system to the Standby state.

7.12.4.5 Wired Controller Ports

In a typical wired system topology there are one or more game pads that are plugged into the wired
controller ports on the console – one game pad per controller port.

Please note that through out this section the term game pad refers to the device that the user
manipulates and the term controller port refers to the hardware in the console that the game pad
connects to.

The wired game pads, controller ports, the SMC and system software shall be designed to allow
control of the system power states via the XUSB protocol. It shall not require any additional signal
wires or methods. The XUSB protocol is a derivative of the USB protocol where the data bit time
has been increased and intelligent downstream traffic routing is implemented. In the USB protocol,
downstream traffic is broadcast to all leaf nodes of the USB tree, in intelligent downstream traffic

Microsoft Corporation Confidential

36

background image

Version: 0.40

Xenon System Architecture

routing it is only sent to the specific leaf that the traffic is intended for. Both of these changes result
in lowered emissions, as well as increased cable lengths.

When the system is in Standby state, the controller port shall be powered and be able to detect
XUSB protocol based connect, disconnect and resume signaling from each wired controller port
and output it to the SMC; the state of the output to the system software is undefined since the
system software is not running in this state.

The SMC shall interpret the command using XUSB protocol connect, disconnect and resume
timing to determine if the controller is present and if so, requesting that the system be powered up.
On determination that a power up event has occurred via the controller port, the SMC shall note
particular controller port which is the source of the power up event, transition the system to Full
Power state and when requested by the system software, provide the power event source.

When the system enters Standby state:

• If there is no game pad plugged into a controller port, the XUSB signal lines shall be at a

single-ended 0 (SE0) state and the output to the SMC shall indicate the controller port
state is disconnected.

• If the user plugs in a game pad to the controller port and the power up button sequence

has not been activated, the game pad shall drive XUSB signal lines to the J state. This
behavior is the same when the controller is plugged into a system that is in Quiet or Full
power states. Furthermore, once the game pad detects that the bus is idle (J state with no
traffic) for the XUSB suspend time duration, it shall transition to a low power state. In this
low power state, the game pad shall be able to monitor its buttons for the power up button
sequence and perform XUSB resume signaling.

• If there is a game pad plugged into a controller port and the power up button sequence has

not been activated, the controller shall drive the XUSB signal lines to the suspend state (J
state) and the output to the SMC shall indicate the controller port state is connected.

• If there is a game pad plugged into a controller port and the power up button sequence is

activated, the game pad shall generate XUSB resume signaling by transitioning the XUSB
signal lines to the resume state (K state) for the XUSB resume time duration. The controller
port shall detect the resume state and the output to the SMC shall indicate the game
controller port state is resume.

When the system is in Quiet or Full power states, this block shall continue to detect the connect
and disconnect signaling and output it to the system software. While it may continue to output
these states to the SMC, the SMC shall ignore this output.

In the Quiet and Full power states the power sequencing is controlled by the system software in
concert with the game pad. When the user activates the Standby state button sequence, the
controller shall send the button sequence to the system software. The system software shall
process the sequence and this processing may ask the user for confirmation of the power state
change and after confirmation, initiate the process of changing the power state to the Standby
state.

In the event the system software is non-responsive to the Standby state button sequence, the
sequence will be lost and the system power state will not be alterable via this power control event.
Since the button sequence is sent as part of a XUSB data transfer, the controller port shall not
incorporate any logic to detect this.

7.12.4.6 Wireless Controller

In a typical system topology, there are one or more wireless game pad devices that connect via a
bidirectional wireless link to the wireless transceiver in the console. The wireless transceiver
connects to a wireless controller port via a USB interface. One notable difference between wired
and wireless controllers is that in the wired system, there is one controller port per game pad. In the

Microsoft Corporation Confidential

37

background image

Version: 0.40

Xenon System Architecture

wireless system, all the game pads share the same wireless link and connect to a single
transceiver which connects to a wireless controller port in the console.

Please note that through out this section the term game pad refers to the wireless game pad device
that the user manipulates and the term controller port refers to the wireless controller port in the
console.

There are two ways to do this – one way is use USB suspend and resume like the wired controller
and the second way is to use a separate wakeup signal from the transceiver.

7.12.4.6.1

USB Power Control

The game pads, link, transceiver, controller port, the SMC and system software shall be designed
to allow control of the system power states via the USB protocol. It shall not require any additional
signal wires or methods.

When the system is in Standby state, the controller port shall be powered and be able to detect
USB protocol based resume signaling from the transceiver and output it to the SMC; the state of
the output to the system software is undefined since the system software is not running in this
state.

The SMC shall interpret the command using USB protocol resume timing to determine if the
transceiver is requesting that the system be powered up. On determination that a power up event
has occurred via the controller port, the SMC shall note that the wireless controller port is the
source of the power up event, transition the system to Full Power state and when requested by the
system software, provide the power event source.

When the system is in Standby state:

• The transceiver shall attempt to establish a link with game pad(s).
• While the transceiver hasn’t established a link with a game pad, it shall drive the USB

signal lines to the J state. Once the transceiver detects that the bus is idle (J state with no
traffic) for the USB suspend time duration, it shall transition to a low power state. In this low
power state, the transceiver shall be able to establish a link with game pad(s) and perform
USB resume signaling.

• If the wireless transceiver establishes a link with a game pad(s) and the power up button

sequence has not been detected, it shall continue to drive the USB signal lines to the J
state. Once the transceiver detects that the bus is idle (J state with no traffic) for the USB
suspend time duration, it shall transition to a low power state. In this low power state, the
transceiver shall be able to maintain the link with game pad and establish links with other
game pad(s) and perform USB resume signaling.

• If the wireless transceiver establishes a link with a game pad(s) and the power up button

sequence has been activated, the transceiver shall generate USB resume signaling by
transitioning the USB signal lines to the resume state (K state) for the USB resume time
duration. The controller port shall detect the resume state and the output to the SMC shall
indicate the controller port state is resume.

In the Quiet and Full power states the power sequencing is controlled by the system software in
concert with the game pad. When the user activates the Standby state button sequence, the game
pad shall send the button sequence to the system software. The system software shall process the
sequence and this processing may ask the user for confirmation of the power state change and
after confirmation, initiate the process of changing the power state to the Standby state.

In the event the system software is non-responsive to the Standby state button sequence, the
sequence will be lost and the system power state will not be alterable via this power control event.

7.12.4.6.2 Dedicated

Power

Control

Microsoft Corporation Confidential

38

background image

Version: 0.40

Xenon System Architecture

The game pads, link, transceiver, controller port, the SMC and system software shall be designed
to allow control of the system power states via a separate set of signals between the transceiver
and the SMC instead of using the USB signaling between the transceiver and the controller port.

When the system is in Standby state, the controller port may be powered down and does not need
to detect USB protocol based resume signaling. If the controller port is powered down, the
transceiver connection to the controller port shall also be powered down.

The transceiver and SMC power control signals consist of a console standby power state signal
(TRAN_STANDBY) that goes from the SMC to the transceiver and a system resume signal
(TRAN_RESUME) from the transceiver to the SMC. The TRAN_STANDBY signal indicates to the
transceiver whether it should monitor the links for power up button sequences and relay the status
to the SMC via the TRAN_RESUME signal. On determination that a power up event has occurred
via the TRAN_RESUME signal, the SMC shall note that the wireless controller port is the source of
the power up event, transition the system to Full Power state and when requested by the system
software, provide the power event source.

When the system is in Standby state:

• The SMC shall assert TRAN_STANDBY
• On detection of TRAN_STANDBY, the transceiver shall transition to a low power state. In

this low power state, the transceiver shall be able to establish a link with game pad(s) and
assert TRAN_RESUME as appropriate.

• While the transceiver hasn’t established a link with a game pad, it shall negate

TRAN_RESUME.

• If the wireless transceiver establishes a link with a game pad(s) and the power up button

sequence has not been detected, it shall negate TRAN_RESUME. In this low power state,
the transceiver shall be able to maintain the link with game pad and establish links with
other game pad(s) and perform USB resume signaling.

• If the wireless transceiver establishes a link with a game pad(s) and the power up button

sequence has been activated, the transceiver shall pulse TRAN_RESUME for T

TRANRESUME

.

The SMC shall detect the TRAN_RESUME pulse and initiate the transition of the system to
Full Power state.

In the Quiet and Full power states, the SMC shall negate the TRAN_STANDBY signal and ignore
the state of the TRAN_RESUME signal.

In the Quiet and Full power states the power sequencing is controlled by the system software in
concert with the game pad. When the user activates the Standby state button sequence, the game
pad shall send the button sequence to the system software. The system software shall process the
sequence and this processing may ask the user for confirmation of the power state change and
after confirmation, initiate the process of changing the power state to the Standby state.

In the event the system software is non-responsive to the Standby state button sequence, the
sequence will be lost and the system power state will not be alterable via this power control event.

7.13 CPU/GPU Synchronization (Nick)

At a high level how the CPU/GPU are expected to communicate and synchronize with one another.

Most of the synchronization mechanisms are pretty standard w.r.t. PC graphics, with the exception
of the procedural geometry scheme discussed in the next section.

Microsoft Corporation Confidential

39

background image

Version: 0.40

Xenon System Architecture

7.14 CPU – GPU Procedural Geometry Communication (Nick)

This section describes what the GPU hardware must do to implement the procedural geometry
algorithm. For further details of this algorithm, refer to the document “Xenon Procedural Geometry”.

Conceptually, the GPU’s main command list is stored in main memory. The CPU kicks off the
command processor in the GPU through a register write that points the command processor to a
memory address. The command processor starts fetching commands and data from this address
until a pre-programmed stop address is reached.

Main Memory

GPU

CPU

Command

proc.

Reg

writes

Vertex

data

writes

Vertex

data

reads

Write

back

data

Rest of

the

pipeline

Start_addr

Curr_addr

We will discuss the current implementation from the concept later in this section.

First we will discuss the command processor commands.

7.14.1 Vertex commands

In addition to the traditional commands we need the following commands for the procedural
geometry algorithm.

CALL <addr>: GPU will store the current address and begin processing data at <addr>.

RETURN:

GPU will begin executing at the most recently stored CALL address. Multiple levels

of CALL/RETURN might be needed.

JUMP <addr> : GPU will begin executing commands from the <addr>

WRITEBACK <addr> <data> : GPU writes <data> back to <addr>.

7.14.2 GPU Memory mapped registers

Current_addr[31:0] : Vertex Unit is currently reading this address.

Stop_addr[31:0] : Vertex unit will stop processing the data when Current_addr == Stop_addr;

Start_vertex_proc[0:0] : Thjs kicks off the vertex unit to start from Current_addr.

At the start of the frame, the CPU will set Current_addr = start of the GPU push buffer. and
stop_addr = start of the GPU push buffer. Then the CPU sets Start_vertex_proc. Now vertex unit is
ready for data and wait until CPU changes Stop_addr register. Once it changes, the verytex unit
start fetching at Current_addr until Current_addr reaches Stop_addr.

Microsoft Corporation Confidential

40

background image

Version: 0.40

Xenon System Architecture

In the mean time the CPU is continuously updating the Stop_addr as more data is written into the
memory.

Note that if the GPU reaches the Stop_addr it just waits there until the CPU updates the Stop_addr.
There is no need for the Start_vertex_proc to be set again.

7.14.3 Current Implementation

The current implementation algorithm has been described in “Xenon Procedural Geometry”. This
section describes how the GPU might implement it.

GPU

CPU

Mem Cntl

Writeback

registers

Cpuc Fifo 0

Cpuc Fifo 1

Cpuc Fifo 2

L2 Cache

CPU Core

F
S
B

F
S
B

Command

Processor

Stop_addr

Curr_addr

GPU push

buffer

DRAM

Coherency

Block

The Rest of the

GPU Pipeline

1) Write GPU register
2) GPU writeback to CPU reg
3) GPU Data Request to Coherency
4) GPU request to CPU L2 Cache
5) CPU Cache line castout
6) GPU request to DRAM
7) DRAM returns data

The seven different actions that are going on between CPU, GPU and memory are indicated in the
diagram above. They are:

1) CPU core writes to a memory mapped register in GPU

2) GPU writeback to a memory mapped CPU register. This is caused by the WRITEBACK

command described in an earlier section. This would be used for updating CPUC fifo tail
pointer and GPU push buffer tail pointers. For the tail-pointer writebacks the CPU implements a
section of cacheable memory directly on its die. There are memory mapped registers in the
CPU.

3) GPU read data request to the coherency block. The coherency block decides whether the data

is to be read from main memory or CPU’s L2 and directs the request accordingly which is
shown in 4) and 6)

4) Coherency block determined that the data is CPU’s L2 and makes a request for cache line

castout.

Microsoft Corporation Confidential

41

background image

Version: 0.40

Xenon System Architecture

5) CPU gives back the Cache line castout data and is routed to the vertex unit as a response to 3)

6) Coherency block determined that the data requested by Vertex Unit is in Main memory and

makes a request to main memory.

7) Main memory gives the data back to vertex unit in response to 3).

This section is not trying to describe the coherency algorithm which is described elsewhere. We
have put in coherency module just to show how the data might flow depending on the architecture
of the GPU and how it will be modified to have the coherency block.

Note that the only GPU initiated transaction for which the CPU responds is a coherency
transaction. In 3) above the coherency block handles the procedural geometry FIFO data
differently. For the procedural geometry FIFOs, the CPU write-no-allocates these in a physical
address range that doesn't exist. When the GPU comes to read these vertices, the coherency
block in the GPU sees that the CPU has these addresses as dirty and issues a castout/invalidate.
The GPU now needs to know to not let that data go out to memory, but to suck the return data from
the CPU up directly.

For the writebacks described in 2) above, the GPU when it determines that the CPU owns this
writeback address (owning meaning that this tag is valid in the CPU's L2), the castout/reload
command that the GPU now generates causes this writeback data to update this memory on the
CPU die. The castout/reload also has the effect of causing the CPU's L1/L2 to refetch this data, but
this now gets read from this cacheable memory block, rather than from main memory

.

The reason for the GPU fetching vertex data from the CPU cache is that there is a very high
bandwidth of data (on the order of 16GB) that would damage system performance if it were stored
to memory by the CPU and subsequently read by the vertex processor.

The reason for the GPU to write values of the FIFO tail pointers into memory mapped in the CPU is
that the algorithm requires the CPUC to know very quickly that the GPU is done processing a
certain block of data in the L2 and that the CPUC can reuse that block. The CPUC thread will be
spin waiting on the tail pointer and if that data were in main memory, there would be a latency issue
and a FSB bandwidth issue.

7.14.4 Requirements

• The output of the procedural geometry would be mostly inline tristrip flexible vertex format

data types or any format that is supported by streaming.

• The following could be a possible proposal for the format of inline vertex data.

All inline vertex data would be split into blocks, with each block beginning with a
single 32-bit DWORD header. The header would encode four possible instructions:

0 – This is the last vertex in the mesh.
1 – n inline vertices follow (where n is encoded in the instruction).
2 – Use the i'th previous vertex from the post-transform vertex cache (where i is
encoded in the instruction). The header for the next vertex immediately follows.
3 – NOP. The header for the next vertex immediately follows

.

So, for example, if we had 4 inline 32-byte vertices followed by one re-used vertex, the
total size would be 4+4*32+4 = 136 bytes.

7.14.5 Notes

GPU XPS Reads / Writes 1-13-04
PG reads as requested by the Memory Hub (MH) are processed by the BIU by issuing a 128 byte (aligned) read to the
FSB/CPU. This occurs as a request command on the FSB "transmit" interface. It is the only Read operation issued to
the FSB/CPU from the BIU.

The FSB/CPU responds to the BIU with a response command and 128 bytes of data on the FSB "receive" interface.

Microsoft Corporation Confidential

42

background image

Version: 0.40

Xenon System Architecture


The BIU forwards the response data to the MH.

Request/responses are linked by a tag as the CPU does not guarantee ordering if multiple reads are outstanding.

No coherency operation (flush) is issued for the PG reads.

7.15 System Debug Facilities (?)

7.15.1 Deadbox Recovery

7.15.2 Low Level Debug

Low level hardware interfaces for figuring out really really hard bugs, or for developing embedded ROM code, or
programming registers which are not accessible for security reasons (and how we close those holes later).
CPU
GPU
SMC
JTAG

7.15.3 Development Systems

How we program this thing

7.16 System Bandwidth / Latency Roll-Up (Nick)

7.17 Error Conditions

7.17.1 BSB

2-6-04 BSB Completions
Unsupported Request is generated when the GPU receives a request that it does not recognize. Badly formed or
corrupted packets always generate UR (bad CRC, header, etc), and any request received while the GPU’s
BUS_MASTER_ENABLE bit is cleared always generate UR. Other than that, UR is the ‘else’ in a big if;else if;else
if;else statement. It’s therefore easier to tell you what won’t generate a UR than what will.

In production mode, with GPU’s BUS_MASTER_ENABLE bit set:
- Memory writes to the interrupt register won’t generate a UR
- Memory reads and memory writes below top_of_memory won’t generate a UR

In prototype mode, with GPU’s BUS_MASTER_ENABLE bit set:
- Memory writes to the interrupt register won’t generate a UR
- Memory reads and memory writes below top_of_memory won’t generate a UR
- 4 byte memory reads and 4 byte memory writes to nb or gc MMIO won’t generate a UR
- non 4 byte memory reads and non-4 byte memory writes to nb or gc MMIO generate CA (completer abort)
- 4 byte type 0 configuration reads and 4 byte type 0 configuration writes to device 1 or 2, function 0, won’t
generate a UR
- non 4 byte type 0 configuration reads and non 4 byte type 0 configuration writes to device 1 or 2, function 0,
generate CA


I used the tables on page 365, 366 of PCI Express System Architecture by Mindshare, Inc to decide between UR and
CA for invalid requests. Specifically, requests that do not reference address space mapped within the device are UR, and

Microsoft Corporation Confidential

43

background image

Version: 0.40

Xenon System Architecture

requests that violate programming rules for a device are CA. CA will only happen for the specific cases listed above in
prototype mode.

7.18 Reliability

2-6-04 Memory Reliability
The quick answer is that 256Mbytes DRAM memory system will experience a single bit soft error between 2 to 4 times
a year, that’s why computer servers still implement ECC. I am sure GDDR manufacturers characterize their chips for
susceptibility to soft errors, so the data for new technologies should come from them (Michael?).

In general the sources for errors are: radioactive isotopes (in package and PCB materials), cosmic rays and UFOs

Microsoft Corporation Confidential

44

background image

Version: 0.40

Xenon System Architecture

8 Low Level Software Architecture (MarcW?)

8.1

Flash Resident Drivers

8.2 BIOS

8.3 Low

Level

Drivers

Memory

Audio

8.4 Network

Stack

8.5 Procedural

Geometry

8.6

Video Mode Selection

Microsoft Corporation Confidential

45

background image

Version: 0.40

Xenon System Architecture

9 Other

9.1 Not

Covered

This section states what is not covered in this document, but should be covered elsewhere

• APIs

• Better-together

• Remoting devices

• Video-on PC

• Media device.

• Network security. All done in application layer. Require system to be secure.

• Peripherals (cameras, etc.)

• Performance abtraction for HDD.

• Where do I put my stuff without having to buy an MU.

• Mass-storage performance abstraction: say what should be included in software specification.

Microsoft Corporation Confidential

46


Wyszukiwarka

Podobne podstrony:
95 AMESYS CRITICAL SYSTEM ARCHITECT
Ch02 System Architecture
Wastu to najstarszy system architektury i aranżacji wnętrz
09 Architektura systemow rozproszonychid 8084 ppt
Wstęp do informatyki z architekturą systemów kompuerowych, Wstęp
Architekrura Systemów Lab1
Architekrura SystemAlw Lab5 (1) Nieznany
66 251103 projektant architekt systemow teleinformatycznych
Architekrura Systemów Lab3
tranzystory mosfet(1), Architektura systemów komputerowych, Sentenza, Sentenza
sciaga-skrocona, Informatyka Stosowana, Architektura systemów komputerowych, ASK
ukl 74xx, Informatyka PWr, Algorytmy i Struktury Danych, Architektura Systemów Komputerowych, Archit
Architektura systemów komputerowych przeliczanie systemów, Notatki
kol x86 IID GAK, studia wsiz, semestr 5, Architektora systemow lab
Architekrura Systemów Lab2
Architektura sieciowa systemu Windows
T 3 Architektura systemow komputerowych wytyczne

więcej podobnych podstron