ECE 259 / CPS 221

Advanced Computer Architecture II

Spring 2010
Professor Daniel J. Sorin


The objective of this course is to provide students with an understanding of parallel computer architectures.  Students will read research papers, 
lead in-class discussions of papers, perform a research project, and present their research projects both in written and oral formats.

The course focuses on both the design and evaluation of multiprocessor systems. The main design themes of this course are: parallel programming, system organizations, shared memory multiprocessors, memory consistency models, interconnection networks, high availability systems, interactions with current microprocessor and I/O technology, novel architectures, and emerging technologies.  The evaluation portion of this course will focus on metrics, modeling, simulation, and workloads for benchmarking.

Prerequisites: ECE 252, CPS 220, or consent of instructor.
Class Location and Hours


Class meets Monday/Wednesday/Friday from 10:20am - 11:10am.

Location: 212 Hudson Hall



Professor Daniel J. Sorin

Office: 209C Hudson Hall

Office Hours: Tuesday 12:30-1:30

Email: Email Address of Daniel Sorin

This course has an OPTIONAL textbook for background material and for reference, but it is NOT required. The emphasis of the class will be discussions of research papers.  

Optional Textbook: Parallel Computer Architecture.  David Culler and J.P. Singh

 Assignments and Grading
This is a graduate level class that will not require "busy work."  This class will, however, require that students learn the reading material and learn
how to present research in both written and oral formats (see Hill and Patterson for useful advice for presentations).  Communication is very 
important in this class.  Students who struggle with reading and writing are encouraged to take this course but should expect to work hard and to 
improve their communication skills in the process.  

Students are responsible for:

The project is a semester-long assignment that should reflect the goal of being no more than "a stone's throw" away from a research paper.  As
such, the project will require:
Deadlines will be enforced except under extreme circumstances.  I would prefer that you turn in something not quite done on the due date rather than waiting until after the deadline to try to finish it.  Any project that is late by less than 24 hours will lose 50%.  Any project that is more than 24 hours late will receive a zero.

Academic Misconduct: I will not tolerate academically dishonest work.  This includes cheating on the final exam and plagiarism on the project.  
Be careful on the project to cite prior work and to give proper credit to others' research. 

 Lecture Notes

I will post lecture notes (in PowerPoint format) shortly before I cover them in class.  

Segment 1: Introduction   (part 1  part 2)

Segment 2: Parallel Programming (part 1  part 2  part 3)

Segment 3: Shared Memory and Cache Coherence
               3.1: Snooping  ( part 1 part 2 part 3)
               3.2: Directories (part 1 part 2)
               3.3: Advanced topics: Token Coherence, COMA, etc. (part 1 of 1)

Segment 4: Memory Consistency and Synchronization Optimizations (part 1 part 2 part 3)

Segment 5: Interconnection Networks (part 1 part 2)

Segment 6: Evaluation (part 1 of 1)

Segment 7: Availability (part 1 of 1)

No slides for material past this point.

 Paper Presentation Notes

Amdahl's Law - Dan Sorin

Map/Reduce - JP Cafaro

Starfire - Blake Hechtman

Multicast Snooping - Dan Gaultney

DASH - Meng Zhang

AlphaServer GS320 - Zach Drillings

Token Coherence - Bryan Fleming

R-NUMA - Ralph Nathan

Wildfire - Alex Edelsburg

Virtual Hierarchies - Mohammed Mottaghi (note: new version of slides)

SC+ILP=RC - Jake Harer

SLE - Matthew Fulmer

LogTM - Roman Zhang

Alpha 21364 ICN - John Ingalls

Flattened Butterfly - Jun Pang

Cost-Effective Computing - Eric Wheeler

Simics - Mohammad Mottaghi

RAMP - Andrew First

Commercial Workloads - David Eitel

Simulating 2M Server - Deepak Srinivasan

IBM S/390 - Adam Jacobvitz

DVSC - Tom Marmaduke

SafetyNet - Akin Olugbade

Cray-1 - Eric Wheeler

Tarantula - Matthew Fullmer

Cell - John Ingalls

Larrabee - Vali Pistol

Dataflow - Blake Hechtman

Raw - Dan Gaultney

Anton - Alex Edelsburg

 Topics and Readings

Readings in italics are optional material.





Why Study Multiprocessors

parallelism, limits, Amdahlís Law

"Amdahl's Law in the Multicore Era"

"The Landscape of Parallel Computing Research: A View From Berkeley"

Parallel Programming

Programming Models

message passing, shared memory, performance and scaling

"Evaluating MapReduce for Multi-core and Multiprocessor Systems"

"The SPLASH-2 Programs: Characterization and Methodological Considerations"

"The PARSEC Benchmark Suite: Characterization and Architectural Implications"

Synchronization Basics

atomic operations, locks, barriers


Machine Organizations

System Organizations

SIMD: MMX, vectors, DSP



Cache-Coherent Shared Memory Multiprocessors

Shared Memory & Cache Coherence


Snooping Cache Coherence

"Starfire: Extending the SMP Envelope"

"Multicast Snooping: A New Coherence Method Using a Multicast Address Network"

"Timestamp Snooping: An Approach for Extending SMPs"

Directory Cache Coherence

"The Stanford DASH Multiprocessor"

"Architecture and Design of AlphaServer GS320"

"An Evaluation of Directory Schemes for Cache Coherence"

"Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing"

Advanced Topics in Coherence

token coherence, COMA, coherence domains

"Token Coherence: Decoupling Performance and Correctness"

"DDM--A Cache-Only Memory Architecture"

"Reactive NUMA: A Design for Unifying S-COMA and CC-NUMA"  

"WildFire: A Scalable Path for SMPs"

"Virtual Hierarchies to Support Server Consolidation"

Memory Consistency Models

Memory Consistency Basics

"Shared Memory Consistency Models: A Tutorial"

"Specifying and Dynamically Verifying Address Translation-Aware Memory Consistency"

Consistency Optimizations

speculation, Scheurich's optimization

"Two Techniques to Enhance the Performance of Memory Consistency Models"

"Is SC + ILP = RC?"

Synchronization Optimizations

"Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution"

"Efficient Synchronization: Let Them Eat QOLB"

Transactional Memory Hardware TM, TM Software "LogTM: Log-based Transactional Memory"

"STAMP: Stanford Transactional Applications for Multi-Processing"

Interconnection Networks

Interconnection Network Basics

topology, routing, flow control

"The Alpha 21364 Network Architecture"

"Flattened Butterfly Topology for On-Chip Networks"

Deadlock Avoidance

virtual channels, turn model, hot-potato routing

"Virtual Channel Flow Control"

"A Survey of Wormhole Routing Techniques in Direct Networks" [includes "Turn Model" concept]

Evaluation Tools and Methodology

Evaluation: Metrics & Modeling

scalability, throughput, why not IPC?

mathematical modeling of performance

"Cost-Effective Parallel Computing"

"Analytic Evaluation of Shared-Memory Parallel Systems with ILP Processors"

Evaluation: Simulation

precision vs. performance

full-system, parallel host

"Simics: A Full System Simulation Platform"

"RAMP: A Research Accelerator for Multiple Processors"

"The Wisconsin Wind Tunnel: Virtual Prototyping of Parallel Computers"

Evaluation: Workloads

scientific vs. commercial, TLP, importance of benchmark selection

"Memory System Characterization of Commercial Workloads"

"Simulating a $2M Commercial Server on a $2K PC"

Reliability and Availability

Available Computers



"IBM S/390 Parallel Enterprise Server G5 Fault Tolerance: A Historical Perspective"

"Dynamic Verification of Sequential Consistency"

"Fault-Tolerant Systems in Commercial Applications" [survey of classic FT systems]

"SafetyNet: Improving the Availability of Shared Memory Multiprocessors with Global Checkpoint/Recovery" 

Other Architectures




Vector Machines

"The Cray-1 Computer System"

"Tarantula: A Vector Extension to the Alpha Architecture"

"Introduction to the Cell Multiprocessor"


"Larrabee: A Many-Core x86 Architecture for Visual Computing"

Scalable, Non-Coherent Multiprocessors

message passing: Paragon, CM5, active messages

shared physical memory: Cray T3E

"The Network Architecture of the Connection Machine CM-5"

"Synchronization and Communication in the Cray T3E Multiprocessor"

"Active Messages: A Mechanism for Integrated Communication and Computation"


"Executing a Program on the MIT Tagged-Token Dataflow Architecture"

Tiled Architectures

"Baring It All to Software: Raw Machines"

Tilera Tile64 [website only, not a technical paper]


"Anton, A Special-Purpose Machine for Molecular Dynamics Simulation"

"Blue Gene: A Vision for Protein Science Using a Petaflop Supercomputer"

Interactions with Processors and I/O

Microarchitectural Effects

parallelism: ILP, MLP, TLP

"An Evaluation of Memory Consistency Models for Shared-Memory Systems with ILP Processors"

I/O "Making Network Interfaces Less Peripheral"


Quantum Computing

"A Practical Architecture for Reliable Quantum Computers"

Nanocomputing "Circuit and System Architecture for DNA-Guided Self-Assembly of Nanoelectronics"