ECE 552 / CPS 550

Advanced Computer Architecture I

Fall 2015
Professor Daniel J. Sorin


The objective of this course is to learn the fundamental aspects of computer architecture design and analysis.
The course focuses on processor design, pipelining, superscalar, out-of-order execution, caches (memory hierarchies), virtual memory, storage
systems, and simulation techniques. Advanced topics include a survey of parallel architectures and future directions in computer architecture.
Prerequisites: ECE/CS 250 or consent of instructor
Class Location and Hours


Class meets Monday/Wednesday/Friday from 10:20am - 11:10am.

Location: MF Teer 203, W CIEMAS A-1464

 Instructor, Teaching Assistants, and News Group


Professor Daniel J. Sorin

Office: 209C Hudson Hall

Office Hours: TBD

Email: sorin AT ee DOT duke DOT edu 


Graduate Teaching Assistants:

Chaofan Chen: cfchen AT cs DOT duke DOT edu

Yijie Zhuang: yijie.zhuang AT duke DOT edu


Required Textbooks
Computer Architecture: A Quantitative Approach, 5th edition, by Hennessy and Patterson
A Primer on Memory Consistency and Cache Coherence, by Sorin, Hill, and Wood.  (free PDF download from Duke IP addresses)
 Assignments and Grading
This course will require readings from the textbooks and from selected research papers.  While you will not be quizzed on readings, you
should still be certain to have read them before class so that you can learn from the class.  And, to appeal to your practical side, all readings are
fair game for the exams.  Added bonus: you will be better at reading research papers at the end of this class than at the beginning.

Students are responsible for:

Note to Computer Science students: Qualifying grade is based only on the midterm and final.

Late policy for homework and project (except for dean's excuses):
        Homework: <1 day late = take earned score and divide by 2 -- this applies to entire assignment (not per question)
                           >1 day late = 0
        Project: No late projects will be accepted!
Academic Misconduct: I will not tolerate academically dishonest work.  This includes cheating on the exams and plagiarism on the project.  
Be careful on the project to cite prior work and to give proper credit to others' research. 
Refer to the Duke Undergraduate Honor Code or to the instructor if you have any questions about misconduct.
 Topics, Lecture Notes, and Reading Assignments (still in flux!!)

I will post lecture notes (in PDF format) on Sakai shortly before I cover them in class.  Click on topic title for link to notes.

Readings in blue will be provided by the instructor (click on links below for PS or PDF).

Topic Reading Assignments
Course Introduction & Computer Performance H/P Chapter 1;
"Instruction Sets and Beyond: Computers, Complexity, and Controversy"
Pipelined Processor Cores
H/P Appendix C; 
"The Optimal Pipeline Depth Per Pipeline Stage is 6-8 FO4 Inverter Delays"
Superscalar (wide) Processor Cores  H/P Chapter 3
Software/Static Exploitation of Instruction Level Parallelism
H/P Chapter 3;
"EPIC: Explicitly Parallel Instruction Computing"
Hardware/Dynamic Exploitation of Instruction Level Parallelism
H/P Chapter 3;  
"The Microarchitecture of the Pentium 4 Processor"

"Complexity-Effective Superscalar Processors"

"Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors"
Exploiting Data-level Parallelism: SIMD, Vectors, and GPUs H/P Chapter 4;
"NVidia Tesla: A Unified Graphics and Computing Architecture" 
Advanced Memory System Design H/P Chapter 2 (remedial material in Appendix B);
"An Adaptive, Non-Uniform Cache Structure for Wire-Dominated On-Chip Caches"

"The ZCache: Decoupling Ways and Associativity"

"Exceeding the Dataflow Limit via Value Prediction"

Exploiting Thread-Level Parallelism: Multithreading, Multicore, and Multiprocessors


H/P Sections 5.1 and 3.12; 
"Power: A First Class Design Constraint"

"Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor"

"Multiscalar Processors"

"Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance"

"Amdahl's Law in the Multicore Era"

Shared Memory, Memory Consistency, and Cache Coherence
Primer on Consistency and Coherence: Chapters 1-8


Interconnection Networks TBD

Homework policy: Homework must be done individually. 

Homework assignments will be posted on Sakai.


The course project will be performed either individually or in groups of 2 or 3. 

Typical projects involve implementing and exploring a microarchitectural idea using a simulator such as SimpleScalar.  See Prof. Sorin for project guidelines and ideas.

Project proposals (2 pages max!!): Hardcopy due Weds, October 21 in class.  Proposals must contain the following information:

Project reports (15 pages max!!): Hardcopy due Friday, Dec 4 in class.  No exceptions!

 Schedule (tentative)

This is a tentative schedule which may change depending on time constraints and which days the instructor will be out of town.





Aug 24


Review Goals

Aug 31




Sept 7

Pipelining Superscalar Superscalar

Sept 14

Static ILP Static ILP

Static ILP

Sept 21

Static ILP

Dynamic ILP

Dynamic ILP

Sept 28

Dynamic ILP

Dynamic ILP

Dynamic ILP

Oct 5

Dynamic ILP Dynamic ILP Dynamic ILP

Oct 12




Oct 19


Project Proposals Due


Oct 26

Memory Systems Memory Systems Memory Systems

Nov 2

Memory Systems Memory Systems TLP/Multithreading

Nov 9

TLP/Multithreading Shared Memory Consistency

Nov 16

Consistency Consistency Coherence
Nov 23 Coherence



Nov 30



Project Reports Due

Dec 7

--------  EXAM WEEK  --------