







ECE 152

# Still More Uses of Virtual Memory

- Inter-process communication
   Map VPs in different processes to same PPs
- Direct memory access I/O
  - Think of I/O device as another process
  - Will talk more about I/O in a few lectures
- Protection
  - Piggy-back mechanism to implement page-level protection
  - Map VP to PP ... and RWX protection bits
  - Attempt to execute data, or attempt to write insn/read-only data?

ECE 152

- Exception  $\rightarrow$  OS terminates program

```
© 2012 Daniel J. Sorin from Roth
```

40





| <ul> <li>How big is a page ta</li> <li>4B page table entries</li> </ul> | ble on the following machine?            |
|-------------------------------------------------------------------------|------------------------------------------|
| <ul> <li>32-bit machine</li> </ul>                                      | (FILS)                                   |
| 4KB pages                                                               |                                          |
| Solution<br>• 32-bit machine $\rightarrow$ 32                           | -bit VA $\rightarrow$ 4GB virtual memory |
|                                                                         | / 4KB page size $\rightarrow$ 1M VPs     |
| • 1M VPs * 4B PTE $\rightarrow$ 4                                       | 4MB page table                           |

43

Page tables can get enormousThere are ways of making them smaller

2012 Daniel J. Sorin from Roth ECE 152

## Multi-Level Page Table

- One way: multi-level page tables
  - Tree of page tables
  - Lowest-level tables hold PTEs
  - Upper-level tables hold pointers to lower-level tables
  - Different parts of VPN used to index different levels
- Example: two-level page table for machine on last slide
  - Compute number of pages needed for lowest-level (PTEs)
    - 4KB pages / 4B PTEs  $\rightarrow$  1K PTEs fit on a single page
    - + 1M PTEs / (1K PTEs/page)  $\rightarrow$  1K pages to hold PTEs
  - Compute number of pages needed for upper-level (pointers)
     1K lowest-level pages → 1K pointers
    - 1K pointers \* 32-bit VA  $\rightarrow$  4KB  $\rightarrow$  1 upper level page

ECE 152

© 2012 Daniel J. Sorin from Roth





| Address Translation                                                                                                                                                                                                                                                                                                                                                                                                                                        | n Mechanics |    |  |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------|----|--|
| <ul> <li>The six questions</li> <li>What? address translation</li> <li>Why? compatibility, multi-programming, protection</li> <li>How? page table</li> <li>Who performs it?</li> <li>When?</li> <li>Where does page table reside?</li> </ul>                                                                                                                                                                                                               |             |    |  |
| <ul> <li>Option I: process (program) translates its own addresses</li> <li>Page table resides in process visible virtual address space</li> <li>Bad idea: implies that program (and programmer)</li> <li>must know about physical addresses</li> <li>Isn't that what virtual memory is designed to avoid?</li> <li>can forge physical addresses and mess with other programs</li> <li>Translation on L2 miss or always? How would program know?</li> </ul> |             |    |  |
| © 2012 Daniel J. Sorin from Roth                                                                                                                                                                                                                                                                                                                                                                                                                           | ECE 152     | 47 |  |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                            |             |    |  |

# Who? Where? When? Take II

- Option II: operating system (OS) translates for process
   Page table resides in OS virtual address space
  - + User-level processes cannot view/modify their own tables
  - + User-level processes need not know about physical addresses
  - Translation on L2 miss
  - Otherwise, OS SYSCALL before any fetch, load, or store
- L2 miss: interrupt transfers control to OS handler
  - Handler translates VA by accessing process's page table
  - Accesses memory using PA
  - Returns to user process when L2 fill completes
  - Still slow: added interrupt handler and PT lookup to memory access
  - What if PT lookup itself requires memory access? Head spinning...

ECE 152

© 2012 Daniel J. Sorin from Roth

48



## **TB** Misses

- TB miss: requested PTE not in TB, but in PT
  - Two ways of handling
- 1) OS routine: reads PT, loads entry into TB (e.g., Alpha)
  - Privileged instructions in ISA for accessing TB directly
  - Latency: one or two memory accesses + OS call
- 2) Hardware FSM: does same thing (e.g., IA-32) • Store PT root pointer in hardware register
  - Store PT root pointer in hardware register
    Make PT root and 1st-level table pointers physical addresses

ECE 152

- So FSM doesn't have to translate them
- + Latency: saves cost of OS call

\_\_\_\_

50

## Nested TB Misses

- Nested TB miss: when OS handler itself has a TB miss
  - TB miss on handler instructions
  - TB miss on page table VAs
  - Not a problem for hardware FSM: no instructions, PAs in page table
- Handling is tricky for SW handler, but possible
  - First, save current TB miss info before accessing page table
     So that nested TB miss info doesn't overwrite it
  - Second, lock nested miss entries into TB
    - Prevent TB conflicts that result in infinite loop
    - Another good reason to have a highly-associative TB

ECE 152

2012 Daniel J. Sorin from Roth

## Page Faults

- Page fault: PTE not in TB or in PT
  - Page is simply not in memory
  - Starts out as a TB miss, detected by OS handler/hardware FSM

#### • OS routine

- · OS software chooses a physical page to replace
  - "Working set": more refined software version of LRU
    - Tries to see which pages are actively being used
    - Balances needs of all current running applications
- If dirty, write to disk (like dirty cache block with writeback \$) Read missing page from disk (done by OS)

ECE 152

- Takes so long (10ms), OS schedules another task
- Treat like a normal TB miss from here

© 2012 Daniel J. Sorin from Roth













# Summary

## DRAM

- Two-level addressing
- Refresh, access time, cycle time
- Building a memory system
  - DRAM/bus bandwidth matching
- Memory organization
- Virtual memory
  - Page tables and address translation
  - Page faults and handling
  - Virtual, physical, and virtual-physical caches and TLBs

## Next part of course: I/O

ECE 152

© 2012 Daniel J. Sorin from Roth