Width-Adaptive and Non-Uniform Access Asynchronous Register Files.


David Fang.
Cornell University.
January 2004. (completed Fall 2003)
[pdf 2MB] double-spaced, accepted by grad school
[pdf 2MB] single-spaced, bonus content!

Abstract

  At the heart of practically every modern microprocessor core sits some form of register file, whose purpose is to hold and supply intermediate results of computations to other computation units. As register files grow in size and in the number of ports to support increasing instruction-level parallelism (ILP), it becomes extremely difficult to meet timing requirements in clocked designs, and the energy consumed by accesses increases significantly. Asynchronous microprocessors share many of the same design issues, however, we have at our disposal a different family of techniques due to the robust and modular nature of self-timed design.
   Starting with a sequential specification of a typical asynchronous register file, we decompose the specification into fine-grain parallel processes for the core, bypass and control that implement the specified register file. To improve the throughput of the core, we vertically pipeline the read and write ports into smaller blocks of data, and we describe the locking mechanism that maintains pipelined mutual exclusion among reads and writes. Using standard handshaking expansion templates, we synthesize quasi-delay insensitive production rules that describe the circuits for the pipelined core ports. This initial design serves as the basis for comparison for the transformations presented in the remainder of the thesis.

[back to research page]

Log of Thesis Writing

alternate title: "Asynchronous Register Files for Dummies"
or "Everything You Wanted To Know About Asynchronous Register Files
... But Were Afraid To Ask"
or "188 Ways to Build An Asynchronous Register File"

Running time: 16 18 19 months
Number of hours logged: (order-of-magnitude) 3e3

5/28/03 -- advisor requests first draft by D-Day 6/6
6/14/03 -- 90% draft handed to advisor
7/30/03 -- almost forgot this was on my web page...
8/15/03 -- so much for the August deadline *frown*
8/31/03 -- first batch of simulations completed (after 9 days)
9/1/03 -- data is piped straight into thesis via awk scripts
9/9/03 -- second batch of simulations completed (after 9 days)
9/12/03 -- confirmed scheduling of exam: Wed. 10/22, 4:30 pm, 310 Rhodes Hall
9/15/03 -- handed committee copy *thud* of near-final draft
9/18/03 -- started writing conference paper version
10/10/03 -- mostly done with conference paper (1st rev.)
10/17/03 -- scrambled to start making a g-zillion slides for defense talk
10/22/03 -- defended thesis before committee and CSL!
10/24/03 -- passed Q exam, *whew*
10/24/03 -- considering last minute architecture simulations for ISCA paper (4th miracle)
10/28/03 -- finished modifying cycle-accurate simulator to satisfaction to report register results with renaming
10/30/03 -- started to attempt to reproduce results from an old paper
11/1/03 -- aborted desparate attempt to crank out another paper
11/3/03 -- started to play around with ATOM, considered reverse-engineering
11/16/03 -- gave up reverse engineering, returned to thesis revisions
12/1/03 -- final copy approved by thesis advisor
12/3/03 -- copy center (funds -= $85), back-ache from carrying copies
12/4/03 -- bindery (funds -= $120), shoulder-ache from carrying bound copies
12/9/03 -- officially submitted to gradschool
12/10/03 -- got off my lazy behind and made this page

Grunt work:
In summary, production rules are finished (for the 19th 20th time), 
26 (versions) x 8 (optimization permutations) x 2 (reset conventions) are FINALIZED, I swear...
layout: 100 percent laid out (36 versions total), 
	and automatic array-and-place scripts have been written for them,
	LVSd block by block
Had much frustration wrestling with and working around tool funkinesses.  
To first order, to complete all of them would require 9.6307 years to do by hand.  
Want to graduate in fewer than 9 years, if that's not too unreasonable to ask.  

Writing:
Introductioncomplete and revised (final)
Chapter 2complete and revised (final)
Chapter 3complete and revised (final)
Chapter 4complete and revised, including results
Chapter 5complete and revised, including results
Chapter 6complete and revised (final)
Chapter 7complete and revised (final)
Chapter 8complete and revised, including results
Chapter 9complete and revised, including results
Conclusioncompleted
Appendicesdone appending, includes automatic tables in App. I(eye)