Sign In | My Account Live chat by SightMax 1.866.839.5875
Search by: Products Computer
Check Out  Check Out
Sales & Specials
To learn about sales and specials, sign up for the OEM Newsletter!
Email
 
OEMPCWorld.com recognized by BizRate Circle of Excellence 2009


Company Profile

RAM -Technical Perspective

 

The RAM Guide

When you pick up a SIMM you'll see a number of chips on either side of a flat board. These chips are called DRAM (Dynamic Random Access Memory). On non-parity SIMM's you'll find eight DRAM chips on one or both sides of the SIMM depending on the size:

A 4MB SIMM has 8 DRAM chips on one side.
An 8MB SIMM has 16 DRAM chips with 8 DRAM chips on each side.
A 16MB SIMM has 8 DRAM chips on one side.
A 32MB SIMM has 16 DRAM chips with 8 DRAM chips on each side.

Both 4MB and 16MB SIMM's look alike because they have 8 chips on one side. Likewise 8MB and 32MB SIMM's look alike because they have 8 chips on each side for a total of 16 chips. You can tell them apart by looking at the DRAM chips themselves.

Each DRAM chip is either 512KB's (1Mx4's ), 2MB's (4Mx4's) or 8MB's (16Mx4). 1Mx4's are used on 4MB and 8MB SIMM's. 4Mx4's are used on 16MB and 32MB SIMM's. 16Mx4's are used in 64MB and 128MB chips. 1Mx4's chips have four sets of five pins and 4Mx4's have four sets of six pins. So for example, you can tell an 8MB SIMM from a 32MB SIMM by counting the pins. The 8MB SIMM will have four sets of 5 pins on each of its chip while the 32MB SIMM will have 6 pins.

16Mx4's are available in a variety of configurations, the most common is a 54-pin TSOP package. 16Mx4's are used in 64MB and 128MB modules. Later this year 32MB (256Mb) chips will be available allowing for 256MB and 512MB modules. By 2001 128MB (1Gb) chips will be available in 1GB and 2GB modules. 

Types of EDO and FPM Chips

Size (MB)

Size (Mb)

Configuration

Banks

Pins

512KB

4Mb

1Mx4

1

4sets of 5-pins SOJ

2MB

16Mb

4Mx4, 2Mx8, 1Mx16

1

Generally 4sets of 6 or 7 pins SOJ

8MB

64Mb

16Mx4, 8Mx8, 4Mx16

1

4 sets of 8-pins SOJ or TSOP II


Types of SDRAM Chips

Size (MB)

Size (Mb)

Configuration

Banks

Config per Bank

Pins

2MB

16Mb

4Mx4, 2Mx8

2

2Mx4, 1Mx8

44-pin TSOP

8MB

64Mb

16Mx4, 8Mx8, 16Mx4

4

4Mx4, 2Mx8, 1Mx16

54-pin TSOP

16MB

128Mb

32Mx4, 16Mx8, 8Mx16

4

8Mx4, 4Mx8, 2Mx16

54-pin TSOP

32MB

256Mb

N/A

N/A

N/A

N/A

128MB

1Gb

N/A

N/A

N/A

N/A

 

Bits and Bytes 

Remember that one bit represents one switch that is either on or off and there are eight bits to one byte. That means that one byte can store two to the eighth power or 256 different numbers i.e. a binary number with eight digits can range from 0 to 255 and thus can store any one of 256 values. (Each of these values can be a assigned a character such as a letter or a number or a punctuation mark. In this way one byte can store any one of 256 different characters. On modern computers this is called the ASCII character set. So when you think of a byte think of one letter). Also remember that a set of 1024 bytes is called a kilobyte (KB) and 1024 kilobytes is called a megabyte (MB). Also 1024 bits is called a kilobit (Kb) and 1024 kilobits is called a megabit (Mb). 

Calculate the Size in MB's 

Now you can understand that a 1Mx4 is a DRAM chip that has one megabits (1Mb) times 4 or 4Mbs. So to calculate how many bytes this is, divide 4Mbs by 8bits per byte to get 512 KB's. That's why a 1Mx4 is 512KB's and the same logic can be used to calculate the a 4Mx4 is 2MB's. So a 4MB SIMM has 8 1Mx4's because it is 8 (chips) times 512KB's (per chip) which equals 4MB's. Further you can tell its a 4MB SIMM because each chip has its pins in sets of five. Now you know how to tell the size of a non-parity SIMM. It is interesting to note that 4Mx4's cost the same to produce as 1Mx4's, with the caveat that you loose more of your final yield with 4Mx4's. 

True Parity, Logical Parity and ECC

But what about parity SIMM's? Parity SIMM have one extra bit per byte. That's right, they have 9 bits per byte. There also used to be ECC SIMM's which had 10 bits per byte but the new PC chip sets like the Triton-II (430HX) and the Natoma (440FX) will generated ECC with normal 9-bit parity SIMM's so you don't have to worry about buying ECC memory anymore. The 9th bit of a parity SIMM stores what's called the XOR of all the other bits. This just means if all eight bits added together equal an even number the parity bit will store a zero. If all eight bits added together equal an odd number the parity bit will store a one. This is done for all eight bits of every byte of memory. So say you stored an eight bit number 01010101. The sum would be four (0+1+0+1+0+1+0+1) which is even so the XOR value would be zero and zero would be stored in the parity bit. After the parity bit is stored if bit number four failed we'd have 0+1+0+1+?+1+0+1 with XOR of Zero. We can them add the remaining seven bits and we'd get an XOR of one and compare that to our stored XOR of zero and we'd see that the missing digit must be a one. In this way missing data can be reconstructed! ECC actually adds on more bit of security. It allows the computer to correct one bit errors and detect two bit errors. How useful is this? Well you'll get random lost bits from gamma rays and other radiation about once per 100 years. So this will correct errors once per one hundred years for the average user - so it is really only useful for important servers. You'll get many more errors from software than hardware - but a good program should catch such errors with software traps.

A parity SIMM must store one more bit per byte. That means that for every eight 1Mx4 (512KB) DRAM chips you'll need one extra 1Mx4 DRAM chip on a parity SIMM. Likewise for each 4Mx4 (2MB) DRAM chip you'll need one extra 4Mx4 DRAM chip or four extra 1Mx4 DRAM chips. So on a parity SIMM you'll either see nine DRAM chips per side (all 1Mx4's or all 4Mx4's) or you'll see eight 4Mx4 per side and four 1Mx4's per side. That's how a parity SIMM is built. This is "True Parity". "Logical Parity" was invented by MA Labs and just adds logic that fools the computer into thinking it has parity SIMM's installed but actually no parity checking is done at all. However, it allows you to boot a computer that requires parity with less expensive non-parity SIMM's by fooling the computer. 

Identifying part numbers 

Each SOJ or TSOP has a part number and Manufacturer marking on the top. You can use these data to look up information about your SIMM. The first step is identifying the manufacturer from the manufacturer code. We have listed several DRAM manufacturers and their codes in the table below. Once you have identified the manufacturer you'll need to visit their WWW site or call their fax back service to get a catalog that lists chip ID's. There are too many part numbers for us to collate an authoritative guide. So we'll leave you to do the leg work on chip ID. 

Identifying part numbers 
ID Marker  Manufacturer 
GM Goldstar 
HM Hitachi 
HY  Hyundai 
HYB  Siemens 
IT  Micron USA 
LGS Goldstar 
M5M Mitsubishi
MB Fujitsu 
MT  Micron Technology
NEC  NEC 
NPNx  NPNx 
OKI  Okidata 
SEC  Samsung Electronics Corp. 
TI  Texas Instruments 
USA  Micron USA 

TSOP's and SOJ's 

DRAM chips are packaged in two different formats called SOJ's (Small Outline J-Lead) or TSOP's (Thin Small Outline Package). By packaging we mean that the DRAM chips are fabricated into different shapes and sizes. Both SOJ's and TSOP's are surface mounted on one or both sides of a circuit board. The TSOP packages have are lower profile than SOJ packages, but SOJ packages cover less area.

For all practical purposes to the end user it doesn't matter if you have SOJ's or TSOP's. However, SOJ's are easier to use in the manufacturing process because they three times thicker than TSOP (3.0mm vs. 1.0mm).

There are two types of TSOP's. A type I TSOP is a low profile package (1.0mm high) with fine pitch (0.5mm) leads which project from the narrower ends at a 0.5mm pitch. A type II TSOP is a low profile package (1.0mm high) with fine pitch (0.5mm) leads which project from the widest ends at a 1.27mm pitch. The main advantage of TSOP's over SOJ's is that TSOP's are thinner and can be used in low profile designs such as credit card memory.

There are also two other older packages called ZIP's and DIP's. You won't see these anymore but they are through hole technology and can only be mounted on one side of a circuit board. DIP's are like TSOP's in that they are lower profile but cover a greater area than ZIP's. 

Composite and Non-Composite ("True") 

Many larger 64MB and 128MB DIMM"s and SIMM's are available in composite (C) and non-composite (T for "true") configurations. Non-composite SIMM's work with all systems that support 64MB or 128MB SIMM's or DIMM's. They are generally composed of 8 or 16 8MB chips. Composite SIMM's or DIMM"s are made up of 32 or more less expensive 4x4 SOJ's or TSOP's. In effect composite SIMM's are like SIMM trees. Occasionally you'll even find composite 16MB and 32MB composite SIMM's that use 1x4 SOJ's or TSOP's on the used market. In general most SIMM's and DIMM's are non-composite unless otherwise noted.

The problem with composites is that they have more rows to access than normal non-composite SIMM's or DIMM's. This increases the capacitance and may decrease the signal speed (See the Buffer RAM and Voltage section). Many systems like Macintosh's do not support composite SIMM's because they do not have a strong enough load driver. Some Intel and AIR motherboards that use the AMI BIOS do support composite SIMM's and DIMM"s. Composite SIMM's and DIMM's are cheaper alternatives if your on a budget and you have a motherboard that supports them. But be careful buying composite SIMM's if your not sure if your motherboard will support them. 

Buffered RAM and Voltage 

Recently larger 64MB and 128MB SIMM's and DIMM's have been introduced. These modules frequently stack multiple chips (loads) in what is known as a composite architecture (See the Composite section of this page). This increases the total load on the memory bus in the form of increased capacitance. This is a bad thing. If you recall, a capacitor in a DC circuit will charge like a battery when a voltage is applied across it and it will discharge a current when the voltage is removed. If a pulse of voltage is sent through a circuit with a capacitance its signal will rise slowly and decay slowly. Capacitance in a circuit adds a delay in response Signals, such as memory address lines, are sent encoded as a series of voltage pulses. Therefore, an increased capacitance has the effect of slowing the overall rate at which signals may be conducted. The ability of the motherboard to overcome this effect is know as the load driving capacity of the board. This is generally not sufficient to drive larger modules with higher capacitance. Rather than redesign the motherboard and increase its load driving capacity, the industry has opted to modify the memory module by adding a line driver (buffer) such as an 74F244. This is a single chip on the memory module which is much smaller than a normal RAM chip. You can identify buffered modules by looking for this chip. The line driver handles all loading of the RAM chips and redirects all incoming and outgoing signals. It introduces some delay but this is more than offset by the increase in signal rise and fall times. So the net effect is to increase the speed of memory access. In practical terms, using a line driver (buffer) is more important when you install more than one bank of very high density SIMM's or DIMM's.

An alternate method of dealing with the issue of increased capacitance is to merely lower the voltage of the signal sent to the memory module. It takes less time for a signal to rise and fall if the voltage is lower. If the voltage has to be increased from 0 to 5v to send a signal across a capacitor it will take a long time for the capacitor to charge and decay. Likewise if the signal it sent to a memory module with a higher capacitance it will take a long time for the signal to rise and fall. If the pulse is merely from 0 to 3.3v than it will take much less time for the signal to rise and fall. Therefore lowering the voltage of the memory module from 5v to 3.3v has the effect of speeding up the system just like the case of adding a memory buffer. You can even combine lower voltage and a buffer to augment signal speed.

Older SIMM's use 5 volt technology. Newer DIMM's are available in both 3.3v and 5v formats. Some "mixed-mode" modules are actually 3.3v modules that can run at 5v. Many 3.3v DIMM's can actually run at 5v and therefore can be mixed with 5v DIMM's but in practice it is not a good idea to mix 3.3v DIMM's with 5v SIMM or DIMM's unless your absolutely sure about the manufacturer's IC specifications. On newer motherboards that use both 5v SIMM's and DIMM's it is a good idea to use 5v DIMM's if you plan to use the SIMM socket. If you don't plan to use SIMM's at all you can use all 3.3v DIMM's. The choice of voltage you use is not dependent on the motherboard chip set. 

SPD EPROM's on SDRAM

Some of the Pentium II motherboards using the 440LX PCIset require an EPROM on the SDRAM DIMM module. For example, motherboards from Intel and Super Micro require this EPROM. Boards from AIR and ASUS do not require an EPROM. However, all boards will accept DIMM's with EPROM's. If an EPROM is present and it is not required, it will be ignored by the motherboard BIOS. Pentium motherboards using the 430TX and 430VX chip sets do not require an EPROM. If it is present it will be ignored. It is always safe to buy DIMM's with EPROM's.

The EPROM is a small SPD (Serial present detection) chip made by National Semiconductor. When present and supported the EPROM will pass the specifications of the module to the motherboard BIOS so the motherboard BIOS can setup the timings properly. If your motherboard does not support the use of EPROM's you will have to set SDRAM settings by hand. This requires knowledge of the motherboard and SDRAM DIMM. You can start by setting CAS to RAS and Prefetch timings both to 3T and MA to slow. On boards that support the EPROM you can leave settings to auto-detect. 

The meaning of SIMM, DIMM and the number of pins 

Your processor communicates to the memory and add on cards through a set of parallel channels in a bus. The number it bits in the bus is merely the number of channels present. For example, motherboards using the Triton-II (430HX) and the Natoma (440FX) have a 64 bit memory bus (connecting the processor to the cache and the cache to the memory) and a 32-bit PCI bus (connecting to the add on cards). The wider the bus (i.e. the more bits) the more data that can travel per cycle thus the faster the data transfer. The data transfer rate is the width of the bus in bits times the number of transfers per second in megahertz times the data size. However, if you have a 64-bit bus you must access your memory 64- bits at a time. Each SIMM is only 32-bits wide so you must install them in pairs to get the full 64-bits. Then two SIMM's are accessed as one logical bank of RAM. Older boards that accept a single SIMM are have only a 32-bit memory bus and this reduces speed by half. There is also a small advantage to using two banks or four SIMM's of the same size.

SIMM's connect to the motherboard bus via 72 Gold or Tin (Palladium Nickel) leads. Indeed SIMM's are called Single In-line Memory Modules because that have 72 pins in a single row of in- line leads. Older SIMM's had 30-pins. Other Memory modules such as DIMM (Dual In-line Memory Modules) have all 64-bits (on 128-pins) in one set of leads so you only need to install one DIMM per bank.

Nearly 99% of motherboards use Tin (Palladium Nickel) which can plate onto the noble Gold of SIMM's in a cathode-anode reaction. According to semi- conductor corrosion specialists the amount of corrosion possible can not effect the proper function of the SIMM's because of the small contact surface. If it was true the finest SIMM makers like NEC would stop production of GOLD SIMM's immediately. Indeed, Tin is a little harder to work with because it tarnishes faster. Most engineers who design SIMM's agree that gold is the best. Only Intel has claimed otherwise. Indeed, no such cases of corrosion have been reported. 

2K and 4K Refresh, Columns and Rows

The Joint Electronics Design Engineering Council (JEDEC) has two approved refresh types for 16Mb ICs called 2K and 4K. In rare cases modules with 1K refresh are encountered. The performance of 1K, 2K and 4K ICs is similar in most cases, but the newer 4K 16Mb ICs use less energy. 4K ICs conserve about 20mA of energy under worst-case conditions because they require fewer columns. However, 4K designs have a half the page depth of 2K devices. 

Common Types of EDO and FPM Chips
Size (MB) Size (Mb) Configuration Refresh tREF Banks Pins
512KB 4Mb 1Mx4 1K 16ms 1 4 sets of 5 pins SOJ
2MB 16Mb 4Mx4 2K or 4K 16ms 1 4 sets of 6 pins SOJ or TSOP
2Mx8 2K 32ms 4 sets of 7 pins SOJ or TSOP
1Mx16 1K 64ms 50-pin SOJ or TSOP
8MB 64Mb 16Mx4 4K or 8K 64ms 1 4 sets of 8 SOJ or TSOP II
8Mx8 4K or 8K 64ms 4 sets of 8 SOJ or TSOP II
4Mx16 4K 64ms 50-pin TSOP II

16Mb EDO ICs typically implement 2K technology:
Traditional 2K technology is implemented on most older 16Mb EDO RAM chips. In the 4Mx4 scheme bits of data are stored in a large array with 2,048 (2Kb) columns and 2,048 (2Kb) rows. Each bit in this array is uniquely identified by a pair of values called the column-address bit and a row-address bit. To address 2,048 columns there must be 11 column-address bits ( 2^11 = 2,048). Likewise there must be 11 row-address bits. Thus each bit is identified by a unique pair or coordinates (column-address_bit_value, row-address_bit_value). This allows for addressing of all 4,194,304 bits (2,048^2) or 4Mbs. These bits are generally denoted as A0-AXX where XX is the last bit.

A term called the page depth is used to denote the number of columns in a chip. Thus a chip with a 2K refresh has a 2K page depth. The cycle refresh is a term used to refer to the number of rows. Thus a chip with a 2K refresh has a 2K cycle refresh.

Our description of this array has only two dimensions so far; columns and rows. In the final IC four such arrays are stacked in a 3 dimensional data structure. A set of arrays are stacked like cards in a deck. The number of arrays in the stack is generally 4, 8 or 16. For this example four arrays will be used. We say that this data structure is four bits wide. The total number of bits in a chip then is 16,777,216 (4,194,304 bits per 2D array x 4 arrays). Thus each chip can store 16Mbs or 4MBs of data. Eight such chips are used in a 16MB module (8 x 16Mb chips = 16MB module) and 16 are used in 32MB modules (16 x 16Mb chips = 32MB module).

Periodically the charge stored in each bit must be refreshed or the charge will decay and the value of the bit of data will be lost. DRAM (Dynamic Random Access Memory) is really just a bunch of capacitors that can store energy in an array of bits. The array of bits can be accessed randomly. However, the capacitors can only store this energy for a short time before it discharges it. Therefore DRAM must be refreshed (re-energizing of the capacitors) every 15.6s (a microsecond equals 10-6 seconds) per row. Each time the capacitors are refreshed the memory is re-written. For this reason DRAM is also called volatile memory.

Using the RAS-ONLY refresh (ROR) method, the refresh is done is a systematic manner, each column is refreshed row by row in sequence. In a typical EDO module each row takes 15.6s to refresh. Therefore in a 2K module the refresh time per column would be 15.6s x 2048 rows = 32ms (1 millisecond equals 10-6 seconds). This value is called the tREF. It refers to the refresh interval of the entire array.

In some 16Mb ICs a 4K refresh scheme is utilized where the number of columns has been decreased. To decrease the number of columns and keep the total number of bits the same in the array, the number of rows must be increased. In this scheme bits of data are stored in a large array with 1Kb (1,024) columns and 4Kb (4,096) rows. To address 1,024 columns there must be 10 column-address bits ( 2^10 = 1,024). Likewise there must be 12 row-address bits ( 2^12 = 4,096).

Thus a chip with a 4K refresh has a page depth (number of columns) of 1K and a 4K cycle refresh (number of rows).

64Mb EDO ICs typically implement 4K technology:
In 64MB and 128MB modules 64Mb chips are used. Like 16Mb EDO ICs, 16MX4 64Mb chips are based on a three dimensional array. However in 16MX4 chips a common configuration features 4,096 columns by 4,096 rows by 4 bits. This yields 67,108,864 bits or 64Mbs or 8MBs per chip.

64Mb EDO ICs use CBR:
In 64Mb ICs, the ROR method would require a 7.8s per row refresh interval with an 8K refresh device using the ROR method. Instead in 64Mb EDO ICs an internal double-row refresh is implemented with CAS-BEFORE-RAS (CBR) cycles. Using this method a 15.6s per row refresh interval can be maintained with 8K rows. Future 256Mb chips promise 7.8s or even a 3.9s per row refresh rates. 

Types of SDRAM Chips
Size (MB) Size (Mb) Configuration Banks Config per Bank Pins
2MB 16Mb 4Mx4, 2Mx8 2 2Mx4, 1Mx8 44-pin TSOP
8MB 64Mb 16Mx4, 8Mx8, 16Mx4 4 4Mx4, 2Mx8, 1Mx16 54-pin TSOP
16MB 128Mb 32Mx4, 16Mx8, 8Mx16 4 8Mx4, 4Mx8, 2Mx16 54-pin TSOP
32MB 256Mb N/A N/A N/A N/A
128MB 1Gb N/A N/A N/A N/A

64Mb and 128Mb SDRAM ICs are typical implementations of 4K technology using four banks:
In 64Mb ICs four 3D arrays may be organized into four separate banks (quad-bank). Each bank is addressed by the values of a pair of bank bits. These are usually denoted BS0 and BS1. Banks may be interleaved as discussed in the next section. A bank typically stores 16,777,216 bits. The total number of bits in a chip then is 67,108,864 (16,777,216 bits per bank x 4 banks). Thus each chip can store 64Mbs or 8MBs of data. Eight such chips are used in a 64MB module and 16 are used in 128MB modules.

A 64Mb 4K chip may be organized in a variety of ways. A typical module will have four banks. Each bank in turn may include a 4Mb array that is 4 bits wide. The 4Mb array will have 4K rows and 1K columns. Such a chip would be called a 16Mx4. Note that the 16M refers to the 4Mb array times four banks, while the x4 refers to the number of bits per bank.

Alternately a bank may have a 2Mb array that is 8 bits wide. The 2Mb array will have 4K rows and 512 columns. Such a module would be called a 8Mx8. The 8M refers to the 2Mb array times four banks. The x8 refers to the number of bits per bank.

Finally a bank may have a 1Mb array that is 16 bits wide. The 1Mb array will have 4K rows and 256 columns. Such a chip would be called a 4Mx16. The 4M refers to the 1Mb array by four banks. The x16 refers to the number of bits per bank.

In a typical SDRAM module each row takes 15.6s (1 microsecond equals 10-9 seconds) to refresh. Therefore in a 4K module the refresh time per column would be 15.6s x 4096 rows = 64ms (1 millisecond equals 10-6 seconds). Thus a 4K module typically has a tREF of 64ms.

Refresh periods are alternated with periods in which the data can be accessed in a cycle called the RAS Cycle (tRC). The RAS Cycle has two periods. The first period is the called the tRAS. During tRAS a bank my be marked as ACTIVE. After being marked as ACTIVE the data in this bank can be extracted or written with a READ or WRITE command. After the data if any is out, the bank then enters a RAS Precharge Cycle (tRP). During tRP the bits are refreshed. The refresh can be executed automatically after a READ or WRITE command saving time if AUTO REFRESH or AUTO PRECHARGE is enabled. The Latency of tRP is frequently masked by Memory Interleaving of banks as described in the next section. In this case one bank is in Active mode (tRAS) while an alternate bank is in refresh mode (tRP). These banks may alternate or several banks may alternate so that the effects of tRP is insignificant to the total data output on the bus. tRP is has the least significant effect when long reads or writes are conducted. It can have a critical impact with short reads or writes where row changes are common such as in the case of matrix math. In this case SDRAM tRP timings become critical. 

Refresh Terminology
Symbol Name Meaning
tRP RAS Precharge PRECHARGE (RAS) to ACTIVATE (RAS) Command Period (analogous to the CAS# before RAS# (tCBR) for EDO)
tRC RAS Cycle Refresh/ACTIVATE to Refresh/ACTIVATE Command Period (tRC = tRAS + tRP)
tRAS RAS Clock ACTIVATE (RAS) to PRECHARGE (RAS) Command Period 
tREF Refresh  Refresh Period

Memory Interleaving 

The demands of new higher speed processors and parallel processing has made memory throughput (bandwidth) a bottleneck in modern computer systems. Memory interleaving is a technique implemented to increase the maximal throughput of data a memory system can provide per unit time. However, memory interleaving does not effect memory latency as discussed in the cache section.

Memory interleaving is implemented by dividing the memory system into a number independent banks which can answer read or write requests independently, in parallel. For example the Intel Orion 450GX chip set (discontinued) for the Intel Pentium Pro processor used four-way interleaved memory by dividing its memory into four banks. The most extreme interleaved design is used in current SMP vector supercomputers (Cray) which may have up to 256 way interleaved memory banks!

In a typical four-way interleaved memory system the SIMM's are divided logically into four banks. When lines of data are written to the memory four lines of data may be written simultaneously because each line can be written to each bank separately, in parallel. In contrast, in a non-interleaved system only one line may be written to memory in the same amount of time. Therefore four-way interleaved memory can read and write data four times faster than non- interleaved memory at its maximal rate. Imagine what a 256 way interleaved memory supercomputer can do! You may want to note that this technique of interleaving is analogous to disk striping in a RAID system (see our hard drive technical section) and increases through put in the same manor.

To obtain the maximal throughput of interleaved memory the data must be prefetched. Prefetch techniques are automatically utilized by pipelined and superscalar CPU's. The use of Prefetch loops is particularly important for iterative loops like "for (i=1; i&ltn; i++) { ... a[i] ... }." Where the CPU must Prefetch elements of a[i] from memory before they are actually called for. Problems are also encountered in matrix and vector math when an operation requires access to data in a sequence that is some multiple of the interleaving. In this cases all the data needed is in the same memory banks. However, modern compilers, especially Fortran compilers will address such issues for the programmer.

Most PC chip sets like the Intel Triton-II (430HX and 430VX) and the Intel Natoma (440FX) do not support memory interleaving because Intel does not believe PC users saturate their memory bus. However, many number crunching applications do saturate the memory bus. Increasingly Multi-processor PC's are being used for such applications and we hope to see future Intel Chip sets support memory Interleaving. We hope you send your thoughts on this to Intel. (Hint). 

PC100 SDRAM Latency 

The Clock Cycle Time:

A computer sequences its tasks based on an oscillator which is basically a clock. The faster the clock runs, the faster the tasks get done. The speed of the clock is measured by its frequency in MHz (Millions of cycles per second). Computers based on the 440BX, 440GX and 440NX PCIsets have a bus speed of 100MHz. This means the bus oscillates 100,000,000 times per second. The duration of a single oscillation is known as the Clock Cycle Time (tCK). Clock cycle times are measured in nanoseconds. 1 nanosecond (ns) = 10e-9 or 0.000000001 seconds or one billionth of a second. tCK is given by the inverse of the frequency (tCK = 1/frequency). A 100MHz oscillator gives a 10ns clock cycle time.

Therefore, the Intel PC100 Specification requires a maximum value of 10ns for tCK. A Clock Cycle Time of 8ns is currently the fastest available. 8ns modules do not necessarily perform faster than 10ns modules on a 100MHz bus. Performance is based on a number of factors that will be described below. However, 8ns modules are capable of running on a 125MHz bus and are therefore of interest to over clockers.

Programming SDRAM modules with SPD EPROM's, BIOS settings and 3-2-2 notation:

SDRAM ICs are programmable. Values for programmable parameters are stored in the IC's Mode Registry at boot time. These values are set in a programmable register in the SDRAM IC which is set at power-up and remains set until the system is powered off. Programmable parameters include the Burst Mode, the CAS Latency (CL), the RAS Precharge Latency (tRP) and the The RAS to CAS delay (tRCD). Values for these attributes are read from the SPD EPROM on the module (See the SPD EPROM section) by the BIOS. The BIOS may use the SPD values, or the values may be overridden depending on the settings in your CMOS setup program. The final values are recorded in the 440BX registry. In a section below we document how to read these values from the 440BX registry.

In most cases these values must be overridden because the values in the SPD EPROM are incorrect. This is because the value of many parameters is dependent on bus speed. Since PC100 SDRAM may be used on boards with a 66MHz bus as well as boards with a 100MHz bus, the values stored in the SPD will frequently be set for the wrong bus speed.

In the Award BIOS you can find these settings in the "Chipset Features Setup." Change the "SDRAM Configuration" value from "By SPD" to "Disable." You can then set the respective values of "SDRAM CAS Latency," "SDRAM RAS to CAS Delay" and "SDRAM RAS Precharge" to the values of CL, tRCD and tRP respectively. In the CMOS setup these values are recorded as by values of clock tics denoted by 2T or 3T.

These three values, CL, tRCD and tRP, are often quoted with the notation 3-2-2 referring to tCL-tRCD-tRP. This should not be confused with burst timing notation as described elsewhere on this page. A value of 3-2-2 is optimal for current PC100 SDRAM on the market. 2-2-2 PC100 SDRAM has only been produced in low yields as described above. Module tolerances of 3-2-2 and 2-2-2 are most common. However, the Intel PC100 specification allows for a variety of tolerances ranging from 3-3-3 to 2-2-2.

Frequently a fourth parameter is referred to as the "DRAM Idle timer" in the BIOS. This is also known as the RAS cycle time (tRC). In the PC100 SDRAM specification this has a maximal value of 8. In an alternate notation 3-2-2 modules with a tRC of 8 are referred to as /3/2/2/8. The minimum standard for PC100 SDRAM is 3/3/3/8.

The operation of SDRAM in relation to latency:

We have already covered topics relating to refresh and latency above. We have defined the RAS Cycle (tRC), the RAS Precharge Cycle (tRP), tRAS, AUTO REFRESH and AUTO PRECHARGE as well as the impact of continuous reads or writes versus short reads and writes on these parameters. Now we can look more closely at the process of activating a bank and reading or writing data to that bank.

As discussed previously, commands and data addresses control access to banks of data. Commands are issued by holding the voltage low or high across several leads. Four very important leads are CS#, RAS#, CAS# and WE#. When CS# is high the other three leads are ignored. When CS# is held low the voltages across RAS#, CAS# and WE# determine a set of commands used to manipulate the bank in question. 

Commands issued for values of RAS#, CAS# and WE# when CS# is Low
RAS# CAS# WE# Meaning
LOW HIGH HIGH Bank Activate
LOW HIGH LOW Bank Precharge
HIGH LOW HIGH WRITE
HIGH LOW LOW READ
HIGH HIGH X Power Down
LOW LOW HIGH Self Refresh
HIGH HIGH LOW BURST STOP
HIGH HIGH HIGH NOP (Non-Operational) similar to CS# High
LOW LOW LOW Mode Register Set

To access data an address must be setup before the clock by about 1 clock cycle (tAS). Then the ACTIVATE (RAS) command must be executed on an idle bank to put it in an active mode. A READ or WRITE (CAS) command is issued after the ACTIVE (RAS) command. The period between the ACTIVE command and the subsequent READ or WRITE command is the RAS to CAS delay (tRCD).

Data is read out sequentially, synchronized with the positive edge of CLK. Data is sequentially bursted according to the burst mode set in the Mode Registry. The burst mode may be set to 2, 4 or 8 words or even to a full page (the entire column). The initial data from the READ command is available (data out) after a period equal to the value of the the CAS Latency (CL) as set in the Mode Registry.

Therefore the total access time to the first data word is generally equal to tAS + tRCD + CL. This may be something like 1+2+3 = 6. Subsequent data is bursted out synchronous with the clock at one clock per word for the duration of the burst mode.

As noted above the PC100 specification requires a minimum standard of 3/3/3/8. The specification also requires a 6ns or less Access Time from Clock (tAC 6ns). tAC is basically the time in nanoseconds for data to be read.

Most PC100 modules have a CL of 3. The Samsung -GH and Micron -8D and -8E have a CL of 2. Modules with a CL of 2 are difficult to produce in ample yields and are therefore much more expensive than modules with a CL of 3. The lower CL only increases total SDRAM performance marginally (~5%) and therefore are considered to offer too little bang for the buck. However, CL 2 modules are of some interest to those who over clock their systems to bus speeds of 112MHz. We do not suggest over clocking the bus speed. However, we do carry the faster modules for those who are interested in them.

The values listed in specifications may be confusing because they are alternately listed in clocks tics or nanoseconds. For example, the RAS to CAS delay, tRCD, is the number of clock cycles allowed between the ACTIVATE (RAS) command and the READ (CAS) command. The value of tRCD is generally 20ns or 30ns (2 or 3 clocks). The value of the tRCD latency in nanoseconds may be divided by the bus speed in nanoseconds (100Mhz is 10ns) and then rounded up to the nearest whole clock value. For example a tRCD of 20ns would yield a latency of 2 clocks cycles (20ns / 10ns). The RAS Precharge (tRP) is typically 20ns of 30ns. (tRP is analogous to the CAS# before RAS# (tCBR) described above for EDO RAM).

We have found that the values set in the SPD are frequently incorrect. Indeed, it is not possible to set these values correctly for all bus speeds. These values should be overridden. Lets look at CAS latency (CL) as an example. Recall That CL is the time from the READ command to the first data out. If a module is rated as 3/2/2/8 (which most are) to what value should the SPD CL be set? Suppose the module is used on a 440BX board with a 333MHz CPU. The bus speed would be 66MHz which has 15ns tCK. A 3/2/2/8 modules has a CL of 30ns. Therefore on a 66MHz bus the CL should be set to 2 (30ns/15ns) in the SPD. Suppose the same module is used on the same 440BX board with a 400MHz CPU using a 100MHz bus. The bus speed would be 100MHz which has 10ns tCK. A 3/2/2/8 modules has a CL of 30ns. Therefore on a 100MHz bus the CL should be set to 3 (30ns/10ns). To which value should this module's SPD be programmed? 2 or 3? There is no correct answer. It doesn't appear that Intel took this into account when writing the specifications of the SPD. In actual practice you'll find SPD set to very odd values indeed. Therefore, override these values in your BIOS and set them up by hand the old fashioned way. Generally you'll set these values to 3/2/2/8.

What happens if you don't set the module parameters correctly? If you set the parameters to conservatively your system may perform about 20% too slow. But no other harm will be done. If you set your module parameters too fast, you will not damage the module nor the motherboard. However, you will end up with data corruption. This corruption may be insidious and difficult to detect at first. But eventually it will catch up to you. To test for data corruption try unpacking large software packages like the entire X Windows system and running check sums. This seems to be a fairly good test for this type of data corruption. 

Latency and Timing Terminology
Symbol Name Meaning
tCK Clock Cycle Time The Period of one Clock Cycle (CLK)
tCH CLK High Level Width The Pulse Width of CLK measured from the positive (leading) edge to the negative (trailing) edge of the wave form. (tCH = tCK - tCMS)
tCL CLK Low Level Width The Pulse Width of CLK measured from the negative edge to the positive edge of the wave form. (tCL = tCK + tCMH)
tAC Access Time DATA Access Time from Clock 
CL CAS Read Latency READ/WRITE (CAS) Command to Data Out Period
tRCD RAS to CAS Delay ACTIVATE (RAS) to READ/WRITE (CAS) Command Period
tRP RAS Precharge PRECHARGE (RAS) to ACTIVATE (RAS) Command Period (analogous to the CAS# before RAS# (tCBR) for EDO)
tRC RAS Cycle Refresh/ACTIVATE to Refresh/ACTIVATE Command Period (tRC = tRAS + tRP)
tRAS RAS Clock ACTIVATE (RAS) to PRECHARGE (RAS) Command Period 
tCCD CAS to CAS Delay READ/WRITE(CAS a) to READ/WRITE(CAS b) Command Period
tRRD RAS to RAS Delay When interleaving two banks this is the period between the ACTIVATE command of RAS Bank A and the subsequant ACTIVATE Command on RAS b. tRP is thus hidden.
tWR WRITE WRITE (CAS) Recovery Period 
tOH Output Data Hold A holding period waited while valid data it output.
tHZ Output Data High Impedance tOH plus a residual hold time required due to impedance of the circuit.
tLZ Output Data Low Impedance Output Data Low Impedance Period
tDS Data-in Setup Data-in Setup Period 
tDH Data-in Hold Data-in Hold Period 
tAS Address Setup Address Setup Period 
tAH Address Hold Address Hold Period 
tCMS Command Setup Command Setup Period 
tCMH Command Hold Command Hold Period 
tCKS Clock Setup Clock Setup Period 
tCKH Clock Hold Clock Hold Period 
tREF Refresh  Refresh Period 
tRSC Mode Register Set Cycle  Mode Register Set Cycle Period issued during startup to program CAS Latency, Addressing Mode and Burst Length

Reading your register settings under Linux:

Register values for the 440BX PCIset are listed in the 440BX Specification in section 3.3. 3.3.24 lists the values for tCL, tRCD and tRP. A 16 bit address under control register 0 (Device 0), at address offset 76h77h defines these values. The default value is 00h. The three least significant bits are defined such that:

Bit 0:
SDRAM RAS# Precharge (tRP). This bit controls the number of DCLKs for RAS# Precharge.
0 = 3 clocks of RAS# Precharge.
1 = 2 clocks of RAS# Precharge.

Bit 1:
SDRAM RAS# to CAS# Delay (tRCD). This bit controls the number of DCLKs from a Row Activate command to a read or write command.
0 = 3 clocks will be inserted between a row activate command and either a read or write command.
1 = 2 clocks will be inserted between a row activate and either a read or write command.

Bit 2:
CAS# Latency (CL). This bit controls the number of CLKs between when a read command is sampled by the SDRAMs and when the 82443BX samples read data from the SDRAMs. If a given row is populated with a registered SDRAM DIMM, an extra clock is inserted between the read command the when the 82443BX samples read data. For a registered DIMM with CL=2, this bit should be set to 1.
0 = 3 DCLK CAS# latency.
1 = 2 DCLK CAS# latency.

If you wish to read the values of these bits you can download and run 440bx.c. Compile 440bx.c with "gcc -O -c timing timing.c." The program will dump the bit values of address offset 76h of device 0. For example, a board set to 3-2-2 would read something like:

*** 440BX SDRAM Memory Timing ***

CAS# Latency: 3 DCLKs
RAS# to CAS# Delay: 2 DCLKs
RAS# Precharge: 2 DCLKs

Comparing Various ICs of PC100 SDRAM:

Typically the suffix on the chip part number can be used to compare various modules. However, with the recent hype of PC100 SDRAM, misleading markings have appeared that have confused end users. The following tables compare various models currently marketed as PC100. Note also that according to our tests, modules with identical specifications may actually perform quite differently. See data on the 440BX Page.

SAMSUNG
 

Part# Suffix Performance Clock Cycle Time (tCK) Access Time from Clock (tAC) CAS Latency (tCL)
-10 NOT PC100 10ns 7 ns. 3
-GL PC100 10 ns. 6 ns. 3
-GH PC100 10 ns. 6 ns. 2
-8 PC100 8 ns. 6 ns. 3

TOSHIBA
 

Part# Performance Clock Cycle Time (tCK) Access Time from Clock (tAC) CAS Latency (tCL)
TC59S6408BFT-10 NOT PC100 10 ns. 7 ns. 3
TC59S6408BFT-80 PC100 8 ns. 6 ns. 2

LGS
 

Part# Suffix Performance Clock Cycle Time (tCK) Access Time from Clock (tAC) CAS Latency (tCL)
-10K NOT PC100 10 ns. 8 ns. 3
-8 PC100 8 ns. 6 ns. 3
-7J* PC100 10 ns. 6 ns. 3

* Note that the LG RAM -7J is a 10ns Cycle Time

MICRON
 

Part# Suffix Performance Clock Cycle Time (tCK) Access Time from Clock (tAC) CAS Latency (tCL)
-10 NOT PC100 10 ns. 7 ns. 3
-8C PC100 8 ns. 6 ns. 3
-8E PC100 8 ns. 6 ns.  2

 
The Quality of the Module:

The timing specifications of various ICs should not be the only factor that you use to choose a module. The quality of the module can frequently have a greater impact on the actual performance of the module than the ICs used and their respective specifications. Variation in the length, width and depth of traces as well as the placement of traces can profoundly impact performance. Moreover, the proper quality of source resister packs must be used to cancel reflected signals. The type of PCB can even effect performance. In sum, many factors can influence the quality and timing of signals sent from the IC to the bus especially with the tight tolerances required for PC100 complient modules. Selecting a good quality module in many cases is more important than the choice of chip.

Learn more about PC100 SDRAM performance on 440BX Page

EDO Speed 

The speed of the RAM is measured in nanoseconds. 1 nano-second (ns) = 10e-9 or 0.000000001 seconds or one billionth of a second. RAM is typically 60 or 70ns. However, the actual speed is generally a little better than the rating. If you look at your SIMM there is a serial number on each DRAM chip. The last digit is either a 7 or a 6 which references the speed of the SIMM as 70ns or 60ns respectively. Now you know how to read the speed of the SIMM.

The important thing to remember is that if you don't want wait states you need to buy 60ns RAM if memory bus speed is 66MHz and you can use either 60ns or 70ns RAM if your memory bus speed is 60MHz. The bus speeds for several processors are listed in the table below.

A computer sequences its tasks based on an oscillator which is basically a clock. The faster the clock runs, the faster the tasks get done. The speed of the clock is measured by its frequency in MHz. Time=1/frequency for example a 1000MHz oscillator gives a 1ns clock cycle. A 100MHz oscillator gives a 10ns clock cycle. However PC's divide the clock frequency by two. So a 66MHz PC uses a frequency of 33MHz which give you a 30ns clock cycle. The clock cycle is called the T state and everything happens as a multiple of on T state. A single read or write to memory takes 2 T states or 60ns. If the read or write can not be accomplished in 2T states, an extra T state (a wait state) is added. Adding a single wait state to a 2T state process leads to a 50% performance loss! 

Comparison CPU Bus Speeds 

CPU Bus Speed Clock Multiplier
Intel P75 50 1.5x
Intel P90 60 1.5x
Intel P100 66 1.5x
Intel P120 60 2.0x
Intel P133 66 2.0x
Intel P150 60 2.5x
Intel P166 66 2.5x
Intel P200 66 3.0x
Intel PP180 60 3.0x
Intel PP200 66 3.0x

Cache 

DRAM memory has very high latency compared to present day CPU speeds. The latency is the time for a memory module to respond to a read or write request from the CPU. For example a Pentium Pro CPU operating at 200MHz can issue transactions every 5ns (1/200,000,000). In comparison DRAM memory can only respond every 60ns's. For this reason, SRAM (Static Random Access Memory) with speeds typically ranging from 7 to 15ns is used between the CPU and RAM as a buffer or so called cache.

SRAM, like DRAM, is another type of volatile memory which can retain its data without refresh as long as electrical power is supplied. It is generally much faster than DRAM and generates zero wait sates. DRAM is cheaper because it requires only two transistors per bit, while SRAM requires 6 or 8 transistors per bit. 

The Cache Hit Ratio 

Cache SRAM is used to hold data the processor needs right away. Statistical analysis of modern computers shows that generally 90% of the data needed by the CPU is in the cache - this is called the hit ratio. If the CPU finds the data it needs in the cache it is called a hit, otherwise it must than seek the DRAM and this is called a miss.

Because cache SRAM memory is faster than DRAM memory, the hit ratio is proportional to the effective latency of the memory architecture. If we call the cache hit ratio H, and the latency of the cache and DRAM Tc and Tm respectively, then the effective (average) latency of a memory architecture with cache is given by:

Teff= H * Tc + (1-H)*Tm

The increase in speed due to cache is the ratio of Tm to Teff and increases exponentially with H:

S=Tm/Teff = 1/(1-H(1-Tc/Tm))

Direct Mapping Cache 

In order to achieve such a high efficiency or hit rate the cache must transfer only the most crucial data from the main memory to itself. This is accomplished by cache mapping techniques. The most important cache mapping technique is set associative cache mapping. However, this is very complex and in order to understand it you must first understand direct mapping and fully associative mapping.

The simplest form of mapping is direct mapping. All the data in the memory and cache is organized into logical blocks called cache lines. Data is transferred from the memory to the cache in cache lines. A cache line generally contains four words. A word in generally 32 or 64 bits of information. However, the cache can only hold a fraction of the cache lines held in the main memory. If we express the number of words in the cache memory as 2^a and the number of words in the main memory as 2^b then the fraction of data held by the cache is 2^a divided by 2^b or 2^(a-b) where a < b.

The data in the memory are broken into what are called equivalence classes. Each equivalence class is composed of 2^(b-a) elements or cache lines. For each equivalence class in the main memory there is an index plus a cache line in the cache. When the CPU seeks the cache for data is looks for the proper equivalence class and index in the cache. If this equivalence class holds that proper cache line there is a hit. If not it increments the index and pulls the proper cache line from the main memory and this cache line plus its index replaces the existing equivalence class and index in the cache memory.

This is conceptually the simplest form of mapping and it yields very fast access. For example, because each cache line is composed of four words. If several consecutive words are sought by the CPU than a cache miss will only occur once every forth word. However, if any iterative process seeks words in multiples of four it will cause continuous cache misses just as in memory interleaving. This will also cause much unnecessary traffic. Hence the hit ratio is poor and not very efficient. 

Fully associative cache 

The fully associative cache improves on the direct mapping by allowing cache lines and tags to be placed anywhere in the cache from main memory as needed. The fully associative cache mapping technique often uses a least recently used (LRU)strategy. When a cache miss occurs cache lines are brought from main memory and they replace cache lines that have been least recently used. Thereby any data that has not been accessed recently (and therefore presumably will not be accessed in the near future) will be replaced by neighbors of recently sought data. It therefore significantly improve the hit ratios.

The main memory contains 2^a words total and these words are broken into cache lines of 2^d words each. This means that there are 2^a divided by 2^d cache lines or 2^(a-d) cache lines. The cache contains 2^b total words with cache lines of 2^d words each also. Each cache line in the cache has a tag of (a-d) bits to specify its memory address.

However, this makes searches very expensive. In an associative cache, 2^(a-d) lines must be searched to determine if an address is cached. With a cache hit only the last d bits of the tag address needs to be used to find the address in the cache - otherwise the whole address of the tag can be used to find a miss in the main memory. 

Set Associative Cache 

In a direct mapped cache we have a well indexed and easy to search data set in the cache. The data in the cache corresponds nicely with the data set in the RAM. However, most of this cached data is useless, yielding a high miss rate. In contrast, the fully associative cache has a lot of data that we need because the data is filled in using LRU. However, it is disorganized and searching its tags is a huge tour of force.

Enter the set associative cache! It combines the organization of the direct mapped cache with the high hit rate of the fully associative cache. A new parameter, k, is defined such that all the 2^(a-d) cache lines are indexed into 2^k elements called the associativity of the cache. Each element is easily found just like the direct mapped cache. Within each element are 2^(a-d-k) sets of cache lines. Each element uses associative addressing so that only 2^k comparisons are necessary per addressing operation.

So a set associative cache is just a set of fully associative cache indexed in the larger more organized infrastructure of the direct mapped cache. Put another way, if k=0 (each set has one cache line) we have a direct mapped cache. If the number of sets equals one we have a fully associative cache.

Hence modern computer systems are able to achieve extremely high hit rates of approximately 90% using the set associative cache. 

Burst timing (5-2-2-2 notation) 

Burst timing expresses the number of clock cycles it takes to read or write a cache line of data from a source to a destination. Generally a cache line is composed of four words of data, called dwords. So the burst timing expresses the number of clock cycles it takes to read or write each of these four words from a source to a destination. The source and destination can be the processor, the RAM, or the L2 cache.

The first word always takes more clock cycles to read or write. This is because the entire memory address of the first word must be sent wile the next three words are automatically bursted out as the memory logic increments the internal address pointer.

Burst timings are recorded in notation like 5-2-2-2. In this example the first word of the cache line takes five clock cycles and remaining three words take two clock cycles each. The transfer of all four words takes eleven clock cycles.

Different types of RAM have distinct maximal burst rates. I say maximal because the motherboard's chip set must support these burst rates for them to be utilized. Not all chip sets support higher burst rates. When you buy a new mother board it is important to look at the burst timings the chip set can support and pick RAM and cache that fully uses the maximal burst rate. Here are some characteristic maximum burst rates for different types of RAM and the CPU. Note that at higher bus frequencies burst rates slow down: 

Burst Timings 
DRAM Type Fastest Burst Rate
FPM  5-3-3-3
EDO  5-2-2-2 
BEDO  5-1-1-1
SDRAM 5-1-1-1
SRAM Type Fastest Burst Rate
Asynchronous  2-1-1-1
Sync Burst 2-1-1-1
Pipeline Burst 3-1-1-1
CPU Fastest Burst Rate
Pentium or Pentium Pro 2-1-1-1

For EDO RAM on a Triton-II (430HX) chip set, it is common to see burst rates like 5-2-2-2. Pipeline burst cache is much faster and can achieve burst rates 3- 1-1-1. In modern motherboards the burst rates are auto-detected by the BIOS. However, there is generally a screen in the CMOS Setup that allows you to override the defaults and change the burst timings. This may be useful if you are getting page faults or random lock-ups and you want to slow down the burst rates. However, this is probably a sign of a problem with your system. I've also had clients that have jumped into the CMOS setup and manually set the burst rates as high as possible. Then they wonder why their system is crashing all the time. Don't do that! You will corrupt your data and suffer intermittent crashes. Let your system run as it was designed.

The Intel Pentium and Pentium Pro CPU's can achieve a maximal burst rate of 2-1-1-1. In order to keep the system from experiencing wait states the cache must be able to supply data to the CPU with a 2-1-1-1 burst timing. Pipeline burst SRAM comes close to doing this as it burst at 3-1-1-1. However, when there is a cache miss the Main Memory must be accessed which means the burst rate falls all the way down to 5-2-2-2 in the case of EDO DRAM. Therefore larger caches are crucial to multi-tasking operating systems that have large memory requirements. 

Pipelined Burst, Synchronous and Asynchronous SRAM 

The table below shows the burst timings of various types of cache at various bus speeds. The shaded region indicates the best price per performance at a given bus speed. Since present motherboard bus speeds range from 50 to 66MHz, the best cache to use would be Sync Burst SRAM. However, Sync Burst SRAM has never been brought into full production so Pipeline Burst SRAM is used instead. Note that Sync Burst SRAM can achieve burst rates of 2-1-1-1 which is the same as the CPU so it can actually support zero wait states.

Older PC's used Asynchronous SRAM which had a maximum clock-to-data times of 15ns. However, at higher bus speeds Asynchronous SRAM generates very slow burst timings. New technologies including bursting, pipelining and synchronization with the system clock have increased maximum clock-to-data times to less than 6ns while maintaining good burst timings.

Synchronous DRAM is synchronized to the system clock and all signals are triggered on a clock edge. Very few systems support Synchronous DRAM. Burst capability, as described by burst timings, is the capacity of the RAM to auto- increment its address pointer and retrieve four words of data at a time. The first word takes longer to find, but once the absolute address is found the pointer logic incrementally adds three sets of words to the address and downloads the following three words in anticipation that the program will ask for this sequential data next. Pipelining is an architecture analogous to an assembly line. Data can be passed through a multi-staged pipe. Each stage of the pipe performs a different task. In this way several pieces of data can travel down the pipe like cars on an assembly line. In the case of Pipelined Burst SRAM input or output registers are added to the cache so that one set of registers can start to read new data while current data can be written. The register loading slows down the cache a little but this is more than offset by the ability to overlap the read and write tasks of different cache lines. 

Cache Burst Timings at Various Bus Speeds

Bus Speed [MHz]  33 50 60 66 75 83 100 125
Async SRAM 2-1-1-1 3-2-2-2 3-2-2-2 3-2-2-2 3-2-2-2 3-2-2-2 3-2-2-2 3-2-2-2
Sync Burst SRAM 2-1-1-1 2-1-1-1 2-1-1-1 2-1-1-1 3-2-2-2 3-2-2-2 3-2-2-2 3-2-2-2
Pipelined Burst SRAM 3-1-1-1 3-1-1-1 3-1-1-1 3-1-1-1 3-1-1-1 3-1-1-1 3-1-1-1 3-1-1-1

Write-Through Cache and Write-Back Cache and Data Consistency 

Data in memory may be shared by a variety of processes at the same time. This saves memory and increases the system speed. This is quite common on today's multi-tasking PC's that implement implicit parallel processing in the superscalar design of their CPU's and explicit parallel processing by using multiple CPU's. For this reason cache is extremely important and in extreme cases computers have been designed like the KSR have a cache-only memory system, in which the global memory is just a collection of all the caches used for each processor.

For data that is read only shared memory is fairly simple to implement. However, if data that is shared by several processes is written to and altered it can cause consistency problems. Care must be taken that altered data is written back to main memory. In systems using write through cache whenever a write occurs it is simultaneously written to the cache and the main memory. Alternatively a write-back system writes data to the cache and then marks it with a dirty bit. It is then written back to memory from the cache only when space is needed in the cache for further data. 

Tag Chips and caching over 64MB's of RAM 

Cache has to be mapped to cover all of your RAM. That's were the tag chip comes in. The tag chip determines how the cache maps to the memory. If you have more than 64MB's and you want to cache beyond 64MB's you must have a tag chip that supports this. Most tag chips only allow for caching of the first 64MB's. Newer versions of cache have two tag chips and can be jumped to cache either up to 64MB's or RAM or 128MB's of RAM. However, older chip sets like the Intel Triton-I (430FX) can not support caching above 64MB's. This is because Intel does not believe PC users need more than 64MB's of RAM. We encourage you to send you comments about this to Intel (Hint). Some newer chip sets like the Intel Triton-II (430HX) support caching more than 64MB's of RAM as long as you have two tag chips. 

COAST's and CELP's 

New versions of SRAM cache are packaged in COAST's (Cache On A stick) and fit in a CELP (Card Edge Low Profile) socket. It is very important to get the proper Cache module version for your mother board or you'll get page faults and random crashes. COAST 1.2's only work on older Triton I boards. COAST 1.2's have low shoulders near the pins measuring 0.295 inches. COAST 2.1's or later work on all current motherboards and support global writes. These COAST's have high shoulders near the pins measuring 0.492 inches. COAST's with uneven shoulders should only be used with 430VX boards. Don't confuse COAST version number with the chip version number sometimes printed on the COAST. 

SIMM architectures: FPM, EDO, BEDO, SDRAM, etc. 

Burst Timings 
DRAM Type Fastest Burst Rate
FPM  5-3-3-3
EDO  5-2-2-2 
BEDO  5-1-1-1
SDRAM  5-1-1-1
SRAM Type Fastest Burst Rate
Asynchronous  2-1-1-1
Sync Burst 2-1-1-1
Pipeline Burst 3-1-1-1
CPU Fastest Burst Rate
Pentium or Pentium Pro 2-1-1-1

There are several alternate SIMM architectures available. These include the standard fast page mode (FPM) DRAM, Extended Data Out DRAM (EDO), Burst EDO (BEDO), Synchronous DRAM (SDRAM), GRAM (synchronous graphics RAM), EDRAM (enhanced DRAM), CDRAM (cache DRAM), MDRAM (multibank DRAM) and RAMBus. Relative burst rates for some of the more important RAM types are listed above. Page mode was the earliest type of RAM and has since been replaced by FPM. Only recently has FPM be virtually replaced by EDO. Now SDRAM looks like it will take the place of EDO according to many experts.

As discussed earlier FPM DRAM is simply a bunch of capacitors arranged in rows and columns. A single row is accessed by lowering the RAS (Row Access Strobe) for that row. Then cycling CAS (Column Access Strobe) accesses consecutive columns of data one after the other. A low CAS allows data to be accessed. A high CAS locks the data and moves to the next column of data. In the case of EDO RAM a high CAS does not lock the data, instead a so-called data latch is used which is independent of CAS. So CAS can continue on to the next column before all the data from the current column is read. So reads in EDO can overlap and the CAS cycle is 20-25% faster. This yields faster burst timings for EDO. However, the system speed improvement is only about 1-2% because most data access still goes on in the cache since EDO DRAM is still much slower than SRAM. EDO RAM will probably have a limited life span if Bus Speeds continue to increase beyond 66MHz.

BEDO is another alternate architecture which has not been released yet. EDO RAM will work with older FPM style motherboards as long as they don't attempt to interleave the memory in FPM mode. BEDO is the same as EDO except it has an optimized burst address mechanism and it pipelines the latch system. Burst capability, as described by burst timings, is the capacity of the RAM to auto- increment its address pointer and retrieve four words of data at a time. The first word takes longer to find, but once the absolute address is found the pointer logic incrementally adds three sets of words to the address and downloads the following three words in anticipation that the program will ask for this sequential data next. Pipelining is an architecture analogous to an assembly line. Data can be passed through a multi-staged pipe. Each stage of the pipe performs a different task. In this way several pieces of data can travel down the pipe like cars on an assembly line. In the case of BEDO RAM input or output registers are added to the latch mechanism so that one set of registers can start to read new data while current data can be sent to the gate and forwarded to the CPU. The register loading slows down the latch a little but this is more than offset by the ability to overlap the reads on the data queue. However, like EDO its lifetime is limited by the fact that it can not exceed 66MHz bus speeds. Therefore BEDO may never become main stream.

Synchronous DRAM is synchronized to the system clock and all signals are triggered on a clock edge. SDRAM is faster than EDO and it works at bus speeds of up to 100MHz. However it currently works with the Intel 430VX chip set which is an extremely slow chip set because it shares its memory bus with the video card. The 430VX chipset is designed for low end computers. 

RAM related Links 
RAM Price Changes 

For a decade RAM prices have fallen in cycles. Over the long term RAM prices have continuously dropped as production has increased in the Asian Pacific RIM. The drops have been somewhat cyclical occurring in 1986-7 1991-2, 1996 the spring of 1999 and the first quarter of 2000. There is no reliable way to gauge the short term RAM market. However, understanding some of the factors involved may aid in your purchasing decisions. According to industry analysts RAM prices will go up from the second quarter of 2000 through 2001.

From 1995-1997 supply and demand made for a volatile market. Through 1995 RAM prices remained high due to several factors. Because of the booming 1994 computer market chip market analysts predicted huge demand for RAM in 1995. Hence huge inventories were produced. However, the 1995 computer market was soft. This was in part due to Intel's fast pace of CPU introduction and price drops. In addition the introduction of Win95 was expected to cause greater consumption of RAM as computer uses upgraded to the new OS. However, Microsoft only sold half as many licenses as expected. Furthermore several large computer manufacturers, like Compaq, over-stocked RAM in anticipation of a bull computer market in 1995 and to insulate themselves from RAM shortages and rising prices. Indeed, even Intel stocked huge amounts of RAM as it plotted to take on its own resellers head on by producing its own PC's. The demand for RAM caused by these companies was misinterpreted by RAM manufacturers as a increased need for RAM by the end user (when actually the RAM was just getting stock piled). The RAM manufacturers invested in more fabrication plants and increased production levels causing even more excess inventory. Since many of the large purchasing decisions where made up to a year in advance, there was no immediate feedback in the market and prices continued to rise. Finally prices soured as Sumitomo, Japan's resin plant suffered a fire. This plant formerly supplied two thirds of the resin used in encapsulating RAM chips. The rapid increase in prices further augmented investments in new fabrication plants.

Once the sales pipeline was filled to capacity with RAM it finally became clear that RAM supplies were far in excess of demand and the cost started to slip. In addition several new plants in the far east were producing RAM at prices lower than ever before. These plants focused on producing chips used in 16 and 32MB SIMM's which could be produced at the same price as the chips used in 4 and 8MB SIMM's (although with a higher failure rate). This began a price war between the 4MB SIMM's and the larger SIMM's. Slipping prices spooked Intel and other computer companies to dump their massive stock piles in Spring of 1996 hence causing a sudden crash in RAM prices. Since then several manufacturers in the US like Micron and several overseas chip manufacturers have put their plants on mothballs asserting profit margins were too low to continue production. Kingston finally sold its plants to the Koreans and dropped prices. Further the production of EDO RAM clearly pushed down the value of stocked FPM RAM.

RAM Prices rose dramatically in September 1996, as larger computer manufacturers like Compaq sucked the market dry. Increased demand was in part due to the release of Windows NT 4.0 and in part due to the rise of the book to bill ratio and seasonal stock piling in preparation for the holiday season.

RAM prices then dropped gradually with some volatility along the way until late January 1997. At that time prices rose slightly due to the increase demand for PC's after Intel's price drops. From 1997 until the Spring of 1999 RAM prices decreased due to oversupply.

Memory prices hit an all time low of $4 per 64Mbit IC back in May of 1999. The price went up to $7 in June and $14 by the end of August. In September prices ranged from $15 up to $17 and hit a peak of around to $20 per IC in October 1999.

There were three catastophic events in Taiwan that affected pricing in August of 1999. Prices increased due to two earthquakes in Taiwan (6.7 and 6.9). This news followed the loss of f 15 million dollars worth of SDRAM, including 1.5 million 64Mbit ICs due to a substatial Taiwanese power outage which interrupted the curing process. However, Taiwan accounts for a little over one tenth of the total RAM market so the impact of these events are small compared to market forces.

Many factors affected the price of SDRAM. To a large part the higher pricing was due to demand outstripping supply. There had been a great increase in demand in Asia due to the recovery of local Asian economies. Globally there had been increased spending on new hardware in preparation for Y2K. Moreover, there was a seasonal increase in demand for products in September that normally lasts through the winter months. Shortages have affected not only the SDRAM market but also many other components.

The supply of 64Mbit ICs used in 64MB, 128MB and 256MB SDRAM modules had been decreased. 64Mbit ICs had been selling at a loss for a long time. Therefore many manufacturers had been switching production to 128Mbit technology and had been ramping down production of 64Mbit ICs.  For example, in August, one of the major producers of 64Mbit SRAM, Samsung (Korea), announced they were closing most of their 64Mbit fabs. TI jumped out of the RAM market and sold their fabs to Micron. This lead to the normal price increases associated with supply and demand.

From November 1999 until February 2000 price dropped slowly. According to industry analysts priced will now increase until 2001.

The experts are bearish on RAM prices in the face of a falling Yen.

It is expected that RAM prices may be back down by this summer due to the low value of the Yen (hence the increased buying power of the US market). Over the past two decades the Yen has increased in value until its all time high in April, 1995. But since then the Yen has been loosing value topping 120 Yen to a dollar, down 48% since its post-war high in April, 1995. Moreover the Nikkei has been loosing value. Thus analyst expect prices to drop for RAM imports by this summer. Generally value changes in the Yen don't affect consumer prices for about three months.

The experts wrong about the Yen/RAM equation?

Interestingly , if you look at the value of the yen over the past ten years you'll notice that the biggest RAM prices drops occurring in 1986-7 1991-2, and in 1996 correspond to periods of rapid increase in the value of the Yen. This flies in the face of conventional wisdom and must make us question whether a cheaper Yen really means cheaper RAM. Indeed looking at these data one might posit just the opposite and maybe the experts really aren't doing their homework as well as they should.


Copyright (c) 1996 Net Express All Rights Reserved.