Emulating the Aging of NAND Flash Memories as a Time-Variant Communications Channel

Antonios Prodromakis, George Sklias and Theodore Antonakopoulos
University of Patras
Department of Electrical and Computer Engineering
Patras 26504, Greece
e-mails: aprodemakis@upatras.gr, ece6847@upnet.gr and antonako@upatras.gr

Abstract—The behavior of NAND Flash, the most successful non-volatile memory technology today, deteriorates as the number of write accesses increases. This process, known as aging, is not only irreversible but also critical for the design of systems that use NAND Flash (i.e. Solid-State Drives), since it affects the system’s I/O performance and the required overhead for achieving a specific level of reliability. Experimental characterization of NAND Flash-based systems during their whole lifetime is a time-consuming and non-repetitive process, since further programming cycles increase aging, and the systems’ behavior changes. In this work, we present the architecture and experimental results of a system that can be used to emulate in real-time and with high precision the behavior of NAND Flash memories under user-defined aging conditions. The system can be adjusted to the specific characteristics of any NAND technology and supports multi-level cells. The main advantages of this approach are the following: the emulated technology can be used under the same aging conditions for repetitive experiments and the same system can be used to compare different memory technologies at the system level and under different aging conditions using the same hardware setup.

I. INTRODUCTION

NAND Flash based Solid-State Drives (SSDs) have emerged in recent years as the most attractive technology for commercial and enterprise data storage systems [1], [2]. During the past decade, the capacity of NAND Flash memories has been increased rapidly as the result of continuous scaling and due to advances in multi-level cell (MLC) technology. The rapid growth of NAND Flash storage capacity has affected their performance, data retention and reliability, and paved the way for new research challenges in this field. Nevertheless, the increase of storage density resulted to the deterioration of NAND Flash performance characteristics, such as endurance and raw BER. Although algorithms and techniques have been used to deal with the effects of aging on NAND Flash memories, the use of such algorithms increases the system’s complexity and the required development time. Until now, experiments have been performed using real chips, requiring tens of thousands of program/erase (P/E) cycles. Furthermore, as the process of aging of a storage system is irreversible, successive tests of these algorithms cannot be performed at the same aging state.

The purpose of this work is to design a realistic, real-time emulation system which will demonstrate accurately the response time and the raw bit error characteristics of a MLC NAND Flash IC. The proposed design must be capable of emulating any user-specified stage of the aging process, thus eliminating the need of cycling on a real chip. Furthermore, it must also provide the capability to conduct different experiments at the same aging state of the memory, which is impossible when real chips are used. Emulations can be performed at a cell, chip (NAND Flash) or system (SSD) basis, supporting a variety of I/O interfaces (i.e. ONFI [3], toggle and high-speed serial interfaces). The experimental results of this work show that the threshold voltage distributions of a 2-bit/cell NAND Flash can be emulated with an accuracy of few mV and an output rate of 50 MBps.

Section II describes the model used for 2-bits/cell NAND Flash voltage distributions as a function of the aging process. In Section III we discuss the architecture of the proposed emulation platform. The performance analysis and the experimental results of this work are presented in Section IV.

II. MODELING THE NAND FLASH VOLTAGE DISTRIBUTIONS

A. Multilevel Cell Technology

The necessity of producing NAND Flash memories with high storage density, and consequently reduced cost per bit, has led vendors to the mass-development of MLC NAND flash. MLC technology is achieved by accurately programming each cell of the memory into multiple voltage levels. More specifically, in a $n$-bits/cell NAND Flash, each cell can be programmed into $2^n$ different voltage levels. Each level corresponds to a symbol represented as a $n$-bits binary vector. The mapping of these symbols can be specified using different schemes, like Gray and direct mapping.

As the number of states increases, the margin separating adjacent states becomes smaller, and the MLC NAND Flash becomes more prone to bit errors than its single-level cell counterpart. It has been shown in various works that the probability density function of the stored symbols can be modeled as a set of Gaussian distributions [4], [5]. The likelihood function of a given input symbol $i$ is given by the Gaussian probability density function

$$P(v|s = i) = \frac{1}{\sqrt{2\sigma^2\pi}} e^{-(v-\mu)^2/2\sigma^2}$$ (1)
where \( s \) represents the input symbol and \( v \) is its voltage, while \( \mu \) and \( \sigma \) are the mean and standard deviation of the Gaussian distribution, respectively.

### B. Aging Effect on NAND Flash Reliability

Continuous storage of new data to a SSD has a great impact on its performance (read and write times) [6], data retention and reliability (raw bit error ratio) [7]. In this work, we focus on the reliability issues introduced by the aging process of the NAND Flash. Regardless of the technology used (SLC - MLC), cycling a NAND Flash results in increased voltage distributions on its cells. In fact, cycling alters the parameters \( \mu \) and \( \sigma \) of the Gaussian probability density function. More specifically, mean voltages are usually shifted to higher values and voltage distributions become wider, therefore resulting to increased probability of detection of erroneous symbols [4]. As a result, hard data detectors, such as threshold comparators, may misinterpret the actual programmed symbol with another one, therefore leading to a higher raw bit error ratio (BER).

In this paper, the results of [2] are used as a benchmark. Table I presents the mean voltages and the relationship between the standard deviation of each symbol of a 2-bits/cell NAND Flash memory. Fig. 1 illustrates the normalized voltage distributions at the beginning of the lifetime of the cell and after performing a significant number of P/E cycles, respectively.

#### III. NAND Flash Emulator Architecture

As mentioned in the previous Section, each cell of a NAND Flash can be modeled as an Additive White Gaussian Noise (AWGN) channel which is affected by data dependent noise. The implementation of a AWGN source can be based on the Box-Muller method [12]. According to this method, if \( a \) and \( \varphi \) are independent random variables from the same regular density function on the intervals \((0, 1)\) and \((0, 2\pi)\) respectively, and

\[
X_1 = \sqrt{-2 \ln a} \cdot \cos \varphi
\]
\[
X_2 = \sqrt{-2 \ln a} \cdot \sin \varphi
\]

then \((X_1, X_2)\) is a pair of independent random variables from the same normal distribution with unit variance, and zero mean \((X_1, X_2 \sim \mathcal{N}(0, 1))\). Finally, if we consider \( U_s \) as the initial mean voltage of input symbol \( s \), and \( V_s \) as the output voltage that is actually read by a cell, then

\[
V_s = (X_1 \cdot \sigma_s) + (\mu_s + U_s),
\]

![Fig. 1. Voltage distributions on a 2-bits/cell NAND flash.](image_url)
where parameters \( \mu_s \) and \( \sigma_s \) depend on the aging state of the cell and the input symbol \( s \), while \( V_s \sim N(\mu_s + U_s, \sigma_s^2) \).

### A. Cell Architecture

The architecture of the proposed emulator is based on the implementation of multiple cells that follow the architecture presented in this section. The inputs of each cell are the parameters \( \mu_s, \sigma_s \) and the symbol \( s \), while the output is the emulated read voltage \( V_s \) from equation (3). As Fig. 2 illustrates, the cell architecture is based on two parallel processes, the former implementing the Box-Muller method of eq. (2), and the latter calculates the output \( V_s \) of eq. (3). Since the aim of the designed platform is to emulate NAND Flash bit error characteristics with optimal accuracy and speed, it is of significant importance to efficiently define which arithmetic operations shall be performed using floating point or fixed point representations [13], according to the architecture of Fig. 3.

The **Box-Muller Module** uses two type-2 Linear Feedback Shift Registers (LFSR) with parallel outputs to produce the random variables \( a, \varphi \) using 1Q31 fixed point arithmetic. Adders, multipliers and arithmetic operations such as square roots, logarithms and data type conversions are performed using floating point cores with single precision. Trigonometric functions are performed using the cordic technique in fixed point representation as their dynamic range is bounded on the interval \([-1, 1]\). All other operations are performed by bit shifting and sign inversions.

The **Cell Logic Module** takes as input the output \( X_1 \) of the Box-Muller Module, the parameters \( \mu_s, \sigma_s \), which are defined by the aging state of the cell, and the input symbol \( s \). The 2-bits symbols are translated to their equivalent voltage \( U_s \) using the direct mapping analyzed in Table I. The additions and multiplications within this module are performed using single-precision floating point representation.

### IV. Experimental Results

The **Box-Muller Module** is implemented as a free-running process, producing a new registered \( X_1 \) value per clock cycle. Consequently, it has no impact on the total latency of the system, which is affected only by the operations performed within the cell logic. The **Cell Logic Module** provides an output/clock cycle and its total latency is 24 clock cycles. The design has been implemented on a Xilinx ZC702 evaluation board (Zynq). The clock frequency of the developed design is 200 MHz and the throughput is 400 Msymbols/second (50 MBps). Table II shows the used and available resources, as well as the utilization factor of the device.

<table>
<thead>
<tr>
<th>Resource</th>
<th>Used</th>
<th>Available</th>
<th>Utilization</th>
</tr>
</thead>
<tbody>
<tr>
<td>Slice Registers</td>
<td>7.412</td>
<td>4.674</td>
<td>6%</td>
</tr>
<tr>
<td>Slice LUTs</td>
<td>6.474</td>
<td>12%</td>
<td></td>
</tr>
<tr>
<td>Occupied Slices</td>
<td>2.121</td>
<td>15%</td>
<td></td>
</tr>
<tr>
<td>RAMB36E1</td>
<td>0</td>
<td>0%</td>
<td></td>
</tr>
<tr>
<td>BUFGE</td>
<td>2</td>
<td>6%</td>
<td></td>
</tr>
<tr>
<td>DSP48E1</td>
<td>12</td>
<td>5%</td>
<td></td>
</tr>
</tbody>
</table>

For the evaluation of the cell architecture, the following steps were performed. Initially, a Matlab model has been developed for generation of the theoretical threshold voltage distributions, which are used as reference for our analysis. As the next step, the NAND Flash emulator has been embedded as a hardware peripheral to the Zynq device. The communication between Zynq and the host PC has been implemented over TCP/IP, so that comparisons between experimental and theoretical data can be performed in the Matlab environment.

1) **Voltage Distributions:** In Fig. 4 the theoretical and the experimental voltage distributions of two different aging states are illustrated. In the first aging state, denoted as initial state, AWGN has zero mean and 0.05V standard deviation. In the second aging state, denoted as aged state, the standard deviation has been increased to 0.2V, while the AWGN mean voltages are \( \mu_{s11} = 0.2V, \mu_{s10} = 0.1V, \mu_{s00} = 0.1V \) and \( \mu_{s01} = 0.2V \) for the respective symbols. The experiment has been conducted for four random sets of 1 Msymbols/set.
where $\bar{Y}_i$ denotes the theoretical voltages, $Y_i$ the experimental ones and $n$ the number of samples. Fig. 5 presents the mean squared error and the squared difference of each sample at the initial state of the cell.

3) BER measurements: Fig. 6 demonstrates the deviation of BER as a function of the aging process, when blocks of 256 ksymols are used. The threshold detectors have been kept at the optimum threshold values. The described procedure has been repeated for 1K test cycles, ranging the standard deviation ($\sigma$) of the noise from 0V to 0.16V, with steps of 1mV, while the mean voltage ($\mu$) has been kept to zero.

V. Conclusions

In this work, we presented the basic components of a platform that can emulate the raw bit error characteristics of a 2-bits/cell NAND Flash memory as aging progresses. The experimental results show that the developed platform can emulate the variation of the threshold voltage distributions with high accuracy. The performance studies indicate that the NAND Flash emulator has a great potential in reducing the duration of experiments related with SSD performance.

REFERENCES