# This document is an author-formatted work. The definitive version for citation appears as:

R. Zand, A. Roohi, S. Salehi and R. F. DeMara, "Scalable Adaptive Spintronic Reconfigurable Logic Using Area-Matched MTJ Design," in *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 63, no. 7, pp. 678-682, July 2016. doi: 10.1109/TCSII.2016.2532099

http://ieeexplore.ieee.org/abstract/document/7412716/

# Scalable Adaptive Spintronic Reconfigurable Logic using Area-Matched MTJ Design

Ramtin Zand, Arman Roohi, Soheil Salehi, and Ronald F. DeMara Department of Engineering and Computer Science University of Central Florida Orlando, FL, USA 32816-2362

Abstract— Spin-Transfer Torque Random Access Memory (STT-RAM) have been researched as a promising alternative for SRAM in reconfigurable fabrics, especially in Look-Up Tables (LUTs), due to its non-volatility, low standby and static power, and high integration density features. In this paper, we leverage physical characteristics of Magnetic Tunnel Junctions (MTJs) to design a unique reference MTJ which has a calibrated resistance matching the STT-based LUT (STT-LUT) circuit requirements to provide optimal reading operation. Results obtained show 42%, and 70% Power-Delay Product (PDP) improvement over previous MTJ-based LUT designs. Moreover, a 4-input Adaptive STT-based LUT (A-LUT) is proposed based on the developed STT-LUT, which is configurable to function in seven independent modes. An *n*-input A-LUT exhibits PDP which can be a fraction of *n*-input STT-LUT PDP, when performing two to (*n*-1)-input Boolean logic functions.

Index Terms—Reconfiurable fabric, spintronics, low-power computation, magnetic tunnel junction (MTJ), spin-transfer torque (STT).

### I. INTRODUCTION

MOS scaling challenges are controlling leakage currents, short-channel effects, and drain saturation growth while reducing the power supply voltage for digital applications. Furthermore, intrinsic leakage current, dynamic power consumption and process variation are the main factors limiting the MOS scaling in near future. Therefore, spin-based devices and technologies have attracted considerable attention in recent years as an alternative for CMOS based technologies. Spintronic devices are characterized by non-volatility, near-zero standby power, high integration density, and radiation-hardness, as a technology progression from CMOS [1], [2]. While spintronic-based neuromorphic architectures offer analog computation strategies [3], in this work we look to ultra-low power methods beyond them which embrace reconfigurability enabled by MTJs.

Currently, Static Random Access Memory (SRAM) is the primary component in most of the reconfigurable fabrics. However, SRAM's drawbacks such as static power consumption, volatility and low logic density have led to extensive studies on emerging devices as an alternative memory cell. Some of the non-volatile memory technologies which could be integrated in reconfigurable fabrics, are STT-RAM [4], Phase Change Memory [5] and domain wall based racetrack memory [6].

Non-volatility, near-zero static and standby power, and instant on/off capability are some of the advantages of STT-RAM which make it one of the most promising alternative memory cells in reconfigurable fabrics. Spin-Transfer Torque (STT) approach is utilized in STT-RAM to provide a high speed and low power switching method, comparing to the previously proposed Field-Induced Magnetic Switching (FIMS) [7] and Thermally Assisted Switching (TAS) [8] approaches. In this paper, STT-RAM is used to implement the primary element of reconfigurable fabrics, Look-Up Table (LUT). Herein, we develop two structures for LUTs. In the first STT-RAM based LUT (STT-LUT) structure, a reference MTJ is designed in specific dimensions to match with circuit requirements for providing optimal sensing performance. In the second LUT structure, an Adaptive STT-LUT (A-LUT) is proposed which could be configured to implement a variety of Boolean functions.

The remainder of this paper is organized as follows. In Section II, review of MTJ devices and spinbased reconfigurable fabrics are provided. Section III provides the STT-LUT circuit and performance analysis, in addition to A-LUT circuit designs. A-LUT simulation and analysis are summarized in Section IV. Finally, Section V concludes the paper by highlighting the advantages and features of the proposed architecture.

#### II. BACKGROUND

# A. Fundamentals of Magnetic Tunnel Junction

MTJ consists of two ferromagnetic (FM) layers which are called *fixed* layer and *free* layer, and one *oxide barrier* layer as shown in Fig. 1. FM layers could be aligned in two different configurations, *parallel (P)* and *antiparallel (AP)*, according to which MTJ shows low resistance ( $R_P$ ) or high resistance ( $R_{AP}$ ) characteristic, respectively [9].

Conventional approaches for switching MTJ states,  $R_{AP}$  and  $R_P$ , were based on applying a magnetic field, producing of which required a current in order of *mA* that results in significant power consumption and hardware area overhead [10]. STT approach was proposed in 1996 [11], as a promising alternative for MTJ switching method. According to STT switching principles, the *P* or *AP* state of the MTJ is configured by means of the bidirectional current that passes through it,  $I_{MTJ}$ , which could be readily produced by simple MOS based circuits. The states of the MTJ are switched when the  $I_{MTJ}$  becomes higher than a critical current,  $I_C$ , as shown in Fig. 1(a).

The MTJ resistance in parallel and anti-parallel states is presented in the equations below [12]:

$$R(\theta) = 2R \times \frac{1 + TMR}{2 + TMR + TMR.\cos\theta}$$
  
= 
$$\begin{cases} R_p = R &, \ \theta = 0^{\circ} \\ R_{ap} = R(1 + TMR), \ \theta = 180^{\circ} \end{cases}$$
 (1)

$$TMR = TMR(0)/1 + (\frac{V_b}{V_h})^2$$
(3)

$$R_{MTJ} = \frac{t_{ox}}{Factor \times Area.\sqrt{\varphi}} \exp(1.025 \times t_{ox}.\sqrt{\varphi})$$
(2)

where *TMR* is Tunnel Magnetoresistance,  $t_{ox}$  is the oxide thickness of MTJ, *Factor* is the tunneling conductivity, *Area* is the surface of MTJ,  $\varphi$  is the oxide layer energy barrier height,  $V_b$  is the bias voltage, and  $V_h = 0.5$ V is the bias voltage when TMR is half of the TMR(0). Fig. 1(b) shows the relationship between the MTJ resistance and  $I_{MTJ}$ . MTJ state changes from AP to P if  $I_{MTJ}$  is greater than positive critical current,  $I_{C+}$ , and it returns to AP state if  $I_{MTJ}$  is smaller than  $I_{C-}$ , which is the critical negative current.

MTJ oxide barrier has a reliability over ten years which is validated by Time-Dependent Dielectric Breakdown (TDDB) experimental measurements [13]. Further accelerated TDDB measurements provided in [14] verifies an endurance greater than 10<sup>16</sup> write cycles for MTJ devices. Nonetheless, if a percentage



Fig. 1. (a) MTJ state changes from P to AP due to the positive current  $I_{MTJ} > I_{c+}$  condition, rather than a negative current  $I_{MTJ} > |I_{c-}|$  condition, where  $|I_{c-}| > I_{c+}$  and (b) MTJ resistance hysteresis curve.

|   | Designs                          | Device | Write/Read<br>Operation | Features and Challenges                                                 |
|---|----------------------------------|--------|-------------------------|-------------------------------------------------------------------------|
| ] | FIMS-LUT<br>[7]                  | MTJ    | Magnetic<br>Field/TMR   | High Speed<br>High Power Consumption<br>High Area Overhead              |
|   | TAS-LUT<br>[8]                   | MTJ    | Magnetic<br>Field/TMR   | Relatively High Speed<br>High Power Consumption<br>Medium Area Overhead |
|   | DW-LUT<br>[6]                    | DW     | DWM/TMR                 | High Speed<br>Medium Power Consumption<br>Low Area Overhead             |
|   | STT-LUT<br>(developed<br>herein) | MTJ    | STT/TMR                 | High Speed<br>Low Power Consumption<br>Low Area Overhead                |
|   | A-LUT<br>(developed<br>herein)   | MTJ    | STT/TMR                 | High Speed<br>Scalable Power Consumption<br>Low Area Overhead           |

Table I: Characteristics of LUT designs in related works.

of MTJs become dysfunctional due to structural stress-related failures, then reconfiguration capability of the reconfigurable fabrics can be invoked [15].

# B. Spin-based Reconfigurable Fabrics

Although reconfigurable fabrics have shown great advantages over general purpose processors, they still occupy a niche market share of Application Specific Integrated Circuits (ASICs) due to *high standby energy*, *low logic density*, and *high static power* which are caused by configuration storage based on SRAM. We wish to achieve SRAM's fast data access that provides reconfigurability and fast computing speed, while using emerging devices to overcome the mentioned crucial drawbacks.

STT-RAM could be a promising alternative for SRAM in reconfigurable fabrics. It is a nonvolatile and scalable memory cell with near-zero leakage power which shows advantages in read speed and energy. Despite potential advantages, the limitations of an STT switching approach revolve around its relatively low speed and high power operation. To mitigate these drawbacks, some techniques are proposed such as enlarging the write circuit transistors which is investigated in [16]. Moreover, STT can experience occasional read/write disturbances due to a common read/write path [4] which may require mitigation.

Look-Up Table (LUT) is the basic element for reconfigurable computing circuits which contains a  $1 \times 2^m$  bit memory to implement a Boolean logic function with *m* inputs. Due to the aforementioned drawbacks of SRAM, spin-based LUTs have attracted researches' attentions in recent years. In [7] and [8], MTJ-based LUTs are proposed in which FIMS and TAS approaches are employed, respectively, to change MTJ logic states. Furthermore, a DW-based LUT is proposed in [6] that leverages the DW Motion (DWM) to determine the logic function implemented by LUT which consumes less reconfiguration power in comparison with the mentioned FIMS-LUT and TAS-LUT. However, current-mode behavior of DW-LUT results in a higher power consumption comparing to STT-LUT. A qualitative comparison between mentioned spin-based LUTs is provided in Table I which exhibits the superiority of STT-LUT and A-LUT in terms of reconfiguration speed, power consumption, and area overhead.

The feasibility of the MTJ-based LUT in the light of MTJ stability and transport experimental data has been demonstrated by Suzuki et al. in [17] wherein a nonvolatile FPGA is fabricated using 6-input MTJ-based LUTs in 90nm CMOS and 75nm perpendicular MTJ technologies with 5 metal layers. They have reported 56% and 81% reduction in effective area and total average power, respectively, compared to the SRAM-FPGA.



Fig. 2: 4-input STT-LUT functional diagram.

III. PROPOSED ADAPTIVE STT-LUT

#### A. Design and Analysis of Non-Adaptive STT-LUT

In this section, a 4-input STT-LUT [84] is introduced which consists of read and write circuits as shown in Fig. 2. The write circuit includes two transmission gates (TGs) which provide the desired charge current for STT switching, while the read circuit is comprised of a pre-charge Sense Amplifier (SA) [18], a TG-based Multiplexer (MUX), and a reference tree. Each MTJ cell of LUT could be accessed according to the input signals, A, B, C, and D, through MUX which employs TGs instead of Pass Transistors (PTs). TGs have near optimal full-swing switching behavior which results in less delay. In addition, TG-based circuits are more resilient to process variation comparing to PT-based designs [19].

The reference tree in read circuit is designed to provide SA with required reference resistance to properly sense each MTJ cell state. Reference tree consists of four TGs in series configuration to compensate for the select tree active resistance. Reference MTJ resistance is designed in a manner such that its value in parallel configuration is between low resistance,  $R_P$ , and high resistance,  $R_{AP}$ , of the LUT MTJ cells as shown in following equation,  $R_{P-reference MTJ} \cong \frac{1}{2}(R_{AP-LUT MTJ} + R_{P-LUT MTJ})$ .

According to Equation 3, resistance of MTJ can be altered by changing oxide barrier thickness,  $t_{ox}$ , or *Area*. Oxide thickness could only be changed between 0.7nm and 2.5nm to keep the low resistance value and also show the TMR effect. Additionally, as established in [20], fabricating MTJs with various oxide thicknesses requires different magnetic process which leads to a significant increase in fabrication cost. Thus, in this work the other effective factor, area, is examined to determine the desired value of reference MTJ resistance. The dimension of LUT and reference MTJ cells are shown in Fig. 3, according to which  $R_{P-treference MTJ}$ ,  $R_{AP-LUT MTJ}$  and  $R_{P-LUT MTJ}$  are equal to  $1.8k\Omega$ ,  $2.5k\Omega$ , and  $1.25k\Omega$ , respectively.

The proposed design is simulated for LUTs with different number of inputs using SPICE simulator in 90nm library. Delay and power consumption results are summarized in Table II. As it can be seen from the table, power and delay of STT-LUT is larger when the MTJ state is zero, due to the inequality shown in Equation (4), which results in longer time required for SA to be completely discharged.

$$R_{AP-LUT MTJ} - R_{P-reference MTJ} > R_{P-reference MTJ} - R_{P-LUT MTJ}$$
(4)



Fig. 3: Reference MTJ cell and LUT MTJ cell dimensions.

| number of  | LUT MTJ state                | e ="0"        | LUT MTJ state ="1"           |            |  |
|------------|------------------------------|---------------|------------------------------|------------|--|
| LUT inputs | power<br>consumption<br>(µw) | delay<br>(ps) | power<br>consumption<br>(µw) | delay (ps) |  |
| 2          | 3.39                         | 62            | 3.35                         | 52         |  |
| 3          | 3.87                         | 71            | 3.79                         | 60         |  |
| 4          | 4.27                         | 83            | 4.26                         | 69         |  |
| 5          | 4.70                         | 96            | 4.66                         | 76         |  |
| 6          | 5.14                         | 108           | 5.12                         | 86         |  |

Table II: STT-LUT power and delay analysis for various input widths.

In [21] the first prototype of a two input MTJ-based LUT is simulated. It contains four MTJs to store data, and a separate SA and write circuit for each MTJ which lead to significant area overhead and power consumption. In [16] Suzuki et al. has proposed an optimized STT-MTJ based LUT. They reported a 44% reduction in active power, for a 4-input XOR operation, comparing to the LUT designed in [21]. They employed a single SA for the whole LUT circuit instead of using one for each memory cell which results in area and active power reduction.

In this paper, the developed STT-LUT circuit is implemented utilizing both PTs and TGs. The performances of our STT-LUT implementations are compared with SRAM-LUT [22] and two above mentioned MTJ-based LUTs and summarized in Table III and Table IV, respectively. The STT-LUT provides high speed and ultra-low power circuits with improved Power-Delay Product (PDP) values shown in seventh row of the table. Furthermore, TG-based STT-LUT exhibits least PDP value while it leverages larger number of MOS transistors comparing to PT-based STT-LUT which is the optimum choice from the area efficiency point of view.

|                  | Designs | SRAM LUT [22] | PT based STT-LUT | TG based STT-LUT |
|------------------|---------|---------------|------------------|------------------|
| Features         |         |               |                  |                  |
| Area $(\mu m)^2$ |         | 14.3×16.55    | 7.2×8.35         | 13.5×15.75       |
| Delay (ps)       |         | 85.86         | 94               | 83               |
| Power            | Dynamic | 1.217         | 4.3              | 4.27             |
| Consumption      | Leakage | 1.030         | 0                | 0                |
| (µW)             | Total   | 2.247         | 4.3              | 4.27             |

Table III: Performance comparison between STT-LUT and SRAMLUT for 4-input NAND operation.

Table IV: Performance comparison for 4-input STT-LUT.

|             |              |          | 1         | 1        |          |
|-------------|--------------|----------|-----------|----------|----------|
|             | Designs      | Zhao et  | Suzuki et | PT based | TG based |
| Features    |              | al. [21] | al. [16]  | STT-LUT  | STT-LUT  |
| NO. of N    | <b>/</b> TJs | 32       | 36        | 17       | 17       |
| NO. of M    | IOSs         | 154      | 74        | 59       | 112      |
| Delay (     | ps)          | 88       | 81        | 94       | 83       |
| Active Powe | er (µW)      | 13.4     | 7.58      | 4.3      | 4.27     |
| PDP (ps×    | μW)          | 1179.2   | 613.98    | 404.2    | 354.41   |
| Standby F   | ower         | 0        | 0         | 0        | 0        |
| PDP         | [21]         |          | 48%       | 65.7%    | 70%      |
| Improvement | [16]         | _        |           | 34%      | 42%      |



Fig. 4: PDP growth of STT-LUT in terms of input widths.



Fig. 5: Circuit view of A-LUT schematic.

#### B. Proposed Adaptive STT-LUT (A-LUT)

In order to evaluate the scalability of the STT-LUT circuit, PDP values are calculated for 2-input to 6-input STT-LUTs, considering the worst case scenario, i.e. MTJ state is zero. Figure 4 exhibits that PDP and number of LUT inputs are linearly proportional with a low slope which validates the STT-LUT scalability. This capability led to the proposition of a 4-input A-LUT, as shown in Fig. 5.

Proposed 4-input A-LUT could be configured to operate as different LUTs in seven independent modes: four 2-input STT-LUTs, two 3-input STT-LUTs, and one 4-input STT-LUT. Output of each configuration is individually connected to SA through A mode selector which includes PTs to choose between different operational modes, described in Table V. For example, bitstream = 10' h104 configures A-LUT to operate as a 2-input STT-LUT based on the logic function stored in MTJ4 to MTJ7.

Optimal reference tree resistance for an *n*-input STT-LUT,  $R_{Reference Tree}$ , is approximately equal to average of maximum and minimum resistances of LUT MUX,  $R_{MUX,max}$  and  $R_{MUX,min}$ , as shown in Equation (5).  $R_{MUX,max}$  and  $R_{MUX,min}$  are equal to active resistance of *n* TGs in series,  $n.R_{TG}$ , adding to the LUT MTJ high resistance,  $R_{AP-LUT MTJ}$ , and low resistance,  $R_{P-LUT MTJ}$ , respectively.

$$R_{MUX,max} = n. R_{TG} + R_{AP-LUT MTJ}$$

$$R_{MUX,min} = n. R_{TG} + R_{P-LUT MTJ}$$

$$R_{Reference Tree} \cong \frac{1}{2} \left( R_{MUX,max} + R_{MUX,min} \right)$$
(5)

|        |     |     | U   | 1   |     |     |    | U   |     | 1   | 0 1       | U            |                 |
|--------|-----|-----|-----|-----|-----|-----|----|-----|-----|-----|-----------|--------------|-----------------|
|        | S21 | S22 | S23 | S24 | S31 | S32 | S4 | RS2 | RS3 | RS4 | bitstream | MTJ<br>usage | Description     |
| mode 0 | 1   | 0   | 0   | 0   | 0   | 0   | 0  | 1   | 0   | 0   | 10'h204   | 0-3          | 2-input STT-LUT |
| mode 1 | 0   | 1   | 0   | 0   | 0   | 0   | 0  | 1   | 0   | 0   | 10'h104   | 4-7          | 2-input STT-LUT |
| mode 2 | 0   | 0   | 1   | 0   | 0   | 0   | 0  | 1   | 0   | 0   | 10'h84    | 8-11         | 2-input STT-LUT |
| mode 3 | 0   | 0   | 0   | 1   | 0   | 0   | 0  | 1   | 0   | 0   | 10'h44    | 12-15        | 2-input STT-LUT |
| mode 4 | 0   | 0   | 0   | 0   | 1   | 0   | 0  | 0   | 1   | 0   | 10'h22    | 0-7          | 3-input STT-LUT |
| mode 5 | 0   | 0   | 0   | 0   | 0   | 1   | 0  | 0   | 1   | 0   | 10'h12    | 8-15         | 3-input STT-LUT |
| mode 6 | 0   | 0   | 0   | 0   | 0   | 0   | 1  | 0   | 0   | 1   | 10'h9     | 0-15         | 4-input STT-LUT |
|        |     |     |     |     |     |     |    |     |     |     |           |              |                 |

Table V: Configuration specifications and MTJ usage for 2-input through 4-input LUT organization.

Re-writing (5) according to  $R_{P-reference MTJ}$  describes that reference tree of an *n*-input STT-LUT could be implemented by *n* TGs and a reference MTJ in series configuration, which provides a resistance equal to  $R_{Reference Tree} = n.R_{TG} + R_{P-reference MTJ}$ . Thus, different number of LUT inputs, only affects the number of TGs which must be utilized in reference tree, and modification to the dimensions of the reference tree MTJ is unnecessary, in order to keep the optimized sensing behavior of SA. Hence, Equation (5) is employed to design the A-LUT reference tree which includes three different branches in parallel configuration that are serially connected to a single MTJ, as shown in Fig. 5. Each of the branches contains two, three, and four TGs which are used for 2-input, 3-input, and 4-input A-LUT configurations, respectively. Figure 6 shows the layout of the A-LUT which occupies a cell area of 13.5  $\mu$ m × 15.75  $\mu$ m in 90nm process. A five metal layer design is depicted. The MTJ cell has a vertical structure which could be readily integrated at the backend process of CMOS fabrication.

# IV. RESULTS AND ANALYSIS

The proposed A-LUT is examined using SPICE simulation in 90nm technology. Figure 7 elaborates the functionality of the proposed A-LUT for a 4-input NAND operation when ABCD= "1111" and ABCD=



Fig. 6: 13.5  $\mu$ m×15.75  $\mu$ m 4-input A-LUT layout.



Fig. 7: Transient response of A-LUT for 4-input NAND operation for *ABCD*= "1111" (top), and *ABCD*= "0000" (middle).

|                         |                     | <b>•</b> • •      |
|-------------------------|---------------------|-------------------|
| Boolean function inputs | 8-input STT-<br>LUT | 8-input A-<br>LUT |
| 2                       | 819.72              | 269.8             |
| 3                       | 819.72              | 353.58            |
| 4                       | 819.72              | 449.35            |
| 5                       | 819.72              | 549.98            |
| 6                       | 819.72              | 673.97            |
| 7                       | 819.72              | 798.64            |
| 8                       | 819.72              | 926.1             |
|                         |                     |                   |

Table VI: PDP values for STT-LUT and A-LUT designs (ps×µW).

"0000" inputs are applied, respectively. The former set of inputs selects MTJ15 which has a parallel configuration that denotes logic "0", while the latter input selects MTJ0 with anti-parallel configuration representing logic "1". Herein, mode selector's bitstream is equal to 10'h9, which selects sixth mode, i.e. A-LUT functioning as 4-input STT-LUT.

Herein, a comprehensive PDP analysis is performed to evaluate the performance of A-LUT. Therefore, an 8-input A-LUT and 8-input STT-LUT are examined to implement 2-input to 8-input Boolean logic functions. The PDP results are extracted for a worst case NAND operation utilizing 1.2V nominal voltage (VDD) and 1GHz circuit clock (CLK) frequency, as listed in Table VI. Generally, an *n*-input A-LUT PDP is smaller than n-input STT-LUT PDP, when performing 2-input to (*n*-1)-input Boolean functions.

# V. CONCLUSION

In this paper, we first developed a novel, non-volatile, high speed, and ultra-low power 4-input STT-LUT. A reference MTJ with different dimensions from the LUT MTJs was utilized to provide the reference resistance required for achieving the optimum sensing behavior of SA while maintaining the area and power efficiency. Proposed STT-LUT achieved over 40% PDP improvement as compared to the most performance-efficient designs. Our TG-based STT-LUT exhibited a linear relation between PDP and number of LUT inputs which verifies its scalability. Hence, we proposed a 4-input A-LUT with adaptive functionality which could be configured to function in seven independent modes. PDP results and analysis of the proposed A-LUT showed its performance superiority in addition to its functional flexibility. The proposed adaptive LUT could be generalized to n-input A-LUT with a prominent performance improvement for implementing 2-input to (n-1)-input logic functions.

#### REFERENCES

- D. E. Nikonov and I. A. Young, "Overview of beyond-CMOS devices and a uniform methodology for their benchmarking," *Proceedings of the IEEE*, vol. 101, pp. 2498-2533, 2013.
- [2] J. Kim, A. Paul, P. A. Crowell, S. J. Koester, S. S. Sapatnekar, J.-P. Wang, *et al.*, "Spin-based computing: Device concepts, current status, and a case study on a high-performance microprocessor," *Proceedings of the IEEE*, vol. 103, pp. 106-130, 2015.
- [3] M. Sharad, D. Fan, K. Aitken and K. Roy, "Energy-Efficient Non-Boolean Computing With Spin Neurons and Resistive Memory," in *IEEE Transactions on Nanotechnology*, vol. 13, no. 1, pp. 23-34, Jan. 2014
- [4] A. Vatankhahghadim, W. Song and A. Sheikholeslami, "A Variation-Tolerant MRAM-Backed-SRAM Cell for a Nonvolatile Dynamically Reconfigurable FPGA," in *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 62, no. 6, pp. 573-577, June 2015.
- [5] K. Huang, Y. Ha, R. Zhao, A. Kumar, and Y. Lian, "A Low Active Leakage and High Reliability Phase Change Memory (PCM) Based Non-Volatile FPGA Storage Element," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 61, pp. 2605-2613, 2014.
- [6] W. Zhao, D. Ravelosona, J. Klein, and C. Chappert, "Domain wall shift register-based reconfigurable logic," *Magnetics, IEEE Transactions on*, vol. 47, pp. 2966-2969, 2011.
- [7] W. Zhao, E. Belhaire, V. Javerliac, C. Chappert and B. Dieny, "Evaluation of a Non-Volatile FPGA based on MRAM technology," *IEEE International Conference on IC Design and Technology*, Padova, 2006, pp. 1-4.
- [8] W. Zhao, et al., "TAS-MRAM-based low-power high-speed runtime reconfiguration (RTR) FPGA," ACM Transactions on Reconfigurable Technology and Systems (TRETS), 2009.
- [9] B. Behin-Aein, J.-P. Wang, and R. Wiesendanger, "Computing with spins and magnets," *MRS Bulletin*, vol. 39, pp. 696-702, 2014.
- [10] W. Kang, Y. Zhang, Z. Wang, J.-O. Klein, C. Chappert, D. Ravelosona, et al., "Spintronics: Emerging ultra-low-power circuits and systems beyond MOS technology," ACM Journal on Emerging Technologies in Computing Systems (JETC), vol. 12, p. 16, 2015.
- [11] J. C. Slonczewski, "Current-driven excitation of magnetic multilayers," *Journal of Magnetism and Magnetic Materials*, vol. 159, pp. L1-L7, 1996.
- [12] Z. Xu, C. Yang, M. Mao, K. B. Sutaria, C. Chakrabarti, and Y. Cao, "Compact modeling of STT-MTJ devices," Solid-State Electronics, vol. 102, pp. 76-81, 2014.
- [13] C. Yoshida et al., "Reliability of MgO Tunneling Barrier for MRAM Device," 2006 IEEE International Reliability Physics Symposium Proceedings, San Jose, CA, 2006, pp. 697-698.
- [14] P. Khalili Amiri et al., "Low Write-Energy Magnetic Tunnel Junctions for High-Speed Spin-Transfer-Torque MRAM," in *IEEE Electron Device Letters*, vol. 32, no. 1, pp. 57-59, Jan. 2011.
- [15] N. Imran, J. Lee and R. F. DeMara, "Fault Demotion Using Reconfigurable Slack (FaDReS)," in *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 21, no. 7, pp. 1364-1368, July 2013.
- [16] D. Suzuki, M. Natsui, and T. Hanyu, "Area-efficient LUT circuit design based on asymmetry of MTJ's current switching for a nonvolatile FPGA," in *Circuits and Systems (MWSCAS)*, 2012 IEEE 55th International Midwest Symposium on, 2012, pp. 334-337.
- [17] D. Suzuki, M. Natsui, A. Mochizuki, S. Miura, H. Honjo, H. Sato, et al., "Fabrication of a 3000-6-input-LUTs embedded and block-level power-gated nonvolatile FPGA chip using p-MTJ-based logic-in-memory structure," in VLSI Circuits (VLSI Circuits), 2015 Symposium on, 2015, pp. C172-C173.
- [18] W. Zhao, C. Chappert, V. Javerliac, and J.-P. Nozière, "High speed, high stability and low power sensing amplifier for MTJ/CMOS hybrid logic circuits," *Magnetics, IEEE Transactions on*, vol. 45, pp. 3784-3787, 2009.
- [19] A. Alzahrani and R. F. DeMara, "Process variation immunity of alternative 16nm HK/MG-based FPGA logic blocks," 2015 IEEE 58th International Midwest Symposium on Circuits and Systems (MWSCAS), Fort Collins, CO, 2015, pp. 1-4.
- [20] W. Zhao, E. Belhaire, C. Chappert, and P. Mazoyer, "Spin transfer torque (STT)-MRAM--based runtime reconfiguration FPGA circuit," ACM Transactions on Embedded Computing Systems (TECS), vol. 9, p. 14, 2009.
- [21] W. Zhao, E. Belhaire, C. Chappert, F. Jacquet, and P. Mazoyer, "New non-volatile logic based on spin-MTJ," physica status solidi (a), vol. 205, pp. 1373-1377, 2008.
- [22] Y. Zhou, S. Thekkel, and S. Bhunia, "Low power FPGA design using hybrid CMOS-NEMS approach," in *Proceedings of the 2007 international symposium on Low power electronics and design*, 2007, pp. 14-19.