# Energy and Area Analysis of a Floating-Point Unit in 15nm CMOS Process Technology

Soheil Salehi

Department of Electrical and Computer Engineering University of Central Florida Orlando, Florida 32816-2362 Email: soheil.salehi@knights.ucf.edu

Abstract—The continuous increase in transistor density based on Moore's Law has led us to Complementary Metal-Oxide Semiconductor (CMOS) technologies beyond 45nm process node. These highly-scaled process technologies offer improved density as well as a reduction in nominal supply voltage. New challenges also arise, such as relative proportion of leakage power in standby mode. In this paper, we present an analysis regarding different aspects of 45nm and 15nm technologies, such as power consumption and cell area to compare these two technologies. For this purpose, an IEEE 754 Single Precision Floating-Point Unit implementation is analyzed based on 45nm and 15nm technologies. The results have shown that using the 15nm technology we can have 4 times less energy and 3-fold smaller footprint.

Keywords—IEEE 754, Floating-Point, CMOS technology, energy aware design, Predictive Technology Model, 15nm process technology.

## I. INTRODUCTION

Power density and area have always been two important challenges for CMOS devices and designers [1]. As the trends enabled by Moore's Law allow the technology to shrink to enable increased level of integration, both benefits and challenges arise. One of the most promising device technologies for extending Moore's law to 20 nm and beyond is the selfaligned double-gate MOSFET structure (FinFET). FinFET transistors offer solutions to conventional planar CMOS issues such as sub-threshold leakage, poor short-channel electrostatic behavior, and high device variability. Furthermore, its ability to operate at much lower supply voltage results in static and dynamic power savings [2]. Although issues such as process variation [3-4], aging, and bias temperature and threshold voltage instability [5-8] can become more significant at higher levels of integration, the capability of computing devices is greatly increased while their cost is decreased. In particular, by scaling down the transistor size it is possible to reduce the overall footprint of the device and also accommodate a lower supply voltage to obtain a better dynamic power profile. Floating-point computation can represent a large portion of the power consumed by the CPUs performing video processing and high performance scientific computation, and is a significant area component of most processors. In this research, we are using 15nm technology [9] for IEEE-745 Floating-point Standard [10] in order to assess out about the relative advantages of the 15nm technology over 45nm technology [11]. First, in Section II we will introduce the related works on energy-aware

## **Ronald F. DeMara**

Department of Electrical and Computer Engineering University of Central Florida Orlando, Florida 32816-2362 Email: demara@mail.ucf.edu

techniques to improve the performance of floating-point units. We will also review some work done on reducing the power consumption and area using inexact arithmetic units. Next, in Section III we will introduce the design of the IEEE 754 floating-point unit that we used as a case study. In Section IV, power, voltage and technology relationships are discussed. In Section V, the simulation environment and also the technology libraries used in our research are described. In Section VI, the experimental results are presented and finally we conclude the work in Section VII.

## II. RELATED WORK

There are several techniques that are used to minimize the energy of CMOS logic devices for computation. Three main approaches are commonly-used for energy reduction as shown in Figure 1. These three categories are 1) optimizing one or more steps of the computation procedure, 2) lowering the nominal supply voltage, and 3) allowing approximate arithmetic in applications that can tolerate reduced accuracy. In this paper, we concentrated on the lowering of nominal voltage which can be realized through improvements in process technology at the 15nm node. Alternate techniques of using a near threshold voltage operation are also possible, but introduce significant delay in the switching time in return for reduced energy. The primary emphasis of this paper is to examine the use of 15nm technology process which can allow a lower nominal supply voltage to reduce energy consumption and area.



Figure 1: Energy Aware Techniques for FPU Design.

These three techniques can also be synergistic. For example, [12] proposes the idea of minimizing the bit-width representation of floating-point utilizing low-resolution sensory data which results in 66% reduction in multiplier energy. In [13] a new method is proposed for improving the energy efficiency of a floating-point multiplier by partially truncating the computation of mantissa and also during different floating-point computations to allow the bit-width of mantissa in the multiplicand, multiplier, and output product to be dynamically interchangeable. Some voltage scaling techniques to reduce energy consumption are presented in [14]. In order to minimizing power consumption and energy of digital systems implemented in CMOS we can reduce the supply voltage to near threshold voltage which has an impact on logic speed and it has small performance penalties compared to operation in the subthreshold region. Furthermore, [15] has discussed the benefits and challenges of near-threshold voltage operation and its applications. Approximate computing is another concept that recently it has been used frequently in order to reduce the energy, power and area of CMOS devices. Using approximate computing in [16] results show reduction in energy and area. Further, using approximate or inexact computing can allow tradeoffs between energy, performance and area while introducing perceptually tolerable level of error for some applications [17-18]. Using a new process technology is the most direct way to reduce the supply voltage without sacrificing speed and still results in increased energy efficiency of CMOS switching devices. This technique has been discussed in [19] which analyzes a floating-point unit in 90nm, 45nm, and 22nm technologies. Furthermore, Swaminathan et al. in [20] investigated the switching time and energy consumption of a 32bit CMOS full adder circuit in 15nm node where the authors created their own cell library. Other 15nm arithmetic designs are still emerging in the literature at this time. In this paper, our main concept is to use a new technology which has a lower supply voltage and can make our circuit more efficient in terms of energy. We show realization for area reduction of about 3fold and 3-times less energy consumption in 15nm technology.

## III. IEEE 754 SINGLE PRECISION FLOATING-POINT UNIT

IEEE 754 is an standard for floating-point arithmetic which is a well-known standard frequently used in processors. Details about the IEEE 754 standard can be found in [10], and we will utilize this standard as our case study. Numbers in this standard are represented using an exponent and a significand where the sign is represented using one bit. We can categorize floatingpoint numbers based on their exponent and also based on their significand. Categories based on the exponent are basic and extended where if the floating point's significand is 32 bits long then it is single precision format and if it is 64 bits long then it is referred to as double precision format. IEEE 754 standard supports different types of operation such as addition, subtraction, multiplication, comparisons, division, square root, remainder, and also conversions between integer and floatingpoint formats. During the arithmetic operations our result might be a Not-A-Number (NAN) if there is some overflow or underflow or a division by zero event which all need to be handled as an exeption. After every floating-point operation also needs the result to be rounded based on the format so that the result fits within the standard specifications mentioned in [10].

In this study, we used a single precision Floating-Point Unit (FPU) [21] which is fully IEEE 754 compliant and it can perform a floating-point operation every cycle. It will latch internally the operation type, rounding mode, and operands. This FPU delivers the result after 4 clock cycles. This unit will only assert Signaling NAN (**SNAN**) if operand **a** or operand **b** signals NAN which in this case the output will be a quiet NAN. It uses two pre-normalization units, one for addition and subtraction and another for multiplication and division to adjust the exponents and mantissas and we have a post normalization block which does the normalization of the output's fraction and then rounds the output. Finally, the result will be provided in single precision floating-point format. The FPU block diagram is shown in Figure 2.



Figure 2: FPU Functional Elements.

#### IV. POWER, VOLTAGE AND TECHNOLOGY RELATIONSHIPS

Power calculation is an important metric for a CMOS device performance. Utilizing the power analysis, we can determine important factors such as power-supply sizing, current requirements, criteria for device selection, and the maximum reliable operating frequency. As shown in (1), total power of a CMOS device is determined by two main components which are dynamic power and static power, respectively:

$$P_{Total} = P_{Dynamic} + P_{Static} \tag{1}$$

CMOS static power consumption is a result of the leakage current while the transistor is off. In general, static power consumption is the product of the device leakage current and the supply voltage as shown in (2). However, dynamic power consumption can have a significant impact on the total power when the device's operating frequency is high. In addition to the high operating frequency, charging and discharging a capacitive load can also increase the dynamic power consumption. Dynamic power consists of two components 1) signal transitions power (transient power) and 2) short circuit power as shown in (3) where  $P_T$  and  $P_{SC}$  stand for transient power and short circuit power respectively.

$$P_{Static} = I_{Static} \times V_{dd} \tag{2}$$

$$P_{Dynamic} = P_T + P_{SC} \tag{3}$$

The dynamic power is the power consumed for legitimate logic transitions and spurious glitches due to switching which is a result of input transitions. The first component is the current required to charge the internal nodes called switching current which is shown in (4). Second component is the current that flows from  $V_{dd}$  to *GND* when the p-channel transistor and n-channel transistor simultaneously turn on briefly during the logic transition called through or short circuit current. The transient power and the short circuit power are given by the following equations:

$$P_T = E_T \times f_{clk} \times \alpha = C_L \times V_{dd}^2 \times f_{clk} \times \alpha \tag{4}$$

 $P_{SC} = E_{SC} \times f_{clk} \times \alpha \tag{5}$ 

$$E_{SCf} = (t_f \times (V_{dd} - |V_{Tp}| - V_{Tn}) \times I_{scmaxf})/2 \quad (6)$$

$$E_{SCr} = (t_r \times (V_{dd} - |V_{Tp}| - V_{Tn}) \times I_{scmaxr})/2 \quad (7)$$

where  $E_T$  is the transient energy,  $E_{SC}$  is short circuit energy which is related to rise and fall times of the input signal,  $E_{SCr}$  is rise time short circuit energy,  $E_{SCf}$  is fall time short circuit energy,  $V_{Tp}$  and  $V_{Tn}$  are the threshold voltages of the p-channel and n-channel transistors respectively,  $I_{SCmaxf}$  and  $I_{SCmaxr}$  are maximum short circuit currents flowing during the fall time and rise time respectively,  $f_{clk}$  is the operating frequency,  $\alpha$  is switching activity factor,  $C_L$  is the capacitive load and  $V_{dl}$  is the supply voltage [22]. As it can be inferred from (5) through (7), the duration of the short circuit current impulse is directly affected by operating frequency, rise and fall times, and the internal nodes of the device. The short circuit current that flows through the gate is negligible compared to the switching current, when the operating frequency is high.

## V. SIMULATION ENVIRONMENT

In order to compare the two technologies and to simulate the FPU design we used Design Compiler [23] which is an RTL Synthesis tool by Synopsys. We simulated the FPU circuit using the 45nm and 15nm libraries from NANGATE and extracted the results. In order to use the Design Compiler, first we have to express the hardware description of our circuit and then synthesize it to extract the gate-level netlist using the library components defined in technology library file for RTL synthesis. We used the Design Compiler in order to create the gate-level netlist for our FPU design. Figure 3 depicts the flow of a gate-level netlist extraction.



Figure 3: Modeling Environment and Synthesis Flow.

### VI. EXPERIMENTAL RESULTS

Simulating the FPU using Design Compiler, we could extract the information about the resources used in the design after RTL synthesis. Information about the gates that have been used for the FPU design are listed in Tables 1 through Table 3. The gates used for the design are all standard cells defined in the corresponding technology libraries.

Table 1: Constituent Gate Types and Usage Count (Simplex Gates).

| Cata Function | Simpley Cates | Quantity |      | Gate Area (µm <sup>2</sup> ) |        |
|---------------|---------------|----------|------|------------------------------|--------|
| Gate Function | Simplex Gates | 45nm     | 15nm | 45nm                         | 15nm   |
|               | AND2_X1       | 38       | 19   | 1.0640                       | 0.2949 |
|               | AND3_X1       | 6        | 9    | 1.3300                       | 0.3932 |
| AND           | AND3_X2       | 0        | 2    | 1.5960                       | 0.3932 |
|               | AND4_X1       | 2        | 4    | 1.5960                       | 0.4424 |
|               | AND4_X2       | 0        | 1    | 1.8620                       | 0.4915 |
|               | NAND2_X1      | 63       | 120  | 0.7980                       | 0.1966 |
|               | NAND2_X2      | 0        | 5    | 1.3300                       | 0.2949 |
| NAND          | NAND3_X1      | 26       | 28   | 1.0640                       | 0.2949 |
| INAIND        | NAND3_X2      | 2        | 2 17 | 1.8620                       | 0.4424 |
|               | NAND4_X1      | 29       | 16   | 1.3300                       | 0.3441 |
|               | NAND4_X2      | 0        | 1    | 2.3940                       | 0.5407 |
|               | OR2_X1        | 4        | 7    | 1.0640                       | 0.2949 |
|               | OR3_X1        | 7        | 7    | 1.3300                       | 0.3932 |
| OR            | OR3_X2        | 9        | 1    | 1.5960                       | 0.3932 |
|               | OR4_X1        | 2        | 3    | 1.5960                       | 0.4424 |
|               | OR4_X2        | 0        | 2    | 2.3940                       | 0.4915 |
|               | NOR2_X1       | 27       | 28   | 0.7980                       | 0.1966 |
|               | NOR2_X2       | 0        | 1    | 1.3300                       | 0.2949 |
| NOR           | NOR3_X1       | 37       | 23   | 1.0640                       | 0.2949 |
|               | NOR4_X1       | 27       | 24   | 1.3300                       | 0.3441 |
|               | NOR4_X2       | 0        | 4    | 2.3940                       | 0.5407 |
| XNOR          | XNOR2_X1      | 2        | 0    | 1.5960                       | 0.4424 |
|               | BUF_X1        | 26       | 0    | 0.7980                       | 0.2458 |
|               | BUF_X2        | 4        | 22   | 1.0640                       | 0.2458 |
|               | BUF_X4        | 0        | 1    | 1.8620                       | 0.3932 |
|               | BUF_X8        | 0        | 2    | 3.4580                       | 0.6881 |
| BUFFER        | CLKBUF_X1     | 16       | 0    | 0.7980                       | 0.2458 |
|               | CLKBUF_X2     | 0        | 1    | 1.0640                       | 0.2458 |
|               | CLKBUF_X4     | 0        | 1    | N/A                          | 0.3932 |
|               | CLKBUF_X8     | 0        | 1    | N/A                          | 0.6881 |
|               | CLKBUF_X12    | 0        | 5    | N/A                          | 0.9830 |
|               | INV_X1        | 237      | 173  | 0.5320                       | 0.1475 |
| INIX/         | INV_X2        | 4        | 8    | 0.7980                       | 0.1966 |
| LIN V         | INV_X4        | 0        | 7    | 1.3300                       | 0.2949 |
|               | INV_X8        | 6        | 2    | 2.3940                       | 0.4915 |

Due to technology scaling, the anticipation is that the cell area in 15nm technology would be significantly less than 45nm technology and after simulation the results validated our hypothesis with specific area values. The Total Cell Area of the FPU in 15nm technology is about 30% less than that of the FPU in 45nm technology. Figure 4 depicts the graph for Cell Area analysis of the two technologies used for simulation.

| Gate               | Complex   | Quantity |      | Gate Area (µm²) |        |
|--------------------|-----------|----------|------|-----------------|--------|
| Function           | Gates     | 45nm     | 15nm | 45nm            | 15nm   |
| AND-<br>OR-<br>INV | AOI21_X1  | 5        | 31   | 1.0640          | 0.2949 |
|                    | AOI22_X1  | 59       | 91   | 1.3300          | 0.3441 |
|                    | AOI22_X2  | 0        | 11   | 2.3940          | 0.5898 |
|                    | AOI211_X1 | 3        | 0    | 1.3300          | N/A    |
|                    | AOI221_X1 | 6        | 0    | 1.5960          | N/A    |
|                    | AOI221_X4 | 2        | 0    | 3.4580          | N/A    |
|                    | AOI222_X1 | 19       | 0    | 2.1280          | N/A    |
|                    | OAI21_X1  | 8        | 83   | 1.0640          | 0.2949 |
|                    | OAI21_X2  | 0        | 10   | 1.8620          | 0.4424 |
| OR-                | OAI22_X1  | 4        | 8    | 1.3300          | 0.3440 |
| AND-<br>INV        | OAI22_X2  | 0        | 10   | 2.3940          | 0.5898 |
|                    | OAI211_X1 | 3        | 0    | 1.3300          | N/A    |
|                    | OAI221_X1 | 50       | 0    | 1.5960          | N/A    |
|                    | OAI221_X4 | 15       | 0    | 3.4580          | N/A    |

Table 2: Constituent Gate Types and Usage Count (Complex Gates).

Table 3: Constituent Gate Types and Usage Count (Registers).

| Technology | Registers | Quantity | Area (µm²) |
|------------|-----------|----------|------------|
| 45nm       | DFF_X1    | 234      | 4.5220     |
|            | DFF_X2    | 2        | 5.0540     |
|            | SDFF_X1   | 24       | 6.1180     |
| 15nm       | DFFSNQ_X1 | 256      | 1.2779     |
|            | DFFRNQ_X1 | 4        | 1.2779     |

Identical HDL was synthesized using the same tool under identical synthesis parameters. As noted in Tables 1 through Table 3, library differences can result in some diversity between gate selection and gate count. None the less, the predominant trend for energy consumption between the two designs is realistic for synthesis using two process technologies.



Figure 4: Cell Area Analysis of 45nm vs. 15nm Technologies.

Table 4 lists power consumption estimates for the FPU using the default testbench inputs from Design Compiler. The 45nm column indicates power consumption for a zero-negative slack clock period of 5ns. These values are seen to be 3.15-fold to 4.56-fold larger than the same design synthesized using the 15nm with default parameters. The rightmost column indicates that the FPU design can also operate significantly faster in 15nm technology than in 45nm technology. It is seen that the minimum clock rate which avoids negative timing slack is 400 ps. Thus, for the default testbench, the FPU in 15nm technology can operate about 12.5 times faster than the same FPU in 45nm technology, albeit at a higher power consumption due to the faster clock.

Table 4: Power Analysis of 45nm vs 15nm technologies.

|                                 | $\begin{array}{l} 45 nm \\ \tau = 5 ns \end{array}$ | 15nm<br>τ = 5ns | 15nm<br>τ = 0.4ns |
|---------------------------------|-----------------------------------------------------|-----------------|-------------------|
| Cell Internal Power<br>(mW)     | 1.2367                                              | 0.3922          | 4.8297            |
| Net Switching Power<br>(mW)     | 0.5863                                              | 0.1284          | 1.6604            |
| Total Dynamic Power<br>(mW)     | 1.8230                                              | 0.5206          | 6.4901            |
| Cell Leakage Power<br>(mW)      | 0.2250                                              | 0.1134          | 0.1215            |
| Total Power<br>(mW)             | 2.0480                                              | 0.6340          | 6.6116            |
| Clock Period<br>(ns)            | 5.0                                                 | 5.0             | 0.4               |
| Global Operating<br>Voltage (v) | 1.1                                                 | 0.8             | 0.8               |

Finally, Figure 5 shows the components of energy consumption and total energy consumption for the FPU in 45nm and 15nm technologies. Results indicate that using 15nm technology allows the FPU to consume about 4 times less energy than 45nm technology.



Figure 5: Energy Analysis of 45nm vs. 15nm Technologies.

#### VII. CONCLUSION

Power density and area are two important challenges for CMOS devices. As discussed in this paper, using a new process technology is the most direct way to reduce the supply voltage which results in increased energy efficiency of CMOS switching devices without sacrificing speed. Results have proven that 15nm technology suggests 3-fold to 4-fold improvement energy efficiency than 45nm technology and also it offers about 30% less cell area using this Predictive Technology Model.

Despite the fact that FinFET devices are one of the most promising alternative for planar CMOS, these devices may suffer from some reliability issues that need to be addressed. Self-heating is one of the problems that FinFET devices may face due to their complex geometry and confined dimensions. Self-heating can be a cause for electro-migration and other such issues because it decreases the reliability of the device. As the number of fins grow, self-heating impact will be increased; however, increase in the number of gates doesn't have any significant effect on self-heating [8]. Other important reliability issues, which can influence FinFET's performance and can affect the behavior of the device, are Negative Bias Temperature Instability (NBTI) aging and Positive Bias Temperature Instability (PBTI) aging [5-7]. These issues can result in an alteration in the threshold voltage of the device  $(V_t)$  which is a function of three main factors:  $V_{GS}$ , temperature, and time. In long term use of the device,  $V_t$  can undergo a significant degradation which influences the critical path's delay by as much as 7% to 10% [2]. Process variation is another reliability concern that needs to be taken into account. Process variation is a result of small geometries of FinFET devices and as the technology shrinks their impact has become more significant. Generally, these variations are caused by factors such as random dopant fluctuations, line edge roughness, lavout induced stress, and other process variations which can result in changes in  $V_t$ power and timing [3]. Migrating to new device technologies such as 15nm can help reduce the energy due to reduction in supply voltage, however, as mentioned earlier, process variation, aging, etc. can cause some reliability issues which need to be solved and addressed at the 15nm technology node.

#### REFERENCES

- Tanay Karnik, Shekhar Borkar, and Vivek De. "Sub-90nm technologies: challenges and opportunities for CAD." Proceedings of the IEEE/ACM international conference on *Computer-Aided Design*. ACM, 2002.
- [2] Jamil Kawa, "FinFET Design, Manufacturability, and Reliability," Synopsys DesignWare Technical Bulletin, 2013, available at: <u>http://www.synopsys.com/Company/Publications/DWTB/Pages/dwtb-finfet-jan2013.aspx.</u>
- [3] Hyung Beom Jang, Junhee Lee, Joonho Kong, Taeweon Suh, and Sung Woo Chung, "Leveraging Process Variation for Performance and Energy: In the Perspective of Overclocking," *IEEE Transactions on Computers*, vol.63, no.5, pp.1316-1322, May 2014.
- [4] Jamil Kawa, Andy Biddle, "FinFET: The Promises and the Challenges," Synopsys Insight Newsletter, 2012, Available at:

http://www.synopsys.com/Company/Publications/SynopsysInsight/Page s/Art2-finfet-challenges-ip-IssQ3-12.aspx.

- [5] S. Hamdioui, "NBTI modeling in the framework of temperature variation," *Design, Automation & Test in Europe Conference & Exhibition (DATE)*, pp.283,286, 8-12 March 2010.
- [6] Vivek De, "Energy efficient computing in nanoscale CMOS: Challenges and opportunities," *IEEE Asian Solid-State Circuits Conference (A-SSCC)*, pp.121-124, 10-12 Nov. 2014.
- [7] A. Kerber and T. Nigam, "Challenges in the characterization and modeling of BTI induced variability in metal gate / High-k CMOS technologies," *IEEE International Reliability Physics Symposium (IRPS)*, pp.2D.4.1,2D.4.6, 14-18 April 2013.
- [8] M. I. Khan, A. R. Buzdar, F. Lin, "Self-heating and reliability issues in FinFET and 3D ICs," 12th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT), IEEE, 2014.
- [9] NanGate FreePDK15 Open Cell Library, available at: http://nangate.com
- [10] IEEE Standard for Binary Floating-Point Arithmetic," ANSI/IEEE Std 754-1985.
- [11] NanGate FreePDK45 Open Cell Library, available at: http://nangate.com
- [12] J.Y.F. Tong, D. Nagle, and R.A. Rutenbar, "Reducing power by optimizing the necessary precision/range of floating-point arithmetic," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol.8, no.3, pp.273 - 286, June 2000.
- [13] Shiann-Rong Kuang, Kun-Yi Wu, and Kee-Khuan Yu. "Energy-Efficient Multiple-Precision Floating-Point Multiplier for Embedded Applications." *Journal of Signal Processing Systems*, pp. 43-55, 2013.
- [14] R. A. Ashraf, A. Alzahrani, and R. F. DeMara, "Extending Modular Redundancy to NTV: Costs and Limits of Resiliency at Reduced Supply Voltage," *Workshop on Near Threshold Computing (WNTC-2014)*, Minneapolis, MN, USA, June 14, 2014.
- [15] H. Kaul, M. Anders, S. Hsu, A. Agarwal, R. Krishnamurthy, S. Borkar, "Near-threshold voltage (NTV) design — Opportunities and challenges," 49th ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1149-1154, 3-7 June 2012.
- [16] Jie Han, Michael Orshansky, "Approximate computing: An emerging paradigm for energy-efficient design," 18th IEEE European Test Symposium (ETS), pp.1,6, 27-30 May 2013.
- [17] Naveed Imran, Rizwan A. Ashraf, and Ronald F. DeMara, "Power and quality-aware image processing soft-resilience using online multiobjective GAs," *International Journal of Computational Vision and Robotics*, in-press.
- [18] R. S. Oreifej, C. A. Sharma, R. F. DeMara, "Expediting GA-Based Evolution Using Group Testing Techniques for Reconfigurable Hardware," *IEEE International Conference on Reconfigurable Computing and FPGA's, ReConFig 2006*, pp.1-8, Sept. 2006
- [19] Sameh Galal, and Mark Horowitz. "Energy-efficient floating-point unit design," *IEEE Transactions on Computers*, 60.7 pp. 913-922, 2011.
- [20] K. Swaminathan, M. S. Kim, N. Chandramoorthy; B. Sedighi, R. Perricone, J. Sampson, V. Narayanan, "Modeling steep slope devices: From circuits to architectures," *Design, Automation and Test in Europe Conference and Exhibition (DATE)*, pp.1,6, 24-28 March 2014.
- [21] Rudolf Usselman, Documentation for Floating-point Unit, available at: <u>http://www.opencores.org</u>.
- [22] M. A. Ortega and J. Figueras, "Short Circuit Power Modeling in Submicron CMOS," *PATMOS '96*, pp. 147-166, Aug. 1996.
- [23] Synopsys, Inc., "Design Compiler 2010," available at: <u>http://www.synopsys.com/Tools/Implementation/RTLSynthesis/Design</u> <u>Compiler/Pages/default</u>.