Efficient and optimized approximate GDI full adders based on dynamic threshold CNTFETs for specific least significant bits

2023-05-10 03:34:14AyoubSADEGHIRaziehGHASEMIHosseinGHASEMIANNabiollahSHIRI

Frontiers of Information Technology & Electronic Engineering 2023年4期

Ayoub SADEGHI ,Razieh GHASEMI ,Hossein GHASEMIAN ,Nabiollah SHIRI

1Department of Electrical Engineering,Shiraz Branch,Islamic Azad University,Shiraz 7198774731,Iran

2School of Electrical Engineering,Iran University of Science and Technology,Tehran 1684613114,Iran

3Department of Electrical and Electronic Engineering,Shiraz University of Technology,Shiraz 7155713876,Iran

Abstract: Carbon nanotube field-effect transistors (CNTFETs) are reliable alternatives for conventional transistors,especially for use in approximate computing (AC) based error-resilient digital circuits.In this paper,CNTFET technology and the gate diffusion input (GDI) technique are merged,and three new AC-based full adders (FAs) are presented with 6,6,and 8 transistors,separately.The nondominated sorting based genetic algorithm II (NSGA-II) is used to attain the optimal performance of the proposed cells by considering the number of tubes and chirality vectors as its variables.The results confirm the circuits’ improvement by about 50% in terms of power-delay-product (PDP) at the cost of area occupation.The Monte Carlo method (MCM) and 32-nm CNTFET technology are used to evaluate the lithographic variations and the stability of the proposed circuits during the fabrication process,in which the higher stability of the proposed circuits compared to those in the literature is observed.The dynamic threshold (DT) technique in the transistors of the proposed circuits amends the possible voltage drop at the outputs.Circuitry performance and error metrics of the proposed circuits nominate them for the least significant bit (LSB) parts of more complex arithmetic circuits such as multipliers.

Key words: Carbon nanotube field-effect transistor (CNTFET);Optimization algorithm;Nondominated sorting based genetic algorithm II (NSGA-II);Gate diffusion input (GDI);Approximate computing

1 Introduction

Digital circuits are vital parts of portable elec‐tronic devices,and in recent years,designers have tried to achieve high-performance circuits.The main challenge in integrated circuits (ICs) is the dimen‐sions and number of transistors (Rafiee et al.,2021b).Shrinking the size of transistors toward the nanometer region poses significant challenges to reduce power,delay,and area (Cardenas et al.,2021).Reliable scaling down of transistors has faced vital issues,including short channel effects,drain-induced barrier lowering (DIBL),decreased gate controllability,and hot elec‐tron effects (Sadeghi et al.,2020).Hence,two funda‐mental solutions have been considered: the technology of transistors and the circuit design methodology.Carbon nanotube field-effect transistors (CNTFETs) (Deng and Wong,2007a,2007b) and fin field-effect transistors (FinFETs) have been proposed as alterna‐tives for conventional metal-oxide-semiconductor fieldeffect transistors (MOSFETs).Therefore,an evalua‐tion of circuit fabrication and transistor sizing is re‐quired to achieve an optimal sketch (Karimi and Rezai,2016;Kordrostami et al.,2019).These are critical concepts in very-large-scale integration (VLSI) cir‐cuits,especially regarding approximate computing (AC) based arithmetic circuits (Strollo et al.,2020).

The basis of AC-based arithmetic circuits is the full adder (FA) (Mirzaei and Mohammadi,2020).Various theories of transistor-and gate-level designs along with technology have been expressed in AC.In this research,AC-based FAs based on CNTFET tech‐nology are presented.The focus of arithmetic AC-based circuits is on area reduction and performance enhance‐ment;thus,the design techniques are evaluated for this intention.In this regard,a research gap is the lack of the gate diffusion input (GDI) technique and its compatibil‐ity with CNTFET technology (Morgenshtein et al.,2002).

The contribution of this paper is the assessment of combining CNTFET technology and the GDI tech‐nique for designing AC-based arithmetic circuits.As shown in Fig.1,the GDI technique is considered to design AC-based circuits using CNTFET technology.The most significant contribution of the AC concept is considered at the circuit level.By integration of pioneer design techniques and modern technology,highefficiency specific-purpose chips,field programma‐ble gate arrays (FPGAs),and systems-on-chip (SoCs) are accessible.To select the best option as a reliable and prospective technology,various factors,such as tremendous electrical characteristics including ballistic transport and low OFF-current,are important and seen in CNTFETs.

Carbon nanotubes are defined as two types of CNT transistors (Karimi and Rezai,2017),which were described in Abdul Hadi et al.(2022) and Gha‐semian et al.(2022) as alternatives for MOSFETs.Additionally,CNT transistors have a significant ability to adjust the threshold voltage (Vth);hence,they are used as reliable devices for multithreshold applica‐tions.Fig.1 shows a CNTFET-based inverter layout consisting of P-CNTs and N-CNTs with 10 tubes and a chirality vector of (38,0).The transistors’ parame‐ters,the width of the gate,pitch,Lch,Ldd,andLssare shown in Fig.1.In this paper,optimizing the perfor‐mance of CNTFETs in a circuit,specifically when they are merged with the GDI technique,is covered by considering different values and theories for the geometric parameters.Therefore,the nondominated sorting genetic algorithm II (NSGA-II) (Abiri et al.,2020) is used as an optimization procedure,which improves the performance of a subject such as energy savings by considering two or more objectives simultaneously.In this regard,according to Fig.1,the number of tubes and chirality vectors are critical parameters influencing theVthof transistors and their performance.However,there is a possibility of errors during fabrication;for example,the number of tubes may change (Ghorbani et al.,2022).Therefore,it is necessary to evaluate the optimized circuits consider‐ing lithography (Cho and Lombardi,2016).

Fig.1 Considered chart to attain future chips

From Fig.1,the GDI cell is the selected tech‐nique,and there is no need to adjust the width of the CNTFETs to achieve equal rise/fall time (Ben-Jamaa et al.,2011).Hence,by changing the number of tubes and chirality vectors,the voltage transfer characteris‐tics (VTC) change,as shown in Fig.1.The GDI cells reduce the area,but they have a voltage swing drop at their outputs (Rafiee et al.,2021b).Although a single-swing restoration (SR) transistor has been pro‐posed,it increases the area and jeopardizes one of the main goals of AC-based circuits,which lowers the complexity of the circuits in high-bit structures (Morgenshtein et al.,2014).Another solution is the dynamic threshold (DT) technique (Lindert et al.,1999).Fig.2 indicates different GDI cells (e.g.,for an AND gate),which use basic,DT,and SR transistor struc‐tures.When using the DT technique,each transistor substrate is dynamically aligned with the gate volt‐age,so theVthof the device is adjusted dynamically.When DT-CNT is ON,itsVthdecreases,and the cur‐rent and speed increase.On the other hand,when it is OFF,Vthincreases,leakage current is reduced,and power dissipation is minimized (Homulle et al.,2018).Full-swing outputs increase the drivability of the circuit (Rafiee et al.,2022).As shown in Fig.2,the highest output voltage under tube variations and fan-out 4 (FO4) belongs to the DT-based GDI AND gate.By the DT in CNTFET-based circuits,Vthis controlled,and its drop at the output is reduced (Rafiee et al.,2021b;Sadeghi et al.,2022).

Fig.2 Implementations of the GDI‐based AND gate: (a) basic;(b) DT;(c) SRT;(d) output voltage swing variations

The contributions of this paper are as follows:

1.The DT-GDI technique and CNTFET technol‐ogy are used to achieve three new approximate FAs.

2.FAs have low numbers of transistors (6,6,8) and high performance.The best values of transistor dimension and threshold voltage (Vth) are considered.The FAs are implemented in the least significant bit (LSB) parts of the partial product reduction tree (PPRT) of the multipliers.

3.The FAs are optimized based on NSGA-II,and the best performances of the circuits for power,delay,energy,and output swings are extracted.

4.Finally,the proposed circuits are directly used in an error-tolerant application,image processing,and a reasonable trade-off between the circuit and accuracy parameters is confirmed.

2 Investigation of AC-based FAs

2.1 Analysis of previous AC‐based FAs

Previous FAs are classified into full-custom and gate-level designs.However,a combination of them can be considered.In Gupta et al.(2013),approxi‐mate FAs were designed based on the general struc‐ture of a conventional mirror adder (CMA) to reduce the area and save more power.AMA1,AMA2,and AMA3 are implemented with this principle.Table 1 shows the implementation functions,the number of transistors,and the techniques in the literature.

Venkatachalam and Ko (2017) and Waris et al.(2019) proposed approximate FAs based on chang‐ing the conventional block diagram of an exact FA (XOR-XNOR-based).High area and high error rate are the defects of these circuits.Mahdiani et al.(2010) proposed a design known as lower-part OR adder (LOA),where an FA cell was designed as a part of a hard‐ware implementation for most significant bit (MSB) or LSB computing,but the delay and power were not satisfactory.Similar to AMA1?AMA3,in Mirzaei and Mohammadi (2020;2021),six other circuits were im‐plemented based on the simplification of the CMA circuit.In this paper,all six designs are compared with the proposed circuits.These circuits include a high number of transistors and are designed to increase the stability against unintended variations during the fabri‐cation process.Another circuit suggested in the litera‐ture is TGA2 (Yang et al.,2015),which is based on a combination of transistor-level and gate-level,by the integration pass-transistor-logic (PTL) and CMOS.Here,the drop in voltage swing is solved,but the power consumption is increased.An approximate FA based on CMOS consists of NOR gates presented in Waris et al.(2022) to reduce the error rate of output carry (Cout).The circuit has one error inCoutand two errors in Sum,which produce an appropriate error rate.As given in Table 1,the maximum number of transistors is for VAFA,while the minimum is AFA5 and the following is AFA1.AFA5 is similar to LOA,without input carry (Cin).

Table 1 Comparison among designs in the literature

2.2 Proposed AC‐based DT‐GDI FAs

Three novel approximate FA cells are proposed (Fig.3).These cells are based on different block dia‐grams and the GDI technique.From Fig.3a,Proposed-1 has an XNOR with an AND including six transistors,a similar number of transistors as AFA5.Compared to AFA5,Proposed-1 hasCin,so its implementations in complex structures do not require extra gates.Ad‐ditionally,inputAis considered asCoutfor passing to other gates in a ripple-carry-based structure.To over‐come the voltage swing drop due to the use of GDIAND,the DT-CNT technique is used.However,in high fan-out conditions,the voltage drop problem is still observable.Therefore,in Proposed-2,instead of using AND at Sum,the F1 gate is used,and the main advantage of F1 is that the inverter is created internally.The number of transistors is still 6,while the voltage swing of Sum is improved.Additionally,XOR is used instead of XNOR,and theCoutof these two circuits is not changed.In terms of functioning,Proposed-1 withis similar to TGA2 and AMA1.How‐ever,in TGA2 and AMA1,16 and 20 transistors are used to produce Sum,respectively.In AMA1,Sum de‐pends onCout,which requires an inverter on the output for swing restoration.Additionally,using F1 and con‐sideringCinas the main input give Sum=(A⊕B).Considering Table 1,none of the circuits in the litera‐ture produce such an output.

Fig.3 Structures of the proposed approximate full adders: (a) Proposed‐1;(b) Proposed‐2;(c) Proposed‐3

Proposed-3 improvesCoutdrivability (Fig.3c).Its function is similar to that of Proposed-2.The only difference is a transmission gate (TG) for boosting the speed and strengthening the inputAasCout.Since the XOR-based GDI uses an inverter forA,it is possi‐ble to use a TG with both PCNT and NCNT gate ter‐minals connected toAˉ;therefore,an extra inverter is not needed,so Proposed-3 produces its outputs with eight transistors.TG is used in theCoutpath,because in cascading structures such as multipliers (Rafiee et al.,2021b;Sadeghi et al.,2022),inputs come from circuits such as compressors and may have a voltage swing drop,so TG overcomes this problem.

Table 2 shows the truth table of references and the proposed designs compared to their exact types.The highest error rate (ER) is for the circuits with 0.5 ER such as LOA,AFA2,AFA3,AFA5,Proposed-2,and Proposed-3.Considering the normalized mean error distance (NMED),AFA2,AFA3,AFA5,Proposed-2,and Proposed-3 have the maximum values.Transistorlevel schematics of the proposed designs are shown in Figs.4a–4c for Proposed-1 to Proposed-3.Addition‐ally,the output waveforms under a frequency of 1 GHz are shown in Fig.4d.

Table 2 Truth table for the exact and approximate full adders

The proposed circuits benefit from a fast charge and discharge of internal capacitances,so high-speed outputs are expected.In Proposed-1,T1–T4operate as XNOR.Conventional GDI-based XNOR uses an in‐verter for inputB,but here it is used for inputAto ac‐tivate or inactivate the TG of Proposed-3 without an extra inverter.

In Proposed-1,T5andT6act as an AND,in whichCinis connected as the transistor activator and trans‐mits GND and XNOR to the output.Due to the use of GND and the inherent properties of the GDI-AND gate,glitches are seen in the outputs of this circuit,which reduces speed or increases power consumption.Hence,F1 is used in Proposed-2,which preventsVthdrop by generating fresh and inverted signals.With the inherent nature of the internal inverter of F1,Cinis converted toin,and then it is AND with XOR.This output produces strong 0 and 1 in all cases,ex‐cept whenCin=0 and XOR=0,whereT5produces an output of |Vthp| when the DT technique is not used.As an advantage,the GDI-based F1 function does not need SR transistors since the employed DT works efficiently;therefore,as shown in Fig.4d,Sum is full-swing whenCin=0 and XOR=0.Another impor‐tant point ofT5in Proposed-2 is its threshold volt‐age (Vth-T5).As will be illustrated in Section 2.4,by the intelligent optimization algorithm,the best value of the chirality vector for this transistor will be ob‐tained as (35,0).Using this chirality vector,we haveDCNT-T5=2.741 nm andVth-T5=0.156 V,andDCNTis the CNT diameter.This value ofVth,compared toVDD=0.9 V,shows a decrease of only 17.33% when the voltage swing drop occurs (below 20%),which con‐firms the high capability of DT for making this output full-swing with a desirable noise margin.In other words,the algorithm considers the largest value of the chirality vector and consequently the smallest value ofVthforT5(among all transistors of Proposed-2).Another contribution of this research is the way by which NSGA-II is used to consider full-swing output.That is,in addition to optimizing power,delay,and power-delay-product (PDP),the algorithm considers the output waveform and compares it with the ideal value to obtain the best swing performance.

Fig.4 Proposed approximate full adders: (a) Proposed‐1;(b) Proposed‐2;(c) Proposed‐3;(d) their output waveforms under a frequency of 1 GHz

In about 50% of the cases,the GDI cell operates as a regular CMOS inverter (Morgenshtein et al.,2002).On the other hand,using GDI AND,this possibility does not exist,so voltage swing drop and glitch appear in the output of Proposed-1.However,using DT,this issue is covered.In this case,only the DT technique operates as swing restoration,and full-swing output is obtained for all states (such as the table provided in Fig.4).

In Proposed-3,since an inverter is used in XOR,the possibility of using TG is raised.Therefore,Coutis highly dependent on the state ofA.The two transis‐tors used in the TG structure of this circuit (T7andT8) are not ON (activated) at the same time,and their activation states are changed according to the state ofA.Therefore,with the appropriate use of NCNT and PCNT transistors for producing 0 and 1,respectively,no voltage swing drop occurs,and the speed is im‐proved when embedded in cascaded structures.In CNTbased TG gates,the rise and fall times are equal,and the on-resistances of P-type and N-type CNTFETs are equal (R=Rn=Rp) (Ben-Jamaa et al.,2011).This yields smaller CNTFET gates compared to CMOS gates in the implementation of the same function.Simultaneously,since equally sized P-type and N-type CNTFETs have the same ON-resistance,they are more compact than MOSFET-based gates (which usually use a transistor sizing methodology such asWPMOS≈3WNMOS).CNTFET technology overcomes the chal‐lenge of static power by small-size devices and lowVth(Majerus et al.,2013).

As shown in Fig.4,Proposed-1 has sensitive glitches at Sum.Various procedures are considered to avoid AND’s glitches,including delay balancing,hazard filtering,gate sizing,and transistor sizing.F1,as a kind of AND,improves the performance of Proposed-1 and introduces a solution to remove the glitches in Proposed-2 and Proposed-3.

By using F1,Cin,which reaches F1 faster than XOR,is first inverted in F1 without increasing the area,compared to well-known solutions (Vasantha Kumar et al.,2012;Yang et al.,2015),and causes a slight delay forCinto enter F1,resulting in delay balancing.The procedure used in Proposed-2 and Proposed-3 increases the total delay in comparison with Proposed-1,but it prevents glitches and increases drivability and accuracy.

Moreover,by looking at thek-map of Proposed-1,whenCin=0,T5is ON and passes GND to output,and it is PCNT and inappropriate for passing GND;there‐fore,the hazard of glitch can occur.In total,accord‐ing to Fig.4,Proposed-1 has the major failing efforts for producing the full-swing output,although by using DT,these defects are reduced,but not completely avoided.On the other hand,for Proposed-2,first the simultaneous arrival of inputs to F1 is covered by an inherent inverter,and second a minimum possible fail‐ing effort of non-full swing output appears and is cov‐ered by DT.In conclusion,Proposed-2 and Proposed-3 are better options for cascade structures such as RCAs.

Regarding the highest operating frequency,Proposed-1 suffers from possible glitches,while the two other circuits have much better performance.Proposed-3 with a 2-GHz operating frequency has a better condition.However,in this state,the voltage drop is equal to 0.211 V,which is approximately 20% ofVDD(still acceptable).For Proposed-2,frequencies greater than 1.5 GHz jeopardize the swings.As ex‐pected,Proposed-1 has weak performance against high frequencies,and according to noise margin,it has acceptable values until 1.25 GHz.Proposed-1,in con‐trast to two other circuits,has some drops regarding both high logic (e.g.,0.645 V) and low logic (e.g.,0.115 V).The peaks for the glitches are 0.48 V and 0.37 V.Another important point is the high stability ofCoutin Proposed-3,produced by a TG gate,which makes this cell suitable for ripple effect structures.

Based on the ER of the proposed circuits and their low number of transistors,they are suitable for the LSB parts.Several approximate multipliers have been proposed,in which LSB outputs are classified into truncated or approximate.In truncated multipli‐ers,some of the partial products are not formed,lead‐ing to errors in the outputs (Strollo et al.,2020).

To increase the accuracy of approximate multi‐pliers,instead of truncating the LSB outputs,the pro‐posed circuits are used without a significant increase in power and energy (Sadeghi et al.,2022),and a higher accuracy is achieved at the cost of losing a small area.These circuits are used to generate the LSB outputs ofP0toP3in an 8-bit multiplier.The proposed circuits are also used to generate the initial signals of the approximate multipliers in the final addition by an RCA.For this reason,in this study,different RCAs with different numbers of approximate bits (NABs) are used to evaluate the proposed circuits.

2.3 Optimization procedure using NSGA‐II

Usually,VLSI circuits are faced with transistor sizing (Naseri and Timarchi,2018).To date,a reli‐able mechanism has not been proposed to achieve the best (optimal) performance of CNTFET-based circuits in the literature regarding approximate cells.In this study,this issue is significantly addressed.Unlike many previous works on transistor sizing,we consider all circuit parameters with a specific priority.This ar‐rangement is related to output voltage swing (indicat‐ing correct operation of the circuit during transistor sizing) and the best power saving,minimum delay,and minimum PDP.NSGA-II is used (Fig.5).It con‐siders several variables and objectives at the same time to optimize several objective functions (Deb et al.,2002).Using these procedures often results in a trade-off between the best circuit performance and area consumption (Abiri et al.,2020).Therefore,the intervals considered in the following are not jeopar‐dizing the symmetry of the circuit too much.For the optimization of the cells,a direct mechanism is estab‐lished between MATLAB and HSPICE tools.

Fig.5 Optimization procedure using NSGA‐II

This algorithm is carried out in three main steps.The CNTFET-based circuits are affected by the numbers of tubes and chirality vectors,and are considered for optimization.Before starting the algorithm,preknown data must be optimized.First,with an improved code in MATLAB,the desired circuit is simulated using HSPICE,and the results are stored so that they can be easily called when comparison becomes necessary.In this case,along with the power,delay,and PDP re‐sults,the output waveforms are stored as well.In the second step,the problem variables are considered,the number of variables is equal toV=(αi+βi)T,whereαiandβiare the numbers of tubes and chirality vectors of transistori,respectively,andTis the number of transistors.For example,for Proposed-1,V=12.

Now,random populations,the optional number of generations,and the considered intervals for objec‐tives with prior knowledge stored in the previous step are adjusted for the algorithm.The algorithm is initiated with the desired iterations.Genetic opera‐tors,including mutation and crossover (Deb et al.,2002),are used to generate different ranks to domi‐nate each previous rank to attain the best fronts.In the third step,one-by-one and corresponding compar‐isons between the obtained results of waveforms,power,delay,and PDP are performed.If the results are correct,the next comparison is performed;other‐wise,the population is reset for the simulation.Opti‐mization results may be obtained in each iteration.In this case,if the algorithm is stopped,better results in the next generations that can be obtained may be ignored.To avoid that,again in MATLAB,a storage environment is provided only for all optimization results,the best of which are reported as the final results.This step is called “selection of the best optimization.”

2.4 Optimization procedure results

The explained mechanism is applied to the refer‐ences and the proposed designs according to the speci‐fication provided in Table 3.The preknown data are attained according to constant conditions for the cir‐cuit parameters.

The optimization results are provided in Table 4.Additionally,the power,delay,and PDP results from the comparison between the nonoptimized and opti‐mized versions of the proposed cells are given in Table 4.

Table 3 Optimization adjustment and desired conditions for the proposed approximate full adder cells

Proposed-1 suffers from delay,while the best optimization results belong to Proposed-1 for power and PDP with 2.10× and 2.09× improvements,respec‐tively.Regarding delay,Proposed-2 shows a 0.36% improvement.A comparison between Proposed-2 and Proposed-3 shows better performance of Proposed-2 in terms of power and PDP.

2.5 Layout of approximate FAs

Electric VLSI 9.07 is a useful tool for MOSFET and CNTFET layouts (Huang JL et al.,2010,2012).Fig.6 shows the layout of the proposed FAs,where design rule checking (DRC),electrical rule checking (ERC),and layout versus schematic (LVS) are per‐formed without error.These roles are based onλ(f=2λ),which exists in the tool as mocmos-cn (cn=carbon nanotube) technology.The values of 0.090,0.080,and 0.113 μm2are achieved as the area occu‐pation of Proposed-1,Proposed-2,and Proposed-3,respectively,while for their optimized versions,these values are 0.109,0.120,and 0.153 μm2,respectively.These results show a 17.43%,33.33%,and 26.14% larger area for the optimized version of Proposed-1,Proposed-2,and Proposed-3 in comparison with their nonoptimized versions,respestively.

Fig.6 Layouts of Proposed‐1 (a and d),Proposed‐2 (b and e),and Proposed‐3 (c and f): (a?c) nonoptimized;(d?f) optimized

3 Simulation setup and results

In this study,the 32-nm SPICE-compatible com‐pact model is used which describes enhancement-mode,unipolar MOSFETs with semiconducting single-walled carbon nanotubes as channels (Deng and Wong,2007a,2007b).Additionally,the Synopsys HSPICE-H-2013.03-SP2 64-bit tool with Stanford University CNFETs Verilog-A Model v.2.1.1 is used for the simulations.The main reason for using the mentioned technology is its running speed issue in the HSPICE tool;in this case,its Verilog-A model can be used as a solution (Lee and Wong,2015).The simulation parameters for the technology are given in Table 5.For constant condi‐tions of simulations,the chirality vector and the num‐ber of tubes are adjusted as (38,0) and 10 for each transistor,respectively,which results inDCNT=2.97 nm andVth=0.144 V.

Table 4 Optimization results for the proposed approximate full adder cells

Table 5 Parameters of the applied CNTFET technology

For the FAs,a circuit under test (CUT) is pro‐vided according to Fig.7a (Hasan et al.,2020).The circuit inputs pass through two inverters,and a load capacitance equivalent to 1 fF is used at each output after a fan-out of 4 to check the drivability of the cir‐cuit (Kandpal et al.,2020).The average power con‐sumption is measured from 0.01 ns for two periods of time based on the intended operative frequency.The delay measurement is according to the path shown in the CUT from before the buffers to the end of the path.Additionally,the worst propagation delay is estimated when outputs approach 50% ofVDDin rising and falling conditions.All possible patterns of the FA truth table are applied for the measurements.Additionally,PDP is reported as a metric for circuit performance observation.To include the area con‐sumption level based on the number of transistors,power-delay-area-product (PDAP) is reported.

Fig.7 Simulation setup for the full adders (FAs) (a),RCA structure (b),gate‐level structure of the exact FA (c),and transistor‐level schematic of the exact FA (d) used in this paper

Additionally,ER,error distance (ED),mean error distance (MED),and NMED (Mirzaei and Mohammadi,2021) are calculated.The RCA structure shown in Fig.7b is used to evaluate the proposed circuits and circuits proposed in the literature.To obtain a fair comparison between the approximate FAs when em‐bedded in RCA,different scenarios are considered.Different NABs,including NAB1 to NAB4,are ap‐plied to the 4-bit RCA.For example,NAB1 means that only the first exact FA with the LSB signals on its output is replaced with an approximate FA.In each of the NABs,all 512 possible states are applied to the inputs.These scenarios are used in HSPICE and MATLAB for circuitry performance and error evalua‐tions separately.Since the proposed circuits are ac‐cording to the DT-GDI techniques,the exact cell is selected based on this technique as well.Figs.7c and 7d show the block diagram and transistor-level scheme of the DT-GDI-based exact FA for comparison and RCA implementation with different NABs.

3.1 Approximate FA assessments

Simulations are carried out under the constant conditions provided in Table 6.The best performance regarding power,delay,PDP,and PDAP belongs to Proposed-1.VAFA has the highest power of 3.0946 μW due to its high number of transistors.TGA2,with 22 transistors,shows the highest results of delay,PDP,and PDAP.Proposed-2 and Proposed-3 have appro‐priate conditions.In this regard,in terms of power,Proposed-2 and Proposed-3 are placed after AMA3,AFA1,AFA2,and AFA3.Regarding PDAP,Proposed-1,Proposed-2,AMA2,and Proposed-3 rank the first four.The main significant difference between Proposed-1 and the other cells results from its low number of tran‐sistors.The main disadvantages of NxFA are a high number of transistors (i.e.,14),the use of an input in‐verter,and the existence of a high number of direct paths betweenVDDand GND.The extraction results according to Table 6 show that,compared with the NxFA circuit,Proposed-1 has 69.20%,49.96%,84.50%,and 93.39% better results for power,delay,PDP,and PDAP,respectively.Although NxFA has a good error rate (which is comparable to that of Proposed-1),due to the weak results in terms of power,delay,and specifi‐cally PDAP (which are the most important comparative factors between circuits),it is not considered the main competitor in comparison with the proposed circuits.

Table 6 Approximate full adder simulation results under constant conditions＊

During the fabrication process,the considered mechanism is useful.For example,the intended num‐ber of tubes may be displaced.Inevitably,the circuits are initially examined in terms of lithography and cost on the wafer before use based on the optimized values.The process-voltage-temperature (PVT) vari‐ations evaluate the performance of circuits,but it is better to establish more accurate analyses when using CNTFETs.Two types of variations are considered: conventional processes and CNT-specific processes.The former variations include channel length,channel width,oxide thickness,and threshold voltage varia‐tions,while the latter is about the gate length and width of CNTFETs (Cho and Lombardi,2016).

Here,for the simulations,parameters such as the number of tubes (which has a direct influence on the density of CNTFETs),gate length,and gate width (relevant to the lithography) of the CNTFETs are considered the changeable objectives of the Monte Carlo method (MCM) for 1000 runs with a Gaussian distribution (±5% distribution at the ±3σlevel) (Ghor‐bani et al.,2022).A lithographic process using gate length and width variations has a small impact on the gate capacitance of a CNTFET.Variations in the lithographic process and density do not affect the power dissipation because they do not change the current in a CNTFET.When the number of tubes is changed,the power is varied.Hence,considering variations in both lithography and the number of tubes (density),a comprehensible understanding of CNTFET-based fabrication on wafers arises.

The obtained results of the mean,minimum,and maximum for power,delay,PDP,and PDAP are shown in Fig.8.According to Fig.8a,the mean power consumption by Proposed-1 is the minimum,while AFA1,Proposed-2,and AFA2 follow.All three pro‐posed cells show the best results of mean PDAP,re‐sulting from their low PDP and small number of tran‐sistors.Similarly,the conclusion can be attributed to the results regarding the minimum and maximum values of power,delay,and PDP,as shown in Figs.8b and 8c.

By comparing the power,delay,and PDP distri‐bution results obtained using MCM,according to Fig.9,the stability of the proposed circuits in this field is achieved.Accordingly,a histogram closer to the starting point of theXandYaxes has higher sta‐bility in terms of circuitry performance.On the other hand,Fig.8d is provided in terms of stability of cir‐cuit output versus the fabrication process,in which higher sensitivity of Proposed-1 compared to those of two other cells is realized.This is due to the use of an unstable gate such as GDI-AND in its structure.The stability of the proposed circuits against mismatches in the fabrication process is acceptable,and one can rely on the results obtained from the optimization procedure.

Fig.8 Lithographic variation results by MCM for mean (a),minimum (b),maximum (c),and output (d) waveforms

Fig.9 Lithographic variation results by MCM analysis in terms of power (a),delay (b),and PDP (c)

3.2 Approximate RCA evaluations

Here,a comprehensive investigation of the ap‐proximate RCAs based on approximate FAs is pro‐vided.Fig.10 shows the performance of the circuits versus different NABs in terms of power,delay,PDP,and PDAP.Regarding power,PDP,and PDAP,the proposed cells have the best results compared to other cells when the optimized values are considered.Among the proposed cells,Proposed-2 has the highest delay for NAB1 to NAB4.The proposed cells are appropri‐ate alternatives for RCAs even with a higher number of input bits,such as 15 bits,for the multipliers as final addition stages or even in their PPRT.

The higher NABs cause a higher possibility of error in the outputs.Fig.11 depicts the performance of RCAs in terms of NMED.As expected in Table 2,some circuits,such as AMA1,have higher output accuracy than the proposed circuits.Hence,AMA1,AMA2,VAFA,and AFA4 have lower NMED rates compared to the proposed circuits.

Fig.11 NMED evaluations of the optimized approximate RCAs versus different numbers of approximate bits (NABs)

In contrast to the circuitry behavior shown in Fig.10,AMA1,AMA2,VAFA,and AFA4 have worse results.Therefore,an appropriate competency criterion is used to evaluate the performance of approximate circuits.In this regard,Figs.12a and 12b are given,showing NMED versus PDP and PDAP,respectively.Fig.12a shows the PDP versus NMED results extracted from all considered NABs of RCAs.Plotting the trendline of the proposed circuits on a logarithmic scale demonstrates the best Pareto optimal curve,due mainly to the appropriate PDP.To illustrate the con‐ditions of the circuits in terms of a trade-off between the circuitry and accuracy performance,Figs.12b and 12c are shown.From Fig.12b,in terms of average NMED and average PDP,the proposed cells have better conditions compared to AMA3,NFAx,LOA,AFA2,AFA1,AFA5,and AFA3.TGA2,AMA1,AFA4,AFA6,AMA2,VAFA,and NxFA have only better NMED.The main conclusion here is the strength of the proposed cells regarding circuitry performance and NMED.The same conclusion is attributed to Fig.12c,which shows PDAP versus NMED;the proposed cells have a better condition in both terms compared to most of the designs.Therefore,these results suggest the proposed cells as proper designs for implementa‐tion in specific-purpose future generation chips that are error-tolerant and energy-efficient.

Fig.10 Performance of the optimized approximate RCAs versus different numbers of approximate bits: (a) power;(b) delay;(c) PDP;(d) PDAP

Fig.12 Comparison of the optimized approximate RCAs for different numbers of approximate bits (NABs) in terms of all NABs’ PDP simultaneously (a),average PDP (b),and average PDAP (c) versus NMED

3.3 Case study: digital image addition

The proposed circuits are investigated in a real error-tolerant application such as image addition (Huang JQ et al.,2021) based on the mechanism described in Sadeghi et al.(2022).Initially,gray input images (Figs.13a and 13b) are considered,then converted to binary images (Figs.13c and 13d),and then converted to a binary equivalent signal in piecewise linear (PWL) format by the developed codes in MATLAB.The attained signals are applied to the digital circuits in HSPICE,and the real performance of these circuits is extracted in terms of circuitry parameters such as power,delay,and PDP along with image quality as‐sessments such as the peak signal-to-noise ratio (PSNR) and structural similarity index metric (SSIM) accord‐ing to Eqs.(1) and (2).Additionally,the figure of merit (FoM) in Eqs.(3) and (4) gives a better under‐standing of the designs.The mentioned procedure is carried out using RCA with NAB3 for all references and the proposed cells.The output images obtained by Proposed-1,Proposed-2,and Proposed-3 are shown in Figs.13e,13f,and 13g,respectively.The output images of Proposed-2 and Proposed-3 have slightly blurred pixels compared to those of Proposed-1 due to their higher error rates.

Fig.13 Image addition application: (a,b) greyscale input images;(c,d) binary scale input images;(e) outputs of Proposed‐1;(f) outputs of Proposed‐2;(g) outputs of Proposed‐3

In Eqs.(1) and (2),mandpare the image di‐mensions,MAXIis the maximum value of each pixel,I(i,j) andK(i,j) are the exact and obtained val‐ues for each pixel,respectively,μxandμyare the pixel sample means ofxandyimages respectively,andare the variances ofxandyimages respec‐tively,σxyis the covariance ofxandyimages,andC1=(K1L)2andC2=(K2L)2are two variables to stabilize the division with a weak denominator,whereK1andK2are small constants generally equal to 0.01 and 0.03 respectively with no unit andL=255 is the di‐mension of the image.

In the achieved results (Fig.14),TGA2,as a cir‐cuit with the worst results,is indicated for other ref‐erences to be normalized.The proposed cells with a low number of transistors and appropriate PDP and PDAP have the best performance during the conducted image processing application.

Fig.14 FoM1 and FoM2 results for the approximate RCA with three approximate bits and different approximate full adders in image addition

Proposed-1 with 8% FoM1 and 3% FoM2,com‐pared to TGA2,is the best circuit.Proposed-2 has a similar performance regarding FoM2 compared to Proposed-1,while it has a similar result in compari‐son with AMA2 regarding FoM1.The results obtained show an appropriately established trade-off between circuitry performance and accuracy in the proposed cells for error-tolerant applications.

Approximate cells are usually connected as an RCA with a large number of input bits.In this regard,as a multibit evaluation of the proposed cells and those in the literature,they are used in RCA imple‐mentation with 8-bit,16-bit,and considering 50% ap‐proximate bits to evaluate them in real circumstances (Huang JQ et al.,2021).Simulations are performed,and the results in terms of FoM3 (Eq.(5)) consisting of circuitry and accuracy performance are reported (Sabetzadeh et al.,2019),as shown in Fig.15:

Fig.15 Eight‐ and 16‐bit RCA results versus FoM3 with 50% NABs

A design with a smaller FoM3 value reaches a better trade-off between hardware and accuracy.The simulations are performed at an operating frequency of 250 MHz,and the inputs are applied to the circuit to cover all possible states.Additionally,the circuits used in the simulations are all based on optimized values attained previously.Accordingly,the three pro‐posed circuits have better conditions in terms of PDP and PDAP.However,in some cases,the delay and NMED values of the proposed circuits are worse than those of the others.During the simulations,as the number of input bits increases,the delay of Proposed-1 increases,whereas this increment occurs for Proposed-2 and Proposed-3 concerning NMED.The reason for the poorer performance of Proposed-1 is voltage swing drop in its Sum.In general,the proposed circuits with better performance in terms of FoMs consist of different parameters,such as power,delay,PDP,and the number of transistors,as PDAP and NMED are more suitable metrics for use in more complex circuits.

4 Conclusions

In this paper,a new approach for designing approximate computing based arithmetic circuits is proposed.A reliable combination between CNTFETs and GDI is established as the principal technology and design technique.Three approximate full adders are proposed with a small area and a small number of transistors.Regarding performance optimization,as one of the main challenges in using CNTFETs,the NSGA-II algorithm is performed by considering the number of tubes and chirality vectors of transistors as objectives.The optimization results indicate an approxi‐mately 50% improvement in terms of power,delay,and PDP for some of the proposed cells compared to nonoptimized conditions.Additionally,lithography evaluation based on the Monte Carlo method is per‐formed,and the stability and reliability of the pro‐posed cells based on the GDI technique are approved.The results achieved are attributed to the dynamic threshold technique that is used for the transistors of the proposed cell.Investigations of error metrics in terms of the normalized mean error distance (NMED) and circuitry performance of the proposed cells,im‐plemented in a ripple-carry adder (RCA) under dif‐ferent numbers of approximate bits,are carried out using MATLAB and HSPICE.The results obtained in an error-tolerant application such as image addition indicate the appropriate performance of the proposed circuits on both circuitry and accuracy performance.

Contributors

Ayoub SADEGHI designed the research.Razieh GHASEMI and Hossein GHASEMIAN processed the data.Ayoub SADEGHI and Hossein GHASEMIAN drafted the paper.Hossein GHASEMIAN and Nabiollah SHIRI supervised the study and revised and finalized the paper.

Compliance with ethics guidelines

Ayoub SADEGHI,Razieh GHASEMI,Hossein GHASEMIAN,and Nabiollah SHIRI declare that they have no conflict of interest.

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Frontiers of Information Technology & Electronic Engineering2023年4期

Frontiers of Information Technology & Electronic Engineering的其它文章: Frequency–angle two-dimensional reflection coefficient modeling based on terahertz channel measurement?; Wavelength‐selective wavefront shaping by metasurface＊#; Dynamic power-gating for leakage power reduction in FPGAs#?; An array of two periodic leaky-wave antennas with sum and difference beam scanning for application in target detection and tracking?; Synchronization of nonlinear multi-agent systems using a non-fragile sampled data control approach and its application to circuit systems?; A distributed variable density path search and simplification method for industrial manipulators with end‐effector’s attitude constraints＊#

亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放