An efficient counter-based Wallace-tree multiplier with a hybrid full adder core for image blending#

2022-06-30 05:58:58AyoubSADEGHINabiollahSHIRIMahmoodRAFIEEMahsaTAHGHIGH

Frontiers of Information Technology & Electronic Engineering 2022年6期

Ayoub SADEGHI，Nabiollah SHIRI，Mahmood RAFIEE，Mahsa TAHGHIGH

Department of Electrical Engineering，Shiraz Branch，Islamic Azad University，Shiraz 71987-74731，Iran

?E-mail:na.shiri@iau.ac.ir

Received Sept.10，2021;Revision accepted Dec.29，2021;Crosschecked Apr.26，2022

Abstract:We present a new counter-based Wallace-tree (CBW) 8×8 multiplier.The multiplier’s counters are implemented with a new hybrid full adder (FA) cell， which is based on the transmission gate (TG) technique. The proposed FA， TG-based AND gate， and hybrid half adder (HA) generate M:3 (4≤M≤7) digital counters with the ability to save at least 50% area occupation.Simulations by 90 nm technology prove the superiority of the proposed FA and digital counters under different conditions over the state-of-the-art designs. By using the proposed cells， the CBW multiplier exhibits high driving capability， low power consumption，and high speed. The CBW multiplier has a 0.0147 mm2 die area in a pad. The post-layout extraction proves the accuracy of experimental implementation. An image blending mechanism is proposed， in which a direct interface between MATLAB and HSPICE is used to evaluate the presented CBW multiplier in image processing applications. The peak signal-to-noise ratio(PSNR)and structural similarity index metric(SSIM)are calculated as image quality parameters，and the results confirm that the presented CBW multiplier can be used as an alternative to designs in the literature.

Key words:Full adder;Transmission gate;Counter;Multiplier;Three-dimensional layout;Image blending

1 Introduction

The multiplier is an important block in digital signal processors (DSPs)， central processing units(CPUs)， multimedia applications (Kumar and Sharma，2016; Fathi et al.， 2020)， as well as image processing where there is a focus on sharpening and smoothing(Momeni et al.， 2015). Different types of multipliers are necessary for larger systems like discrete cosine transform (DCT) and the sum of absolute difference(SAD).The structure of a three-stage multiplier consists of partial product (PP)， partial product reduction tree(PPRT)， and final addition. Depending on the input data，the PP size varies and commonly is implemented using two-input AND gates (Luo and Liu，2017;Fathi et al.， 2020; Jain and Pandey， 2021). However， techniques such as modified booth encoding (MBE) can also be used in this stage. It reduces the area， but causes problems in the next stages. Nevertheless， the higher the number of conventional AND gates in PP，the higher the accuracy of outputs by multipliers that can be obtained(Jain and Pandey，2021).The conventional and standard way of employing PPRT is using full adders(FAs)as an array formation.FAs show high propagation latency in cascaded structures， and they have been used less in recent years in the PPRT stage.Hence， low area occupation and low power dissipation of the used gates in this stage are the subjects that designers consider (Ranasinghe and Gerez， 2020).The performance of the gates in the PPRT stage can determine the performance of the whole system (Fathi et al.， 2020).To overcome the challenge of using FAs，digital compressors and counters have been suggested(Kumar and Sharma， 2016; Fathi et al.， 2020). Extra data compression may cause output errors， which will jeopardize the performance of the entire system(Tripathi et al.， 2013). Therefore， in multipliers， having a PPRT stage with the minimum output error and highest circuitry performance is of considerable interest. Finally， by applying ripple carry adders (RCAs)as the final addition stage after the PPRT stage， complexity is reduced(Fathi et al.，2020).

In this study，we present new structures for digital counters based on a new hybrid FA cell.The counters are 4:3， 5:3， 6:3， and 7:3， which are used for the implementation of a conventional 8-bit×8-bit counterbased Wallace-tree (CBW) multiplier. Also， a new structure as a multiplier is presented， in which the benefits of an appropriate combination of digital counters， FAs， and half adders (HAs) in the PPRT stage are exploited. Here， area occupation is significantly reduced to achieve lower power consumption than the conventional CBW. The proposed hybrid FA cell is based on the transmission gate (TG) technique and is low-power， high-speed， full-swing， and tolerant to different circumstances such as process-voltagetemperature (PVT) variations.These advantages make the proposed FA an efficient cell for implementation in the counters.In addition，in the proposed multiplier， the PP stage is implemented by the TG-based AND gate.

2 Literature review of full adders and counters

2.1 Full adders

FAs have three inputs，A，B， andCin(input carry)，and two outputs，Sum andCout.Power and area reduction is the challenge of FA designers. Fig. 1(Taheri et al.， 2016; Amini-Valashani et al.， 2018;Sadeghi et al.， 2020) shows two common schemes of FA implementation. In Fig. 1a， XOR-XNOR and two multiplexers (MUXs) produce the required outputs.Naseri and Timarchi (2018) and Safaei Mehrabani and Eshghi (2016) proposed their designs based on Fig.1a.Double-pass logic (DPL) FAs with 28 transistors(Aguirre-Hernandez and Linares-Aranda，2011)are reconfigured based on Fig. 1b， using XOR-XNOR and AND-OR gates at the first stage. The MUXs use the resulting signals of the first stages to control the Sum andCout.This structure has high power and low speed while occupying a large area. Some structures of FAs，like complementary CMOS(C-CMOS)(Chang et al.， 2004)， transmission function adder (TFA)(Zhuang and Wu，1992)，and transmission gate adder(TGA) (Weste and Eshraghian， 1993; Hasan et al.，2020)， suffer from high power and high delay because of an inappropriate established trade-off among critical design parameters.

Fig. 1 Different structures of full adders: (a) 3-module based on hybrid design; (b) 4-module based on Aguirre-Hernandez and Linares-Aranda(2011)’s logic

This study deals with a new hybrid 18-transistor FA cell with a proper trade-off among circuit parameters. The techniques used here help approach a lowpower， high-speed cell that has a suitable number of transistors and high driving capability for complicated structures like multipliers.

2.2 M:3(4≤M≤7)counters

The benefits of digital counters with 4 to 7 input bits in an 8×8 Wallace-tree multiplier are considered.Digital counters count the number of 1’s from the inputs. As discussed in the literature， a popular 4:3 counter design is to use an FA and two HAs in series.Unlike those designs， in this studyM:3 (4≤M≤7)counters are implemented by a new FA. A comprehensive comparison validates the efficiency of the proposed designs in terms of power and delay. The simplicity of the proposed cells gives a small-area design without considerable loss of information along with output swings.

3 Proposed circuits

3.1 Proposed hybrid full adder cell

A new hybrid FA with 18 transistors is proposed based on three modules (Fig. 2).A TG-based XOR-XNOR is the first module (Module-1) with 9 transistors. The XOR gate has low power since its outputs at different states are produced by specific transistors. WhenAB=00， only M5 is ON andBis transferred to the output by this NMOS， which can pass 0 appropriately. WhenAB=01 andAB=10， M3 and M4 are ON， but their influence on the output is negligible. In these two states， the PMOS transistors of M1 and M2 are sufficient to produce the output of XOR.

Fig.2 Proposed hybrid full adder

Finally，at the ON state of M3 and M4，whenAB=11， the invertedAis passed to the output. Therefore，the non-synchronous operation of transistors results in a low-power and high-speed XOR. Also， the higher the toggles of the transistors， the higher the power consumed; thus， the XOR of the proposed FA has very low power. In digital circuits， the power consumption is affected by the current drawn from the power supply(VDD).Dynamic power，includes a shortcircuit power that results when both the pull-up and pull-down networks are partially ON. Switching power is consumed by delivering energy to charge a load capacitance and dumping this energy to the ground(GND).The short-circuit power is strongly sensitive to the ratio ofV=Vth/VDD(Vthis threshold voltage). Therefore， a lack ofVDDand GND can reduce static power (short-circuit power) to nearly zero. This benefit is observed in the used XOR.

The dynamic power and short-circuit power can be expressed as follows:

whereCLandVDDare the load capacitance and power supply respectively，fswitchingis the operating frequency，andIVDDis the amount of current drawn from the power supply.

Less presence ofVDDand GND (only in the inverter of inputA) to produce output in the XOR is the main difference between this gate and those in the literature.This greatly reduces the static power of the circuit because there is a minimum direct path betweenVDDand GND， and the desired signals are produced by the inputs directly. Here， the required XNOR signal is produced using an inverter after the XOR.Although it increases the dynamic power of the whole circuit， some advantages are obtained， including high driving capability，high speed，and small area.

Novel in the proposed FA cell is that Module-2 consists of 6 transistors， M6 and M7 are TG-based with XOR and XNOR input signals，andCinis the selector to generate the Sum. When XOR and XNOR are 0 and 1，respectively，the Sum output swing is not sufficiently strong. In this case， two NMOS (M10 and M11) and two PMOS (M8 and M9) are used in series to speed up the transition of the Sum production. Like M3 and M4 in the XOR gate， M8 and M9 are activated when necessary. To evaluate the performance of Module-2 in terms of the ON/OFF states of transistors， theCinstate should be considered. WhenCin=0 andX=0 (Xˉ=1)， M8=M11=OFF， M9 and M10 states are insignificant and Sum is equal toCinthrough M6 and M7. On the other hand， whenCin=0 andX=1， M8 and M9 are ON and other transistors have no impact on the output; then Sum=X. This state produces the high logic states (1) of Sum with high driving capability， since M8 and M9 are both PMOS transistors. Conversely， whenCin=1， ifX=0，the Sum is equal toCin(through M6 and M7); ifX=0， then(through M10 and M11). Here， the low logic of Sum (0) has a strong swing because M10 and M11 are NMOS transistors.

Another advantage of using series transistors is strengthening the driving capability of the Sum when the FA is used in structures like multipliers where the high impedance of outputs will be avoided. Here the strong 0 and 1 can be obtained. Module-2 seems to have high power consumption despite the mentioned advantages of series transistors. To solve this problem， instead of usingVDDand GND in pull-up and pull-down networks， the XOR-XNOR signals are used mainly in the transistors to produce the desired signals. Therefore， the lack of presence ofVDDand GND in the circuit can be considered as one of the efforts to decrease power consumption. In the proposed circuit， except for the inverters，VDDand GND are not connected to the drain/source of the transistors. This technique is known as the floating design method (Shiri Asmangerdi et al.， 2012)， where internal nodes are not directly connected to the GND andVDD，while the essential signals for the outputs are the main input signals (A，B， andCin). In addition， by using this technique， it is possible to increase the number of transistors to achieve higher driving capability， while the power and delay are maintained in an appropriate range. The proposed circuit’s module， especially the XOR-XNOR cell， as the main module of the FA，uses this technique，so the dynamic power is decreased significantly. In this module，the number of NMOS transistors is greater than the number of PMOS transistors，so the need for using power supply in the pull-up and pull-down network is minimized. Consequently， the XOR-XNOR module has a high driving capability and low power consumption.So，the power again will be reduced while the speed of the signal transition is improved. By extracting the Boolean function of Module-2，it is illustrated that M6 to M11 are working as an MUX withCinas its selector:

On the other hand， only a gate-diffusion input(GDI) gate composed of M12 and M14 is used as Module-3. However， M13 as a single-swing restoration (SSR) transistor is added to Module-3 to resolve theCoutfailure when its voltage suffers fromVthdrop，one of the main defects of the GDI concept.Cout，based on Eq.(4)，is now sufficiently strong，especially for being employed in cascaded structures as RCAs.

In the proposed FA， the TG technique is used to generate XOR and Sum， while the GDI technique is applied to generateCout.Also，the use of relevant transistors，such as M3-M5，M8-M11，and M13，for output voltage enhancement in XOR，MUX，andCout，respectively， without significant impact on the delay of critical paths， can lower the power and maintain high speed simultaneously. Finally， the use of series pull-up and pull-down networks with series transistor structures in the XOR and MUX reduces the total load capacitance of the circuit. Lowering the total capacitance and raising the voltage swing on the outputs of the proposed FA， with the lack of direct paths ofVDDand GND on the final stages， can increase the driving capability and noise immunity， and save power.This makes this cell appropriate for implementation in more sophisticated structures.

3.2 Proposed hybrid half adder

Fig.3 shows the presented hybrid HA for the implementation of digital counters and consequently the multiplier.The HA consists of two gates，AND and XOR.The XOR is based on the one used in the FA cell while the AND is proposed based on the TG technique.The novelty of the HA is related to the AND gate.

Fig.3 Hybrid half adder

Unlike previous types of TG-based AND gates，in the proposed cell the inputs and number of inverters are different. Here， inputBactivates transistors，and inputAis passed to the output only when M1 and M2 are activated. Moreover， because of the circumstances of inputB， an inverter for inverting inputAis unnecessary. Thus， area and power are inherently saved.Also， the proposed AND is not working with four transistors (M1-M4) in the same state， but only one of the pair transistors (M1， M2) and (M3，M4) is active in the same state， producing the output.Based on a unique feature of the ON/OFF state of the TGs， the equivalent resistance of the path in the proposed AND is halved (Fathi et al.， 2020; Tirupathireddy et al.， 2021). This helps make data transmission faster than those in other methods like using CMOS and conventional TG-based ANDs， especially when the transistor sizing issues are in question.

3.3 M:3(4≤M≤7)counter implementations

By using the proposed FA and HA cells， different counters are implemented as shown in Fig. 4. FA circuits， also known as 3:2 counters， are the basis for a larger counter circuit structure. Using the proposed FA circuit，different counter cells with different dimensions are implemented.On the other hand，using HA as a 2:2 counter is necessary to achieve the mentioned goal. Different counters with different dimensions are offered in the literature (Veeramachaneni et al.，2007; Fritz et al.， 2017; Saha et al.， 2018)， and are considered in this paper.We aim to show that by considering the conventional structure of counters， like those which use FA and HA， counter circuits with different dimensions can be achieved with much better performance in terms of power， delay， powerdelay-product (PDP)， and occupied area. Here， the main aim is to implement circuits with the minimum area while other trade-offs for the design parameters are being established and considered very carefully.The proposed FA and HA cells have the capability of producing full-swing outputs. Generally， counters are used to count the number of 1’s of the inputs as shown in Table 1.

Table 1 Simplified truth table of a 4:3 counter

Fig.4a shows that the 4:3 counter consists of an FA and two HAs， whose outputs are generated based on Eqs.(5)-(7).The HAs generate the outputs.

Fig.4 M:3(4≤M≤7)counter implementations:(a)4:3;(b)5:3;(c)6:3;(d)7:3

Two carry outputs (Cout1andCout2) are generated by the second HA， while the Sum is produced by the first one.Also， because of the use of capable HAs， it is expected that the 4:3 counter will have full-swing outputs. Unlike the 4:3 counter， in Fig. 4b the 5:3 counter uses two FAs and an HA. Here， the MUX is used to generate Sum with a high driving capability because of the series pull-up and pull-down networks. The 4:3 and 5:3 counters have 44 and 49 transistors， respectively. The proposed structures of 6:3 and 7:3 counters are shown in Figs. 4c and 4d，respectively. They have 67 and 72 transistors respectively， and compared to their counterparts， the proposed cells benefit from a significant reduction in the number of transistors， which in turn leads to the reduction of power consumption. The proposed 6:3 counter contains three FAs and an HA， while the 7:3 counter is composed of only FAs. Having counter cells with the minimum area plays a very important role in the implementation of larger circuits such as multipliers. Hence， the proposed counters using the proposed FA and HA can achieve the desired important goal. This paper investigates the conventional structure of different counters based on a new highly efficient FA and HA. Different types of counters with different input dimensions are needed for the PPRT stage of a multiplier. The largest counter that can be used in the 8×8 multiplier structure is the 7:3 counter (implemented by FAs)， while the smallest is the 2:2 one，which is implemented by the HAs.

3.4 Conventional and proposed counter-based Wallace-tree multipliers

The effectiveness of the proposed FA and counters is evaluated by 8×8 CBW multipliers under two conditions. In these multipliers， inputs areA7toA0andB7toB0as 8-bit multiplicand and multiplier inputs， respectively. These produceP0toP15outputs.Fig. 5a shows the conventional structure of the 8×8 CBW multiplier. Here， the PP stage is formed using the explained 2-input TG-based AND， which has a considerable driving capability. Sixty-four ANDs are used for the first stage.In the second stage，counters of different sizes along with FAs and HAs are used for the PPRT. This stage is shortened until two rows remain，and then the final addition is performed by the RCA， which is constructed by the proposed FA and HA. Although different counters are used to reduce the complexity of the multiplier，it still suffers from a high number of transistors，1711.

Fig.5 Structure of conventional(a)and proposed(b)8×8 CBW multipliers

Thus， a new structure is proposed， as shown in Fig. 5b， similar to the conventional structure， but the PPRT stage is different.In the PPRT stage，instead of using 4:3 and 5:3 counters， FAs and HAs are used to reduce the number of transistors and power consumption. On the other hand， this technique may increase the delay of data transfer. However， power reduction can overcome this shortcoming and save energy and area. It is expected that the conventional structure will have higher power than the proposed configuration， because it uses a lower number of FAs in the PPRT stage and makes extra use of the 4:3 counter. For the PPRT stage of the proposed structure， the 4:3 counter is changed to an FA series with an HA， so the 44 transistors of the counter are reduced to 31，which also influences the total transistor reduction in the next levels.

3.5 Physical comparison between the conventional and proposed multipliers

Table 2 shows the comparison of gates used between the conventional and proposed CBW multipliers (Wallace， 1964). The PP stage is composed ofN2AND gates，whereNis the number of input bits.It results in 64 ANDs. Conventional and proposed configurations use 27 and 24 HAs， respectively. The usage of 6:3 and 7:3 counters is similar.

Table 2 Comparison of the conventional and proposed 8-bit multipliers

The proposed multiplier with 31 FAs and without 4:3 or 5:3 counters has a lower area， while its total number of gates and total number of PPRT gates are larger than those in the conventional design.By removing 4:3 and 5:3 counters in the proposed cell and adding more FAs， the proposed structure has 125 transistors fewer than the conventional structure.This reduction lowers the power and increases the speed. The fact is that using higher-order compression circuits such as 6:3 and 7:3 counters causes a higher speed and more ability to reduce area compared to the case where 4:3 and 5:2 counters are used(Momeni et al.， 2015). Therefore， in the proposed multiplier， especially on critical path delay， it is preferred to use FAs，and 5:3 and 6:3 counters instead of 4:3 and 5:3 ones.By choosing the proper combination of counters and FAs，the critical paths are reduced.

4 Simulation results and comparison

4.1 Basic simulation results

For a fair comparison， the simulations of all circuits are performed using the 90 nm CMOS model，BSIM4(level 54)version 4.4 by the HSPICE simulator according to the conditions of Table 3.Then power，delay， and PDP are reported. To involve the physical structure of the circuits， a figure of merit (FoM) as power-delay-area-product (PDAP) is used. Here， the proposed cell has lower power and PDP for both Sum andCout. The proposed FA has 1.5188 μW power，which shows 6.17% and 51.05% improvement compared to the TFA and new non-driving full adder(NEW-ND-FA)， respectively. The PDP-Carry of the proposed cell is 0.3036 fJ lower than that of HFA-17T as the second rank cell， which is the best result among the cited references.The proposed FA with 18 transistors has 57.800 fJ·μm2as PDAP， which constitutes 63.79%， and 3.26% improvements over TFA and HFA-17T with 16 and 17 transistors， respectively.According to Table 4， the simulations are carried out at a low frequency of 100 MHz where DC power consumption (as static power)， average dynamic power， worst-case delay， and worst-case PDP are extracted from 0.5 V to 1.3 V. The proposed circuit has the minimum static power (with an average value of 39.397 nW) compared to other cells during the considered intervals except for 0.5 V and 1.1 V，in which HFA-17T shows better results.

Looking at the worst-case delay in Table 4， it can be seen that theVthdrop in one of theCoutstates causes the delay to be increased.Even so，in this caseas well，during 0.5 V and 1.3 V，the proposed cell with 9.128 ns as the average delay has 27.86% lower speed compared to the hybrid circuit with a 7.139 ns average delay. Average dynamic power consumption and worst-case PDP results are shown in Figs. 6a and 6b， respectively. In terms of dynamic power consumption， the proposed circuit has the lowest rate of consumption regarding all intendedVDDvalues.Similarly， regarding the PDP only at 0.7 V， the hybrid FA has the lowest results， while for the other voltages，the proposed circuit has the best results. Note that the problem of threshold voltage drop existing in this output can be solved by adding an NMOS in parallel with M14 when low power supply applications are required for extra reliability.

Table 3 Full adder constant circumstances

Table 4 Static power and delay time extraction versus VDD variation

4.1.1 Full adder cell tolerability evaluation

The tolerance of the FAs is evaluated by the PVT for different corners of typical-typical (TT)， fast-fast(FF)，and slow-slow (SS)，along with the Monte Carlo method (MCM)， and the results are shown in Fig.7.

Fig.6 Average dynamic power(a)and worst-case PDP(b)versus VDD variation

Fig.7 Tolerability of the full adder cells against PVT variation for different corners

Here， the proposed circuit shows the maximum reliability and minimum sensitivity to considered variations under three corners. The PDP percentage of increase in the proposed circuit from TT to FF，TT to SS， and FF to SS is 34.08%， 51.23%， and 12.79%，respectively. The HFA-17T and hybrid circuits have the closest results to the proposed circuit for the mentioned corners. The outcomes confirm the reliability and superiority of the proposed FA for implementation in more complex circuits like counters and multipliers.

4.1.2 Counter cells

To investigate the performance of 4:3， 5:3， 6:3，and 7:3 counters in more critical situations，VDDvariation from 1 V to 1.4 V is applied (Fig. 8). Other parameters are adjusted as the frequency of 250 MHz，load capacitance of 1 fF， and the temperature is 25 ℃. Worst-case PDPs are extracted. According to Fig.8a，the proposed cell at 1.2 V as the nominal value ofVDDhas a minimum difference compared to that proposed by Mukherjee and Ghosal (2019)，which is based on the GDI. Due to the low number of transistors in Mukherjee and Ghosal (2019)， this circuit is the main competitor to the proposed cell; however， it has weak and unreliable outputs. Also， the CMOSbased references have considerable differences from the proposed cell. Based on the 5:3 counter results(Fig.8b)，the proposed cell with an average of 245.48 fJ as the PDP under three considered points experiences 28.34% and 50.16% improvements compared to those of Chowdhury et al.(2008) and Mehrabi et al.(2013)，with average results of 342.55 fJ and 492.57 fJ，respectively. A similar situation applies for the 6:3 counters (Fig. 8c). The proposed 6:3 counter consumes 124.07，355，and 515.97 fJ at 1，1.2，and 1.4 V，respectively. It shows better results than the other counters. These results show that the use of FAs in the counter can significantly reduce the number of transistors， area， and PDP. Finally， according to Fig. 8d，the results for the most important type of digital counter， the 7:3 counter， are extracted with a high compression rate along with a small area that can be chosen to be implemented in the PPRT stage.The results of the circuits proposed in this study，Veeramachaneni et al. (2007)， and Saha et al. (2018)(design 2)are close. The proposed cell has better results than two other cells. The presented circuits show a high and reliable capability in different voltage conditions and have stable dynamic power， resulting from the combination of techniques. The proposed 5:3， 6:3，and 7:3 counters have the minimum number of transistors， while for the 4:3 one， the circuits of Mukherjee and Ghosal (2019) has the minimum number of transistors，22.

Fig.8 Worst-case PDP results of 4:3(a)，5:3(b)，6:3(c)，and 7:3(d)counters against VDD variation

4.2 Layout considerations

The proposed multiplier structure in a real environment is checked with its layout in a pad with 36 input ports by Cadence Virtuoso based on 90 nm TSMC technology (Fig.9). A design rule check(DRC)， layout versus schematic (LVS)， and parasitic extraction (PEX) are performed. Then the layout is linked with Sonnet for the three-dimensional plot. In providing the layout， a pad is used with 82.5 μm，178.5 μm， and 0.0147 mm2as width， length， and total die area， respectively.Also， in Fig. 9， the dimensions of involved circuits in the multiplier including AND，F(xiàn)A， counters， and HA are provided. The greatest area occupation is for the 6:3 and 7:3 counters with 197.59 μm2.As shown in Fig.10，the post-layout simulation of the proposed FA at 500 MHz is performed and the accuracy of the circuit is confirmed.

Fig.9 Layout of the proposed multiplier

Fig.10 Post-layout waveforms of the proposed full adder cell

4.3 8×8 Wallace-tree multiplier results

To compare the conventional and proposed structures of the multiplier based on the proposed circuits， two references， Ref1 and Ref2， which are the main competitors for the proposed cells， are selected.Ref1 and Ref2 are implemented based on Figs.5a and 5b， respectively. In both references， for the implementations of FA，HA，and AND gates，the proposed cells are used， while the counters are different.In this regard，the best references for 4:3 to 7:3 counters are selected (Veeramachaneni et al.， 2007;Chowdhury et al.，2008;Fritz and Fam，2017;Mukherjee and Ghosal， 2019). Simulations are performed and the results are summarized in Table 5. Here， the proposed multiplier has the best results of power， delay，and PDP.Also，the minimum area with 1586 transistors belongs to the proposed structure. However， the conventional structure has a better performance than Ref1 and Ref2.

Normalized-PDAP measurements show the good effectiveness of the proposed multiplier with 29.79%，47.61%，and 67%superiority of PDAP over the conventional structure， Ref2， and Ref1， respectively. Also，the proposed multiplier has 7.31% (125 transistors)，19.7%(389 transistors)， and 24.22%(507 transistors)improvements in terms of transistor (area) reduction compared to the conventional one， Ref2， and Ref1， respectively. The post-layout results of the proposed multiplier are checked， conformity is proven， and parameters including power and delay are extracted based on Table 5.The existing differences of parameters are due to consideration of the internal and parasitic resistances and capacitances.

Table 5 Simulation results of multipliers

5 Image blending

One of the best possible ways for multiplier circuit assessment is their applications in image processing，like image blending.So far，several methods have been proposed for the multiplication of the image (Momeni et al.， 2015; Kumar and Sharma，2016). Here， the circuit implementation is involved with each of the pixels， and the operation of the multiplier is evaluated by the value of the output bits in the form of pixels of the output image resulting from the multiplier.In the realm of using multiplier circuits，the most important point to be considered is their hardware implementation (Taheri et al.， 2016). Hence， a new reliable mechanism based on Fig.11 is presented.

Fig.11 The proposed image processing mechanism for the image blending application

Fig. 12 shows the input and output images of the proposed image blending mechanism for different circuits. Based on the results of Table 6 on image quality assessments， the proposed multiplier has the best peak signal-to-noise ratio (PSNR) and structural similarity index metric(SSIM).The conventional multiplier with 2.95%and 0.85%lower quality ranks second.

Fig. 12 The proposed image blending mechanism: (a)-(d) applied input images in grey and binary version; (e) expected output image from MATLAB; (f) obtained output images by the proposed multiplier using the proposed cells; (g) output image of multiplier Ref1;(h)output image of multiplier Ref2

To demonstrate the established trade-off between multiplier efficiency and quality factors， FoM as given in Eq. (8) is used (Salmanpour et al.， 2021).The lower the FoM， the better the trade-off that can be obtained. It can be seen that the proposed circuit has the lowest FoM， which is 31.67% better than the conventional one and 57.61% better than the average of two reference circuits.

Also，using the proposed mechanism it is possible to analyze the values of the design parameters.Based on Table 6， the two proposed cells have much better conditions than the two reference circuits， especially the proposed multiplier which has 713.91 μW power and 177.04 pJ PDP.

Table 6 Multiplier performance for image blending

6 Conclusions

In this paper， a counter-based Wallace-tree(CBW) multiplier is illustrated which is reliable and has small area. Different digital counters have been implemented based on a new hybrid full adder (FA)whose main configuration uses the transmission gate(TG)， which saves about 50% area.The proposed FA with the TG technique and a minimum direct path from power supplies provides the full-swing output and a low short-circuit power， while the dynamic power and delay are much more retained than the state-of-the-art designs. By implementing the proposed FA and counters， along with a TG-based AND gate and hybrid half adder (HA)， in the proposed CBW multiplier， 0.0147 mm2as the total die area is attained under 90 nm TSMC technology. The results under different conditions show advantages such as low power， small area， and high energy-saving of the proposed FA， counters， and multiplier.The multiplier has been implemented in digital image processing for image blending， and the results have confirmed its efficiency.

Contributors

Ayoub SADEGHI and Nabiollah SHIRI designed the study.Ayoub SADEGHI，Mahmood RAFIEE，and Mahsa TAHGHIGH processed the data.Ayoub SADEGHI，Nabiollah SHIRI，Mahmood RAFIEE， and Mahsa TAHGHIGH drafted the paper.Nabiollah SHIRI revised and finalized the paper.

Compliance with ethics guidelines

Ayoub SADEGHI， Nabiollah SHIRI， Mahmood RAFIEE，and Mahsa TAHGHIGH declare that they have no conflict of interest.

Frontiers of Information Technology & Electronic Engineering2022年6期

Frontiers of Information Technology & Electronic Engineering的其它文章: Comment:New directions for artificial intelligence:human，machine，biological，and quantum intelligence*; Correspondence:A novel wideband ring antenna for polarization/pattern diversity*; Personal View:An evolutionary note on smart city development in China*; Target height and multipath attenuation joint estimation with complex scenarios for very high frequency radar*; Sensor-guided gait-synchronization lower-extremity-exoskeleton for potential application on unilateral knee-injured people*; Affine formation tracking control of unmanned aerial vehicles*

亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放