We have been able to optimize earlier work by Ling on integer adders  to avoid excessive fanout, yet preserve the speed advantage. We have also mapped the Ling approach over to CMOS technology [P13].
We have made basic improvements to the floating-point addition algorithm. Conventionally, floating-point addition consists of a subtraction of exponents, a shifting of fractions by an amount equal to the exponent difference, addition or subtraction of the fractions, a shifting of the result (on subtraction) to leave it in normalized form, and a rounding of the result. These steps are generally sequential, and require two shifts and three additions (the exponent subtraction, fraction addition, and rounding). Farmwald, in earlier work , showed that one of the shifts could be eliminated by creating two simultaneous paths, since a long preshift could not be accompanied by a long postshift following the fraction addition/multiplication. Quach has shown that it is possible to further improve the floating-point addition by integrating the rounding step with the final fraction addition [P14]. This is done by creating up to four different results and MUXing out the correct result. By careful management and integration of the paths, the overall hardware cost is only modestly increased over the state-of-the-art approach. A floating-point adder based upon this idea has been fabricated and is being tested. Using a conventional 1-micron CMOS, process, the floating-point addition delay is about 15-16 nanoseconds (IEEE standard floating-point addition).