# **Application of Massively Parallel Processors to Real Time Processing of High Speed Images** YoungJoong Joo, Suzanne Fike, Kee Shik Chung Martin Brooke, Nan Marie Jokerst, D. Scott Wills School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, Georgia 30332 #### Abstract This paper demonstrates that the real-time utilization of image sequences, at frame rates far above what is currently possible, can now be achieved with an optically interconnected massively parallel processor. A focal plane processing chip with an on-chip array of sigmadelta analog to digital converter front ends under each pixel is presented. This two layer chip is a scaleable high frame rate image capture building block, however it requires a third layer of data processing to filter the sigma-delta front end data to obtain images. The use of an array of optically connected processors beneath the chip proves to be the best solution to this challenging data processing task. ### 1. Introduction To achieve image processing systems that operate, in real time, on large images with frame rates in the high kHz or MHz is beyond the capability of today's imaging systems. For example, a sigma-delta analog to digital converter (ADC) generating a sequence of 256x256 8 bit images at a frame rate of 400 kHz must be clocked at more than 1.7 THz. Even when parallel ADCs are placed along the edge of the imaging array the problem is only partially mitigated because the speed at which the ADCs must operate still increases with image size. To generate 256x256 8 bit images at a frame rate of 400 kHz 256 ADCs need to be clocked at more than 6.6 GHz. In this paper we will describe a chip that implements a fully parallel front end with one ADC per pixel. This provides a scaleable solution to the real time high frame rate image capture problem when coupled to a massively parallel optically interconnected processor. We show that to generate a sequence of 256x256 8 bit images at a frame rate of 400 kHz a 8x8 array of 102 MHz pipelined processors (each operating at 435 MIPS) is required. To keep the design scaleable the processors must reside beneath the imaging chip, which uses 3-D electrical interconnect for parallel connection to the detector plane. To connect to subsequent layers of processing, a through-substrate parallel optical data link is used. # 2. Architecture of focal plane readout systems In conventional focal-plane readout systems, a single ADC performs data conversion for the entire array. In this case, noise can be introduced in the long analog shift register chains that cannot be removed and will cause a reduction of dynamic range. In addition, the analog circuits in a serial ADC are required to operate with the highest bandwidth of all focal-plane components, since the conversion rate is the same as the pixel data rate. For the example of sigma-delta conversion of a 400 kHz frame rate 256x256 8 bit image sequence, if we assume an oversampling ratio of 32 then the sampling rate of the ADC front end must be more than 1.7 THz, so clearly a single ADC based architecture is not a good choice for a high frame rate image readout system. ### 2.1 On-focal-plane readout To improve the signal-to-noise ratio (SNR), serial onfocal-plane readout systems have been proposed as shown in Figure 1(a). This architecture removes pick-up and vibration sensitivity since no off-chip analog cabling is required [1]. However, the data rate required is not changed and noise can still be introduced in the analog shift registers between the focal plane and ADC. Semi-parallel architectures have been proposed [1], [2], [3] to overcome these problems as shown in Figure 1(b). The ADC conversion rate of this circuit is the much lower row read out rate. For our example of the sigmadelta conversion of a 400 kHz frame rate 256x256 8 bit image sequence, if we assume all 256 rows have an ADC attached in parallel as in Figure 1 (b), then the clock rate of the ADC front end drops by 256 to slightly more than 6.6 GHz. This is at least a feasible clock speed, although only heterojunction bipolar transistor (HBT) technology has demonstrated the capability to realize near commercial ADCs at these speeds, although not an array of 256 of them. Thus this is not currently a feasible architecture and still does not solve the problem of noise being introduced through the long analog lines between the pixels and the ADCs. Figure 1. Architecture of on-focal-plane readout system ## 2.2 Fully parallel on-focal-plane readout To overcome the limitations of serial and semiparallel readout, we have implemented a focal plane array (FPA) using a fully parallel readout architecture. Figure 2 shows the basic concept of this architecture. Each pixel has its own ADC. To keep the area of the ADC circuitry small only the front end of the sigma-delta converter is implemented on a per pixel basis. Since sigma-delta converters process only digital data after the front end, further noise cannot be introduced to the signal by shifting the row digital data. A digital signal processor is now needed to complete the ADC process. We propose that the pixels be grouped into subarrays each served by one digital signal processing (DSP) unit to perform the conversion. An integrated optoelectronic emitter on each sub array allows through-silicon wafer output of digital image data from the focal plane to the processor stacked below each subarray. This vertical coupling to the image plane allows the detector and processor arrays to be scaled while maintaining a fixed level of processing per pixel. The number of pixels included in the subarray is depend on the bandwidth of DSP circuits. An example of the type of processor proposed is the SIMPil processor as mentioned in [5], [6]. If an 8x8 subarray is used, the size of the processor and focal plane subarray seem to match reasonably well so that by tiling an 8x8 array of processor chips each containing 16 nodes, a 256×256 pixel resolution focal plane could be achieved. # 2.3 Massively parallel processor performance To achieve a 400 kHz frame rate each processor would need to process data at more than 820 Mbps (for an 8x8 subimage image oversampled by 32 at 400 kHz). If we assume a fully pipelined 8 bit processor clocked at 102 MHz (achieving more than 410 MIPS) a frame rate of only about 50 kHz is possible. To achieve a 400 kHz frame rate we need to assume that the first comb filter stage of the sigma-delta converter is also under each pixel. For our example, if we assume that an 4 tap comb filter (which is no more than a 2 bit counter) is added to each pixel, then the 435 MIPs processors need only implement the remaining 4 tap low pass filter to produce the 8 bit image output. This filter can be implemented by the proposed processor to achieve the required 400 kHz frame rate. Figure 2. Parallel on-focal-plane ADC # 3. Current input first order sigma-delta converter To implement the parallel on-focal plane readout system, a new ADC is needed which can be fit into each pixel with reasonable speed and dynamic range. This section presents the current input first order sigma-delta converter that satisfies this requirement. #### 3.1 Sigma-delta converter ADCs can be divided into two classes. One is the converters limited by component matching and the other is based on counting algorithms. Flash converters, pipeline converters, recursive converters and successive approximation converters are included in this first class. The accuracy of these converters is ultimately limited by the component matching. A solution to overcome this limit is to use ADCs based on counting algorithms. One of these converters is the dual slope converter. This converter can be very accurate but the conversion rate is too slow. For instance, to obtain 8 bit accuracy, the converter needs a clock 256 times faster than the sample rate. This implies that the conversion rate is 256 times slower than the internal clock frequency. Sigma-delta converters are another data converter based on a counting algorithm [7], [8]. For a given accuracy, the internal clock frequency of an sigma-delta converter can be much smaller than that of a dual slope ADC. For example while a dual slope ADC requires an internal clock frequency of 256 times the sampling rate to obtain 8 bit accuracy, an oversampling ratio of 32 [7] or more is sufficient for a first order sigma-delta converter to achieve 8 bit accuracy. Because of its efficient counting algorithm, the trade-off between speed and accuracy is more advantageous for an sigma-delta converter. In addition, the sigma-delta converter front end or modulator is small and low power. Only the modulator needs to be on the focal plane since its output is digital. Approximately 90% of the sigma-delta ADC circuitry is digital and can be located off the focal plane. This eases the problem of fitting the modulator into the allocated space under each pixel. #### 3.2 Current input sigma-delta converter The current input sigma-delta ADC consists of a modulator and a digital filter section. The modulator samples the analog input and develops a corresponding digital bit stream, and the digital filter compresses the bit stream into the Nyquist rate multibit codes and performs noise filtering. Figure 3 shows the proposed current input sigma-delta modulator. It is composed of four parts. The first is a current buffer which controls the bias voltage of the photodetector. The second is a current integrator which is implemented with a capacitor. The capacitor size is as large as possible for low 1/f noise. The third is a current D/A converter which provides a reference current to the input depending on the sign of output. The last stage is a comparator. Figure 4 shows the PSPICE simulation of this converter with a sinusoidal input. The digital output contains the analog input and must be filtered digitally to get a digital representation of the signal. Figure 3. First order sigma-delta modulator Figure 4. Simulation results of first order sigma-delta modulator # 3.3 8x8 array of sigma-delta converters with through wafer emitter driver Figure 5 shows a fabricated chip with an 8×8 array of ADCs, associated readout shift registers, and emitter driver. This chip sits above the DSP node that processes its data. Figure 6 (a) and (b) show the test results of this chip when electrical DC inputs are provided. The chip is functional and that with the addition of an array of detectors on top of each ADC and a DSP below, one tile of a scaleable high frame rate imaging system can be realized. Figure 5. Chip with $8 \times 8$ ADCs, shift registers, and emitter driver Figure 6. 8x8 ADC per pixel test results. # 4. Thin film detector integration To integrate the detector imaging array onto the silicon circuitry there are several options which can be explored. The detectors can be fabricated using monomaterial (single material) integration in the silicon CMOS, however, to achieve high fill factors and scaleable arrays the processing circuitry associated with each pixel must be minimized. An additional drawback to silicon detectors realized on a CMOS circuit process is the low absorption coefficient. The low absorption coefficient translates to low responsivity since the absorption length achievable in CMOS processes is small. Hybrid integration, or the bonding of a separately grown detector array on top of the silicon circuitry, is an attractive option for the integration of the detector array. These integration techniques include the bump bonding of arrays of detectors onto pads on the silicon circuit, which is currently a commonly used technique for fabricating infrared focal plane arrays [9], and, alternatively, thin film integration can be utilized, which is similar to bump bonding, save that the detector devices and bonds are very thin in comparison to bump bonds [10]. Hybrid integration is an attractive alternative to monomaterial integration for two reasons. First, the material which comprises the detector array is fabricated separately from the electronics. Because of this fact, the detectors need not be the same material as the electronics (Si), which enables the use of higher responsivity materials such as direct gap compound semiconductors. This also implies that the detector materials do not have to be lattice matched to the circuitry materials (for example, direct growth onto the silicon circuitry). The second advantage of hybrid integration is that the detectors are integrated directly on top of the silicon circuitry, which enables scalability with high fill factors and the direct interconnection of every detector to circuitry which lies beneath it. One disadvantage of using flip chip bonding for hybrid integration of detector arrays is that the substrate must be transparent. Thin film bonded detector arrays do not suffer from this limitation since the substrate is removed from the devices. There are two basic types of detectors which can be integrated with the silicon circuits: P-i-N and MSM (metal-semiconductor-metal) detectors. While MSM detectors are a viable option for integration since they are high speed, low capacitance detectors [11], they have a lower responsivity than the P-i-N detector and their geometry does not lend itself as well as the P-i-N to array integration. The MSM is a planar device which has two bonding pads in the same plane (the interdigitated fingers of the MSM). To achieve individually addressable pixels using MSMs, the spacing of the devices has to be large enough to allow space for the contact pads, thus decreasing the fill factor. The integration method used with the P-i-N results in a higher fill factor than that of the MSM. This increased fill factor comes from exploiting the non-planarity of the P-i-N structure. The P-i-N device contacts are on the top and bottom of the structure; this is useful in terms of the integration because the pixels can be individually addressed through the device side bonded to the silicon circuit and share a common top contact, eliminating the pad space between detectors. For example, to electrically connect an NxN array of P-i-N detectors only $N^2+1$ dedicated pads on the silicon circuit are necessary, with $N^2$ pads located underneath the detector array. Examples of hybrid integration of compound semiconductor detectors onto silicon circuits include GaAs MSMs [10], which detect at 850nm, and InP/InGaAs/InAlAs MSMs [10], which detect at 1.3 and 1.55 micron wavelengths. Arrays of GaAs/AlGaAs double heterostructure P-i-N detectors [12] have also been integrated directly on top of silicon circuits, as shown in the photomicrographs in Figure 7. Figure 7. Photomicrographs: (a) the unintegrated circuit as received from MOSIS, (b) the silicon circuit with a bonded thin-film array, (c) the integrated OEIC with a common top contact. Figure 7(a) is a photomicrograph of the unbonded silicon circuit as received from the MOSIS foundry. Figure 7(b) is a photomicrograph of the GaAs thin film detector array bonded onto the top of the circuitry, and Figure 7(c) shows the final integrated optoelectronic circuit, with the common top contact deposited onto the detector array, connecting the array to the silicon circuit. Each detector is individually interconnected to the silicon circuitry which lies beneath it through a metallized overglass cut on the circuit. #### 5. Conclusion We have shown that a massively parallel focal plane data processing architecture with sigma-delta modulators in each pixel, and massively parallel processing optically interconnected below the focal plane which can achieve high frame rate imaging for large images. The parallel processing image capture method described is scaleable which means it can be used to achieve arbitrarily high resolution without a degradation in frame rate. In addition, more optically connected parallel processor layers could be added to perform real time processing of the high frame rate images generated by the system. # 6. Acknowledgment This work is supported by a grant from the AFOSR (F49620-95-1-0246). #### 7. References - [1] Bedabrata Pain, Eric R. Fossum, "Approaches and analysis for on-focal-plane analog-to-digital conversion," Proceedings of SPIE The International Society for Optical Engineering, v 2226, pp. 208-218, 1994. - [2] Ulf Ringh, Christer Jansson, Kevin Liddiard, "Readout concept employing a novel on chip 16bit ADC for smart IR focal plane arrays," Proceedings of SPIE The International Society for Optical Engineering, v 2745, pp.99-110, 1996. - [3] Zhimin Zhou, Bedabrata Panicacci Pain, Barmak Nakamura Roger Mansoorian, Eric R. Junichi Fossum, "On-focal-plane ADC: recent progress at JPL," Proceedings of SPIE The International Society for Optical Engineering, v 2745, pp. 111-122, 1996. - [4] William J. Mandl, Carl Rutschow, "All-digital monolithic scanning readout based on sigma-delta analog-to-digital conversion," Proceedings of SPIE The International Society for Optical Engineering, v 1684, pp. 233-246, 1992. - [5] H. H. Cat., J. C. Eble, D. S. Wills, V. K. De, M. Brooke, N. M. Jokerst, Low Power Opportunities for a SIMD VLSI Architecture Incorporating Integrated Optoelectronic Devices, GOMAC'96 Digest of Papers, pages 59-62, Orlando, FL, March 1996. - [ 6] Huy H. Cat, D. Scott Wills, Nan Marie Jokerst, Martin A. Brooke, April S. Brown, "Three-dimensional, massively parallel, optically interconnected silicon computational hardware and architectures for high-speed IR scene generation," Proceedings of SPIE The International Society for Optical Engineering, v 2469, pp. 141-145, 1995. - [7] J. C. Candy, G. C. Temes, "Oversampling methods for data conversion," Proc 91 IEEE Pacific Rim Conf Commun Comput Signal Process, pp. 498-502, 1991. - [8] J. C. Candy, G. C. Temes, "A tutorial discussion of the oversampling method for A/D and D/A conversion," Proceedings IEEE International Symposium on Circuits and Systems, v 2. pp. 910-913, 1990. - [9] "Special issue on solid state image sensors," *IEEE Trans. on Elec. Dev..*, vol. 38, 1991. - [10] C. Camperi-Ginestet, Y.W. Kim, N.M. Jokerst, M.G. Allen, M.A. Brooke, "Vertical Electrical Interconnection of Compound Semiconductor Thin-Film Devices to Underlying Silicon Circuitry," *IEEE Photonics Technology Letters*, vol. 4, no. 9, pp 1003-1006, September 1992. - [11] Olivier Vendier, Nan Marie Jokerst, Richard P. Leavitt, "Thin Film Inverted MSM Photodetectors," *IEEE Photonics Technology Letters*, vol. 8, no. 2, pp 266-268, February 1996. - [12] S.M. Fike, B. Buchanan, N.M. Jokerst, M.A. Brooke, T.G. Morris, S.P. DeWeerth, "8x8 Array of Thin-Film Photodetectors Vertically Electrically Interconnected to Silicon Circuitry," *IEEE Photonics Technology Letters*, vol. 7, no. 10, pp 1168-1170, October 1995.