I am now a Senior Research Scientist at NVIDIA
, in the Mobile Visual Computing Team.
I received my Ph.D. in Computer Engineering from the University of California, Santa Cruz in the Summer of 2011, under the guidance of Prof. R. Manduchi
Before and during my studies at UCSC I have worked at Canesta (now acquired by Microsoft), the Nokia Research Center
, and in Dr. J.M. Miller's lab
at the Smith and Kettlewell Research Insitute.
My e-mail: orazio "at" soe "dot" ucsc "dot" edu
I love taking pictures. My Flickr page with my "less worse" shots is here
Stack-Based Algorithms for HDR Capture and Reconstruction
O. Gallo and P. Sen, Chapter in the book "High Dynamic Range Video: From Acquisition, to Display and Applications," Academic Press, 2016.
High-dynamic-range (HDR) images can be created with standard camera hardware by capturing and combining multiple pictures, each sampling a different segment of the irradiance distribution of a scene. This seemingly straightforward process involves several important steps, which will be the focus of this chapter. We start by examining the problem of selecting the set of exposures that properly measures the full dynamic range of a particular scene, a process known as metering for HDR. We then describe how to perform radiometric calibration, needed to estimate the incoming irradiance from the low-dynamic-range (LDR) images. After that, we offer an overview of methods to merge multiple LDR images into a single HDR image. Finally, we discuss methods to compensate for camera and scene motion, which would otherwise cause artifacts in the final HDR image.
Simulating the Visual Experience of Very Bright and Very Dark Scenes
D. Jacobs, O. Gallo, E. Cooper, K. Pulli, and M. Levoy, ACM ToG 2015
project page (with pdf, ppts, and bibtex)
The human visual system can operate in a wide range of illumination levels, due to several adaptation processes working in concert. For the most part, these adaptation mechanisms are transparent, leaving the observer unaware of his or her absolute adaptation state. At extreme illumination levels, however, some of these mechanisms produce perceivable secondary effects, or epiphenomena. In bright light, these include bleaching afterimages and adaptation afterimages, while in dark conditions these include desaturation, loss of acuity, mesopic hue shift, and the Purkinje effect. In this work we present new computational models for these effects and examine explicitly whether they can be used to extend the apparent dynamic range of a conventional computer display.
Locally Non-rigid Registration for Mobile HDR Photography
O. Gallo, A. Troccoli, J. Hu, K. Pulli, J. Kautz, Embedded Vision Workshop, CVPR 2015
project page (with pdf and ppt)
Image registration for stack-based HDR photography is challenging. If not properly accounted for, camera motion and scene changes result in artifacts in the composite image.
Unfortunately, existing methods to address this problem are either accurate, but too slow for mobile devices, or fast, but prone to failing. We propose a method that fills this void: our approach is extremely fast—under 700ms on a commercial tablet for a pair of 5MP images-and prevents the artifacts that arise from insufficient registration quality.
FlexISP: A Flexible Camera Image Processing Framework
F.Heide, M.Steinberger, Y.Tsai, M.Rouf, D.Pajak, D.Reddy, O.Gallo, J.Liu, W.Heidrich, K.Egiazarian, J.Kautz, K.Pulli, SIGGRAPH ASIA 2014.
Conventional pipelines for capturing, displaying, and storing images are usually defined as a series of cascaded modules, each responsible for addressing a particular problem. While this divide-and-conquer approach offers many benefits, it also introduces a cumulative error, as each step in the pipeline only considers the output of the previous step, not the raw sensor data. We propose an end-to-end system that is aware of the camera and image model, enforces natural-image priors, and directly generates a particular output representation (e.g., YUV, DCT), while jointly accounting for common image processing steps like demosaicking, denoising, deconvolution, and so forth. We demonstrate our approach on regular Bayer images as well as images from custom sensors.
HDR Deghosting: How to deal with Saturation?
J. Hu, O. Gallo, K. Pulli, and Xiaobai Sun, CVPR 2013.
project page (with pdf, bibtex, and *code*)
We present a novel method for aligning images in an HDR (high-dynamic-range) image stack to produce a new exposure stack where all the images are aligned and appear as if they were taken simultaneously, even in the case of highly dynamic scenes.
Our method produces plausible results even where the image used as a reference is either too dark or bright to allow for an accurate registration.
Retrieving Gray-Level Information from a Binary Sensor and its Application to Gesture Detection
O. Gallo, I. Frosio, L. Gasparini, K. Pulli, M. Gottardi, Embedded Vision Workshop, CVPR 2015
We report on the use of a CMOS Contrast-based Binary Vision Sensor (CBVS), with embedded contrast extraction, for gesture detection applications. The first advantage of using this sensor over commercial imagers is a dynamic range of 120dB, made possible by a pixel design that effectively performs auto-exposure control. Another benefit is that, by only delivering the pixels detecting a contrast, the sensor requires a very limited bandwidth. We leverage the sensor's fast 150us readout speed, to perform multiple reads during a single exposure; this allows us to estimate gray-level information from the otherwise binary pixels. As a use case for this novel readout strategy, we selected in-car gesture detection, for which we carried out preliminary tests showing encouraging results.
Dynamic Image Stacks
D. Jacobs, O. Gallo, and K. Pulli, Workshop on Mobile Vision, CVPR 2014.
project page (with pdf, bibtex, and ppt)
Since its invention, photography has been driven by a relatively fixed paradigm: capture, develop, and print.
Even with the advent of digital photography, the photographic process still continues to focus on creating a single, final still image suitable for printing. This implicit association between a display pixel and a static RGB value can constrain a photographer's creative agency.
We present dynamic image stacks, an interactive image viewer exploring what photography can become when this constraint is relaxed.
Many computational photography applications require the user to take multiple pictures of the same scene with different camera settings.
While this allows to capture more information about the scene than what is possible with a single image, the approach is limited by the requirement that the images be perfectly registered.
In a typical scenario the camera is hand-held and is therefore prone to moving during the capture of an image burst, while the scene is likely to contain moving objects. Combining such images without careful registration introduces
annoying artifacts in the final image.
This paper presents a method to register exposure stacks in the presence of both camera motion and scene changes.
When capturing a High-Dynamic-Range (HDR) scene with a stack of images, it is crucial to accurately select the exposure times of the different pictures in the stack, so as to optimize the quality of the resulting HDR and minimize the number of images acquired.
We propose a method to perform such selection based on the specific scene. The inset shows the effect on noise of an accurate selection consisting of only 3 images (our method, left) versus a geometric progression that covers the whole irradiance range with 5 images (right).
Reading Barcodes with Cellphones (project)
O. Gallo and R. Manduchi, WACV 2009 and PAMI 2011.
project page bibtex --- code now available!
Current camera-based barcode readers do not work well when the image has low resolution, is out of focus, or is motion-blurred. One main reason is that virtually all existing algorithms perform some sort of binarization, either by gray scale thresholding or by finding the bar edges. We propose a new approach to barcode reading that never needs to binarize the image. Instead, we use deformable barcode digit models in a maximum likelihood setting. We show that the particular nature of these models enables efficient integration over the space of deformations. Global optimization over all digits is then performed using dynamic programming.
Robust surface fitting to range data (project)
O. Gallo, R. Manduchi, Pattern Recognition Letters 2010 and Workshop on Time of Flight based Computer Vision, CVPR 2008.
project page bibtex
In this paper we present a study concerning the use of the Canesta TOF camera for recognition of curbs and ramps. Our approach is based on the detection of individual planar patches using CC-RANSAC, a modified version of the classic RANSAC robust regression algorithm. Whereas RANSAC uses the whole set of inliers to evaluate the fitness of a candidate plane, CC-RANSAC only considers the largest connected components of inliers. We provide experimental evidence that CC-RANSAC provides a more accurate estimation of the dominant plane than RANSAC with a smaller number of iterations.
J. Davis, J. Arderiu, H. Lin, Z. Nevins, S. Schuon, O. Gallo, and M. Yang, Worshop on Computer Vision with Humans in the Loop, CVPR 2010.
Computer-mediated, human micro-labor markets have so far been treated as novelty services good for cheaply labeling training data and easy user studies. This paper claims that these markets can be characterized as Human co-Processing Units (HPU), and represent a first class computational platform. In the same way that Graphics Processing Units (GPU) represent a change in architecture from CPU based computation, HPU-based computation is different and deserves careful characterization and study. We demonstrate the value of this claim by showing that simplistic HPU computation can be more accurate, as well as cheaper, than complex CPU-based algorithms on some important computer vision tasks.
High Dynamic Range (HDR) images can be generated by taking multiple exposures of the same scene. When fusing information from different images, however, the slightest change in the scene can generate artifacts which dramatically limit the potential of this solution. We present a technique capable of dealing with a large amount of movement in the scene: we find, in all the available exposures, patches consistent with a reference image previously selected from the stack. We generate the HDR image by averaging the radiance estimates of all such regions and we compensate for camera calibration errors by removing potential seams.
A camera-based pointing interface for mobile devices
O. Gallo, S. Arteaga, J. E. Davis, ICIP 2008.
pdf bibtex project page
As the applications delivered by cellular phones are becoming increasingly sophisticated, the importance of choosing an input strategy is also growing. Touch-screens can simplify navigation by far but the vast majority of phones on the market are not equipped with them. Cameras, on the other hand, are widespread even amongst low-end phones: in this paper we propose a vision-based pointing system that allows the user to control the pointer's position by just waving a hand, with no need for additional hardware.
Stability of gold bead tissue markers
J.M. Miller, E.A. Rossi, M. Wiesmair, D.E. Alexander, and O. Gallo, Journal of Vision, 2006.
Significant soft tissue features, in particular in the orbit, may not be resolved by MRI or any other imaging method. Spatial resolution may not be the only limitation: existing techniques lack the ability of providing information about shears or deformations within tissues. We describe a new method that uses micrometric gold beads as markers to visualize movements of such tissues with high spatial resolution (~100 µm) and moderate temporal resolution (~100 ms).
3-D video oculography in monkeys
L. Ai, O. Gallo, D.E. Alexander, J.M. Miller, ARVO 2006.
Eye tracking is crucial in many research areas. While there is a number of techniques that can detect horizontal and vertical position reliably, ocular torsion is still of difficult determination. Eye coils may represent a solution to this problem when avoiding implanting in the orbit is not crucial. Video oculography (VOG) can exploit the presence of iris crypts to estimate torsion but is unreliable when said texture is too smooth, such as in the case of primates. We propose to use VOG in combination with scleral markers of different shapes and locations to improve robustness.
Combining conspicuity maps for hROIs prediction
C.M. Privitera, O. Gallo, G. Grimoldi, T. Fujita, and L.W. Stark, Workshop on Attention and Performance in Computational Vision, ECCV 2004. pdf
When looking at an image, we alternate rapid eye movements (saccades) and fixations; the resulting sequence is referred to as scanpath and is crucial for us to perceive a much larger region of the scene than the area that projects on the fovea. In this paper we show that scanpaths exhibit a degree of predictability that can then be exploited for porpuses such as smart image compression. This paper shows that it is possible, for a given class of images, to define an optimal combination of image processing algorithms to improve the prediction of the fixations loci.