Magazine: Features
Generative AI for Solving Inverse Problems in Computational Imaging
FREE CONTENT FEATURE
Generative AI offers great promise for effectively solving challenging ill-posed inverse problems, transforming the way we measure and infer the physical world around us, and enabling exciting new user-centric capabilities.
Generative AI for Solving Inverse Problems in Computational Imaging
Full text also available in the ACM Digital Library as PDF | HTML | Digital Edition
In many engineering and scientific applications, we face the challenge of recovering some hidden signal from its low-quality measurements. The mathematical problem where one starts with the output of a degradation process (that is, low-quality measurements) and works backward to determine the underlying signal at the input of the degradation process is called an "inverse problem." Figure 1 shows several examples of inverse problems encountered in imaging applications. Solving inverse problems is pivotal, for instance, in medical imaging and geophysics, where direct measurement can be challenging. In particular, diagnosing certain medical conditions requires visual inspection of internal organs. Computed tomography (CT) is an imaging method based on non-invasive measurements of the amount of radiation going out of the body compared to the amount that is sent in to produce detailed images of the internal human body. Similarly, in geophysics, we cannot directly image the Earth's subsurface structures to explore rock formations or potential oil/gas reservoirs. Instead, this is done by sending seismic waves into the ground and analyzing the reflected waves that bounce back from different layers of the Earth. Inverse problems play a key role in our everyday lives; they are being solved regularly, not only in hospitals or research facilities but also on our smart-phones. Every time we take a picture or pick up a call, multiple inverse problems are being solved in the background. These improve the resolution, remove blur, and correct the colors in our acquired photo, as well as suppress background noise, undo reverberations, and cancel echoes in the audio signal we hear.
In most practical settings, the inverse problems we are interested in solving are "ill-posed," in the sense that many different signals (possibly an infinite number of them) may be plausible given the measurements. For instance, a 2D image of a thick biological sample captured by a microscope corresponds to the projection of the sample's 3D volume onto a 2D plane. Recovering 3D structures from 2D projections is an inverse problem that has multiple potential solutions because many different 3D structures can give rise to the same 2D image. Consequently, solving such problems requires advanced data processing algorithms that incorporate prior knowledge to narrow down the potential solutions.
Over the last few decades, researchers have developed various methods to integrate prior knowledge into the inverse problem-solving process. The early approaches relied on explicit assumptions on the signals of interest. A notable example is the sparse coding model, which posits that all signals within a specific domain (for example natural images) can be represented as sparse combinations of elements from a large "dictionary" of signals [1]. This concept is akin to the periodic table in chemistry, where a few basic elements (atoms) combine to form all possible molecules. In sparse coding, the dictionary elements represent fundamental building blocks from which all signals can be constructed.
Solving challenging inverse problems is at the heart of many tasks in science and engineering, and plays a key role in technologies that affect our everyday lives.
However, despite the success of explicit models, their application has been somewhat limited due to their limited expressive power and the effort required to tailor the algorithm to each specific inverse problem and signal domain. This lack of a generic recipe posed a significant limitation that restricted the broader adoption of such models and hindered their transferability.
The advent of artificial intelligence (AI) brought a new approach to solving inverse problems. The paradigm shift can be traced back to 2012, which saw the introduction of AlexNet—an artificial neural network for image classification that managed to significantly outperform all preceding classification methods. This breakthrough led researchers to harness deep neural networks for diverse tasks, including for solving inverse problems [2]. Unlike earlier approaches, AI methods learn priors from data and do not rely on explicit mathematical models that are hand-engineered by humans. The process is straightforward: Gather pairs of input and output examples relevant to the inverse problem and train a neural network to learn the mapping between them. This approach has been proven to be highly effective and flexible, significantly improving performance across various inverse problem domains.
While pushing the boundaries of what had previously been possible, the machine learning approach that attempts to map any degraded input into a single recovered output still has fundamental limitations. The reason we need to rely on prior knowledge is that the inverse problems we aim to solve are typically ill-posed. Nonetheless, even after taking into account such prior knowledge, we are often still left with a set of admissible solutions that comply with the measurements. This implies that any method that provides only a single output per input (whether it is based on AI or not) may deceive the user consuming its outputs because it does not report the inherent uncertainty that exists in its predictions. Such methods may therefore be of limited value when the predictions are to be used for driving scientific discovery or for influencing safety-critical decisions regarding patient health.
This is precisely where generative AI (GenAI) comes into play. GenAI models are a class of neural networks capable of sampling from a data distribution. These models are trained on a dataset of signals of interest, such that after training, they enable the sampling of new (artificial) signals with similar characteristics to the training distribution. For example, a generative model that has been trained on a dataset of face images can be used to sample new facial images of people that do not exist. These models have rapidly evolved over the years, and have recently reached excellent quality across multiple domains. Perhaps the most widely known model is the conversational chatbot Chat-GPT. However, very strong models also exist for modalities other than text, including images, audio, videos, and 3D data. For example, DALL-E is a model capable of generating images from textual descriptions.
Over the last few decades, researchers have developed various methods to integrate prior knowledge into the inverse problem-solving process.
How can GenAI be used for solving inverse problems? A generative model encapsulates the prior knowledge that we have on the signals of interest. However, when solving inverse problems, we are not interested in sampling generic signals from this prior distribution; rather we want to sample only signals that are plausible given the measurements. This approach, which is illustrated in Figure 2, is called "posterior sampling." The samples generated this way convey prediction uncertainty, offering the user a path to a more reliable measurement-driven decision. One way of obtaining a posterior sampler is by training a generative model that accepts the measurements as input. Such generative models are called "conditional" because they are conditioned on some input signal. Conditional generative models can achieve high quality, yet this approach requires training a separate model for every inverse problem we wish to solve. An alternative family of methods uses a generic (unconditional) generative model and modifies its sampling process such that it only generates samples from the posterior distribution [3, 4, 5]. This approach is often less accurate but it has the advantage that it does not require training a new model for every task.
Summarization and Visualization of Uncertainty
In principle, posterior sampling tells us everything that can be concluded about the hidden signal given the measurements. Indeed, by examining enough posterior samples, the user can theoretically grasp the range of possible solutions, as well as their likelihoods. However, in practice, posterior sampling has several limitations. First, the current state-of-the-art generative models require a significant amount of computing to produce high-quality samples. This is time-consuming and limits their utility on low-resource devices. Second, after a large enough set of predictions has been obtained, users are tasked with skimming through them to either confirm or refute whatever suspicion they may have about the underlying signal. This process can become tedious very quickly, especially if we consider for example a radiologist that has to do this task repeatedly many times a day. This problem cannot be overcome by simply showing fewer samples to the user, because there may be valid solutions that are significantly less likely than others and will thus be encountered only if we generate a large enough set. In safety-critical settings, one typically needs to examine all options before making an informed decision, even those that are less probable than others.
To overcome these limitations, several recent studies examined ways to summarize the set of admissible solutions into a more compact and user-friendly representation. Figure 3 shows several example approaches. One summarization strategy attempts to answer the following question: Suppose we have a budget of only K samples that we can show the user (K can be rather small, for example, 3 or 5); what would be the most informative such set of K representatives? Cohen et al. proposed a method for generating a small number of solutions, which are meaningfully diverse and cover the perceptual range of options for a given measurement [6]. As can be seen in Figure 3a, this approach summarizes the posterior distribution more efficiently than K random posterior samples. A naive way to obtain this reduced set is by first generating a large set of posterior samples and then choosing representatives from among them. However, this approach is computationally impractical. The method proposed by Cohen et al. modifies the sampling process of a pre-trained unconditional generative model such that it directly outputs only the small set of semantically diverse solutions.
Another recent approach by Nehme et al. proposed to summarize the set of possible solutions using a tree [7], as illustrated in Figure 3b. Trees are commonly used to represent and organize data in a way that is easy to navigate and search, and as such they have been deployed in diverse tasks across computer science. Using a tree to summarize the possible solutions to an inverse problem can significantly accelerate exploration. For example, it allows iterative user interaction, focusing user efforts on the examination of a small number of solutions (not necessarily the most likely ones) to explore only the ones relevant to some specific hypothesis about the signal.
The use of discrete structures like trees or a small set of representative solutions, inherently assumes the solutions could be categorized into a few clustered components, where each cluster of solutions can be summarized by one representative. However, in some cases, the variation between the different solutions is continuous and it makes more sense to present the different possibilities to the user using sliders depicting a continuum of options. Nehme et al. [8], Manor et al. [9], and Yair et al. [10] have explored this strategy by proposing methods to output both a mean solution and a set of directions along which the solution can vary in a meaningful manner. These approaches are illustrated in Figure 3c and 3d. The key idea in those methods is to predict the principal components of the posterior distribution. Principal components are a set of orthogonal directions in space along which the solutions vary the most. These directions are ordered such that the first direction captures the most dominant variations, the second direction corresponds to variations with the second largest variance, and so on. Therefore, to explore the set of possibilities, the user can scan a slider around the mean solution (taken as the middle point) to get a grasp of different possible solutions. Manor et al. and Yair et al. further complemented this set of sliders with an estimate of plausibility/likelihood (see Figure 3d) informing the user how likely the solution is for each slider position.
To summarize, solving challenging inverse problems is at the heart of many tasks in science and engineering, and plays a key role in technologies that affect our everyday lives. Traditionally, inverse problems were solved using sophisticated algorithms that were based on mathematical models and empirically derived signal assumptions. With the rise of AI methods, and specifically artificial neural networks, solving these tasks became conceptually easier with a unified solution framework that is transferable across signal domains and data types. Nonetheless, until successful generative AI methods came onto the scene, we were unable to handle prediction uncertainty. Recent years have seen a new surge of research aimed at harnessing the capabilities of generative AI to provide user-centered solutions.
Every time we take a picture or pick up a call, multiple inverse problems are being solved in the background.
Reliability and proper safeguards are key obstacles to the widespread adoption of generative AI models in practice, especially in safety-critical domains. Methods that address this challenge should not only focus on developing sound algorithms but should also take into account the user and properly factor in the convenience of a human interacting with the software. The recent advances in this direction pave a promising future for human-computer interfaces, with potential impact across the board.
Image Credits
Figure 1. GoPro; https://seungjunnah.github.io/Datasets/gopro Nah, S. et al. Deep multi-scale convolutional neural network for dynamic scene deblurring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2017, 3883–3891.
Figure 1. PIRM; https://pirm.github.io
Blau, Y. et al. The 2018 PIRM challenge on perceptual image super-resolution. In Proceedings of the European Conference on Computer Vision Workshops (ECCVW), 2018.
Figure 1. MRI.
Nickparvar, M. Brain tumor MRI dataset. Kaggle, 2021; 10.34740/kaggle/dsv/2645886. CC BY; https://www.kaggle.com/datasets/masoudnickparvar/brain-tumor-mri-dataset
Figure 1. CT scan of the thorax (axial, lung window). Ptrump16. Wikimedia. CC BY-SA; https://commons.wikimedia.org/wiki/File:CT-Thorax-5.0-B70f-Lungs.jpg
Figure 1. Virtual staining; https://github.com/whd0121/ImageJ-VirtualStain/tree/master/Data/Example_images Rivenson, Y. et al. Virtual histological staining of unlabelled tissue-autofluorescence images via deep learning. Nat Biomed Eng 3 (2019), 466–477; https://doi.org/10.1038/s41551-019-0362-y
Figure 1. CryoEM 101; https://cryoem101.org/chapter-5/
Figure 2. CelebAHQ; https://mmlab.ie.cuhk.edu.hk/projects/CelebA/CelebAMask_HQ.html
Liu, Z., Luo, P., Wang, X., and Tang, X. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 2015, 3730–3738.
Figure 3. CelebAHQ; https://mmlab.ie.cuhk.edu.hk/projects/CelebA/CelebAMask_HQ.html
Liu, Z., Luo, P., Wang, X., and Tang, X. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 2015, 3730–3738.
Figure 3. MNIST digits; https://yann.lecun.com/exdb/mnist/
LeCun, Y. et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11(1998), 2278–2324. CC BY-SA 3.0.
Figure 3. Bioimage translation; https://zenodo.org/records/3941889#.XxrkzWMzaV4
von Chamier, L. et al. Democratising deep learning for microscopy with ZeroCostDL4Mic. Nature Communications 12, 1 (2021), 2276. CC BY 4.0.
Figure 3. Kodak Lossless True Color Image Suite; https://r0k.us/graphics/kodak
Rich Franzen. Released by the Eastman Kodak Company for unrestricted usage.
[1] Elad, M. Sparse and Redundant Representations: From theory to applications in signal and image processing. Springer Science & Business Media, 2010.
[2] Ongie, G. et al. Deep learning techniques for inverse problems in imaging. IEEE Journal on Selected Areas in Information Theory 1, 1 (2020), 39–56.
[3] Kawar, B. et al. Denoising diffusion restoration models. Advances in Neural Information Processing Systems 35 (2022), 23593–23606.
[4] Chung, H. et al. Diffusion posterior sampling for general noisy inverse problems. In Proceedings of the 11th International Conference on Learning Representations. OpenReview, 2023; https://openreview.net/forum?id=OnD9zGAGT0k
[5] Zhao, Z., Ye, J.C., and Bresler Y. Generative models for inverse imaging problems: From mathematical foundations to physics-driven applications. IEEE Signal Processing Magazine 40,1 (2023), 148–163.
[6] Cohen, N. et al. From posterior sampling to meaningful diversity in image restoration. In Proceedings of the Twelfth International Conference on Learning Representations. OpenReview, 2024; https://openreview.net/forum?id=ff2g30cZxj
[7] Nehme, E., Mulayoff, R., and Michaeli, T. Hierarchical uncertainty exploration via feedforward posterior trees. arXiv:2405.15719 [cs.CV]. 2024.
[8] Nehme, E., Yair, O., and Michaeli T. Uncertainty quantification via neural posterior principal components. Advances in Neural Information Processing Systems 36 (2023), 37128–37141.
[9] Manor, H. and Michaeli T. On the posterior distribution in denoising: Application to uncertainty quantification. In Proceedings of the Twelfth International Conference on Learning Representations. OpenReview, 2024; https://openreview.net/forum?id=adSGeugiuj
[10] Yair, O., Nehme, E., and Michaeli T. Uncertainty visualization via low-dimensional posterior projections. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024), 11041–11051.
Elias Nehme is a Ph.D. candidate at the Technion–Israel Institute of Technology, working at the intersection of computational imaging and artificial intelligence. His research interests span the fields of computational photography, machine learning, and computer vision with an emphasis on optimal sensor design, 3D reconstruction, and uncertainty quantification.
Tomer Michaeli is an associate professor in the Faculty of Electrical and Computer Engineering at the Technion–Israel Institute of Technology. He completed his B.Sc. and Ph.D. degrees at the EE faculty of the Technion in 2005 and 2012, respectively. From 2012 to 2015 he was a postdoctoral fellow in the CS and applied math department at the Weizmann Institute of Science. In 2015 he joined the Technion as a faculty member. His research lies in the fields of computer vision and machine learning. He is the recipient of several awards, among which are the Krill Prize for Excellence in Scientific Research by the Wolf foundation (2020), the Best Paper Award (Marr Prize) at ICCV 2019, and the Alon Fellowship for Outstanding Young Scientists (2017–2019).
Figure 1. Inverse problems arise in various computational imaging tasks, ranging from standard photography to medical and scientific imaging.
Figure 2. The difference between "standard" AI and GenAI, illustrated in the task of image completion.
Figure 3. Different summarization and visualization techniques. (In memory of the late Nelson Ellis.)
Copyright is held by the owner/author(s). Publication rights licensed to ACM.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2024 ACM, Inc.