In the past few years, we have seen a perfect storm mixing the growth of two phenomena, a data deluge stemming from access to cheap sensing and computational equipment and the growth of scholarly publications. At the same time, there has been a near constant supply of reviewers. Open access to government funded work is the only short- and long-term policy decision that quickly enables a larger pool of quality reviewing capability aside from imposing reproducible research standards. In the end, it enables a more robust scientific process.
Introduction
With the advent of cheap high throughput equipment, we are seeing the emergence of what one would call "the curse of dimensionality", .i.e. the ability to produce cheaply large amount of data and the somewhat still limited ability to makes sense of them. This data deluge is, in turn, the primary reason behind the growth of the number of scholarly journals and journal articles over the recent few years.
Unfortunately, the pool of potential reviewers has remained about the same and has not caught up to the level needed to deal with these two growth factors. One can certainly wonder how this is having an impact on how Science is performed (i.e. judged). In particular, the growth of the number of journals has eventually yielded a reliance on a lower number of potential high quality reviewers per journals. More insidiously, the growth in data production and/or computational experiments has removed from most time-constrained reviewers the physical ability to take on real reviews.
Peer-Review
In light of this situation, the current response by non-profit and commercial publishing entities has been to exacerbate the problem by opening the gate for newer journals and conference venues instead of developing innovative processes to do the one function that is generally thought to be their value added to the process of scientific discovery: The management of the peer-review process. An item of considerable interest is the current lukewarm ability by publishers (commercial or non-profit) to deal pro-actively and fairly with retraction. In particular, there is currently no system in place for publications to address the fact that they may have referenced a recently retracted publication for instance.
Under a regime of government funded open access of publications, new or older players could change the way peer review is performed by enabling systems like a post-peer review capability. This is just an example but innovation has to enter this market in order for the different stakeholders to continue on producing high quality work, at the lowest price to the government.
Conclusions
The interest of the US Government to have open access of government funded work can be clearly delineated into the following reasons:
Open access opens the ability of non-time constrained post peer-review processes by a larger pool of reviewers, thereby enabling a more robust scientific discovery process.
Open access provides the ability for innovation in the marketplace by enabling new (commercial or non-profit) actors in the peer review process. The new players may provide the ability to create new opportunities that are currently seldom explored by the current landscape.
Open access potentially reduces some large cost to the government in its ability to deal effectively with past flawed work and attendant retractions. Some of these retracted works may have had broad policy implications.
Open access comforts the United States leadership in manners related to Science and Technology development.
I haven't had time to look back at the set up of the Sudoku problem as an underdetermined system of linear equations or if the solution has to be positive but I wonder where this 16/17 result recently found through brute force fit in this diagram. Do you know ?
What are the connection between the different problems of finding a defective coin or coins within a large set of known non defective ones when there is:
It could be coins, precious rings or balls, they all ask the same question. How do you assemble some measurement process in order to recover an ugly duckling ? More important, all these riddles are a handle: The story reminds you
that the act of finding a defective coin is not as important as how you find it,
Why is that important ? Being familiar with a particular problem sometimes hinders your ability to identify the big picture. You may have worked for years in coded aperture yet cannot make a clear connection between that and fertilizers and crops. You may know certain techniques with no firm theoretical grounding yet when you hear about compressive sensing, you know and find that there is indeed a connection. The handle acts as a Rosetta stone. One breakthrough in one field immediately translate into many other fields. Empirical methods that work all the time cannot remain empirical. Should you wait for this theoretical underpinning to come to fruition before you can publish ? No worry, the filter bubble of peer review make that decision for you
I'll feature this example in the Compressed Sensing games page. One of the things that came out of the discussion with Gabriel was that there is a set of problems that cannot seem to be solved by using the L1 /sparsity or the POCS solution. Unbeknownst to me, there are difficult Sudoku problems and it looks like that sparsity alone is not a good enough a constraint to provide a solution for some problems. such as the AL Escargot. I learn something everyday. If you recall Gabriel has released two other examples of compressive sensing already in his numerical tours:
Sunday, August 22, 2010 1:39 AM from Winnipeg
Re: the Kodak engineers produced the first digital camera...
Maybe a line from the technical report written at the time sums it up best: “The camera described in this report represents a first attempt demonstrating a photographic system which may, with improvements in technology, substantially impact the way pictures will be taken in the future"....But in reality, we had no idea.”
Nonsense. I had written them a letter (in the days before e-mail!) suggesting that they go one step further, mosaicing focal planes to select in-focus pixels and produce the first camera that could not take an out of focus picture. Got no reply. The other obvious problem was lack of self-confidence at Kodak: they made a digital back for a Nikon camera. Nikon is still around, and Kodak is just about dead. Moribund leadership for a once great company, I suspect.
I know a fellow at University of Manitoba who still has this Kodak unit.
Yours, -DickGordongordonr@cc.umanitoba.ca
In a different direction, somebody asked me to provide a hyperlink linking his/her name to his/her webpage. Please, if your name appears with no hyperlink or to the wrong one, let me know and I'll try to correct it asap. To give you an idea as to why it is important to be linked from Nuit Blanche, I usually receive some spam requests asking to include links on recent entries for free. I also get some amount of spam from people trying to use the comment section to l.ink back to their "content". Recently however, something new happened, I received an email from a gentleman who proposed to pay a hundred dollars to plant a link in an old entry of this blog... this is interesting on so many levels. With no further due and using you, the readers as filters, here is a set of meetings that are probably of interest to you: Pierre Vandergheynst let me know of a meeting in Marseille mostly in French on signal processing in biology. It is here. Mark Plumbley also let me know of the INSPIRE 2010 Conference in London.
INSPIRE 2010 Conference on information representation and estimation
Date, Time & Location
University College London
Monday 06/09/2010 - Wednesday 08/09/2010
09:00-18:00
Description
Mathematical methods for signal processing and in general for data representation and inference are growing more and more sophisticated. Successful applications of such methods range from medical imaging to security. Developments in mathematics and signal processing are however often divergent. Therefore, the main aim of this conference is to bring together signal processing researchers with statisticians and mathematicians working on problems related to data modelling and estimation, to encourage the exchange of ideas as well as to discuss theoretical underpinning of the methods and domains of application.
The INSPIRE 2010 conference will be held at the Anatomy JZ Young LT at University College London from September 6 till September 8, 2010.
Please visit the conference website to register: http://www.network-inspire.org/events/view/3
The Wired article on Compressed Sensing has started a series of ripples on the interweb that has reached the mighty Slashdot. While I wanted to address this issue a little later, I think it is important to make a point that something in the Wired article has been missed by many readers.
What is described in the Wired article is mostly equivalent to the diverse traditional post-processing steps involved in what is called inpainting. There is a large amount of work in that area but it is just post-processing. In other words, if your hardware has not been able to pick up on details in the first place then it really is nearly impossible to have the kind of superresolution commonly found in CSI and other TV shows (from the video it looks like the problem started with Blade Runner)
The MRI example of the Wired article uses random "pixels" in the Fourier domain,:these pixels are not in the real domain as the Obama picture seems to indicate. This distinction is simply fundamental. For compressed sensing to work, the scene of interest has to be sampled in some incoherent fashion and only specific hardware can do that. Your average digicam is not incoherent (or not incoherent enough) as much money has been spent on lenses to make sure that a point in the real world was translated into a point in the focal plane. In other words, if the scene was acquired coherently (this is why your hard earned cash pays for a spiffy DSLR or a high resolution webcam in the first place) then there is very little in terms of inpainting or superresolution one can do.
If, on the other hand, the lenses of your digicam were to be badly broken, then there would be a chance to obtain superresolution or inpainting and this is why I referred to compressed sensing as a potential CSI technology in a previous entry (Going Hollywood with Compressive Sensing). In that entry, I mentioned the fantastic random lens imager: Here is what one can see with 90 percent "pixels" gone, the 10% of the "pixels" left contain enough information to allow the reconstruction of a sensible image (see reconstructed picture below in the right hand corner):
What would a typical CSI scenario have to do to incorporate compressed sensing and be scientifically sound ?
Well for one, one could imagine a crime scene where a camera has been found and photos were taken by that camera with badly damaged lenses. The next step would be for the CSI team to take that camera and calibrate it by taking many known pictures. Compressed sensing helps there in making sure that they don't spend 200 years performing that calibration by reducing the number of known photos to be used as calibration in the lab. But while the imaging is indeed very appealing, compressed sensing also can be used to find elements in large crowds such as is the case with the Cylons and group testing. Yaniv Erlich's video presentation of the DNA sudoku paper mentioned earlier here also has potential. With some thoughts, we could even think of astounding sensing systems using nature :). All this to say that instead of being negative, compressed sensing can actually help in making scenari a whole lot more sci-fi and grounded in real science at the same time. I know a few accomplished scientists who were raised on Star Wars, Blade Runner or Minority Report.
Why am I writing this ?
I have already crossed the path of clients with CSI like expectations, I would like to contribute to making compressive sensing a reality without undue expectations.
If one of you write scripts for one of these TV shows, I can direct you to a willing academic on these matters or consult on these technical insights for a tiny fee.
Thank you to Jordan Ellenberg for getting people interested in the subject and for a great human interest story as well.
A while back, I had made a request to Julien Mairal and his co-authors about the need for their potentially very fast online dictionary learning toolbox to be available for win32 systems as some of us do not do Linux/Unix. Julien came through with his promise to look into it and sent me the following:
La methode dite de l'acquisition comprimee plus connue sous le vocable anglo-saxon de `Compressive Sensing' est une nouvelle methode qui permet de capturer et de retrouver par la suite un signal echantillonne a des frequences sous Nyquist. Afin de garantir la reconstitution parfaite du signal cette methode requiere la construction d'une matrice dite `sensing' matrice possedant des proprietes d'inversion particulieres. Ici, une construction de cette matrice a l'aide de sequences issues d'un systeme chaotique est proposee et il est prouve que cette matrice verifie avec une ecrasante probabilite (superieure a une construction aleatoire de type Gaussien) les proprietes de reconstruction requises.
Raghu Kainkaryam reminded me of how much time I personally waited to see the poolMC paper with the clear impact it will have on experimental pooling design and how it connects to compressive sensing:
Hi Igor,
Thanks for linking to poolMC. It's been a long time coming. Hope to get some valuable feedback from the CS community.
You know what they say about traditions in Aggieland: "Do something twice and it's a tradition." Mirroring last year's Thank you post, here is this year's. First, in 2009 the readership has increased and has mostly shifted from direct hits on the website to feedreaders.
The tone of the blog has probably changed a little as I mostly show new preprints without discussing them. As it stands, I try to balance the need to have a place where the latest and greatest is shared rapidly throughout the community and my need to digest all this information by trying to understand what is new and what is not. In order to digest this information, y'all are being amazingly kind to me with my dumb questions. I framed these into Q&A's and posted them on the blog, here is the list of the people who kindly obliged in this exercise:
It's Monday, before you gulp that morning cup of joe, you need to read this Fake Steve blog entry first otherwise you might be spilling all that coffee on your new LCD screen. Now that you are back, bada bing bada boom, here is the long post of the week. First, I'll be at MIA'09 and the hashtag for it on twitter will be #MIA09. Gabriel Peyre tells me there will be free wifi. woohoo.
Daryl Lim (who incidentally is applying to graduate school in the US in the Fall 2010) let me know the following:
Just letting you know that I found a couple errors in the Cosamp algorithm on Nuit Blanche (cosamp2.m in the MMA14 archive)
Before being fixed, it only worked for signals which were positive (like the test images) as a modulus function was performed in the least squares estimation step (which should not have been performed)
I have fixed it so it works for real valued signals now. Attached is the modified file.
By the way, your blog rocks.
The edited version is called cosamp3.m and is available here. Good call Daryl, now that you have been featured on Nuit Blanche, I am sure people will look at your graduate application with additional interest. By the way, I agree with you, Daryl, Nuit Blanche rocks!
Tanish Agrawal asked a very interesting question (that in my view is not being asked often):
As a part of my research work, I am working on compressed sensing for OFDM signals in UWB. I have been following some papers from Rice University DSP repository.
In particular, I have some doubts regarding the following paper: 'Beyond Nyquist: Efficient Sampling of Sparse Bandlimited Signals' by J. Tropp et al. They discuss about implementing a random demodulator in the paper. In the part where they propose a matrix for the accumulator and dump sampler (page 5), it works by adding consecutive entries for which one needs to have all the samples. So, how is it working at sub-Nyquist rate when we still need to collect all the samples to integrate them?
Could you please take out sometime in helping me understand this?
To what I responded:
The number of eventual measurements, i.e. linear combination of samples is much less than what the Nyquist rate would indicate. I agree with you that the signal is being sampled at a high rate.
The issue that needs to be understood is that there are instrumentation for which acquiring those linear combination of samples is easier than acquiring each and everyone of these samples.
In this case, the random modulator is used to obtain signals that are in the GHz range. While sampling at that high rate (i.e. take each sample and store them) is not currently feasible, it is however possible to flip the polarity of the sensors at a fast enough rate (GHz) to implement the modulator proposed in the paper. In effect, in this instance, Compressive Sensing enables a new capability that could not exist otherwise. I think Tanish's question is a central one in that using the term subsampling or sub-Nyquist is an impediment for understanding that the new "samples" are actually linear combination of "normal" samples and there is nothing magic: We cannot have superresolution with compressive sensing if the hardware does have the capability to "touch" that superresolution in the first place. Because there are few systems where this subsampling is an obvious enabler, we need to think of these new technologies that have not been implemented yet. It is also important to have a living document on the current technologies implementing compressive sensing explaining how different they are from the current state of the art. Thanks Tanish !
The performance of estimating the common support for jointly sparse signals based on their projections onto lower-dimensional space is analyzed. Support recovery is formulated as a multiple-hypothesis testing problem. Both upper and lower bounds on the probability of error are derived for general measurement matrices, by using the Chernoff bound and Fano's inequality, respectively. The upper bound shows that the performance is determined by a quantity measuring the measurement matrix incoherence, while the lower bound reveals the importance of the total measurement gain. The lower bound is applied to derive the minimal number of samples needed for accurate direction-of-arrival (DOA) estimation for a sparse representation based algorithm. When applied to Gaussian measurement ensembles, these bounds give necessary and sufficient conditions for a vanishing probability of error for majority realizations of the measurement matrix. Our results offer surprising insights into sparse signal recovery. For example, as far as support recovery is concerned, the well-known bound in Compressive Sensing with the Gaussian measurement matrix is generally not sufficient unless the noise level is low. Our study provides an alternative performance measure, one that is natural and important in practice, for signal recovery in Compressive Sensing and other application areas exploiting signal sparsity.
In this paper, we describe a novel design of a Peak-to-Average-Power-Ratio (PAPR) reducing system, which exploits the relative temporal sparsity of Orthogonal Frequency Division Multiplexed (OFDM) signals to detect the positions and amplitudes of clipped peaks, by partial observation of their frequency content at the receiver. This approach uses recent advances in reconstruction of sparse signals from rank-deficient projections using convex programming collectively known as compressive sensing. Since previous work in the literature has focused on using the reserved tones as spectral support for optimum peak-reducing signals in the time-domain [5], the complexity at the transmitter was always a problem. In this work, we alternatively use extremely simple peak-reducing signals at the transmitter, then use the reserved tones to detect the peak-reducing signal at the receiver by a convex relaxation of an other-wise combinatorially prohibitive optimization problem. This in effect completely shifts the complexity to the receiver and drastically reduces it from a function of N (the number of subcarriers in the OFDM signal), to a function of m (the number of reserved tones) which is a small subset of N.
We settle the 1-pass space complexity of (1 +- \epsilon)- approximating the Lp norm, for real p with 1 \lt p \lt 2, of a length-n vector updated in a length-m stream with updates to its coordinates. We assume the updates are integers in the range [-M;M]. In particular, we show the space required is O(\epsilon^(-2 )log(mM) + log log(n)) bits. Our result also holds for 0 \lt p \lt 1; although Lp is not a norm in this case, it remains a well-dened function. Our upper bound improves upon previous algorithms of [Indyk, JACM '06] and [Li, SODA '08]. This improvement comes from showing an improved derandomization of the Lp sketch of Indyk by using k-wise independence for small k, as opposed to using the heavy hammer of a generic pseudorandom generator against space-bounded computation such as Nisan's PRG. Our lower bound improves upon previous work of [Alon-Matias-Szegedy, JCSS '99] and [Woodru, SODA '04], and is based on showing a direct sum property for the 1-way communication of the gap-Hamming problem.
We discuss how recently discovered techniques and tools from compressed sensing can be used in tensor decompositions, with a view towards modeling signals from multiple arrays of multiple sensors. We show that with appropriate bounds on coherence, one could always guarantee the existence and uniqueness of a best rank-r approximation of a tensor. In particular, we obtain a computationally feasible variant of Kruskal’s uniqueness condition with coherence as a proxy for k-rank. We treat sparsest recovery and lowest-rank recovery problems in a uniform fashion by considering Schatten and nuclear norms of tensors of arbitrary order and dictionaries that comprise a continuum of uncountably many atoms.
In this work, we propose algorithms to recursively and causally reconstruct a sequence of natural images from a reduced number of linear projection measurements taken in a domain that is “incoherent” with respect to the image’s sparsity basis (typically wavelet) and demonstrate their application in real-timeMR image reconstruction. For a static version of the above problem, Compressed Sensing (CS) provides a provably exact and computationally efficient solution. But most existing solutions for the actual problem are either offline and non-causal or cannot compute an exact reconstruction (for truly sparse signal sequences), except using as many measurements as those needed for CS. The key idea of our proposed solution (modified-CS) is to design a modification of CS when a part of the support set is known (available from reconstructing the previous image). We demonstrate the exact reconstruction property of modified-CS on full-size image sequences using much fewer measurements than those required for CS. Greatly improved performance over existing work is demonstrated for approximately sparse signals or noisy measurements.
In this paper, we propose and study the use of alternating direction algorithms for several L_1-norm minimization problems arising from sparse solution recovery in compressive sensing, including the basis pursuit problem, the basis-pursuit denoising problems of both unconstrained and constrained forms, as well as others. We present and investigate two classes of algorithms derived from either the primal or the dual forms of the L_1-problems. The construction of the algorithms consists of two main steps: (1) to reformulate an L_1-problem into one having partially separable objective functions by adding new variables and constraints; and (2) to apply an exact or inexact alternating direction method to the resulting problem. The derived alternating direction algorithms can be regarded as rst-order primal-dual algorithms because both primal and dual variables are updated at each and every iteration. Convergence properties of these algorithms are established or restated when they already exist. Extensive numerical results in comparison with several state-of-the-art algorithms are given to demonstrate that the proposed algorithms are efficient, stable and robust. Moreover, we present numerical results to emphasize two practically important but perhaps overlooked points. One point is that algorithm speed should always be evaluated relative to appropriate solution accuracy; another is that whenever erroneous measurements possibly exist, the L_1-norm fidelity should be the fidelity of choice in compressive sensing.
YALL1 is available in Matlab. As most of you may already know, MATLAB 2009b has a potential serious bug. You can find a description here, but since this a serious problem relating to the use of the transpose sign which is very much used in compressive sensing solvers, let me copy the message here:
There seems to be a serious bug in the current Matlab release R2009b that occurs when solving a linear system of equations with the transpose. Here is a small example:
A=[ 0 2 ; 1 0 ]; b=[ 2 ; 0 ]; A'\b
ans =
0 1
whereas the correct solution is [ 0 ; 2 ]. The answer [ 0 ; 1 ] is the solution of A\b, so it appears that the transpose sign ' is ignored. However, the correct solution is returned when using
transpose(A)\b, A.'\b
instead of A'\b.
Interestingly, this bug does not occur for all matrices, for instance
A = [1 2 ; 1 1] A'\b
returns the correct result [ -2 ; 4 ].
The bug seems to be new in release R2009b and applies to Linux and Windows 32 and 64 bit versions. A bug report has been filed.
Look under MATLAB, in the Mathematics Category, Exists In R2009b.
MATLAB customers should feel free to Watch this Bug for any updates, or Subscribe to an RSS feed of a group of Bugs of interest.
This Bug Report details the circumstances required to trigger the bug.
The summary is that A must be the permutation of a full, square, non-scalar triangular matrix (but not exactly triangular) and must be used as A’\b in the solve.
The Bug Report does not yet reflect this, but the bug will be fixed in our next release of MATLAB.
The ease of image storage and transmission in modern applications would be unfeasible without compression, which converts high-resolution images into a relatively small set of significant transform coefficients. Due to the specific content of many real-world images they are highly sparse in an appropriate orthonormal basis. The inherent property of compressed sensing (CS) theory working simultaneously as a sensing and compression protocol, using a small subset of random incoherent projection coefficients, enables a potentially significant reduction in the sampling and computation costs of images favoring its use in real-time applications which do not require an excellent reconstruction performance. In this paper, we develop a Bayesian CS (BCS) approach for obtaining highly sparse representations of images based on a set of noisy CS measurements, where the prior belief that the vector of projection coefficients should be sparse is enforced by fitting directly its prior probability distribution by means of a Gaussian Scale Mixture (GSM). The experimental results show that our proposed method, when compared with norm-based constrained optimization algorithms, maintains the reconstruction performance, in terms of the reconstruction error and the PSNR, while achieving an increased sparsity using much less basis functions.
In the Sparsity in Everything series, a series of post that highlights sparsity and power laws in odd places for the purpose of shamelessly linking them back to compressive sensing, I mentioned the work of Michael Schmidt and Hod Lipson entitled Distilling Free-Form Natural Laws from Experimental Data that seemed to amount to a solving some sort of l_0 problem ( i.e. fitting the data/phenomena at hand with a combinatorial search using genetic algorithms.) Well they just came out with a GUI that allows anyone to try that experiment for themselves. It's called Eureqa. In Equation search part 1, Andrew Gelman just wrote in a string a data he sent and he's waiting for an answer. He might be waiting for a long time.... In the meantime, I am surprised to see that there is no response to the criticism made on the fitness function of the original article and now thisprogram. The criticism was brought forth here:
A recent paper [1] introduced the fascinating idea that natural symbolic laws (such as Lagrangians and Hamiltonians) could be learned from experimental measurements in a physical system (such as a pendulum). The learning is based on a genetic algorithm that employs a fitness function involving the pair-wise partial derivatives of the time-series measurements of state-space data coming from the system. While the general approach is original and convincing, we feel that the particular algorithm described in the paper cannot accomplish these goals. We present theoretical and experimental arguments for this belief and also explain some mathematically rigorous ideas that can be used within the authors’ framework to find symbolic laws. We first recapitulate the main steps in the methodology
And while you are waiting for Eureqa to converge (to the wrong solution ?), you might as well do two things:
1. Enjoy reading papers from NIPS 2009. They're out. Andrew Mcgregor points to some streaming papers. While reading some of them and being off-line, I was wondering aloud if, in some exploratory instance, we should not go about the following process to do compressive sensing: