Showing posts with label MLHardware. Show all posts
Showing posts with label MLHardware. Show all posts

Saturday, May 11, 2019

Saturday Morning Video: ISSCC 2019, Deep Learning Hardware: Past, Present, and Future, Yann LeCun

** Nuit Blanche is now on Twitter: @NuitBlog **

Yann LeCun mentioned his recent talk at ISSCC 2019 on his twitter feed.

Slides are here.

Lots of good lessons learned. Because I am involved with LightOn, I liked the last slide. I like challenges. 



Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv

Friday, April 12, 2019

Data-Driven Design for Fourier Ptychographic Microscopy

One of the things that has changed in the past two years is the interest of the community to build reconstruction solvers using Deep Neural Networks in what we used to call here Data Driven Sensor Design. Here is a new example below. Fascinating enough the demand for theoretical understanding that was asked in Ptychography seems to have vanished :-)





Fourier Ptychographic Microscopy (FPM) is a computational imaging method that is able to super-resolve features beyond the diffraction-limit set by the objective lens of a traditional microscope. This is accomplished by using synthetic aperture and phase retrieval algorithms to combine many measurements captured by an LED array microscope with programmable source patterns. FPM provides simultaneous large field-of-view and high resolution imaging, but at the cost of reduced temporal resolution, thereby limiting live cell applications. In this work, we learn LED source pattern designs that compress the many required measurements into only a few, with negligible loss in reconstruction quality or resolution. This is accomplished by recasting the super-resolution reconstruction as a Physics-based Neural Network and learning the experimental design to optimize the network's overall performance. Specifically, we learn LED patterns for different applications (e.g. amplitude contrast and quantitative phase imaging) and show that the designs we learn through simulation generalize well in the experimental setting. Further, we discuss a context-specific loss function, practical memory limitations, and interpretability of our learned designs.

h/t Michael's tweet.





Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Thursday, April 05, 2018

Processeurs optiques et traitement de données de grande dimension/ Optical Co-Processors and High Dimensional Data Processing, Paris, April 5, 2018

So today, we'll do a presentation of where we are at LightOn. Both Laurent and I will be speaking at the Paris Science and Data eventHere is the anouncement on Inria's website. Nicolas Keriven is one of one of our first alpha users of LightOn Cloud.




Paris Science & Data est une série d’événements organisés conjointement par le pôle Cap Digital, l’Inria et PSL, et destinés à présenter des recherches concernant la science des données, ainsi que leurs applications dans le monde académique et dans celui des entreprises.
Au programme de cette 8e conférence différents intervenants prendront la parole sur les sujets suivants :
  • From computational imaging to optical computing (Laurent Daudet - Professeur Paris Diderot/Institut Langevin & CTO LightOn)
  • Online sketches with random features (Nicolas Keriven - Chercheur ENS, CFM-ENS ''Laplace'' chair in Data Science)
  • Lighton : une nouvelle génération de coprocesseurs optiques (Igor Carron - CEO LightOn)




Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Saturday, December 16, 2017

Saturday Morning Video: Petascale Deep Learning on a Single Chip Speaker: Tapabrata Ghosh, Vathys


Tapa Ghosh of Vathys.ai is presenting a new technology that aims at dealing with the one bottelneck few people in the hardware for AI are not focusing on: Data movement. Give this man his funding !
(Unrelated: At LightOn, we solve the data movement in a different fashion)




Vathys.ai is a deep learning startup that has been developing a new deep learning processor architecture with the goal of massively improved energy efficiency and performance. The architecture is also designed to be highly scalable, amenable to next generation DL models. Although deep learning processors appear to be the "hot topic" of the day in computer architecture, the majority (we argue all) of such designs incorrectly identify the bottleneck as computation and thus neglect the true culprits in inefficiency; data movement and miscellaneous control flow processor overheads. This talk will cover many of the architectural strategies that the Vathys processor uses to reduce data movement and improve efficiency. The talk will also cover some circuit level innovations and will include a quantitative and qualitative comparison to many DL processor designs, including the Google TPU, demonstrating numerical evidence for massive improvements compared to the TPU and other such processors. 

h/t Iacopo and Reddit 




Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Thursday, October 12, 2017

PhD Thesis: Efficient Methods and Hardware for Deep Learning by Song Han

Congratulations Dr. Han !

The thesis is listed below but many of themes are part of lecture 15 of Stanford CS 231n's Convolutional Neural Networks for Visual Recognition (Spring 2017 ). Enjoy !




The future will be populated with intelligent devices that require inexpensive, low-power hardware platforms. Deep neural networks have evolved to be the state-of-the-art technique for machine learning tasks. However, these algorithms are computationally intensive, which makes it difficult to deploy on embedded devices with limited hardware resources and a tight power budget. Since Moore's law and technology scaling are slowing down, technology alone will not address this issue. To solve this problem, we focus on efficient algorithms and domain-specific architectures specially designed for the algorithm. By performing optimizations across the full stack from application through hardware, we improved the efficiency of deep learning through smaller model size, higher prediction accuracy, faster prediction speed, and lower power consumption. Our approach starts by changing the algorithm, using "Deep Compression" that significantly reduces the number of parameters and computation requirements of deep learning models by pruning, trained quantization, and variable length coding. "Deep Compression" can reduce the model size by 18x to 49x without hurting the prediction accuracy. We also discovered that pruning and the sparsity constraint not only applies to model compression but also applies to regularization, and we proposed dense-sparse-dense training (DSD), which can improve the prediction accuracy for a wide range of deep learning models. To efficiently implement "Deep Compression" in hardware, we developed EIE, the "Efficient Inference Engine", a domain-specific hardware accelerator that performs inference directly on the compressed model which significantly saves memory bandwidth. Taking advantage of the compressed model, and being able to deal with the irregular computation pattern efficiently, EIE improves the speed by 13x and energy efficiency by 3,400x over GPU.


Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Thursday, September 21, 2017

CSHardware: InView Multi-Pix Camera Demonstrates 1FPS SWIR Imaging



This is rare to see an embodiment in hardware of CS and DL ideas in the sensing area and in production. We mentioned the development at InView a while back, Here is a new announcement using compressive sensing technology and neural networks in the SWIR sensing realm. From the press release:
"...Having already harnessed the computational power of the famous Single-Pixel Camera architecture of the InView210 SWIR imager, InView has now enhanced its speed and image processing capability by incorporating a small array of pixels and new compressive computational methods. InView takes advantage of parallel measurements, matrix processing and efficient reconstruction algorithms to produce the highest resolution SWIR images at rates of just a few seconds per frame. As shown below, multi-pixel Compressive Sensing magnifies the resolution of a small pixel array. On the left, is a low-resolution image directly measured from a 64 x 64 InGaAs pixel array. When that same 64 x 64 array is used with compressive sensing, the image is transformed computationally into a detailed 512 x 512 image...."
The rest is here.


Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Friday, September 08, 2017

Super-Resolution Imaging Through Scattering Medium Based on Parallel Compressed Sensing / Cell Detection with Deep Convolutional Neural Network and Compressed Sensing / Exploit imaging through opaque wall via deep learning

So the great convergence between sensing and deep learning continues. The first paper is a potential improvement to our approach, the second mixes compressive sensing and Deep Learning while the last paper uses our approach and builds a deep learning solver (several other groups have done similar things in the past see some of the blog entries under the MLHardware hashtag). Enjoy !


Recent studies show that compressed sensing (CS) can recover sparse signal with much fewer measurements than traditional Nyquist theorem. From another point of view, it provides a new idea for super-resolution imaging, like the emergence of single pixel camera. However, traditional methods implemented measurement matrix by digital mirror device (DMD) or spatial light modulator, which is a serial imaging process and makes the method inefficient. In this paper, we propose a super resolution imaging system based on parallel compressed sensing. The proposed method first measures the transmission matrix of the scattering sheet and then recover high resolution objects by “two-step phase shift” technology and CS reconstruction algorithm. Unlike traditional methods, the proposed method realizes parallel measurement matrix by a simple scattering sheet. Parallel means that charge-coupled device camera can obtain enough measurements at once instead of changing the patterns on the DMD repeatedly. Simulations and experimental results show the effectiveness of the proposed method.


The ability to automatically detect certain types of cells in microscopy images is of significant interest to a wide range of biomedical research and clinical practices. Cell detection methods have evolved from employing hand-crafted features to deep learning-based techniques to locate target cells. The essential idea of these methods is that their cell classifiers or detectors are trained in the pixel space, where the locations of target cells are labeled. In this paper, we seek a different route and propose a convolutional neural network (CNN)-based cell detection method that uses encoding of the output pixel space. For the cell detection problem, the output space is the sparsely labeled pixel locations indicating cell centers. Consequently, we employ random projections to encode the output space to a compressed vector of fixed dimension. Then, CNN regresses this compressed vector from the input pixels. Using L1-norm optimization, we recover sparse cell locations on the output pixel space from the predicted compressed vector. In the past, output space encoding using compressed sensing (CS) has been used in conjunction with linear and non-linear predictors. To the best of our knowledge, this is the first successful use of CNN with CS-based output space encoding. We experimentally demonstrate that proposed CNN + CS framework (referred to as CNNCS) exceeds the accuracy of the state-of-the-art methods on many benchmark datasets for microscopy cell detection. Additionally, we show that CNNCS can exploit ensemble average by using more than one random encodings of the output space.



Imaging through scattering media is encountered in many disciplines or sciences, ranging from biology, mesescopic physics and astronomy. But it is still a big challenge because light suffers from multiple scattering is such media and can be totally decorrelated. Here, we propose a deep-learning-based method that can retrieve the image of a target behind a thick scattering medium. The method uses a trained deep neural network to fit the way of mapping of objects at one side of a thick scattering medium to the corresponding speckle patterns observed at the other side. For demonstration, we retrieve the images of a set of objects hidden behind a 3mm thick white polystyrene slab, the optical depth of which is 13.4 times of the scattering mean free path. Our work opens up a new way to tackle the longstanding challenge by using the technique of deep learning.


Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Wednesday, August 30, 2017

ProjectionNet: Learning Efficient On-Device Deep Networks Using Neural Projections

At our weekly meeting, Iacopo talked about this recent interesting preprint: 



Deep neural networks have become ubiquitous for applications related to visual recognition and language understanding tasks. However, it is often prohibitive to use typical neural networks on devices like mobile phones or smart watches since the model sizes are huge and cannot fit in the limited memory available on such devices. While these devices could make use of machine learning models running on high-performance data centers with CPUs or GPUs, this is not feasible for many applications because data can be privacy sensitive and inference needs to be performed directly "on" device.
We introduce a new architecture for training compact neural networks using a joint optimization framework. At its core lies a novel objective that jointly trains using two different types of networks--a full trainer neural network (using existing architectures like Feed-forward NNs or LSTM RNNs) combined with a simpler "projection" network that leverages random projections to transform inputs or intermediate representations into bits. The simpler network encodes lightweight and efficient-to-compute operations in bit space with a low memory footprint. The two networks are trained jointly using backpropagation, where the projection network learns from the full network similar to apprenticeship learning. Once trained, the smaller network can be used directly for inference at low memory and computation cost. We demonstrate the effectiveness of the new approach at significantly shrinking the memory requirements of different types of neural networks while preserving good accuracy on visual recognition and text classification tasks. We also study the question "how many neural bits are required to solve a given task?" using the new framework and show empirical results contrasting model predictive capacity (in bits) versus accuracy on several datasets.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Thursday, August 24, 2017

Towards a Deeper Understanding of Training Quantized Neural Networks / BitNet: Bit-Regularized Deep Neural Networks / Deep Binary Reconstruction for Cross-modal Hashing

There is a strong interest in the low bit side of the Force !




Towards a Deeper Understanding of Training Quantized Neural Networks by Hao Li, Soham De,  Zheng Xu, Christoph Studer, Hanan Samet, Tom Goldstein
Training neural networks with coarsely quantized weights is a key step towards learning on embedded platforms that have limited computing resources, memory capacity, and power consumption. Numerous recent publications have studied methods for training quantized networks, but these studies have been purely experimental. In this work, we investigate the theory of training quantized neural networks by analyzing the convergence properties of some commonly used methods. Our main result shows that training algorithms that exploit high-precision representations have an important annealing property that purely quantized training methods lack, which explains many of the observed empirical differences between these types of algorithms. 

We present a novel regularization scheme for training deep neural networks. The parameters of neural networks are usually unconstrained and have a dynamic range dispersed over the real line. Our key idea is to control the expressive power of the network by dynamically quantizing the range and set of values that the parameters can take. We formulate this idea using a novel end-to-end approach that regularizes the traditional classification loss function. Our regularizer is inspired by the Minimum Description Length principle. For each layer of the network, our approach optimizes a translation and scaling factor along with integer-valued parameters. We empirically compare BitNet to an equivalent unregularized model on the MNIST and CIFAR-10 datasets. We show that BitNet converges faster to a superior quality solution. Additionally, the resulting model is significantly smaller in size due to the use of integer parameters instead of floats.


With the increasing demand of massive multimodal data storage and organization, cross-modal retrieval based on hashing technique has drawn much attention nowadays. It takes the binary codes of one modality as the query to retrieve the relevant hashing codes of another modality. However, the existing binary constraint makes it difficult to find the optimal cross-modal hashing function. Most approaches choose to relax the constraint and perform thresholding strategy on the real-value representation instead of directly solving the original objective. In this paper, we first provide a concrete analysis about the effectiveness of multimodal networks in preserving the inter- and intra-modal consistency. Based on the analysis, we provide a so-called Deep Binary Reconstruction (DBRC) network that can directly learn the binary hashing codes in an unsupervised fashion. The superiority comes from a proposed simple but efficient activation function, named as Adaptive Tanh (ATanh). The ATanh function can adaptively learn the binary codes and be trained via back-propagation. Extensive experiments on three benchmark datasets demonstrate that DBRC outperforms several state-of-the-art methods in both image2text and text2image retrieval task.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Monday, July 31, 2017

Randomized apertures: high resolution imaging in far field

Using Glitter as a way to replace large structure mirrors for space telescopes: This is what is suggested and measured here. The random PSF allows for sharper resolution (and Machine Learning is used).This is another instance of the Great Convergence, woohoo ! ( and by the way, are we going to ever acknowledge that the Random Lens Imaging paper is one of the greatest preprint that did not make it into publication, ever ?)



We explore opportunities afforded by an extremely large telescope design comprised of ill-figured randomly varying subapertures. The veracity of this approach is demonstrated with a laboratory scaled system whereby we reconstruct a white light binary point source separated by 2.5 times the diffraction limit. With an inherently unknown varying random point spread function, the measured speckle images require a restoration framework that combine support vector machine based lucky imaging and non-negative matrix factorization based multiframe blind deconvolution. To further validate the approach, we model the experimental system to explore sub-diffraction-limited performance, and an object comprised of multiple point sources.







Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Wednesday, July 26, 2017

Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks

Here is a very interesting paper that uses only one sensor:

"... ECG data is sampled at a frequency of 200 Hz and is collected from a single-lead, non- invasive and continuous monitoring device called the Zio Patch which has a wear period up to 14 days (Turakhia et al., 2013)."
What is interesting is that it can do very well compared to the gold standard established by the researchers in the paper. The second aspect that is fascinating to me is the need for 34 convolutional layers: an architecture that would have been difficult to guess in the first place. 





Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks by Pranav Rajpurkar, Awni Y. Hannun, Masoumeh Haghpanahi, Codie Bourn, Andrew Y. Ng

We develop an algorithm which exceeds the performance of board certified cardiologists in detecting a wide range of heart arrhythmias from electrocardiograms recorded with a single-lead wearable monitor. We build a dataset with more than 500 times the number of unique patients than previously studied corpora. On this dataset, we train a 34-layer convolutional neural network which maps a sequence of ECG samples to a sequence of rhythm classes. Committees of board-certified cardiologists annotate a gold standard test set on which we compare the performance of our model to that of 6 other individual cardiologists. We exceed the average cardiologist performance in both recall (sensitivity) and precision (positive predictive value).




Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Tuesday, July 18, 2017

Object classification through scattering media with deep learning on time resolved measurement


I love it ! Calibration Invariant Imaging.




We demonstrate an imaging technique that allows identification and classification of objects hidden behind scattering media and is invariant to changes in calibration parameters within a training range. Traditional techniques to image through scattering solve an inverse problem and are limited by the need to tune a forward model with multiple calibration parameters (like camera field of view, illumination position etc.). Instead of tuning a forward model and directly inverting the optical scattering, we use a data driven approach and leverage convolutional neural networks (CNN) to learn a model that is invariant to calibration parameters variations within the training range and nearly invariant beyond that. This effectively allows robust imaging through scattering conditions that is not sensitive to calibration. The CNN is trained with a large synthetic dataset generated with a Monte Carlo (MC) model that contains random realizations of major calibration parameters. The method is evaluated with a time-resolved camera and multiple experimental results are provided including pose estimation of a mannequin hidden behind a paper sheet with 23 correct classifications out of 30 tests in three poses (76.6% accuracy on real-world measurements). This approach paves the way towards real-time practical non line of sight (NLOS) imaging applications.



Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

RLAGPU: High-performance Out-of-Core Randomized Singular Value Decomposition on GPU - implementation -

When the GPU cannot handle your randomized SVD:. 


RLAGPU: High-performance Out-of-Core Randomized Singular Value Decomposition on GPU by Yuechao Lu, Fumihiko Ino, Yasuyuki Matsushita and Kenichi Hagihara

Randomized Singular Value Decomposition (SVD)[1] is gaining attention in finding structure in scientific data. However, processing large-scale data is not easy due to the limited capacity of GPU memory. To deal with this issue, we propose RLAGPU, an out-of-core process method accelerating large-scale randomized SVD on GPU. The contribution of our method is as follows: l Out-of-core implementation that overcomes the GPU memory capacity limit. l High-performance. In-core and out-of-core routines switched automatically according to data size and available GPU memory. We found that our proposed method outperforms the existing cuBLAS-XT by a margin up to 50%

An implementation is here: https://github.com/luyuechao/


Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Monday, July 10, 2017

Understanding and Optimizing Asynchronous Low-Precision Stochastic Gradient Descent

Continuing the examination on SGD on the hardware side this time !


Understanding and Optimizing Asynchronous Low-Precision Stochastic Gradient Descent by Christopher De Sa, Matthew Feldman, Christopher Ré, Kunle Olukotun

Stochastic gradient descent (SGD) is one of the most popular numerical algorithms used in machine learning and other domains. Since this is likely to continue for the foreseeable future, it is important to study techniques that can make it run fast on parallel hardware. In this paper, we provide the first analysis of a technique called Buckwild! that uses both asynchronous execution and low-precision computation. We introduce the DMGC model, the first conceptualization of the parameter space that exists when implementing low-precision SGD, and show that it provides a way to both classify these algorithms and model their performance. We leverage this insight to propose and analyze techniques to improve the speed of low-precision SGD. First, we propose software optimizations that can increase throughput on existing CPUs by up to 11×. Second, we propose architectural changes, including a new cache technique we call an obstinate cache, that increase throughput beyond the limits of current-generation hardware. We also implement and analyze low-precision SGD on the FPGA, which is a promising alternative to the CPU for future SGD systems. 



Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Wednesday, July 05, 2017

Slides: BioComp Summer School 2017 on bio-inspired computing hardware


The good folks at the GdR BioComp (@GDRBioComp)organized what looked like an awesome BioComp Summer School 2017 that was to provide an introduction to subject areas around bio-inspired computing hardware. Here are the slides shown at the conference. Photos taken during the school are here.


Invited speakers and slides

  • Chloé-Agathe Azencott, Mines ParisTech, France, Machine Learning and applications to genomics  
  • Geoffrey W. Burr, IBM Almaden Research Center, USA, Memristive neuromorphic hardware 
  • Elisabetta Chicca, Cognitive Interaction Technology Center of Excellence, Bielefeld Universit, Germany, Neuromorphic electronic circuits for building autonomous cognitive systems 
  • Steve Furber, School of Computer Science, University of Manchester, UK, SpiNNaker: ARM system-on-chip architecture  
  • Vincent Gripon, IMT Atlantique, Lab Sticc, France, Indexing, storing and retrieving data in neural networks  
  • Konstantin K. Likharev, Department of Physics and Astronomy, Stony Brook University, USA, Nanoelectronic/Superconductive Neuromorphic Networks  
  • Jean-Pascal Pfister, University of Zurich and ETH, Switzerland, Learning and inference at the level of single synapses and spiking neurons  
  • Panayiota Poirazi, Computational Biology Laboratory, Heraklion, Greece, Computational neuroscience: the role of dendrites in learning and memory  
  • Damien Querlioz, C2N CNRS, France, Computing with unreliable nanodevices  
  • Gregor Schöner, Institut für Neuroinformatik, Germany, Cognition in embodied and situated nervous systems  
  • Rufin VanRullen, CerCo, France, Brain oscillations and perception  





Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Printfriendly