SlideShare a Scribd company logo
Universal Approximation Property via
Quantum Feature Maps
Quoc Hoan Tran*,Takahiro Goto*, and Kohei Nakajima
Physical Intelligence Lab, UTokyo
QTML2021
arXiv:2009.00298
Phys. Rev. Lett. 127, 090506 (2021)
(*) Equal contribution
The future of intelligent computations
2/20
Bits
Qubits
Neurons Quantum
mind
Quantum computing
High Efficiency + Fragile
Neural networks
Versatile + Adaptive + Robust
Two rising fields can be complementary
Converge to an adaptive platform:
Quantum Neural Networks
QNN via Quantum Feature Map[1,2]
3/20
[1] V. Havlíček et al., Supervised Learning with Quantum-Enhanced Feature Spaces, Nature 567, 209 (2019)
[2] M. Schuld and N. Killoran, Quantum Machine Learning in Feature Hilbert Spaces, Phys. Rev. Lett. 122, 040504 (2019)
[3] Y. Liu et al., A Rigorous and Robust Quantum Speed-Up in Supervised Machine Learning, Nat. Phys. 17, 1013 (2021)
Input Space 𝒳
Quantum
Hilbert space
𝒙
|Ψ 𝒙 ⟩
Access via Measurements
or Compute Kernel
Or at least the same
expressivity as classical
𝜅 𝒙, 𝒙′ = ⟨Ψ(𝒙)|Ψ(𝒙′)⟩
Heuristic
Quantum Circuit Learning
(K. Mitarai, M. Negoro,M. Kitagawa,
and K. Fujii, Phys. Rev.A 98, 032309)
◼ For a special problem, a formal quantum advantage[3] can
be proven (but is neither “near-term” nor practical)
◼ We expect quantum feature maps to be more
expressive than classical counter parts
Universal Approximation Property (UAP)
4/20
𝑥1
𝑥2
𝑤1
𝑤2
𝑤𝐾
𝑦
◼ Feed-forward NN with one hidden layer can approximate any
continuous function* on a closed and bounded subset of ℝ𝑑
to
arbitrary 𝜀 (Hornik+1989, Cybenko+1989)
* This belongs to a larger class of any Borel measurable function (beyond this talk)
➢ Given enough hidden units with any
Squashing Activation Functions (Step,
Sigmoid,Tanh, Relu)
➢ Extensive extensions to non-continuous
activation function, resource evaluation
UAP of Quantum ML models
5/20
Data Re-uploading
Adrián Pérez-Salinas et al., Data re-uploading for a universal quantum
classifier, Quantum 4, 226 (2020)
Expressive power via a partial
Fourier series
M, Schuld, R. Sweke and J. J. Meyer, Effect of data encoding on the expressive power
of variational quantum-machine-learning models, Phys. Rev.A 103, 032430 (2021)
𝑓𝜽 𝒙 = ෍
𝜔∈Ω
𝑐𝜔 𝜽 𝑒𝑖𝜔𝒙
Our simple but clear questions
◼ UAP in the language of observables
and quantum feature maps
◼ How well the approximation could
perform?
◼ How many parameters it needs (vs.
classical NNs)?
QNN via Quantum Feature Map
◼ Choosing W is equivalent to selecting suitable observables
6/20
◼ The output is a linear expansion of the basis functions with a
suitable set of observables
𝜓𝑖 𝒙 = Ψ 𝒙 𝑂𝑖 Ψ 𝒙 𝑓 𝒙; Ψ, 𝑶, 𝒘 = ෍
𝑖=1
𝐾
𝑤𝑖𝜓𝑖 𝒙
[1]V. Havlíček et al., Supervised Learning with Quantum-Enhanced Feature Spaces, Nature 567, 209 (2019)
𝑤𝛽 𝜽 = Tr 𝒲† 𝜽 𝑍⊗𝑁𝒲 𝜽 𝑃𝛽 𝑃𝛽 ∈ 𝐼, 𝑋, 𝑌, 𝑍 ⊗𝑁
𝜓𝛽 𝒙 = Tr Ψ 𝒙 Ψ 𝒙 𝑃𝛽
Pauli-group
Basis functions (expectation)
E.g., Decision function[1] sign
1
2𝑁 ෍
𝛽
𝑤𝛽 𝜽 𝜓𝛽 𝒙 + 𝑏
UAP via Quantum Feature Map
𝑓 𝒙; Ψ, 𝑶, 𝒘 = ෍
𝑖=1
𝐾
𝑤𝑖𝜓𝑖 𝒙 ≈ 𝑔 𝒙 ?
Given a compact set 𝒳 ∈ ℝ𝑑
, for any
continuous function 𝑔: 𝒳 → ℝ, can we
find 𝚿, 𝑶, 𝒘 s.t
7/20
UAP. Let 𝒢 be a space of continuous functions 𝑔: 𝒳 → ℝ.The quantum feature framework
ℱ = {𝑓 𝒙; Ψ, 𝑶, 𝒘 } has the UAP w.r.t. 𝒢 and a norm ⋅ if given any 𝑔 ∈ 𝒢; then for any
𝜀, there exists 𝑓 ∈ ℱ s.t. 𝑓 − 𝑔 < 𝜀
Classification capability. For arbitrary disjoint regions 𝒦1, 𝒦2, … , 𝒦𝑛 ⊂ 𝒳, there exists
𝑓 ∈ ℱ s.t. 𝑓 can separate these regions (can be induced from UAP)
𝒳
ℝ
𝒦1
𝒦2
𝑓 ∈ ℱ
UAP - Scenarios
Parallel
Sequential
◼ Encode the “nonlinear” property in the basis functions for UAP
1. Using appropriate observables to construct a polynomial
approximation
2. Using appropriate activation functions in pre-processing 8/20
Parallel scenario
Approach 1. Produce the “nonlinear” by suitable observables
◼ Basis functions can be polynomials if we choose the
correlated measurement operators
◼ Then we apply the Weierstrass’s polynomial approximation
theorem
𝑈𝑁 𝒙 = 𝑉1 𝒙 ⊗ ⋯ ⊗ 𝑉𝑁 𝒙
𝑉
𝑗 𝒙 = 𝑒−𝑗𝜃𝑗 𝒙 𝑌 𝜃𝑗 𝒙 = arccos( 𝑥𝑘) for 1 ≤ 𝑘 ≤ 𝑑, 𝑗 ≡ 𝑘(mod 𝑑)
𝑂𝒂 = 𝑍𝑎1 ⊗ ⋯ ⊗ 𝑍𝑎𝑁, 𝒂 ∈ 0,1 𝑁
Observables
Basis functions 𝜓𝒂 𝑥 = Ψ𝑁 𝒙 𝑂𝒂 Ψ𝑁 𝒙 = ෑ
𝑖=1
𝑁
2𝑥[𝑖] − 1
𝑎𝑖
Feature map: Ψ𝑁(𝒙) = 𝑈𝑁 𝒙 0 ⊗𝑁
1 ≤ 𝑖 ≤ 𝑑
𝑖 ≡ 𝑖 (𝑚𝑜𝑑 𝑑)
9/20
Parallel scenario
Approach 1. Produce the “nonlinear” by suitable observables
Lemma 1. Consider a polynomial 𝑃(𝒙) of the input 𝒙 = 𝑥1, 𝑥2, … , 𝑥𝑑 ∈ 0,1 𝑑 where the
degree of 𝑥𝑗 in 𝑃(𝒙) is less than or equal to
𝑁+(𝑑−𝑗)
𝑑
for 𝑗 = 1, … , 𝑑 𝑁 ≥ 𝑑 ; then there exists a
collection of output weights {𝑤𝒂 ∈ 𝑅|𝒂 ∈ 0,1 𝑁} s.t.
෍
𝒂∈ 0,1 𝑁
𝑤𝒂 𝜓𝒂 𝒙 = 𝑃(𝒙)
Result 1. (UAP in the parallel scenario). For any continuous function 𝑔: 𝒳 → 𝑅
lim
𝑁→∞
min
𝒘
෍
𝒂∈ 0,1 𝑁
𝑤𝒂𝜓𝒂 𝒙 − 𝑔(𝒙)
∞
= 0
※The number of observables ≤ 2 +
𝑁−1
𝑑
𝑑
10/20
Parallel scenario
Approach 2. Activation function in pre-processing (straight-forward)
[Huang+, Neurocomputing, 2007] 𝜎: 𝑅 → [−1,1] is nonconstant piecewise continuous function
such that 𝜎(𝒂 ⋅ 𝒙 + 𝒃) is dense in 𝐿2(𝑋).Then, for any continuous function 𝑔: 𝑋 → 𝑅 and any
function sequence 𝜎𝑗 𝒙 = 𝜎 𝒂𝑗 ⋅ 𝒙 + 𝑏𝑗 where 𝒂𝑗, 𝑏𝑗 are randomly generated based on any
continuous sampling distribution,
lim
𝑁→∞
min
𝒘
෍
𝑗
𝑁
𝑤𝑗𝜎𝑗 𝒙 − 𝑔(𝒙)
𝐿2
= 0
We obtain UAP if we can implement the squashing activation function in
pre-processing process (Result 2)
• 𝑈𝑁 𝒙 = 𝑉1 𝒙 ⊗ ⋯ ⊗ 𝑉𝑁 𝒙
• 𝑉
𝑗 𝒙 = 𝑒−𝑗𝜃𝑗 𝒙 𝑌
• 𝜃𝑗 𝒙 = arccos(
1+𝜎(𝒂𝑗⋅𝒙+𝑏𝑗)
2
)
𝜓𝑗 𝑥 = Ψ𝑁 𝒙 𝑍𝑗 Ψ𝑁 𝒙 = ⟨0|𝑒𝑖𝜃𝑗𝑌
𝑍𝑒−𝑖𝜃𝑗𝑌
|0⟩
= 2 cos2(𝜃𝑗) − 1 = 𝜎 𝒂𝑗 ⋅ 𝒙 + 𝑏𝑗 = 𝜎𝑗(𝒙)
11/20
Sequential scenario with a single qubit
Finite input set 𝒳 = {𝒙1, 𝒙2, … , 𝒙𝑀}
𝑉 𝒙 = 𝑒−𝜋𝑖𝜃 𝒙 𝑌
→ 𝑉𝑛
𝒙 = 𝑒−𝜋𝑖𝑛𝜃 𝒙 𝑌
Basis functions with the Pauli-Z
𝜓𝑛 𝒙 = 0 𝑉𝑛 †
𝒙 𝑍𝑉𝑛
𝒙 0 = 2 cos2
(𝜋𝑛𝜃(𝒙))
= cos 2𝜋𝑛𝜃 𝒙 = cos(2𝜋{𝑛𝜃 𝒙 })
We can use the ergodic theory
of the dynamical system on M-
dimensional torus
Lemma 3. If real numbers 1, 𝑎1, … , 𝑎𝑀 are linearly independent over ℚ,
then 𝑛𝑎1 , … , 𝑛𝑎𝑀 𝑛∈ℕ is dense in 0, 1 𝑀
.
Result 3. If 1, 𝜃(𝒙1), … , 𝜃(𝒙𝑀) are linearly independent over ℚ, then for
any function 𝑔: 𝒳 → ℝ and for any 𝜀 > 0, there exists 𝑛 ∈ ℕ such that
𝑓
𝑛 𝑥 − 𝑔 𝑥 < 𝜀 for ∀𝑥 ∈ 𝒳
12/20
UAP – Approximation Rate
Classical UAP of DNN
inf
𝐰
𝑓
𝒘 − 𝑔 = 𝑂(𝐾−𝛽/𝑑
)
[1] Hrushiksh M Mhaskar. Neural networks for optimal approximation of smooth and analytic functions. Neural Computation, 8(1):164-167, 1996
Optimal rate[3]
[2] Dmitry Yarotsky. Error bounds for approximations with deep Relu networks. Neural Networks, 94:103-114, 2017
[3] Ronald A DeVore. Optimal nonlinear approximation. Manuscripta Mathematica, 63(4):469-478, 1989.
𝐾
𝜀
𝑂(𝐾−𝛼1)
𝑂(𝐾−𝛼2)
𝛼1 < 𝛼2
13/20
The following statement[1,2] holds for continuous activation
function (with 𝐿 = 2), and ReLU (with 𝐿 = 𝑂(𝛽log
𝐾
𝑑
))
➢ 𝑑: input dimension
➢ 𝐾: number of network parameters
➢ 𝛽: derivative order of g
➢ 𝐿: number of layers.
How to describe relative good and bad UAP?
Approximation Rate = Decay rate of appr. error vs.
(better rate)
UAP – Approximation Rate
Approximation Rate = Decay rate of appr. error vs.
➢ Number K of observables
➢ Input dimension d
➢ Number N of qubits
𝐾
𝜀
𝑂(𝐾−1/3𝑑)
𝑂(𝐾−1/𝑑)
14/20
In the parallel scenario: 𝐾 = 𝑂 𝑁𝑑
→ 𝜀 = 𝑂 𝐾−1/𝑑
Result 4. If 𝑋 = 0,1 𝑑 and the target function 𝑔: 𝑋 → 𝑅 is Lipschitz continuous w.r.t. the
Euclidean norm, we can construct an explicit form of the approximator to 𝑔 in the parallel
scenario by 𝑁 qubits with the error 𝜀 = 𝑂(𝑑7/6𝑁−1/3). Furthermore, there exist an
approximator with 𝜀 = 𝑂(𝑑3/2𝑁−1)
(optimal rate)
Sketch of the proof
Use multivariate Bernstein polynomials for approximating
continuous function g
Number of qubits 𝑁 = 𝑛𝑑 (𝑑: input dimension)
How to create these terms by observables?
15/20
Consider the operators for each 𝒑 = (𝑝1, … , 𝑝𝑑)
16/20
We choose the following observables
Basis functions
17/20
Approximate rate in parallel scenario
We evaluate the approximation rate via the approximation of multivariate
Bernstein polynomials to a continuous function
18/20
Approximate rate in parallel scenario
We use the Jackson theorem in higher dimension of the quantitative information on the
degree of polynomial approximation (D. Newman+1964)
Bernstein basis polynomials form a basis of the linear space of 𝑑–variate polynomials
of degree less than or equal to 𝑁 (but we do not know the explicit form)
19/20
Summary
◼ UAP via simple quantum feature
maps with observables
◼ Other questions:
◼ Evaluate how well the approximation
could perform (vs. classical scenario)
➢ UAP with other properties of the target function: non
continuous, varied smoothness from place to place
➢ Entanglement and UAP
➢ Approximation rate with the number of layers
(such as in data re-uploading scheme)
Thank you for listening!
20/20
◼ Require a large number of qubits

More Related Content

PDF
010_20160216_Variational Gaussian Process
PDF
Advanced Support Vector Machine for classification in Neural Network
PDF
Tutorial of topological data analysis part 3(Mapper algorithm)
PDF
SIAM-AG21-Topological Persistence Machine of Phase Transition
PDF
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PDF
CCS2019-opological time-series analysis with delay-variant embedding
PDF
The Gaussian Process Latent Variable Model (GPLVM)
PDF
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
010_20160216_Variational Gaussian Process
Advanced Support Vector Machine for classification in Neural Network
Tutorial of topological data analysis part 3(Mapper algorithm)
SIAM-AG21-Topological Persistence Machine of Phase Transition
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
CCS2019-opological time-series analysis with delay-variant embedding
The Gaussian Process Latent Variable Model (GPLVM)
Methods of Manifold Learning for Dimension Reduction of Large Data Sets

What's hot (14)

PDF
A review on structure learning in GNN
PPTX
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
PDF
C0311019026
PPTX
Clustering techniques
PDF
Training and Inference for Deep Gaussian Processes
PPTX
Principal component analysis
PDF
Probabilistic PCA, EM, and more
PDF
Neural Networks: Principal Component Analysis (PCA)
PDF
Pca ankita dubey
PDF
Clustering lect
PDF
Hyperparameter optimization with approximate gradient
PDF
Radial Basis Function Interpolation
PPT
Machine Learning and Statistical Analysis
PDF
From RNN to neural networks for cyclic undirected graphs
A review on structure learning in GNN
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
C0311019026
Clustering techniques
Training and Inference for Deep Gaussian Processes
Principal component analysis
Probabilistic PCA, EM, and more
Neural Networks: Principal Component Analysis (PCA)
Pca ankita dubey
Clustering lect
Hyperparameter optimization with approximate gradient
Radial Basis Function Interpolation
Machine Learning and Statistical Analysis
From RNN to neural networks for cyclic undirected graphs
Ad

Similar to QTML2021 UAP Quantum Feature Map (20)

PDF
Economia01
PDF
Economia01
PDF
Radial basis function neural network control for parallel spatial robot
PDF
Approximating manifold orbits by means of Machine Learning Techniques
PDF
proposal_pura
PPTX
ML_Unit_2_Part_A
PDF
Kernal based speaker specific feature extraction and its applications in iTau...
PDF
Kernel Bayes Rule
PDF
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
PDF
Dycops2019
PDF
new approach fro reduced ordee modelling of fractional order systems in delts...
PPTX
Modifed my_poster
PPTX
Koh_Liang_ICML2017
PDF
Estimation Theory Class (Summary and Revision)
PDF
Echo state networks and locomotion patterns
PDF
International Journal of Engineering Research and Development (IJERD)
PDF
RTSP Report
PDF
Further results on the joint time delay and frequency estimation without eige...
PDF
SPECTRAL ESTIMATE FOR STABLE SIGNALS WITH P-ADIC TIME AND OPTIMAL SELECTION O...
PDF
Numerical Solutions of Burgers' Equation Project Report
Economia01
Economia01
Radial basis function neural network control for parallel spatial robot
Approximating manifold orbits by means of Machine Learning Techniques
proposal_pura
ML_Unit_2_Part_A
Kernal based speaker specific feature extraction and its applications in iTau...
Kernel Bayes Rule
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
Dycops2019
new approach fro reduced ordee modelling of fractional order systems in delts...
Modifed my_poster
Koh_Liang_ICML2017
Estimation Theory Class (Summary and Revision)
Echo state networks and locomotion patterns
International Journal of Engineering Research and Development (IJERD)
RTSP Report
Further results on the joint time delay and frequency estimation without eige...
SPECTRAL ESTIMATE FOR STABLE SIGNALS WITH P-ADIC TIME AND OPTIMAL SELECTION O...
Numerical Solutions of Burgers' Equation Project Report
Ad

More from Ha Phuong (20)

PDF
018 20160902 Machine Learning Framework for Analysis of Transport through Com...
PDF
017_20160826 Thermodynamics Of Stochastic Turing Machines
PDF
016_20160722 Molecular Circuits For Dynamic Noise Filtering
PDF
015_20160422 Controlling Synchronous Patterns In Complex Networks
PDF
Tutorial of topological_data_analysis_part_1(basic)
PDF
013_20160328_Topological_Measurement_Of_Protein_Compressibility
PDF
011_20160321_Topological_data_analysis_of_contagion_map
PDF
009_20150201_Structural Inference for Uncertain Networks
PDF
PRML Reading Chapter 11 - Sampling Method
PDF
Approximate Inference (Chapter 10, PRML Reading)
PDF
008 20151221 Return of Frustrating Easy Domain Adaptation
PDF
007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics
PDF
006 20151207 draws - Deep Recurrent Attentive Writer
PDF
005 20151130 adversary_networks
PDF
004 20151116 deep_unsupervisedlearningusingnonequlibriumthermodynamics
PPTX
003 20151109 nn_faster_andfaster
PDF
002 20151019 interconnected_network
PDF
001 20151005 ranking_nodesingrowingnetwork
PDF
Deep Learning And Business Models (VNITC 2015-09-13)
PDF
Prediction io–final 2014-jp-handout
018 20160902 Machine Learning Framework for Analysis of Transport through Com...
017_20160826 Thermodynamics Of Stochastic Turing Machines
016_20160722 Molecular Circuits For Dynamic Noise Filtering
015_20160422 Controlling Synchronous Patterns In Complex Networks
Tutorial of topological_data_analysis_part_1(basic)
013_20160328_Topological_Measurement_Of_Protein_Compressibility
011_20160321_Topological_data_analysis_of_contagion_map
009_20150201_Structural Inference for Uncertain Networks
PRML Reading Chapter 11 - Sampling Method
Approximate Inference (Chapter 10, PRML Reading)
008 20151221 Return of Frustrating Easy Domain Adaptation
007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics
006 20151207 draws - Deep Recurrent Attentive Writer
005 20151130 adversary_networks
004 20151116 deep_unsupervisedlearningusingnonequlibriumthermodynamics
003 20151109 nn_faster_andfaster
002 20151019 interconnected_network
001 20151005 ranking_nodesingrowingnetwork
Deep Learning And Business Models (VNITC 2015-09-13)
Prediction io–final 2014-jp-handout

Recently uploaded (20)

PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
2. Earth - The Living Planet earth and life
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPT
protein biochemistry.ppt for university classes
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
Introduction to Fisheries Biotechnology_Lesson 1.pptx
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
2. Earth - The Living Planet earth and life
Phytochemical Investigation of Miliusa longipes.pdf
protein biochemistry.ppt for university classes
POSITIONING IN OPERATION THEATRE ROOM.ppt
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
. Radiology Case Scenariosssssssssssssss
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Classification Systems_TAXONOMY_SCIENCE8.pptx
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
TOTAL hIP ARTHROPLASTY Presentation.pptx
Taita Taveta Laboratory Technician Workshop Presentation.pptx
AlphaEarth Foundations and the Satellite Embedding dataset
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
ECG_Course_Presentation د.محمد صقران ppt
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS

QTML2021 UAP Quantum Feature Map

  • 1. Universal Approximation Property via Quantum Feature Maps Quoc Hoan Tran*,Takahiro Goto*, and Kohei Nakajima Physical Intelligence Lab, UTokyo QTML2021 arXiv:2009.00298 Phys. Rev. Lett. 127, 090506 (2021) (*) Equal contribution
  • 2. The future of intelligent computations 2/20 Bits Qubits Neurons Quantum mind Quantum computing High Efficiency + Fragile Neural networks Versatile + Adaptive + Robust Two rising fields can be complementary Converge to an adaptive platform: Quantum Neural Networks
  • 3. QNN via Quantum Feature Map[1,2] 3/20 [1] V. Havlíček et al., Supervised Learning with Quantum-Enhanced Feature Spaces, Nature 567, 209 (2019) [2] M. Schuld and N. Killoran, Quantum Machine Learning in Feature Hilbert Spaces, Phys. Rev. Lett. 122, 040504 (2019) [3] Y. Liu et al., A Rigorous and Robust Quantum Speed-Up in Supervised Machine Learning, Nat. Phys. 17, 1013 (2021) Input Space 𝒳 Quantum Hilbert space 𝒙 |Ψ 𝒙 ⟩ Access via Measurements or Compute Kernel Or at least the same expressivity as classical 𝜅 𝒙, 𝒙′ = ⟨Ψ(𝒙)|Ψ(𝒙′)⟩ Heuristic Quantum Circuit Learning (K. Mitarai, M. Negoro,M. Kitagawa, and K. Fujii, Phys. Rev.A 98, 032309) ◼ For a special problem, a formal quantum advantage[3] can be proven (but is neither “near-term” nor practical) ◼ We expect quantum feature maps to be more expressive than classical counter parts
  • 4. Universal Approximation Property (UAP) 4/20 𝑥1 𝑥2 𝑤1 𝑤2 𝑤𝐾 𝑦 ◼ Feed-forward NN with one hidden layer can approximate any continuous function* on a closed and bounded subset of ℝ𝑑 to arbitrary 𝜀 (Hornik+1989, Cybenko+1989) * This belongs to a larger class of any Borel measurable function (beyond this talk) ➢ Given enough hidden units with any Squashing Activation Functions (Step, Sigmoid,Tanh, Relu) ➢ Extensive extensions to non-continuous activation function, resource evaluation
  • 5. UAP of Quantum ML models 5/20 Data Re-uploading Adrián Pérez-Salinas et al., Data re-uploading for a universal quantum classifier, Quantum 4, 226 (2020) Expressive power via a partial Fourier series M, Schuld, R. Sweke and J. J. Meyer, Effect of data encoding on the expressive power of variational quantum-machine-learning models, Phys. Rev.A 103, 032430 (2021) 𝑓𝜽 𝒙 = ෍ 𝜔∈Ω 𝑐𝜔 𝜽 𝑒𝑖𝜔𝒙 Our simple but clear questions ◼ UAP in the language of observables and quantum feature maps ◼ How well the approximation could perform? ◼ How many parameters it needs (vs. classical NNs)?
  • 6. QNN via Quantum Feature Map ◼ Choosing W is equivalent to selecting suitable observables 6/20 ◼ The output is a linear expansion of the basis functions with a suitable set of observables 𝜓𝑖 𝒙 = Ψ 𝒙 𝑂𝑖 Ψ 𝒙 𝑓 𝒙; Ψ, 𝑶, 𝒘 = ෍ 𝑖=1 𝐾 𝑤𝑖𝜓𝑖 𝒙 [1]V. Havlíček et al., Supervised Learning with Quantum-Enhanced Feature Spaces, Nature 567, 209 (2019) 𝑤𝛽 𝜽 = Tr 𝒲† 𝜽 𝑍⊗𝑁𝒲 𝜽 𝑃𝛽 𝑃𝛽 ∈ 𝐼, 𝑋, 𝑌, 𝑍 ⊗𝑁 𝜓𝛽 𝒙 = Tr Ψ 𝒙 Ψ 𝒙 𝑃𝛽 Pauli-group Basis functions (expectation) E.g., Decision function[1] sign 1 2𝑁 ෍ 𝛽 𝑤𝛽 𝜽 𝜓𝛽 𝒙 + 𝑏
  • 7. UAP via Quantum Feature Map 𝑓 𝒙; Ψ, 𝑶, 𝒘 = ෍ 𝑖=1 𝐾 𝑤𝑖𝜓𝑖 𝒙 ≈ 𝑔 𝒙 ? Given a compact set 𝒳 ∈ ℝ𝑑 , for any continuous function 𝑔: 𝒳 → ℝ, can we find 𝚿, 𝑶, 𝒘 s.t 7/20 UAP. Let 𝒢 be a space of continuous functions 𝑔: 𝒳 → ℝ.The quantum feature framework ℱ = {𝑓 𝒙; Ψ, 𝑶, 𝒘 } has the UAP w.r.t. 𝒢 and a norm ⋅ if given any 𝑔 ∈ 𝒢; then for any 𝜀, there exists 𝑓 ∈ ℱ s.t. 𝑓 − 𝑔 < 𝜀 Classification capability. For arbitrary disjoint regions 𝒦1, 𝒦2, … , 𝒦𝑛 ⊂ 𝒳, there exists 𝑓 ∈ ℱ s.t. 𝑓 can separate these regions (can be induced from UAP) 𝒳 ℝ 𝒦1 𝒦2 𝑓 ∈ ℱ
  • 8. UAP - Scenarios Parallel Sequential ◼ Encode the “nonlinear” property in the basis functions for UAP 1. Using appropriate observables to construct a polynomial approximation 2. Using appropriate activation functions in pre-processing 8/20
  • 9. Parallel scenario Approach 1. Produce the “nonlinear” by suitable observables ◼ Basis functions can be polynomials if we choose the correlated measurement operators ◼ Then we apply the Weierstrass’s polynomial approximation theorem 𝑈𝑁 𝒙 = 𝑉1 𝒙 ⊗ ⋯ ⊗ 𝑉𝑁 𝒙 𝑉 𝑗 𝒙 = 𝑒−𝑗𝜃𝑗 𝒙 𝑌 𝜃𝑗 𝒙 = arccos( 𝑥𝑘) for 1 ≤ 𝑘 ≤ 𝑑, 𝑗 ≡ 𝑘(mod 𝑑) 𝑂𝒂 = 𝑍𝑎1 ⊗ ⋯ ⊗ 𝑍𝑎𝑁, 𝒂 ∈ 0,1 𝑁 Observables Basis functions 𝜓𝒂 𝑥 = Ψ𝑁 𝒙 𝑂𝒂 Ψ𝑁 𝒙 = ෑ 𝑖=1 𝑁 2𝑥[𝑖] − 1 𝑎𝑖 Feature map: Ψ𝑁(𝒙) = 𝑈𝑁 𝒙 0 ⊗𝑁 1 ≤ 𝑖 ≤ 𝑑 𝑖 ≡ 𝑖 (𝑚𝑜𝑑 𝑑) 9/20
  • 10. Parallel scenario Approach 1. Produce the “nonlinear” by suitable observables Lemma 1. Consider a polynomial 𝑃(𝒙) of the input 𝒙 = 𝑥1, 𝑥2, … , 𝑥𝑑 ∈ 0,1 𝑑 where the degree of 𝑥𝑗 in 𝑃(𝒙) is less than or equal to 𝑁+(𝑑−𝑗) 𝑑 for 𝑗 = 1, … , 𝑑 𝑁 ≥ 𝑑 ; then there exists a collection of output weights {𝑤𝒂 ∈ 𝑅|𝒂 ∈ 0,1 𝑁} s.t. ෍ 𝒂∈ 0,1 𝑁 𝑤𝒂 𝜓𝒂 𝒙 = 𝑃(𝒙) Result 1. (UAP in the parallel scenario). For any continuous function 𝑔: 𝒳 → 𝑅 lim 𝑁→∞ min 𝒘 ෍ 𝒂∈ 0,1 𝑁 𝑤𝒂𝜓𝒂 𝒙 − 𝑔(𝒙) ∞ = 0 ※The number of observables ≤ 2 + 𝑁−1 𝑑 𝑑 10/20
  • 11. Parallel scenario Approach 2. Activation function in pre-processing (straight-forward) [Huang+, Neurocomputing, 2007] 𝜎: 𝑅 → [−1,1] is nonconstant piecewise continuous function such that 𝜎(𝒂 ⋅ 𝒙 + 𝒃) is dense in 𝐿2(𝑋).Then, for any continuous function 𝑔: 𝑋 → 𝑅 and any function sequence 𝜎𝑗 𝒙 = 𝜎 𝒂𝑗 ⋅ 𝒙 + 𝑏𝑗 where 𝒂𝑗, 𝑏𝑗 are randomly generated based on any continuous sampling distribution, lim 𝑁→∞ min 𝒘 ෍ 𝑗 𝑁 𝑤𝑗𝜎𝑗 𝒙 − 𝑔(𝒙) 𝐿2 = 0 We obtain UAP if we can implement the squashing activation function in pre-processing process (Result 2) • 𝑈𝑁 𝒙 = 𝑉1 𝒙 ⊗ ⋯ ⊗ 𝑉𝑁 𝒙 • 𝑉 𝑗 𝒙 = 𝑒−𝑗𝜃𝑗 𝒙 𝑌 • 𝜃𝑗 𝒙 = arccos( 1+𝜎(𝒂𝑗⋅𝒙+𝑏𝑗) 2 ) 𝜓𝑗 𝑥 = Ψ𝑁 𝒙 𝑍𝑗 Ψ𝑁 𝒙 = ⟨0|𝑒𝑖𝜃𝑗𝑌 𝑍𝑒−𝑖𝜃𝑗𝑌 |0⟩ = 2 cos2(𝜃𝑗) − 1 = 𝜎 𝒂𝑗 ⋅ 𝒙 + 𝑏𝑗 = 𝜎𝑗(𝒙) 11/20
  • 12. Sequential scenario with a single qubit Finite input set 𝒳 = {𝒙1, 𝒙2, … , 𝒙𝑀} 𝑉 𝒙 = 𝑒−𝜋𝑖𝜃 𝒙 𝑌 → 𝑉𝑛 𝒙 = 𝑒−𝜋𝑖𝑛𝜃 𝒙 𝑌 Basis functions with the Pauli-Z 𝜓𝑛 𝒙 = 0 𝑉𝑛 † 𝒙 𝑍𝑉𝑛 𝒙 0 = 2 cos2 (𝜋𝑛𝜃(𝒙)) = cos 2𝜋𝑛𝜃 𝒙 = cos(2𝜋{𝑛𝜃 𝒙 }) We can use the ergodic theory of the dynamical system on M- dimensional torus Lemma 3. If real numbers 1, 𝑎1, … , 𝑎𝑀 are linearly independent over ℚ, then 𝑛𝑎1 , … , 𝑛𝑎𝑀 𝑛∈ℕ is dense in 0, 1 𝑀 . Result 3. If 1, 𝜃(𝒙1), … , 𝜃(𝒙𝑀) are linearly independent over ℚ, then for any function 𝑔: 𝒳 → ℝ and for any 𝜀 > 0, there exists 𝑛 ∈ ℕ such that 𝑓 𝑛 𝑥 − 𝑔 𝑥 < 𝜀 for ∀𝑥 ∈ 𝒳 12/20
  • 13. UAP – Approximation Rate Classical UAP of DNN inf 𝐰 𝑓 𝒘 − 𝑔 = 𝑂(𝐾−𝛽/𝑑 ) [1] Hrushiksh M Mhaskar. Neural networks for optimal approximation of smooth and analytic functions. Neural Computation, 8(1):164-167, 1996 Optimal rate[3] [2] Dmitry Yarotsky. Error bounds for approximations with deep Relu networks. Neural Networks, 94:103-114, 2017 [3] Ronald A DeVore. Optimal nonlinear approximation. Manuscripta Mathematica, 63(4):469-478, 1989. 𝐾 𝜀 𝑂(𝐾−𝛼1) 𝑂(𝐾−𝛼2) 𝛼1 < 𝛼2 13/20 The following statement[1,2] holds for continuous activation function (with 𝐿 = 2), and ReLU (with 𝐿 = 𝑂(𝛽log 𝐾 𝑑 )) ➢ 𝑑: input dimension ➢ 𝐾: number of network parameters ➢ 𝛽: derivative order of g ➢ 𝐿: number of layers. How to describe relative good and bad UAP? Approximation Rate = Decay rate of appr. error vs. (better rate)
  • 14. UAP – Approximation Rate Approximation Rate = Decay rate of appr. error vs. ➢ Number K of observables ➢ Input dimension d ➢ Number N of qubits 𝐾 𝜀 𝑂(𝐾−1/3𝑑) 𝑂(𝐾−1/𝑑) 14/20 In the parallel scenario: 𝐾 = 𝑂 𝑁𝑑 → 𝜀 = 𝑂 𝐾−1/𝑑 Result 4. If 𝑋 = 0,1 𝑑 and the target function 𝑔: 𝑋 → 𝑅 is Lipschitz continuous w.r.t. the Euclidean norm, we can construct an explicit form of the approximator to 𝑔 in the parallel scenario by 𝑁 qubits with the error 𝜀 = 𝑂(𝑑7/6𝑁−1/3). Furthermore, there exist an approximator with 𝜀 = 𝑂(𝑑3/2𝑁−1) (optimal rate)
  • 15. Sketch of the proof Use multivariate Bernstein polynomials for approximating continuous function g Number of qubits 𝑁 = 𝑛𝑑 (𝑑: input dimension) How to create these terms by observables? 15/20
  • 16. Consider the operators for each 𝒑 = (𝑝1, … , 𝑝𝑑) 16/20
  • 17. We choose the following observables Basis functions 17/20
  • 18. Approximate rate in parallel scenario We evaluate the approximation rate via the approximation of multivariate Bernstein polynomials to a continuous function 18/20
  • 19. Approximate rate in parallel scenario We use the Jackson theorem in higher dimension of the quantitative information on the degree of polynomial approximation (D. Newman+1964) Bernstein basis polynomials form a basis of the linear space of 𝑑–variate polynomials of degree less than or equal to 𝑁 (but we do not know the explicit form) 19/20
  • 20. Summary ◼ UAP via simple quantum feature maps with observables ◼ Other questions: ◼ Evaluate how well the approximation could perform (vs. classical scenario) ➢ UAP with other properties of the target function: non continuous, varied smoothness from place to place ➢ Entanglement and UAP ➢ Approximation rate with the number of layers (such as in data re-uploading scheme) Thank you for listening! 20/20 ◼ Require a large number of qubits