SlideShare a Scribd company logo
A guide to Tensor and its applications in
Machine Learning
A brief introduction
Vanessa Bridge1 Prof. Gao2
1Faculty of Mathematics
York University
27, March 2023
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 1 / 45
Table of Contents
1 Introduction
2 Tensors
3 Decomposition
4 Machine Learning Applications
5 Research Applications
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 2 / 45
Introduction
In a world were the data set sizes are ever increasing, mathematicians keep
developing tools to analyze them. Tensors and their applications permit
the optimization of high dimensional problems through a number of
techniques that will be covered in this seminar.
Complex tasks such as Attribute-Enhanced Face Recognition can be
done by using Neural Tensor Fusion Networks[1].
Feature Extraction for Incomplete Data[2]
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 3 / 45
Motivation
In order to understand the applications of tensors we will start with a
brief overview of their definitions. We will cover some of the key
operations and explain the concept of decomposition as well as their many
uses in the field of machine learning.
Figure: Different Order Tensors
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 4 / 45
Tensors
Definition
A tensors can be thought of as multi-way collections of numbers, which
typically come from a field R. In the simplest high-dimensional case, such
a tensor would be a three-dimensional array, which can be thought of as a
data cube.
Example: data in various forms like images, audio, video and text can be
represented as these multi-dimensional arrays.
Operations
Tensors are manipulated using linear algebra operations, such as addition,
multiplication and element-wise products, such as the Kronecker Product
or Kathri-Rao Product to perform computations in neural networks. These
operations are efficient on modern hardware, such as GPUs, and can be
parallelized to accelerate training and inference.
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 5 / 45
Tensor Order
Notation
Y ∈ RI1xI2xI3x...xIN , represents the Nth-order tensor
yi1 ,i2 , ...,iN
represents the entries of an Nth-order Y tensor
For example, a tensor Y ∈ R3x4x5x6 is a tensor of of order 4, size 3 in
mode-1, size 4 in mode-2, size 5 in mode-3 and size 6 in mode-4
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 6 / 45
Tensor Indexing
Figure: Lateral, horizontal, and frontal slices of a mode-3 tensor
Fibers
We can create subarrays (or subfields) by fixing some of the given tensor’s
indices. Fibers are created when fixing all but one index, slices (or slabs)
are created when fixing all but two indices.
Example: For a third order tensor the fibers are given as x:jk = xjk
(column), xi:k (row), and xij: (tube); the slices are given as X::k = Xk
(frontal), X:j: (lateral), Xi:: (horizontal).
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 7 / 45
Operations
Tensor Addition
C = A + B where A ∈ RI1xI2xI3x...xIN B ∈ RI1xI2xI3x...xIN are both Nth-order
tensors, C ∈ RI1xI2xI3x...xIN ci1 ,i2 , ...,iN
= ai1 ,i2 , ...,iN
+bi1 ,i2 , ...,iN
Tensor Mode-n Product and a Matrix
C = AxnmB , where in xnm, m means matrix, n means mode-n,
A ∈ RI1xI2xI3x...xIN means the Nth-order tensor, B ∈ RJxIn means the matrix
yi1 ,i2 , ...,iN
represents the entries of an Nth-order Y tensor
Tensor Mode-(a,b) Product or Tensor Contraction
C = Ax(a,b)B , where A ∈ RI1xI2xI3x...xIN means the Nth-order tensor,
B ∈ RJ1xJ2x...xJM means another tensor
C ∈ RI1xIa−1xIa+1x..IN xJ1x...Jb−1xJb+1..xJM with entries
ci1 , ...,ia−1 ,ia+1 , ...,iN
,j1 , ...jb−1
,jb+1
, ...jM
=
PIa
ia
ai1,...,ia,...,in bj1,...jb−1,jb+1,...,jM
tensor
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 8 / 45
Tensor Contraction Visually Explained
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 9 / 45
Tensor Basic Product
Outer Product Of A Tensor
The vector outer product is defined as the product of the vector’s
elements. This operation is denoted by the symbol ◦ . The vector outer
product of two n-sized C = A ◦ B
where A ∈ RI1xI2xI3X...IN andB ∈ RJ1xJ2xJ3X...JN
they yield an (N+M)th-order tensor C with entries
ci1,...,iN ,j1,...,jM
= ai1,...,iN
bj1,...,jM
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 10 / 45
Tensor Products
Right Kronecker Product
C = A ⊗R B where R means right, A ∈ RI1xI2xI3X...IN and
B ∈ RJ1xJ2xJ3X...JM they yield a tensor C ∈ RJ1I1xJ2I2x..JN IN with entries
ci1j1,...iN jN
= ai1,...,iN
bj1,...,jN
where iNjN = jN + (iN − 1)jN
Figure: Example of a Right Kronecker Product
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 11 / 45
The Kathri-Rao Product
Right Khatri-Rao Product
C = A ⊙R B
= [a1 ⊗R b1, a2 ⊗R b2, ...aK ⊗R bK , ] ∈ RIJxK
where A = [a1, a2, ....aK ] ∈ RIxK , B = [b1, b2, ....bK ] ∈ RJxK
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 12 / 45
Why Use Tensors?
One of the main advantages of using tensors to represent data is the
ability to apply techniques such as decomposition which helps reduce
complexity and run-time in different applications.[7] Decomposition
basically allows to divide the data into related and irrelevant parts. These
techniques allow to for the compression of high dimensional data while
keeping consistency and correlation.
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 13 / 45
Tensor Decomposition
There exist many types of tensor decomposition and we will cover some in
the following section [3]
Canonical Polyadic (CP) Decomposition
Tucker Decomposition
Eigenvalue Decomposition
Multilinear Singular Value SVD and HOSVD
Hierchical Tucker HT
Tensor Train TT
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 14 / 45
CP Decomposition
The key concept of rank decomposition is to express a tensor as the sum
of a finite number of rank-one tensors.
Rank-1 tensor
A rank-1 tensor: Y = b1
◦ b2
... ◦ bN
where
Y ∈ RI1xI2xI3x...xIN , bn
∈ RIn , yi1 ,i2 , ...,iN
= b1
i1
...bN
iN
The constrained low-rank matrix factorization
C = ΛABT + E =
PR
r=1 λr ar bT
r + E
CP Decomposition
In CP decomposition, tensor is decomposed into the linear sum of the
vectors defined in above
Y ≈
PR
r=1 λr b1
r ◦ b2
r ◦ ...bN
r = Λx1mB1x2mB2....xNmBN where
λr = Λr ,r ,r , ..,r in[1, R] are entries of the diagonal core tensor
Λ ∈ RRxRxRx...R and Bn = [b1
r , b2
r , ..., bN
r ] ∈ RInxR which are factor matrix.
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 15 / 45
CP Challenges
Tensors have interference (i.e.: data loss or noise)
No exact solution exist
Need to solve for one factor matrix at a time
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 16 / 45
CP Approximation Solution
Solution
The idea is to first find the factor matrix Bn by minimizing an appropriate
loss function, similar to the least square method.[4]
We minimize the loss function in the upper form, and use the
alternating least square method.
One of those N factor matrices Bn, is optimized separately at a time, keep
the values of other N-1 factor matrices unchanged i.e.:
We first initialize all N factor matrices, and optimize only B1 by gradient
descent while keep the initial values of B2 to BN unchanged.
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 17 / 45
CP Loss Function
Loss function: Mean Squared Error vs CP Decomposition
The general approach to solving these equations comes to trying to find
the factor matrix Bn by minimizing the appropriate loss function such as
the least square method:
Sum of Squared Errors:
P
ijk(x(i, j, k) − x̃(i, j, k))2 = ∥X − X̃∥2
X ≈ X̃ =
PR
r=1 ar ◦ br r ◦ cr = [[A, B, C]]
Figure: Decomposition for 3rd order tensor
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 18 / 45
CP Decomposition Algorithm
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 19 / 45
2. Tucker Decomposition
Tucker decomposition like CP, consists of dividing the tensor into small
size core tensors and factor matrices, the key difference is that it does not
need the diagonal tensor, providing more flexibility.
Tucker Decomposition
Y ≈
PR1
r1=1 ...
PRN
rN =1 ar1r2...rN
b1
r ◦ b2
r ◦ ...bN
r = Ax1mB1x2mB2....xNmBN
Y v1 = (BN ⊗R BN−1... ⊗R B1)Av1
where Ar1,r2,..,rN
are entries of the of the small size core tensor
A ∈ RR1xR2xRx...RN and Bn = [b1
r , b2
r , ..., bN
r ] which are factor matrix. Y v1
is the mode-1 vectorization of the tensor Y
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 20 / 45
Tucker Rank
Multiple Linear Rank
Unlike CP Decomposition Tucker decomposition uses what is called a
Mulitple Linear Rank: (R1, R2, ..., RN).
For a tensor Y ∈ RI1xI2xI3x...xIN
we define its Multiple Linear Rank (Tucker Rank) as:
rml (Y ) = (r(Y m1), r(Y m2), ..., r(Y mN))
where, Y mn) is the mode-n matricization of the tensor Y , r(Y mn) means
the matrix rank of the mode-n matricization of tensor Y
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 21 / 45
Tucker Decomposition vs CP
Both are outer product decompositions, but they have very different
structural properties. As a rule of thumb it is usually advised to use CPD
for latent parameter estimation and Tucker for subspace estimation,
compression, and dimensionality reduction.
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 22 / 45
3. Eigenvalue Decomposition
Rank revealing decomposition associated with outer product rank. The
symmetric eigenvalue decomposition of A ∈ S3(Rn)
A =
Pr
i=1 λi vi ⊗ vi ⊗ vi
where rank⊗(A) = min{r|A =
Pr
i=1 λi vi ⊗ vi ⊗ vi = r}
Eigenvalue decomposition can be useful to highlight low-rank
approximation which can be useful when combined with other algorithms
later covered
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 23 / 45
4. Multi-linear Singular Value SVD and HOSVD
The high-order singular value decomposition can be thought of as another
special form of the Tucker Decomposition where the factor matrices and
the core tensor are all orthogonal. To illustrate what is meant we will look
at the 3-rd order tensor:
HOSVD
(Aa, :, :)(Ab, :, :) = 0, for (a ̸= b, a, b ∈ [1, J])
(A :, c, :)(A :, d, :) = 0, for (c ̸= d, c, d ∈ [1, J])
(A :, :, e)(A :, :, f ) = 0, for (e ̸= f , e, f ∈ [1, J])
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 24 / 45
Spatial Representation
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 25 / 45
HOSVD Decomposition Alglorithm
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 26 / 45
Algorithms
Tensor Regression
Tensor Variable Gaussian Process Regression
STM Support Tensor Machines
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 27 / 45
Tensor Regression
One of the common tensor applications in the field of machine learning
pertains to regression models, often used to predict stock markets forecast,
weather forecast and more.[4] The traditional linear regression model:
y = wTx + b
where x ∈ RN is a sample feature vector and w ∈ RN is a coefficient
vector, b is a bias.
The tensor regression: y = W •X+b
where W ∈ RI1xI2...IN is the coefficient tensor.
Researchers have also used tensor regression with applications in
neuroimaging data analysis.
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 28 / 45
Tensor Regression Algorithm
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 29 / 45
Tensor Variable Gaussian Process Regression
Tensor variable Gaussian process regression is similar to the tensor linear
regression. It takes the input Xi ∈ RI1I2,,IN , an Nth-order tensor and
produces and output yi which is a scalar. The main difference is that the
input here is subject to Gaussian distribution
Tensor Variable Gaussian Regression
yi = f (Xi
) + ϵi = 1, ..., N
where Xi
∈ RI1,I2,..,IN are the Nth order tensor and yi is a scalar.
ϵi ∼ N(0, σ2)
And the non-linear function f (X) is defined by:
f (X ∼ GP(m(X), k(X, X̃)|θ)
where m(X) is the mean function and k(X, X̃) is the kernel function
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 30 / 45
SVM And STM
SVM and STM
To solve a minimization problem for the Support Vector Machines we
have the following formula
min∥w∥2
2 + λ
2 ξT ξ
s.t yj (wT xj + Cb) ≥ 1 − ξj , ξj ≥ 0
Where ξ = [ξ1, ξ2, ..., ξM] ∈ RM
We extend this into Tensors and get the following expressions:
min∥W∥2
2 + C
P
ξj
s.t yj (W • Xj + b) ≥ 1 − ξj , ξj = 1, 2, ..., M
Usually we chose to decompose the tensor W into the form of the
rank-one vector outer product i.e.: W = w1 ◦ w2... ◦ wN and the solution
of STM is similar to the solution of CP decomposition.
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 31 / 45
Applications
Tensor Algorithms In Machine Learning And Its Applications
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 32 / 45
Feature Tensor For Classification
The use of tensors comes with many advantages to other types of data
structures. It allows for users to identify key features and reduce
dimensions by applying the techniques seen before.
One example is in the case of Feature tensor generation (Tensorization)
which is often used in image processing. By finding feature tensors, that
is, extracting valid data, image classification can be better, thereby
improving classification accuracy. Usually we need to use some means to
convert 2D images into 3D feature tensors to extract information.
Feature tensor generation transforms the original image X into another
3rd-order high-dimensional image Y , which can maintain the spatial
relationship between the images. The size of each transformed image Y is
much smaller than the original image X, and the original image X can be
accurately recovered from the transformed 3D image Y .
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 33 / 45
Algorithm For Tensor Based Feature For Image
Classification
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 34 / 45
Supervised Classification With Tucker Decomposition
Researchers have shown that you can used Tucker decomposition to fuse 2
feature: face recognition features (FRF) and facial attribute features (FAF)
can enhance face recognition performance in various challenging scenarios.
Tensor can also combine various high-dimensional features to improve the
classification accuracy. The researchers found that by combining these 2
with Tucker Decomposition their results to enhance face recognition
performance as the features provide complimentary information. [1]
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 35 / 45
Tucker Decomposition For Feature Fusion
We can recover the original image by reversing the above steps. For the
feature tensor, it is highly compatible with the deep learning method
commonly used in images, Convolutional Neural Network (CNN).
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 36 / 45
Tensor Based Feature For Face Recognition
”Overall, face recognition features (FRF) are very discriminative but less
robust; while facial attribute features (FAF) are robust but less
discriminative. Thus these two features are potentially complementary, if a
suitable fusion method can be devised. To the best of our knowledge, we
are the first to systematically explore the fusion of FAF and FRF in various
face recognition scenarios. We empirically show that this fusion can
greatly enhance face recognition performance”[1]
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 37 / 45
Data Pre-Processing Techniques
In data processing a usual problem is the lack of complete values or data
corruption. These can be interpreted as missing data in images due to low
quality of the hardware as well as issues in storage and memory. There
exists many ways of completing the missing values and techniques such as
tensor estimation and tensor completion can be applied. To solve for the
missing data a minimization problem is established that looks at
minimizing the mean square error between the estimated value and the
original value.
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 38 / 45
Tensor Based Feature For Face Recognition With
Incomplete Data
One key example is the paper[7] in which researchers collected
multidimensional data with missing entries to test the ability to recreate
images based on different levels of missing data. They use Tucker and CP
decomposition to propose 2 methods of low-rank tensor decomposition
with feature variance maximization. Results conclude that by using these
methods, their algortihm is able to outperform existing techniques
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 39 / 45
Tensor Based Feature For Face Recognition With
Incomplete Data
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 40 / 45
Tensors and Deep learning
The ability of tensors to capture large-scale of data and allow for
compression in deep learning applications is being explored as well.
Tensors provide the ability to factorize large data into networks of smaller
elements which can be used as input for deep learning: image classification
algorithms.
We can recover the original image by reversing the above steps. For the
feature tensor, it is highly compatible with the deep learning method
commonly used in images, Convolutional Neural Network (CNN). So for
general image processing, it can be classified firstly by finding the feature
tensor of the image and then we can use CNN to classify
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 41 / 45
Tensors For Deep learning
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 42 / 45
Research Limitations
We must keep in mind that these algorithms although effective, have a
common problem, that is, the problem of initialization. In deep learning
and machine learning, if the weight initialization is not appropriate, it will
cause long convergence time or even non-convergence.
In the case of images, the challenges come from the quality of the target
image, there is a need for algorithms that are designed to capture
geometric local structure with tensors in mind. Specially for dynamic video
images.
In the context of deep learning a major concern comes from finding saddle
points and local minimas, a problem which increases with the increased
dimensionality.
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 43 / 45
Reference List
1 G. Hu, Y. Hua, Y. Yuan, Z. Zhang, Z. Lu, S. S. Mukherjee, T. M.
Hospedales, N. M. Robertson, and Y. Yang, “Attribute-enhanced face
recognition with neural tensor fusion networks,” in Proc. IEEE Int.
Conf. Comput. Vis. (ICCV), Venice, Italy, Oct. 2017, pp. 3764–3773.
2 Q. Shi , Y. Cheung, Q Zhao, H. Lu ; “Feature extraction for
incomplete data via low-rank tensor decomposition with feature
regularization,” IEEE transactions on neural networks and learning
systems.
3 S. Rabanser, O. Shchur, and S. Günnemann, “Introduction to tensor
decompositions and their applications in machine learning,” arXiv.org,
29-Nov-2017. [Online]. Available: https://arxiv.org/abs/1711.10781.
4 M. Hou, “Tensor-based regression models and applications,” Ph.D.
dissertation, Laval Univ., Quebec City, QC, Canada, 2017.
5 H. Chen, Q. Ren, and Y. Zhang, “A hierarchical support tensor
machine structure for target detection on high-resolution remote
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 44 / 45
Reference List Continued
6 H. Yang, J. Su, Y. Zou, B. Yu, and E. F. Y. Young, “Layout hotspot
detection with feature tensor generation and deep biased learning,” in
Proc. 54th ACM/EDAC/IEEE Design Autom. Conf. (DAC), Austin,
TX, USA, 2017, pp. 1–6.
7 A. Cichocki, N. Lee, I. Oseledets, A.-H. Phan, Q. Zhao, and D. P.
Mandic, “Tensor networks for dimensionality reduction and large-
scale optimization: Part 1 low-rank tensor decompositions,” Found.
Trends Mach. Learn., vol. 9, nos. 4–5, pp. 249–429, 2016.
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 45 / 45

More Related Content

PDF
Tensor representations in signal processing and machine learning (tutorial ta...
PDF
presentation
PDF
Tenseur en algèbre lineaire numerique avancé
PDF
ppt of no tensors , how they word etc etc etc
PDF
A brief survey of tensors
PDF
Low-rank tensor approximation (Introduction)
PDF
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
PDF
New data structures and algorithms for \\post-processing large data sets and ...
Tensor representations in signal processing and machine learning (tutorial ta...
presentation
Tenseur en algèbre lineaire numerique avancé
ppt of no tensors , how they word etc etc etc
A brief survey of tensors
Low-rank tensor approximation (Introduction)
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
New data structures and algorithms for \\post-processing large data sets and ...

Similar to A guide to Tensor and its applications in Machine Learning.pdf (20)

PDF
Master Thesis on the Mathematial Analysis of Neural Networks
PPTX
Nonlinear finite element method for engineer
PDF
Intro to Tensors & Decompositions
PDF
IRJET- Survey on Image Denoising Algorithms
PDF
Theory and Computation of Tensors Multi Dimensional Arrays 1st Edition Yimin Wei
PDF
Theory and Computation of Tensors Multi Dimensional Arrays 1st Edition Yimin Wei
PDF
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
PPTX
Chap-1 Preliminary Concepts and Linear Finite Elements.pptx
PPTX
Role of Tensors in Machine Learning
PDF
MLconf NYC Animashree Anandkumar
PPTX
Tensor decompositions for medical analytics
PPTX
TENSOR .pptx
PDF
Applied mechanics of solids (a.f
PDF
TENSOR DECOMPOSITION WITH PYTHON
PPTX
Tensor Networks and Their Applications on Machine Learning
PDF
Understanding viscoelasticity
PPTX
Tensor Calculus.pptx
PDF
EiB Seminar from Esteban Vegas, Ph.D.
PPTX
Vector and Matrix operationsVector and Matrix operations
PDF
Dale Smith, Data Scientist, Nexidia at MLconf ATL - 9/18/15
Master Thesis on the Mathematial Analysis of Neural Networks
Nonlinear finite element method for engineer
Intro to Tensors & Decompositions
IRJET- Survey on Image Denoising Algorithms
Theory and Computation of Tensors Multi Dimensional Arrays 1st Edition Yimin Wei
Theory and Computation of Tensors Multi Dimensional Arrays 1st Edition Yimin Wei
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Chap-1 Preliminary Concepts and Linear Finite Elements.pptx
Role of Tensors in Machine Learning
MLconf NYC Animashree Anandkumar
Tensor decompositions for medical analytics
TENSOR .pptx
Applied mechanics of solids (a.f
TENSOR DECOMPOSITION WITH PYTHON
Tensor Networks and Their Applications on Machine Learning
Understanding viscoelasticity
Tensor Calculus.pptx
EiB Seminar from Esteban Vegas, Ph.D.
Vector and Matrix operationsVector and Matrix operations
Dale Smith, Data Scientist, Nexidia at MLconf ATL - 9/18/15
Ad

Recently uploaded (20)

PPTX
1. Introduction to Computer Programming.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
A comparative study of natural language inference in Swahili using monolingua...
PPTX
Machine Learning_overview_presentation.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
cloud_computing_Infrastucture_as_cloud_p
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Encapsulation theory and applications.pdf
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Machine learning based COVID-19 study performance prediction
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Spectroscopy.pptx food analysis technology
PDF
Getting Started with Data Integration: FME Form 101
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
1. Introduction to Computer Programming.pptx
Spectral efficient network and resource selection model in 5G networks
A comparative study of natural language inference in Swahili using monolingua...
Machine Learning_overview_presentation.pptx
Network Security Unit 5.pdf for BCA BBA.
cloud_computing_Infrastucture_as_cloud_p
SOPHOS-XG Firewall Administrator PPT.pptx
Unlocking AI with Model Context Protocol (MCP)
NewMind AI Weekly Chronicles - August'25-Week II
TLE Review Electricity (Electricity).pptx
Encapsulation theory and applications.pdf
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
MIND Revenue Release Quarter 2 2025 Press Release
Machine learning based COVID-19 study performance prediction
Encapsulation_ Review paper, used for researhc scholars
Spectroscopy.pptx food analysis technology
Getting Started with Data Integration: FME Form 101
OMC Textile Division Presentation 2021.pptx
Assigned Numbers - 2025 - Bluetooth® Document
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Ad

A guide to Tensor and its applications in Machine Learning.pdf

  • 1. A guide to Tensor and its applications in Machine Learning A brief introduction Vanessa Bridge1 Prof. Gao2 1Faculty of Mathematics York University 27, March 2023 Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 1 / 45
  • 2. Table of Contents 1 Introduction 2 Tensors 3 Decomposition 4 Machine Learning Applications 5 Research Applications Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 2 / 45
  • 3. Introduction In a world were the data set sizes are ever increasing, mathematicians keep developing tools to analyze them. Tensors and their applications permit the optimization of high dimensional problems through a number of techniques that will be covered in this seminar. Complex tasks such as Attribute-Enhanced Face Recognition can be done by using Neural Tensor Fusion Networks[1]. Feature Extraction for Incomplete Data[2] Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 3 / 45
  • 4. Motivation In order to understand the applications of tensors we will start with a brief overview of their definitions. We will cover some of the key operations and explain the concept of decomposition as well as their many uses in the field of machine learning. Figure: Different Order Tensors Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 4 / 45
  • 5. Tensors Definition A tensors can be thought of as multi-way collections of numbers, which typically come from a field R. In the simplest high-dimensional case, such a tensor would be a three-dimensional array, which can be thought of as a data cube. Example: data in various forms like images, audio, video and text can be represented as these multi-dimensional arrays. Operations Tensors are manipulated using linear algebra operations, such as addition, multiplication and element-wise products, such as the Kronecker Product or Kathri-Rao Product to perform computations in neural networks. These operations are efficient on modern hardware, such as GPUs, and can be parallelized to accelerate training and inference. Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 5 / 45
  • 6. Tensor Order Notation Y ∈ RI1xI2xI3x...xIN , represents the Nth-order tensor yi1 ,i2 , ...,iN represents the entries of an Nth-order Y tensor For example, a tensor Y ∈ R3x4x5x6 is a tensor of of order 4, size 3 in mode-1, size 4 in mode-2, size 5 in mode-3 and size 6 in mode-4 Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 6 / 45
  • 7. Tensor Indexing Figure: Lateral, horizontal, and frontal slices of a mode-3 tensor Fibers We can create subarrays (or subfields) by fixing some of the given tensor’s indices. Fibers are created when fixing all but one index, slices (or slabs) are created when fixing all but two indices. Example: For a third order tensor the fibers are given as x:jk = xjk (column), xi:k (row), and xij: (tube); the slices are given as X::k = Xk (frontal), X:j: (lateral), Xi:: (horizontal). Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 7 / 45
  • 8. Operations Tensor Addition C = A + B where A ∈ RI1xI2xI3x...xIN B ∈ RI1xI2xI3x...xIN are both Nth-order tensors, C ∈ RI1xI2xI3x...xIN ci1 ,i2 , ...,iN = ai1 ,i2 , ...,iN +bi1 ,i2 , ...,iN Tensor Mode-n Product and a Matrix C = AxnmB , where in xnm, m means matrix, n means mode-n, A ∈ RI1xI2xI3x...xIN means the Nth-order tensor, B ∈ RJxIn means the matrix yi1 ,i2 , ...,iN represents the entries of an Nth-order Y tensor Tensor Mode-(a,b) Product or Tensor Contraction C = Ax(a,b)B , where A ∈ RI1xI2xI3x...xIN means the Nth-order tensor, B ∈ RJ1xJ2x...xJM means another tensor C ∈ RI1xIa−1xIa+1x..IN xJ1x...Jb−1xJb+1..xJM with entries ci1 , ...,ia−1 ,ia+1 , ...,iN ,j1 , ...jb−1 ,jb+1 , ...jM = PIa ia ai1,...,ia,...,in bj1,...jb−1,jb+1,...,jM tensor Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 8 / 45
  • 9. Tensor Contraction Visually Explained Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 9 / 45
  • 10. Tensor Basic Product Outer Product Of A Tensor The vector outer product is defined as the product of the vector’s elements. This operation is denoted by the symbol ◦ . The vector outer product of two n-sized C = A ◦ B where A ∈ RI1xI2xI3X...IN andB ∈ RJ1xJ2xJ3X...JN they yield an (N+M)th-order tensor C with entries ci1,...,iN ,j1,...,jM = ai1,...,iN bj1,...,jM Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 10 / 45
  • 11. Tensor Products Right Kronecker Product C = A ⊗R B where R means right, A ∈ RI1xI2xI3X...IN and B ∈ RJ1xJ2xJ3X...JM they yield a tensor C ∈ RJ1I1xJ2I2x..JN IN with entries ci1j1,...iN jN = ai1,...,iN bj1,...,jN where iNjN = jN + (iN − 1)jN Figure: Example of a Right Kronecker Product Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 11 / 45
  • 12. The Kathri-Rao Product Right Khatri-Rao Product C = A ⊙R B = [a1 ⊗R b1, a2 ⊗R b2, ...aK ⊗R bK , ] ∈ RIJxK where A = [a1, a2, ....aK ] ∈ RIxK , B = [b1, b2, ....bK ] ∈ RJxK Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 12 / 45
  • 13. Why Use Tensors? One of the main advantages of using tensors to represent data is the ability to apply techniques such as decomposition which helps reduce complexity and run-time in different applications.[7] Decomposition basically allows to divide the data into related and irrelevant parts. These techniques allow to for the compression of high dimensional data while keeping consistency and correlation. Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 13 / 45
  • 14. Tensor Decomposition There exist many types of tensor decomposition and we will cover some in the following section [3] Canonical Polyadic (CP) Decomposition Tucker Decomposition Eigenvalue Decomposition Multilinear Singular Value SVD and HOSVD Hierchical Tucker HT Tensor Train TT Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 14 / 45
  • 15. CP Decomposition The key concept of rank decomposition is to express a tensor as the sum of a finite number of rank-one tensors. Rank-1 tensor A rank-1 tensor: Y = b1 ◦ b2 ... ◦ bN where Y ∈ RI1xI2xI3x...xIN , bn ∈ RIn , yi1 ,i2 , ...,iN = b1 i1 ...bN iN The constrained low-rank matrix factorization C = ΛABT + E = PR r=1 λr ar bT r + E CP Decomposition In CP decomposition, tensor is decomposed into the linear sum of the vectors defined in above Y ≈ PR r=1 λr b1 r ◦ b2 r ◦ ...bN r = Λx1mB1x2mB2....xNmBN where λr = Λr ,r ,r , ..,r in[1, R] are entries of the diagonal core tensor Λ ∈ RRxRxRx...R and Bn = [b1 r , b2 r , ..., bN r ] ∈ RInxR which are factor matrix. Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 15 / 45
  • 16. CP Challenges Tensors have interference (i.e.: data loss or noise) No exact solution exist Need to solve for one factor matrix at a time Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 16 / 45
  • 17. CP Approximation Solution Solution The idea is to first find the factor matrix Bn by minimizing an appropriate loss function, similar to the least square method.[4] We minimize the loss function in the upper form, and use the alternating least square method. One of those N factor matrices Bn, is optimized separately at a time, keep the values of other N-1 factor matrices unchanged i.e.: We first initialize all N factor matrices, and optimize only B1 by gradient descent while keep the initial values of B2 to BN unchanged. Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 17 / 45
  • 18. CP Loss Function Loss function: Mean Squared Error vs CP Decomposition The general approach to solving these equations comes to trying to find the factor matrix Bn by minimizing the appropriate loss function such as the least square method: Sum of Squared Errors: P ijk(x(i, j, k) − x̃(i, j, k))2 = ∥X − X̃∥2 X ≈ X̃ = PR r=1 ar ◦ br r ◦ cr = [[A, B, C]] Figure: Decomposition for 3rd order tensor Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 18 / 45
  • 19. CP Decomposition Algorithm Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 19 / 45
  • 20. 2. Tucker Decomposition Tucker decomposition like CP, consists of dividing the tensor into small size core tensors and factor matrices, the key difference is that it does not need the diagonal tensor, providing more flexibility. Tucker Decomposition Y ≈ PR1 r1=1 ... PRN rN =1 ar1r2...rN b1 r ◦ b2 r ◦ ...bN r = Ax1mB1x2mB2....xNmBN Y v1 = (BN ⊗R BN−1... ⊗R B1)Av1 where Ar1,r2,..,rN are entries of the of the small size core tensor A ∈ RR1xR2xRx...RN and Bn = [b1 r , b2 r , ..., bN r ] which are factor matrix. Y v1 is the mode-1 vectorization of the tensor Y Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 20 / 45
  • 21. Tucker Rank Multiple Linear Rank Unlike CP Decomposition Tucker decomposition uses what is called a Mulitple Linear Rank: (R1, R2, ..., RN). For a tensor Y ∈ RI1xI2xI3x...xIN we define its Multiple Linear Rank (Tucker Rank) as: rml (Y ) = (r(Y m1), r(Y m2), ..., r(Y mN)) where, Y mn) is the mode-n matricization of the tensor Y , r(Y mn) means the matrix rank of the mode-n matricization of tensor Y Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 21 / 45
  • 22. Tucker Decomposition vs CP Both are outer product decompositions, but they have very different structural properties. As a rule of thumb it is usually advised to use CPD for latent parameter estimation and Tucker for subspace estimation, compression, and dimensionality reduction. Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 22 / 45
  • 23. 3. Eigenvalue Decomposition Rank revealing decomposition associated with outer product rank. The symmetric eigenvalue decomposition of A ∈ S3(Rn) A = Pr i=1 λi vi ⊗ vi ⊗ vi where rank⊗(A) = min{r|A = Pr i=1 λi vi ⊗ vi ⊗ vi = r} Eigenvalue decomposition can be useful to highlight low-rank approximation which can be useful when combined with other algorithms later covered Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 23 / 45
  • 24. 4. Multi-linear Singular Value SVD and HOSVD The high-order singular value decomposition can be thought of as another special form of the Tucker Decomposition where the factor matrices and the core tensor are all orthogonal. To illustrate what is meant we will look at the 3-rd order tensor: HOSVD (Aa, :, :)(Ab, :, :) = 0, for (a ̸= b, a, b ∈ [1, J]) (A :, c, :)(A :, d, :) = 0, for (c ̸= d, c, d ∈ [1, J]) (A :, :, e)(A :, :, f ) = 0, for (e ̸= f , e, f ∈ [1, J]) Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 24 / 45
  • 25. Spatial Representation Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 25 / 45
  • 26. HOSVD Decomposition Alglorithm Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 26 / 45
  • 27. Algorithms Tensor Regression Tensor Variable Gaussian Process Regression STM Support Tensor Machines Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 27 / 45
  • 28. Tensor Regression One of the common tensor applications in the field of machine learning pertains to regression models, often used to predict stock markets forecast, weather forecast and more.[4] The traditional linear regression model: y = wTx + b where x ∈ RN is a sample feature vector and w ∈ RN is a coefficient vector, b is a bias. The tensor regression: y = W •X+b where W ∈ RI1xI2...IN is the coefficient tensor. Researchers have also used tensor regression with applications in neuroimaging data analysis. Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 28 / 45
  • 29. Tensor Regression Algorithm Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 29 / 45
  • 30. Tensor Variable Gaussian Process Regression Tensor variable Gaussian process regression is similar to the tensor linear regression. It takes the input Xi ∈ RI1I2,,IN , an Nth-order tensor and produces and output yi which is a scalar. The main difference is that the input here is subject to Gaussian distribution Tensor Variable Gaussian Regression yi = f (Xi ) + ϵi = 1, ..., N where Xi ∈ RI1,I2,..,IN are the Nth order tensor and yi is a scalar. ϵi ∼ N(0, σ2) And the non-linear function f (X) is defined by: f (X ∼ GP(m(X), k(X, X̃)|θ) where m(X) is the mean function and k(X, X̃) is the kernel function Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 30 / 45
  • 31. SVM And STM SVM and STM To solve a minimization problem for the Support Vector Machines we have the following formula min∥w∥2 2 + λ 2 ξT ξ s.t yj (wT xj + Cb) ≥ 1 − ξj , ξj ≥ 0 Where ξ = [ξ1, ξ2, ..., ξM] ∈ RM We extend this into Tensors and get the following expressions: min∥W∥2 2 + C P ξj s.t yj (W • Xj + b) ≥ 1 − ξj , ξj = 1, 2, ..., M Usually we chose to decompose the tensor W into the form of the rank-one vector outer product i.e.: W = w1 ◦ w2... ◦ wN and the solution of STM is similar to the solution of CP decomposition. Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 31 / 45
  • 32. Applications Tensor Algorithms In Machine Learning And Its Applications Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 32 / 45
  • 33. Feature Tensor For Classification The use of tensors comes with many advantages to other types of data structures. It allows for users to identify key features and reduce dimensions by applying the techniques seen before. One example is in the case of Feature tensor generation (Tensorization) which is often used in image processing. By finding feature tensors, that is, extracting valid data, image classification can be better, thereby improving classification accuracy. Usually we need to use some means to convert 2D images into 3D feature tensors to extract information. Feature tensor generation transforms the original image X into another 3rd-order high-dimensional image Y , which can maintain the spatial relationship between the images. The size of each transformed image Y is much smaller than the original image X, and the original image X can be accurately recovered from the transformed 3D image Y . Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 33 / 45
  • 34. Algorithm For Tensor Based Feature For Image Classification Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 34 / 45
  • 35. Supervised Classification With Tucker Decomposition Researchers have shown that you can used Tucker decomposition to fuse 2 feature: face recognition features (FRF) and facial attribute features (FAF) can enhance face recognition performance in various challenging scenarios. Tensor can also combine various high-dimensional features to improve the classification accuracy. The researchers found that by combining these 2 with Tucker Decomposition their results to enhance face recognition performance as the features provide complimentary information. [1] Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 35 / 45
  • 36. Tucker Decomposition For Feature Fusion We can recover the original image by reversing the above steps. For the feature tensor, it is highly compatible with the deep learning method commonly used in images, Convolutional Neural Network (CNN). Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 36 / 45
  • 37. Tensor Based Feature For Face Recognition ”Overall, face recognition features (FRF) are very discriminative but less robust; while facial attribute features (FAF) are robust but less discriminative. Thus these two features are potentially complementary, if a suitable fusion method can be devised. To the best of our knowledge, we are the first to systematically explore the fusion of FAF and FRF in various face recognition scenarios. We empirically show that this fusion can greatly enhance face recognition performance”[1] Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 37 / 45
  • 38. Data Pre-Processing Techniques In data processing a usual problem is the lack of complete values or data corruption. These can be interpreted as missing data in images due to low quality of the hardware as well as issues in storage and memory. There exists many ways of completing the missing values and techniques such as tensor estimation and tensor completion can be applied. To solve for the missing data a minimization problem is established that looks at minimizing the mean square error between the estimated value and the original value. Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 38 / 45
  • 39. Tensor Based Feature For Face Recognition With Incomplete Data One key example is the paper[7] in which researchers collected multidimensional data with missing entries to test the ability to recreate images based on different levels of missing data. They use Tucker and CP decomposition to propose 2 methods of low-rank tensor decomposition with feature variance maximization. Results conclude that by using these methods, their algortihm is able to outperform existing techniques Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 39 / 45
  • 40. Tensor Based Feature For Face Recognition With Incomplete Data Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 40 / 45
  • 41. Tensors and Deep learning The ability of tensors to capture large-scale of data and allow for compression in deep learning applications is being explored as well. Tensors provide the ability to factorize large data into networks of smaller elements which can be used as input for deep learning: image classification algorithms. We can recover the original image by reversing the above steps. For the feature tensor, it is highly compatible with the deep learning method commonly used in images, Convolutional Neural Network (CNN). So for general image processing, it can be classified firstly by finding the feature tensor of the image and then we can use CNN to classify Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 41 / 45
  • 42. Tensors For Deep learning Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 42 / 45
  • 43. Research Limitations We must keep in mind that these algorithms although effective, have a common problem, that is, the problem of initialization. In deep learning and machine learning, if the weight initialization is not appropriate, it will cause long convergence time or even non-convergence. In the case of images, the challenges come from the quality of the target image, there is a need for algorithms that are designed to capture geometric local structure with tensors in mind. Specially for dynamic video images. In the context of deep learning a major concern comes from finding saddle points and local minimas, a problem which increases with the increased dimensionality. Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 43 / 45
  • 44. Reference List 1 G. Hu, Y. Hua, Y. Yuan, Z. Zhang, Z. Lu, S. S. Mukherjee, T. M. Hospedales, N. M. Robertson, and Y. Yang, “Attribute-enhanced face recognition with neural tensor fusion networks,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Venice, Italy, Oct. 2017, pp. 3764–3773. 2 Q. Shi , Y. Cheung, Q Zhao, H. Lu ; “Feature extraction for incomplete data via low-rank tensor decomposition with feature regularization,” IEEE transactions on neural networks and learning systems. 3 S. Rabanser, O. Shchur, and S. Günnemann, “Introduction to tensor decompositions and their applications in machine learning,” arXiv.org, 29-Nov-2017. [Online]. Available: https://arxiv.org/abs/1711.10781. 4 M. Hou, “Tensor-based regression models and applications,” Ph.D. dissertation, Laval Univ., Quebec City, QC, Canada, 2017. 5 H. Chen, Q. Ren, and Y. Zhang, “A hierarchical support tensor machine structure for target detection on high-resolution remote Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 44 / 45
  • 45. Reference List Continued 6 H. Yang, J. Su, Y. Zou, B. Yu, and E. F. Y. Young, “Layout hotspot detection with feature tensor generation and deep biased learning,” in Proc. 54th ACM/EDAC/IEEE Design Autom. Conf. (DAC), Austin, TX, USA, 2017, pp. 1–6. 7 A. Cichocki, N. Lee, I. Oseledets, A.-H. Phan, Q. Zhao, and D. P. Mandic, “Tensor networks for dimensionality reduction and large- scale optimization: Part 1 low-rank tensor decompositions,” Found. Trends Mach. Learn., vol. 9, nos. 4–5, pp. 249–429, 2016. Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 45 / 45