A guide to Tensor and its applications in Machine Learning.pdf

A guide to Tensor and its applications in
Machine Learning
A brief introduction
Vanessa Bridge1 Prof. Gao2
1Faculty of Mathematics
York University
27, March 2023
Vanessa, Bridge (York U) Tensors And Their Applications ICLR 2023 1 / 45

Table of Contents
1 Introduction
2 Tensors
3 Decomposition
4 Machine Learning Applications
5 Research Applications

Introduction
In a world were the data set sizes are ever increasing, mathematicians keep
developing tools to analyze them. Tensors and their applications permit
the optimization of high dimensional problems through a number of
techniques that will be covered in this seminar.
Complex tasks such as Attribute-Enhanced Face Recognition can be
done by using Neural Tensor Fusion Networks[1].
Feature Extraction for Incomplete Data[2]

Motivation
In order to understand the applications of tensors we will start with a
brief overview of their definitions. We will cover some of the key
operations and explain the concept of decomposition as well as their many
uses in the field of machine learning.
Figure: Different Order Tensors

Tensors
Definition
A tensors can be thought of as multi-way collections of numbers, which
typically come from a field R. In the simplest high-dimensional case, such
a tensor would be a three-dimensional array, which can be thought of as a
data cube.
Example: data in various forms like images, audio, video and text can be
represented as these multi-dimensional arrays.
Operations
Tensors are manipulated using linear algebra operations, such as addition,
multiplication and element-wise products, such as the Kronecker Product
or Kathri-Rao Product to perform computations in neural networks. These
operations are efficient on modern hardware, such as GPUs, and can be
parallelized to accelerate training and inference.

Tensor Order
Notation
Y ∈ RI1xI2xI3x...xIN , represents the Nth-order tensor
yi1 ,i2 , ...,iN
represents the entries of an Nth-order Y tensor
For example, a tensor Y ∈ R3x4x5x6 is a tensor of of order 4, size 3 in
mode-1, size 4 in mode-2, size 5 in mode-3 and size 6 in mode-4

Tensor Indexing
Figure: Lateral, horizontal, and frontal slices of a mode-3 tensor
Fibers
We can create subarrays (or subfields) by fixing some of the given tensor’s
indices. Fibers are created when fixing all but one index, slices (or slabs)
are created when fixing all but two indices.
Example: For a third order tensor the fibers are given as x:jk = xjk
(column), xi:k (row), and xij: (tube); the slices are given as X::k = Xk
(frontal), X:j: (lateral), Xi:: (horizontal).

Operations
Tensor Addition
C = A + B where A ∈ RI1xI2xI3x...xIN B ∈ RI1xI2xI3x...xIN are both Nth-order
tensors, C ∈ RI1xI2xI3x...xIN ci1 ,i2 , ...,iN
= ai1 ,i2 , ...,iN
+bi1 ,i2 , ...,iN
Tensor Mode-n Product and a Matrix
C = AxnmB , where in xnm, m means matrix, n means mode-n,
A ∈ RI1xI2xI3x...xIN means the Nth-order tensor, B ∈ RJxIn means the matrix
yi1 ,i2 , ...,iN
represents the entries of an Nth-order Y tensor
Tensor Mode-(a,b) Product or Tensor Contraction
C = Ax(a,b)B , where A ∈ RI1xI2xI3x...xIN means the Nth-order tensor,
B ∈ RJ1xJ2x...xJM means another tensor
C ∈ RI1xIa−1xIa+1x..IN xJ1x...Jb−1xJb+1..xJM with entries
ci1 , ...,ia−1 ,ia+1 , ...,iN
,j1 , ...jb−1
,jb+1
, ...jM
=
PIa
ia
ai1,...,ia,...,in bj1,...jb−1,jb+1,...,jM
tensor

Tensor Contraction Visually Explained

Tensor Basic Product
Outer Product Of A Tensor
The vector outer product is defined as the product of the vector’s
elements. This operation is denoted by the symbol ◦ . The vector outer
product of two n-sized C = A ◦ B
where A ∈ RI1xI2xI3X...IN andB ∈ RJ1xJ2xJ3X...JN
they yield an (N+M)th-order tensor C with entries
ci1,...,iN ,j1,...,jM
= ai1,...,iN
bj1,...,jM

Tensor Products
Right Kronecker Product
C = A ⊗R B where R means right, A ∈ RI1xI2xI3X...IN and
B ∈ RJ1xJ2xJ3X...JM they yield a tensor C ∈ RJ1I1xJ2I2x..JN IN with entries
ci1j1,...iN jN
= ai1,...,iN
bj1,...,jN
where iNjN = jN + (iN − 1)jN
Figure: Example of a Right Kronecker Product

The Kathri-Rao Product
Right Khatri-Rao Product
C = A ⊙R B
= [a1 ⊗R b1, a2 ⊗R b2, ...aK ⊗R bK , ] ∈ RIJxK
where A = [a1, a2, ....aK ] ∈ RIxK , B = [b1, b2, ....bK ] ∈ RJxK

Why Use Tensors?
One of the main advantages of using tensors to represent data is the
ability to apply techniques such as decomposition which helps reduce
complexity and run-time in different applications.[7] Decomposition
basically allows to divide the data into related and irrelevant parts. These
techniques allow to for the compression of high dimensional data while
keeping consistency and correlation.

Tensor Decomposition
There exist many types of tensor decomposition and we will cover some in
the following section [3]
Canonical Polyadic (CP) Decomposition
Tucker Decomposition
Eigenvalue Decomposition
Multilinear Singular Value SVD and HOSVD
Hierchical Tucker HT
Tensor Train TT

CP Decomposition
The key concept of rank decomposition is to express a tensor as the sum
of a finite number of rank-one tensors.
Rank-1 tensor
A rank-1 tensor: Y = b1
◦ b2
... ◦ bN
where
Y ∈ RI1xI2xI3x...xIN , bn
∈ RIn , yi1 ,i2 , ...,iN
= b1
i1
...bN
iN
The constrained low-rank matrix factorization
C = ΛABT + E =
PR
r=1 λr ar bT
r + E
CP Decomposition
In CP decomposition, tensor is decomposed into the linear sum of the
vectors defined in above
Y ≈
PR
r=1 λr b1
r ◦ b2
r ◦ ...bN
r = Λx1mB1x2mB2....xNmBN where
λr = Λr ,r ,r , ..,r in[1, R] are entries of the diagonal core tensor
Λ ∈ RRxRxRx...R and Bn = [b1
r , b2
r , ..., bN
r ] ∈ RInxR which are factor matrix.

CP Challenges
Tensors have interference (i.e.: data loss or noise)
No exact solution exist
Need to solve for one factor matrix at a time

CP Approximation Solution
Solution
The idea is to first find the factor matrix Bn by minimizing an appropriate
loss function, similar to the least square method.[4]
We minimize the loss function in the upper form, and use the
alternating least square method.
One of those N factor matrices Bn, is optimized separately at a time, keep
the values of other N-1 factor matrices unchanged i.e.:
We first initialize all N factor matrices, and optimize only B1 by gradient
descent while keep the initial values of B2 to BN unchanged.

CP Loss Function
Loss function: Mean Squared Error vs CP Decomposition
The general approach to solving these equations comes to trying to find
the factor matrix Bn by minimizing the appropriate loss function such as
the least square method:
Sum of Squared Errors:
P
ijk(x(i, j, k) − x̃(i, j, k))2 = ∥X − X̃∥2
X ≈ X̃ =
PR
r=1 ar ◦ br r ◦ cr = [[A, B, C]]
Figure: Decomposition for 3rd order tensor

CP Decomposition Algorithm

2. Tucker Decomposition
Tucker decomposition like CP, consists of dividing the tensor into small
size core tensors and factor matrices, the key difference is that it does not
need the diagonal tensor, providing more flexibility.
Tucker Decomposition
Y ≈
PR1
r1=1 ...
PRN
rN =1 ar1r2...rN
b1
r ◦ b2
r ◦ ...bN
r = Ax1mB1x2mB2....xNmBN
Y v1 = (BN ⊗R BN−1... ⊗R B1)Av1
where Ar1,r2,..,rN
are entries of the of the small size core tensor
A ∈ RR1xR2xRx...RN and Bn = [b1
r , b2
r , ..., bN
r ] which are factor matrix. Y v1
is the mode-1 vectorization of the tensor Y

Tucker Rank
Multiple Linear Rank
Unlike CP Decomposition Tucker decomposition uses what is called a
Mulitple Linear Rank: (R1, R2, ..., RN).
For a tensor Y ∈ RI1xI2xI3x...xIN
we define its Multiple Linear Rank (Tucker Rank) as:
rml (Y ) = (r(Y m1), r(Y m2), ..., r(Y mN))
where, Y mn) is the mode-n matricization of the tensor Y , r(Y mn) means
the matrix rank of the mode-n matricization of tensor Y

Tucker Decomposition vs CP
Both are outer product decompositions, but they have very different
structural properties. As a rule of thumb it is usually advised to use CPD
for latent parameter estimation and Tucker for subspace estimation,
compression, and dimensionality reduction.

3. Eigenvalue Decomposition
Rank revealing decomposition associated with outer product rank. The
symmetric eigenvalue decomposition of A ∈ S3(Rn)
A =
Pr
i=1 λi vi ⊗ vi ⊗ vi
where rank⊗(A) = min{r|A =
Pr
i=1 λi vi ⊗ vi ⊗ vi = r}
Eigenvalue decomposition can be useful to highlight low-rank
approximation which can be useful when combined with other algorithms
later covered

4. Multi-linear Singular Value SVD and HOSVD
The high-order singular value decomposition can be thought of as another
special form of the Tucker Decomposition where the factor matrices and
the core tensor are all orthogonal. To illustrate what is meant we will look
at the 3-rd order tensor:
HOSVD
(Aa, :, :)(Ab, :, :) = 0, for (a ̸= b, a, b ∈ [1, J])
(A :, c, :)(A :, d, :) = 0, for (c ̸= d, c, d ∈ [1, J])
(A :, :, e)(A :, :, f ) = 0, for (e ̸= f , e, f ∈ [1, J])

Spatial Representation

HOSVD Decomposition Alglorithm

Algorithms
Tensor Regression
Tensor Variable Gaussian Process Regression
STM Support Tensor Machines

Tensor Regression
One of the common tensor applications in the field of machine learning
pertains to regression models, often used to predict stock markets forecast,
weather forecast and more.[4] The traditional linear regression model:
y = wTx + b
where x ∈ RN is a sample feature vector and w ∈ RN is a coefficient
vector, b is a bias.
The tensor regression: y = W •X+b
where W ∈ RI1xI2...IN is the coefficient tensor.
Researchers have also used tensor regression with applications in
neuroimaging data analysis.

Tensor Regression Algorithm

Tensor Variable Gaussian Process Regression
Tensor variable Gaussian process regression is similar to the tensor linear
regression. It takes the input Xi ∈ RI1I2,,IN , an Nth-order tensor and
produces and output yi which is a scalar. The main difference is that the
input here is subject to Gaussian distribution
Tensor Variable Gaussian Regression
yi = f (Xi
) + ϵi = 1, ..., N
where Xi
∈ RI1,I2,..,IN are the Nth order tensor and yi is a scalar.
ϵi ∼ N(0, σ2)
And the non-linear function f (X) is defined by:
f (X ∼ GP(m(X), k(X, X̃)|θ)
where m(X) is the mean function and k(X, X̃) is the kernel function

SVM And STM
SVM and STM
To solve a minimization problem for the Support Vector Machines we
have the following formula
min∥w∥2
2 + λ
2 ξT ξ
s.t yj (wT xj + Cb) ≥ 1 − ξj , ξj ≥ 0
Where ξ = [ξ1, ξ2, ..., ξM] ∈ RM
We extend this into Tensors and get the following expressions:
min∥W∥2
2 + C
P
ξj
s.t yj (W • Xj + b) ≥ 1 − ξj , ξj = 1, 2, ..., M
Usually we chose to decompose the tensor W into the form of the
rank-one vector outer product i.e.: W = w1 ◦ w2... ◦ wN and the solution
of STM is similar to the solution of CP decomposition.

Applications
Tensor Algorithms In Machine Learning And Its Applications

Feature Tensor For Classification
The use of tensors comes with many advantages to other types of data
structures. It allows for users to identify key features and reduce
dimensions by applying the techniques seen before.
One example is in the case of Feature tensor generation (Tensorization)
which is often used in image processing. By finding feature tensors, that
is, extracting valid data, image classification can be better, thereby
improving classification accuracy. Usually we need to use some means to
convert 2D images into 3D feature tensors to extract information.
Feature tensor generation transforms the original image X into another
3rd-order high-dimensional image Y , which can maintain the spatial
relationship between the images. The size of each transformed image Y is
much smaller than the original image X, and the original image X can be
accurately recovered from the transformed 3D image Y .

Algorithm For Tensor Based Feature For Image
Classification

Supervised Classification With Tucker Decomposition
Researchers have shown that you can used Tucker decomposition to fuse 2
feature: face recognition features (FRF) and facial attribute features (FAF)
can enhance face recognition performance in various challenging scenarios.
Tensor can also combine various high-dimensional features to improve the
classification accuracy. The researchers found that by combining these 2
with Tucker Decomposition their results to enhance face recognition
performance as the features provide complimentary information. [1]

Tucker Decomposition For Feature Fusion
We can recover the original image by reversing the above steps. For the
feature tensor, it is highly compatible with the deep learning method
commonly used in images, Convolutional Neural Network (CNN).

Tensor Based Feature For Face Recognition
”Overall, face recognition features (FRF) are very discriminative but less
robust; while facial attribute features (FAF) are robust but less
discriminative. Thus these two features are potentially complementary, if a
suitable fusion method can be devised. To the best of our knowledge, we
are the first to systematically explore the fusion of FAF and FRF in various
face recognition scenarios. We empirically show that this fusion can
greatly enhance face recognition performance”[1]

Data Pre-Processing Techniques
In data processing a usual problem is the lack of complete values or data
corruption. These can be interpreted as missing data in images due to low
quality of the hardware as well as issues in storage and memory. There
exists many ways of completing the missing values and techniques such as
tensor estimation and tensor completion can be applied. To solve for the
missing data a minimization problem is established that looks at
minimizing the mean square error between the estimated value and the
original value.

Tensor Based Feature For Face Recognition With
Incomplete Data
One key example is the paper[7] in which researchers collected
multidimensional data with missing entries to test the ability to recreate
images based on different levels of missing data. They use Tucker and CP
decomposition to propose 2 methods of low-rank tensor decomposition
with feature variance maximization. Results conclude that by using these
methods, their algortihm is able to outperform existing techniques

Tensor Based Feature For Face Recognition With
Incomplete Data

Tensors and Deep learning
The ability of tensors to capture large-scale of data and allow for
compression in deep learning applications is being explored as well.
Tensors provide the ability to factorize large data into networks of smaller
elements which can be used as input for deep learning: image classification
algorithms.
We can recover the original image by reversing the above steps. For the
feature tensor, it is highly compatible with the deep learning method
commonly used in images, Convolutional Neural Network (CNN). So for
general image processing, it can be classified firstly by finding the feature
tensor of the image and then we can use CNN to classify

Tensors For Deep learning

Research Limitations
We must keep in mind that these algorithms although effective, have a
common problem, that is, the problem of initialization. In deep learning
and machine learning, if the weight initialization is not appropriate, it will
cause long convergence time or even non-convergence.
In the case of images, the challenges come from the quality of the target
image, there is a need for algorithms that are designed to capture
geometric local structure with tensors in mind. Specially for dynamic video
images.
In the context of deep learning a major concern comes from finding saddle
points and local minimas, a problem which increases with the increased
dimensionality.

Reference List
1 G. Hu, Y. Hua, Y. Yuan, Z. Zhang, Z. Lu, S. S. Mukherjee, T. M.
Hospedales, N. M. Robertson, and Y. Yang, “Attribute-enhanced face
recognition with neural tensor fusion networks,” in Proc. IEEE Int.
Conf. Comput. Vis. (ICCV), Venice, Italy, Oct. 2017, pp. 3764–3773.
2 Q. Shi , Y. Cheung, Q Zhao, H. Lu ; “Feature extraction for
incomplete data via low-rank tensor decomposition with feature
regularization,” IEEE transactions on neural networks and learning
systems.
3 S. Rabanser, O. Shchur, and S. Günnemann, “Introduction to tensor
decompositions and their applications in machine learning,” arXiv.org,
29-Nov-2017. [Online]. Available: https://arxiv.org/abs/1711.10781.
4 M. Hou, “Tensor-based regression models and applications,” Ph.D.
dissertation, Laval Univ., Quebec City, QC, Canada, 2017.
5 H. Chen, Q. Ren, and Y. Zhang, “A hierarchical support tensor
machine structure for target detection on high-resolution remote

Reference List Continued
6 H. Yang, J. Su, Y. Zou, B. Yu, and E. F. Y. Young, “Layout hotspot
detection with feature tensor generation and deep biased learning,” in
Proc. 54th ACM/EDAC/IEEE Design Autom. Conf. (DAC), Austin,
TX, USA, 2017, pp. 1–6.
7 A. Cichocki, N. Lee, I. Oseledets, A.-H. Phan, Q. Zhao, and D. P.
Mandic, “Tensor networks for dimensionality reduction and large-
scale optimization: Part 1 low-rank tensor decompositions,” Found.
Trends Mach. Learn., vol. 9, nos. 4–5, pp. 249–429, 2016.

A guide to Tensor and its applications in Machine Learning.pdf

More Related Content

Similar to A guide to Tensor and its applications in Machine Learning.pdf (20)

Recently uploaded (20)

A guide to Tensor and its applications in Machine Learning.pdf