SlideShare a Scribd company logo
International Journal of Reconfigurable and Embedded Systems (IJRES)
Vol. 12, No. 2, July 2023, pp. 269~275
ISSN: 2089-4864, DOI: 10.11591/ijres.v12.i2.pp269-275  269
Journal homepage: http://ijres.iaescore.com
Video saliency-detection using custom spatiotemporal fusion
method
Vinay C. Warad, Ruksar Fatima
Department of Computer Science and Engineering, Khaja Bandanawaz College of Engineering, Kalaburagi, India
Article Info ABSTRACT
Article history:
Received Jul 20, 2022
Revised Oct 15, 2022
Accepted Dec 10, 2022
There have been several researches done in the field of image saliency but
not as much as in video saliency. In order to increase precision and accuracy
during compression, reduce coding complexity and time consumption along
with memory allocation problems with our proposed solution. It is a
modified high-definition video compression (HEVC) pixel based consistent
spatiotemporal diffusion with temporal uniformity. It involves taking apart
the video into groups of frames, computing colour saliency, integrate
temporal fusion, pixel saliency fusion is conducted and then colour
information guides the diffusion process for the spatiotemporal mapping
with the help of permutation matrix. The proposed solution is tested on a
publicly available extensive dataset with five global saliency valuation
metrics and is compared with several other state-of-the-art saliency detection
methods. The results display and overall best performance amongst all other
candidates.
Keywords:
Computing colour saliency
High-definition video
compression pixel
Image saliency
Spatiotemporal diffusion
Video saliency
This is an open access article under the CC BY-SA license.
Corresponding Author:
Vinay C. Warad
Department of Computer Science and Engineering, Khaja Bandanawaz College of Engineering
Kalaburagi, Karnataka 585104, India
Email: vinay_c111@rediffmail.com
1. INTRODUCTION
The world has tried to imitate the functioning of the human eye and the brain. The marvel of the
brain to distinguish among the important and non-important features of the view the eyes are seeing and take
in only whatever is necessary. Various researchers have imitated this process and in today’s word, we have
this in the form of conference videos, broadcasting and streaming. There have been several researches in the
field of image saliency but not in video saliency. Few researches that have made a significant impact in this
field. Itti’s model is one of the most [1] researched and most prominent models for image saliency. Fourier
transformation is used with the help of phase spectrum and [2], [3] helps image saliency using frequency
tuning. They have used the principles of inhibition of return and winner take all that is inspired from the
visual nervous system [4], [5].
It is difficult for video saliency detection, as images are not still, making memory allocation and
computational complexity increased. It has a video saliency detection methodology [6] that involves
determining the position of an object with reference to another. They use computation of space-time-saliency
map as well as computation of motion saliency map [7]-[10]. Fused static and dynamic saliency mapping
[11] to obtain a space- time saliency detection model. Here dynamic texture model is employed [12] to obtain
motion patterns for both stationary and dynamic scenes.
They have used fusion model but it results in low-level saliency [13]-[15]. They have used global
temporal clues to forge a robust low-level saliency map [16], [17]. The disadvantage of these methodologies
is that the accumulation of error is quite high and this has led to several wrong detections.
 ISSN: 2089-4864
Int J Reconfigurable & Embedded Syst, Vol. 12, No. 2, July 2023: 269-275
270
The proposed solution is a modified spatiotemporal fusion saliency detection method. It involves a
spatiotemporal background to obtain high saliency values around the foreground objects. Then after ignoring
the hollow effects, a series of adjustments are made to the general saliency strategies to increase efficiency of
both motion and colour saliencies. The usage of cross frame super pixels and one to one spatial temporal
fusion helps in overall increase in accuracy and precision during compression.
2. RELATED WORK
In this section, the works of some of the research papers that have helped in the completion of the
proposed algorithm have been mentioned. This survey talks about the various video saliency methodologies
along with their advantages and disadvantages [18]. Borji [19], it has also the same outline of the paper but it
also includes the various aspect, which make it difficult for the algorithms to imitate the human eye-brain
coordination and how to overcome them.
This paper has a notable contribution to this field of research [20]. It has a database named dynamic
human fixation 1K (DHF1K) that helps in pointing out fixations that are needed during dynamic scene free
viewing, then there is the attentive convolutional neural network-long short-term memory network (ACLNet)
which has augmentations to the original convolutional neural network and long short-term memory (CNN-
LSTM) model to enable fast end-to-end saliency learning. In this paper [21], [22] they have made some
corrections in the smooth pursuits (SP) logic. It involves manual annotations of the SPs with fixation along
the arithmetic points and SP salient locations by training slicing convolutional neural networks.
High-definition video compression (HEVC) system has become the new standard video
compression algorithms used today. With making changes to the HEVC algorithms with the help of a spatial
saliency algorithm that uses the concept of a motion vector [23], It has led to better compression and
efficiency. They haves introduced a salient object segmentation that uses the combination of conditional
random field (CRF) and saliency measure. It has used statistical framework and local colour contrasting,
motion and illumination features [24]. Fang et al. [25] is also using spatiotemporal fusion with uncertainty in
statistics to measure visual saliency. They have used geodesic robustness methodology to get the saliency
map [26], [27]. Has been a great help to our solution formation with its super-pixel usage and adaptive colour
quantization [28]-[30]. Its measurement of difference between spatial distance and histograms has helped to
obtain the super-pixel saliency map. They gave us an overall idea of the various evaluation metrics to be used
in this paper [31], [32]. The first section has the introduction and section 2 succeeds it with the related work
[33]. Section 3 and 4 displays the proposed algorithm, its methodologies and modifications along with its
final experimentation and comparison. Section 5 concludes the paper.
3. PROPOSED SYSTEM
3.1. Modeling based saliency adjustment
The robustness is obtained by combining long-term inter batch information with colour contrast
computation. Background and foreground appearance models are represented by 𝐵𝑀 ∈ ℝ3×𝑏𝑛
and 𝐹𝑀 ∈
ℝ3×𝑓𝑛
with 𝑏𝑛 𝑎𝑛𝑑 𝑓𝑛 being their sizes respectively. The 𝑖 − 𝑡ℎ super pixel’s RGB history in all regions is
taken care of with the following equations 𝑖𝑛𝑡𝑟𝑎𝐶𝑖
= exp(𝜆 − |𝜑(𝑀𝐶𝑖) − 𝜑(𝐶𝑀𝑖)|) ; 𝜆 = 0.5 and 𝑖𝑛𝑡𝑒𝑟𝐶𝑖
=
𝜑(
min||(𝑅𝑖,𝐺𝑖,𝐵𝑖),𝐵𝑀||2
⋅
1
𝑏𝑛
∑||(𝑅𝑖,𝐺𝑖,𝐵𝑖),𝐵𝑀||2
min||(𝑅𝑖,𝐺𝑖,𝐵𝑖),𝐹𝑀||
2
⋅
1
𝑓𝑛
∑||(𝑅𝑖,𝐺𝑖,𝐵𝑖),𝐹𝑀||
2
). Here, 𝜆 is the upper bound discrepancy degree and helps inversing
the penalty between the motion and color saliencies.
3.2. Contrast-based saliency mapping
The video sequence is now divided into several short groups of frames 𝐺𝑖 = {𝐹1, 𝐹2, 𝐹3, … . , 𝐹𝑛}.
Each frame 𝐹𝑘, where (𝑘denotes the frame number) undergoes modification using simple linear iterative
clustering with boundary-aware smoothing method which removes the unnecessary details. The colour and
motion gradient mapping to help form the spatiotemporal gradient map with help of pixel-based computation
is given by 𝑆𝑀𝑇 = ||𝑢𝑥, 𝑢𝑦||2
⨀||∇(𝐹)||2
. That is, horizontal and vertical gradient of optical flow and ∇(𝐹)
colour gradient map. We then calculate the 𝑖 − 𝑡ℎ super pixel’s motion contrast using (1).
𝑀𝐶𝑖 = ∑
||𝑈𝑖,𝑈𝑗||
2
||𝑎𝑖,𝑎||2
,
𝑎𝑗∈𝜓𝑖
𝜓𝑖 = {𝜏 + 1 ≥ ||𝑎𝑖, 𝑎𝑗||
2
≥ 𝜏} (1)
Int J Reconfigurable & Embedded Syst ISSN: 2089-4864 
Video saliency-detection using custom spatiotemporal fusion method (Vinay C. Warad)
271
Where 𝑙2 norm has been used and 𝑈 and 𝑎𝑖 denote the optical flow gradient in two directions and 𝑖 −
𝑡ℎ super-pixel position centre respectively. 𝜓𝑖is used to denote computational contrast range and is calculated
using shortest Euclidean distance between spatiotemporal map and 𝑖 − 𝑡ℎ superpixel.
𝜏 =
𝑟
||Λ(𝑆𝑀𝑇)||0
∑ ||Λ(𝑆𝑀𝑇𝜏
)||
0
𝜏∈||𝜏,𝑖||≤𝑟 ; 𝑙 = 0.5 min{𝑤𝑖𝑑𝑡ℎ, ℎ𝑒𝑖𝑔ℎ𝑡} , Λ → 𝑑𝑜𝑤𝑛 𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔 (2)
Colour saliency is also computed the same way as optical flow gradient, except we use the red, blue
and green notations for the 𝑖 − 𝑡ℎ super pixel. So, the equation is 𝐶𝑀 = ∑
||(𝑅𝑖,𝐺𝑖,𝐵𝑖,),(𝑅𝑗,𝐺𝑗,𝐵𝑗)||
2
||𝑎𝑖,𝑎𝑗||
2
𝑎𝑗∈𝜓𝑖
. The
following equation smoothens both 𝑀𝐶 and 𝐶𝑀 as temporal and saliency value refining is done by spatial
information integration.
𝐶𝑀𝑘,𝑖 ←
∑ ∑ exp (−||𝑐𝑘,𝑖
𝑎𝜏,𝑗∈𝜇𝜙
,𝑐 𝜏,𝑗||1 𝜇)⋅𝐶𝑀𝜏,𝑗
⁄
𝑘+1
𝜏=𝑘−1
∑ ∑ exp (−||𝑐𝑘,𝑖
𝑎𝜏,𝑗∈𝜇𝜙
,𝑐 𝜏,𝑗||1 𝜇)
⁄
𝑘+1
𝜏=𝑘−1
(3)
Here, 𝑐𝑘,𝑖 is the average of the 𝑖 − 𝑡ℎ super-pixel RGB colour value in 𝑘 − 𝑡ℎ frame while 𝜎 controls
smoothing strength. The ||𝑎𝑘,𝑖, 𝑎 𝜏,𝑗||
2
≤ 𝜃 needs to be satisfied and this is done using 𝜇.
𝜃 =
1
𝑚×𝑛
∑ ∑ ||
1
𝑚
𝑚
𝑖=1
𝑛
𝑘=1 ∑ 𝐹(𝑆𝑀𝑇𝑘,𝑖
𝑚
𝑖=1 ), 𝐹(𝑆𝑀𝑇𝑘,𝑖
)||1; 𝑚, 𝑛 = 𝑓𝑟𝑎𝑚𝑒 𝑛𝑢𝑚𝑏𝑒𝑟𝑠 (4)
𝐹(𝑆𝑀𝑇𝑖
) = {
𝑎𝑖, 𝑆𝑀𝑇𝑖
≤ 𝜖 ×
1
𝑚
∑ 𝑆𝑀𝑇𝑖
𝑚
𝑖=1
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
; 𝜖 = 𝑓𝑖𝑙𝑡𝑒𝑟 𝑠𝑡𝑟𝑒𝑛𝑔ℎ𝑡 𝑐𝑜𝑛𝑡𝑟𝑜𝑙 (5)
At each batch frame level, the 𝑞 − 𝑡ℎ frame’s smoothing rate is dynamically updated with (1 −
𝛾)𝜃𝑠−1 + 𝛾𝜃𝑠 → 𝜃𝑠; 𝛾 = (𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔 𝑤𝑒𝑖𝑔ℎ𝑡 ,0.2). Now the colour and motion saliency is integrated to get
the pixel-based saliency map𝐿𝐿𝑆 = 𝐶𝑀 ⊙ 𝑀𝐶. Since this fused saliency maps increases accuracy
considerably but the rate decreases, so this will be dealt with in the next section.
3.3. Accuracy boosting
Matrix 𝑀 is to be considered as the input. It will be decomposed using sparse 𝑆 and low level 𝐷 with
min
𝐷,𝑆
𝛼||𝑆||1
+ ||𝐷||∗
𝑠𝑢𝑏𝑗 = 𝑀 = 𝑆 + 𝐷 where the nuclear form of 𝐷 is used. With the help of robust
principal component analysis (RPCA) [30] and is showcased using 𝑆 ← 𝑠𝑖𝑔𝑛(𝑀 − 𝐷 − 𝑆)[|𝑀 − 𝐷 − 𝑆| −
𝛼𝛽]+ and 𝐷 ← 𝑉[Σ − 𝛽𝐼]+𝑈, (𝑉, Σ, 𝑈) ← 𝑠𝑣𝑑(𝑍). Where 𝑠𝑣𝑑(𝑍) denotes singular value decomposition of
Lagrange multiplier and 𝛼 𝑎𝑛𝑑 𝛽 represent lesser-rank and sparse threshold parameters respectively. For
reduction of incorrect detections caused by the misplacement of optical flow of super pixels in the
foreground’s region, the given region’s rough foreground is located and feature subspace of a frame 𝑘 is
spanned as 𝑔𝐼𝑘 = {𝐿𝐿𝑆𝑘,1
, 𝐿𝐿𝑆𝑘,2
, … . . 𝐿𝐿𝑆𝑘,𝑚
} and thus for the entire frame group we get 𝑔𝐵𝜏 =
{𝑔𝐼1, 𝑔𝐼2, … . , 𝑔𝐼𝑛}. This way the rough foreground is calculated as 𝑅𝐹𝑖
= [∑ 𝐿𝐿𝑆𝑘,𝑖
−
𝑛
𝑘=1
𝜔
𝑛×𝑚
∑ ∑ 𝐿𝐿𝑆𝑘,𝑖
]+
𝑚
𝑖=1
𝑛
𝑘=1 .
Here 𝜔 is reliability cotrol factor and we also get two subspaces by 𝐿𝐿𝑆 and RGB colour and it is
given by 𝑆𝐵 = {𝑐𝑣1, 𝑐𝑣2, … . , 𝑐𝑣𝑛} ∈ ℝ3𝑣×𝑛
where 𝑐𝑣𝑖 = {𝑣𝑒𝑐(𝑅𝑖,1, 𝐺𝑖,1, 𝐵𝑖,1, … . , 𝑅𝑖,𝑚, 𝐺𝑖,𝑚, 𝐵𝑖,𝑚)}𝐾
and
𝑆𝐹 = 𝑣𝑒𝑐(𝐿𝐿𝑆1
), … . 𝑣𝑒𝑐(𝐿𝐿𝑆𝑛
) ∈ ℝ𝑣×𝑛
. This helps in making a one-to-one correspondence and then pixel-
based saliency mapping infusion that is dissipated on the entire group of frames. 𝑆𝐵over𝑆𝐹 causes disruptive
foreground salient movements and hence with the help from [31]-[33] this issue was resolved with an
alternate solution.
min
𝑀𝑐𝑥,𝑆𝑐𝑥,𝜗,𝐴⊙𝜗
||𝑀𝑐||
∗
+ ||𝐷𝑥||
∗
+ ||𝐴 + 𝜗||2
+ 𝛼1||𝑆𝑐||
1
+ 𝛼2||𝑆𝑥||; || ∙ ||∗
𝑛𝑢𝑐𝑙𝑒𝑎𝑟 𝑛𝑜𝑟𝑚, 𝐴 𝑖𝑠 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑚𝑎𝑡𝑟𝑖𝑥𝑠. 𝑡 𝑀𝑐 = 𝐷𝑐 + 𝑆𝑐, 𝑀𝑠 = 𝐷𝑠 + 𝑆𝑥, 𝑀𝑐 = 𝑆𝐵 ⊙ 𝜗,
𝑀𝑥 = 𝑆𝐹 ⊙ 𝜗, 𝜗 = {𝐸1, 𝐸2, … . , 𝐸𝑛}, 𝐸𝑖 ∈ {0,1}𝑚×𝑚
, 𝐸𝑖1𝐾
= 1. (6)
𝐷𝑐, 𝐷𝑥variables represent colour and saliency mapping, 𝜗 is the permutation matrix while 𝑆𝑥, 𝑆𝑐
represents colour feature sparse component space and saliency feature space. This entire equation set helps in
correcting super-pixel correspondences.
 ISSN: 2089-4864
Int J Reconfigurable & Embedded Syst, Vol. 12, No. 2, July 2023: 269-275
272
3.4. Mathematical model
As shown in (6) generates a distributed version of convex problems 𝐷(𝑀𝑐𝑥, 𝑆𝑐𝑥, 𝜗, 𝐴 ⊙ 𝜗) =
𝛼1||𝑆𝑐||
1
+ 𝛼2||𝐸𝑥||
2
+ 𝛽1||𝑀𝑐||
∗
+ 𝛽2||𝑀𝑥||
∗
+ ||𝐴 ⊙ 𝜗||2
+ 𝑡𝑟𝑎𝑐𝑒(𝑍1
𝐾(𝑀𝑐 − 𝐷𝑐 − 𝑆𝑐)) +
𝑡𝑟𝑎𝑐𝑒(𝑍2
𝐾(𝑀𝑥 − 𝐷𝑥 − 𝑆𝑥)) +
𝜋
2
(||𝑀𝑐 − 𝐷𝑐 − 𝑆𝑐||
2
+ ||(𝑀𝑥 − 𝐷𝑥 − 𝑆𝑥)||
2
). Where 𝑍𝑖 represents Lagrangian
multiplier. 𝜋denotes steps of iterations and the optimized solution using partial derivative 𝑆𝑐,𝑥
𝑘+1
=
1
2
||𝑆𝑐,𝑥
𝑘
−
(𝑀𝑐,𝑥
𝑘
− 𝑆𝑐,𝑥
𝑘
+ 𝑍1,2
𝑘
𝜋𝑘||2
2
⁄ + min
𝑆𝑐,𝑥
𝑘
𝛼1,2 ||𝑆𝑐,𝑥
𝑘
||
1
/𝜋𝑘 and 𝐷𝑐,𝑥
𝑘+1
=
1
2
||𝐷𝑐,𝑥
𝑘
− (𝑀𝑐,𝑥
𝑘
− 𝐷𝑐,𝑥
𝑘
+ 𝑍1,2
𝑘
𝜋𝑘||2
2
⁄ +
min
𝐷𝑐,𝑥
𝑘
𝛽1,2 ||𝐷𝑐,𝑥
𝑘
||
∗
/𝜋𝑘.
𝐷𝑖 is updated to become 𝐷𝑐,𝑥
𝑘+1
← 𝑈𝐾
+ 𝑉 [Σ −
𝛽1,2
𝜋𝑘
], where (𝑉, Σ, 𝑈) ← 𝑠𝑣𝑑 (𝑀𝑐,𝑥
𝑘
− 𝑆𝑐,𝑥
𝑘
+
𝑍1,2
𝑘
𝜋𝑘
).
Similarly, for 𝑆𝑖,𝑆𝑐,𝑥
𝑘+1
← 𝑠𝑖𝑔𝑛 (
|𝐽|
𝜋𝑘
) [𝐽 −
𝛼1,2
𝜋𝑘
]
+
as 𝐽 = 𝑀𝑐,𝑥
𝑘
− 𝐷𝑐,𝑥
𝑘
+ 𝑍𝑐,𝑥
𝑘
/𝜋𝑘.
Value of 𝐸 is determined are used to compute the norm cost 𝐿 ∈ ℝ𝑚×𝑚
is calculated as 𝑙𝑖,𝑗
𝑘
=
||𝑂𝑘,𝑖 − 𝐻(𝑉1, 𝑗)||
2
, 𝑉1 = 𝐻(𝑆𝐵, 𝑘) ⊙ 𝐸𝑘 and 𝑙𝑖,𝑗
𝑘
= ||𝑂𝑘,𝑖 − 𝐻(𝑉2, 𝑗)||
2
, 𝑉2 = 𝐻(𝑆𝐵, 𝑘) ⊙ 𝐸𝑘. Then we use
and objective matrix 𝑂 to calculate the 𝑘 − 𝑡ℎ of 𝑅𝐹 and the equation is 𝑂𝑘,𝑖 = 𝑆𝑐,𝑥(𝑘, 𝑖) + 𝐷𝑐,𝑥(𝑘, 𝑖) −
𝑍1,2(𝑘, 𝑖)/𝜋𝑘 . There is a need to change 𝐿𝜏 as it is hard to approximate the value of 𝑚𝑖𝑛||𝐴 + 𝜗||2
. 𝐿𝜏 =
{𝑟1,1
𝜏
+ 𝑑1,1
𝜏
, 𝑟1,2
𝜏
+ 𝑑1,2
𝜏
, … . , 𝑟𝑚,𝑚
𝜏
+ 𝑑𝑚,𝑚
𝜏
} ∈ ℝ𝑚×𝑚
𝑓𝑜𝑟 𝑘 = [𝑘 − 1, 𝑘 + 1] is hanged to 𝐿𝑘 as shown in (7).
𝐻(𝐿𝑘, 𝑗) ← ∑ ∑ 𝐻(𝐿𝜏, 𝑣). exp (−||𝑐𝜏,𝑣, 𝑐𝑘,𝑗|| 1 𝜇)
⁄
𝑝𝑡,𝑣∈𝜉
𝑘+1
𝜏=𝑘−1 (7)
The global optimization is solved using the equations 𝑆𝐹𝑘+1
← 𝑆𝐹𝑘
⊙ 𝜗, 𝑆𝐵𝑘+1
𝑆𝐵𝑘
⊙ 𝜗 and
𝑍1,2
𝑘+1
← 𝜋𝑘(𝑀𝑐,𝑥
𝑘
− 𝐷𝑐,𝑥
𝑘
− 𝑆𝑐,𝑥
𝑘
) + 𝑍1,2
𝑘
where 𝜋𝑘+1 ← 𝜋𝑘 × 1.05. The alignment of the super pixels is now
given by 𝑔𝑆𝑖 =
1
𝑛−1
∑ 𝐻(𝑆𝐹 ⊙ 𝜗, 𝜏)
𝑛
𝜏=1,𝑖≠𝜏 . To reduce the incorrect detections and alignments we introduce
𝑆𝐹 and use (8)-(10).
𝑆𝐹
̃ ← 𝑆𝐹 ⊙ 𝜗 (8)
𝑆𝐹 ← 𝑆𝐹
̃ ∙ (1𝑚×𝑛
− 𝑋(𝑆𝑐)) + 𝜌 ∙ 𝑆𝐹
̅̅
̅̅ ∙ 𝑋(𝑆𝑐) (9)
𝜌𝑖,𝑗 = {
0.5,
1
𝑛
∑ 𝑆𝐹𝑖,𝑗 <
̃
𝑛
𝑗=1 𝑆𝐹𝑖,𝑗
̃
2, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
(10)
The equation for mapping for the 𝑖 − 𝑡ℎ video frame is given by 𝑔𝑆𝑖 =
𝐻(𝜌,𝑖)−(𝐻(𝜌,𝑖).𝑋(𝑆𝑐)
𝐻(𝜌,𝑖)(𝑛−1)
∑ 𝐻(𝑆𝐹 ⊙ 𝜗, 𝜏)
𝑛
𝜏=1,𝑖≠𝜏 . There is a need to diffuse inner temporal batch 𝑥𝑟 of the current
group’s frames based of degree of colour similarity. The final output is given by 𝑔𝑆𝑖,𝑗 =
𝑥𝑟∙𝑦𝑟+∑ 𝑦𝑖∙𝑔𝑆𝑖,𝑗
𝑛
𝑖=1
𝑦𝑟+∑ 𝑦𝑖
𝑛
𝑖=1
; 𝑦𝑟 = exp (− ||𝑐𝑟,𝑗, 𝑐𝑖,𝑗||
2
/𝜇). Where 𝑥𝑙showcases the colour distance-based weights.
4. RESULTS, EXPERIMENTS AND DATABASE
The proposed solution has been compared with [34] as a base reference as well as by [35]’s
operational block description length (OBDL) algorithm, [36]’s dynamic adaptive whitening saliency (AWS-
D) algorithm, the object-to-motion convolutional neural network two layer long short-term memory
(OMCNN-2CLSTM) algorithm in [36], attentive convolutional (ACL) algorithm [37], saliency-aware video
compression (SAVC) algorithm from [38] and [39]. The database used is the same as the one in the base
paper. It is a high-definition eye-tracking database with its open source available at GitHub
https://github.com/spzhubuaa/Video-based-Eye-Tracking-Dataset [40]. 10 video sequences with 3 different
resolutions, 1920 × 1080, 1280 × 720, and 832 × 480, were taken for experimentation. For evaluating the
performance of all the saliency methods, we employed five global evaluation metrics, namely area under the
ROC curve (AUC), Similarity (SIM), correlation coefficient (CC), normalized scanpath saliency (NSS) and
Kullback-Leibler (KL).
Int J Reconfigurable & Embedded Syst ISSN: 2089-4864 
Video saliency-detection using custom spatiotemporal fusion method (Vinay C. Warad)
273
The XU algorithm is quite similar to HEVC; hence its saliency detection is better than most
algorithms but is faces problems when there are complex images as input. Other than that, our proposed
solution has performed remarkably well and has the best compression efficiency and precision among all the
algorithms in comparison. Table 1 shows results for saliency algorithms that are used. Figure 1 shows the
saliency evaluation and comparison graph.
Table 1. The following results for saliency algorithms used: fixation maps, XU [40], base paper [34] and
proposed algorithm
Parameter BasketBall FourPeople RaceHorses
Fixation Maps
XU [40]
Base Paper [34]
Proposed
algorithm
Figure 1. Saliency evaluation and comparison graph
5. CONCLUSION
This paper has proposed a solution called modified spatiotemporal fusion video saliency detection
method. It involves a modified fusion calculation along with several changes to the basic HEVC code to
include colour contrast computations, boost both motions, and colour values. There is also spatiotemporal of
pixel-based coherency boost to increase temporal scope saliency. The proposed work is tested on the
database as same as that of the base paper and is compared with other state-of-the-art methods with the help
of five global evaluation metrics AUC, SIM, CC, NSS and KL. It has been concluded that the proposed
algorithm of this paper has the best performance out of all the mentioned methods with better compression
efficiency and precision.
REFERENCES
[1] L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 20, no. 11, pp. 1254–1259, 1998, doi: 10.1109/34.730558.
[2] C. Guo, Q. Ma, and L. Zhang, “Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform,” in 26th
IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Jun. 2008, pp. 1–8, doi: 10.1109/CVPR.2008.4587715.
[3] R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, “Frequency-tuned salient region detection,” in 2009 IEEE Conference on
Computer Vision and Pattern Recognition, Jun. 2010, pp. 1597–1604, doi: 10.1109/cvpr.2009.5206596.
[4] M. Cerf, E. P. Frady, and C. Koch, “Faces and text attract gaze independent of the task: experimental data and computer model,”
Journal of Vision, vol. 9, no. 12, pp. 10–10, Nov. 2009, doi: 10.1167/9.12.10.
[5] M. Cerf, J. Harel, W. Einhäuser, and C. Koch, “Predicting human gaze using low-level saliency combined with face detection,”
Advances in Neural Information Processing Systems 20 (NIPS 2007), 2008.
 ISSN: 2089-4864
Int J Reconfigurable & Embedded Syst, Vol. 12, No. 2, July 2023: 269-275
274
[6] L. J. Li and L. Fei-Fei, “What, where and who? Classifying events by scene and object recognition,” in Proceedings of the IEEE
International Conference on Computer Vision, 2007, pp. 1–8, doi: 10.1109/ICCV.2007.4408872.
[7] B. Scassellati, “Theory of mind for a humanoid robot,” Autonomous Robots, vol. 12, no. 1, pp. 13–24, 2002, doi:
10.1023/A:1013298507114.
[8] S. Marat, T. H. Phuoc, L. Granjon, N. Guyader, D. Pellerin, and A. Guérin-Dugué, “Spatio-temporal saliency model to predict eye
movements in video free viewing,” 2008 16th European Signal Processing Conference, Lausanne, 2008, pp. 1-5.
[9] Y. F. Ma and H. J. Zhang, “A model of motion attention for video skimming,” in IEEE International Conference on Image
Processing, 2002, vol. 1, pp. I-129-I–132, doi: 10.1109/icip.2002.1037976.
[10] S. Li and M. C. Lee, “Fast visual tracking using motion saliency in video,” in ICASSP, IEEE International Conference on
Acoustics, Speech and Signal Processing - Proceedings, 2007, vol. 1, pp. I-1073-I–1076, doi: 10.1109/ICASSP.2007.366097.
[11] R. J. Peters and L. Itti, “Beyond bottom-up: incorporating task-dependent influences into a computational model of spatial
attention,” in 2007 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2007, pp. 1–8, doi:
10.1109/CVPR.2007.383337.
[12] A. C. Schütz, D. I. Braun, and K. R. Gegenfurtner, “Object recognition during foveating eye movements,” Vision Research, vol.
49, no. 18, pp. 2241–2253, 2009, doi: 10.1016/j.visres.2009.05.022.
[13] F. Zhou, S. B. Kang, and M. F. Cohen, “Time-mapping using space-time saliency,” in 2014 IEEE Conference on Computer
Vision and Pattern Recognition, Jun. 2014, pp. 3358–3365, doi: 10.1109/CVPR.2014.429.
[14] Z. Liu, X. Zhang, S. Luo, and O. Le Meur, “Superpixel-based spatiotemporal saliency detection,” IEEE Transactions on Circuits
and Systems for Video Technology, vol. 24, no. 9, pp. 1522–1540, Sep. 2014, doi: 10.1109/TCSVT.2014.2308642.
[15] Y. Li, S. Li, C. Chen, A. Hao and H. Qin, “Accurate and robust video saliency detection via self-paced diffusion,” in IEEE
Transactions on Multimedia, vol. 22, no. 5, pp. 1153-1167, May 2020, doi: 10.1109/TMM.2019.2940851.
[16] Y. Fang, G. Ding, J. Li and Z. Fang, “Deep3DSaliency: deep stereoscopic video saliency detection model by 3D convolutional
networks,” in IEEE Transactions on Image Processing, vol. 28, no. 5, pp. 2305-2318, May 2019, doi: 10.1109/TIP.2018.2885229.
[17] C. Chen, Y. Li, S. Li, H. Qin and A. Hao, “A novel bottom-up saliency detection method for video with dynamic background,” in
IEEE Signal Processing Letters, vol. 25, no. 2, pp. 154-158, Feb. 2018, doi: 10.1109/LSP.2017.2775212.
[18] T. M. Hoang and J. Zhou, “Recent trending on learning based video compression: A survey,” Cognitive Robotics, vol. 1, pp. 145–
158, 2021, doi: 10.1016/j.cogr.2021.08.003.
[19] A. Borji, “Saliency prediction in the deep learning era: successes and limitations,” IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 43, no. 2, pp. 679–700, Feb. 2021, doi: 10.1109/TPAMI.2019.2935715.
[20] W. Wang, J. Shen, J. Xie, M.-M. Cheng, H. Ling, and A. Borji, “Revisiting video saliency prediction in the deep learning era,”
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 1, pp. 220–237, Jan. 2021, doi:
10.1109/TPAMI.2019.2924417.
[21] M. Startsev and M. Dorr, “Supersaliency: a novel pipeline for predicting smooth pursuit-based attention improves generalisability
of video saliency,” IEEE Access, vol. 8, pp. 1276–1289, 2020, doi: 10.1109/ACCESS.2019.2961835.
[22] H. Li, F. Qi, and G. Shi, “A novel spatio-temporal 3D convolutional encoder-decoder network for dynamic saliency prediction,”
IEEE Access, vol. 9, pp. 36328–36341, 2021, doi: 10.1109/ACCESS.2021.3063372.
[23] F. Guo, W. Wang, Z. Shen, J. Shen, L. Shao, and D. Tao, “Motion-aware rapid video saliency detection,” in IEEE Transactions
on Circuits and Systems for Video Technology, vol. 30, no. 12, pp. 4887-4898, Dec. 2020, doi: 10.1109/TCSVT.2019.2906226..
[24] E. Rahtu, J. Kannala, M. Salo, and J. Heikkilä, “Segmenting salient objects from images and videos,” in Lecture Notes in
Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 6315
LNCS, no. PART 5, 2010, pp. 366–379, doi: 10.1007/978-3-642-15555-0_27.
[25] Y. Fang, Z. Wang, and W. Lin, “Video saliency incorporating spatiotemporal cues and uncertainty weighting,” in Proceedings -
IEEE International Conference on Multimedia and Expo, Jul. 2013, pp. 1–6, doi: 10.1109/ICME.2013.6607572.
[26] W. Wang, J. Shen, and F. Porikli, “Saliency-aware geodesic video object segmentation,” in 2015 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), Jun. 2015, vol. 07-12-June, pp. 3395–3402, doi: 10.1109/CVPR.2015.7298961.
[27] W. Wang, J. Shen, and Ling Shao, “Consistent video saliency using local gradient flow optimization and global refinement,”
IEEE Transactions on Image Processing, vol. 24, no. 11, pp. 4185–4196, Nov. 2015, doi: 10.1109/TIP.2015.2460013.
[28] Z. Liu, L. Meur, and S. Luo, “Superpixel-based saliency detection,” in International Workshop on Image Analysis for Multimedia
Interactive Services, Jul. 2013, pp. 1–4, doi: 10.1109/WIAMIS.2013.6616119.
[29] Z. Bylinskii, T. Judd, A. Oliva, A. Torralba, and F. Durand, “what do different evaluation metrics tell us about saliency models?,”
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 3, pp. 740–757, Mar. 2019, doi:
10.1109/TPAMI.2018.2815601.
[30] J. Wright, Y. Peng, Y. Ma, A. Ganesh, and S. Rao, “Robust principal component analysis: exact recovery of corrupted low-rank
matrices by convex optimization,” in Advances in Neural Information Processing Systems 22 - Proceedings of the 2009
Conference, 2009, pp. 2080–2088.
[31] X. Zhou, C.Yang, and W.Yu, “Moving object detection by detecting contiguous outliers in the low-rank representation,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 3, pp. 597–610, 2013, doi: 10.1109/TPAMI.2012.132.
[32] Z. Zeng, T.-H. Chan, K. Jia, and D. Xu, “Finding correspondence from multiple images via sparse and low-rank decomposition,”
in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics), vol. 7576 LNCS, no. PART 5, 2012, pp. 325–339, doi: 10.1007/978-3-642-33715-4_24.
[33] P. Ji, H. Li, M. Salzmann, and Y. Dai, “Robust motion segmentation with unknown correspondences,” in Lecture Notes in
Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8694
LNCS, no. PART 6, 2014, pp. 204–219, doi: 10.1007/978-3-319-10599-4_14.
[34] S. Zhu, C. Liu, and Z. Xu, “High-definition video compression system based on perception guidance of salient information of a
convolutional neural network and HEVC compression domain,” IEEE Transactions on Circuits and Systems for Video
Technology, vol. 30, no. 7, pp. 1–1, 2020, doi: 10.1109/TCSVT.2019.2911396.
[35] S. H. Khatoonabadi, N. Vasconcelos, I. V. Bajic, and Y. Shan, “How many bits does it take for a stimulus to be salient?,” in 2015
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2015, vol. 07-12-June, pp. 5501–5510, doi:
10.1109/CVPR.2015.7299189.
[36] V. Leboran, A. Garcia-Diaz, X. R. Fdez-Vidal, and X. M. Pardo, “Dynamic whitening saliency,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 39, no. 5, pp. 893–907, May 2017, doi: 10.1109/TPAMI.2016.2567391.
[37] W. Wang, J. Shen, F. Guo, M.-M. Cheng, and A. Borji, “Revisiting video saliency: a large-scale benchmark and a new model,” in
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp. 4894–4903, doi:
10.1109/CVPR.2018.00514.
Int J Reconfigurable & Embedded Syst ISSN: 2089-4864 
Video saliency-detection using custom spatiotemporal fusion method (Vinay C. Warad)
275
[38] H. Hadizadeh and I. V. Bajic, “Saliency-aware video compression,” IEEE Transactions on Image Processing, vol. 23, no. 1, pp.
19–33, Jan. 2014, doi: 10.1109/TIP.2013.2282897.
[39] M. Xu, L. Jiang, X. Sun, Z. Ye, and Z. Wang, “Learning to detect video saliency with HEVC features,” IEEE Transactions on
Image Processing, vol. 26, no. 1, pp. 369–385, Jan. 2017, doi: 10.1109/TIP.2016.2628583.
[40] F. Zhang, “VED100: A video-based eye-tracking dataset on visual saliency detection,” Jan 1, 2019. Distributed by Github.
https://github.com/spzhubuaa/VED100-A-Video-Based-Eye-Tracking-Dataset-on-Visual-Saliency-Detection
BIOGRAPHIES OF AUTHORS
Vinay C. Warad working as assistant professor in department of computer
science and engineering at Khawaja Bandanawaz College of Engineering. He has 8 years of
teaching experience. His area of interest is video saliency, image retrieval. He can be contacted
at email: vinaywarad999@gmail.com.
Dr. Ruksar Fatima is a professor &head of the department for computer science
and engineering, Vice principal and examination in charge at Khaja Bandanawaz College of
Engineering (KBNCE) Kalaburagi, Karnataka. She is the Advisory Board Member for IJESRT
(International Journal of Engineering and Research Technology). She is Member of The
International Association of Engineers (IAENG). She can be contacted at email:
ruksarf@gmail.com.

More Related Content

PDF
Efficient fusion of spatio-temporal saliency for frame wise saliency identifi...
PDF
Video saliency-recognition by applying custom spatio temporal fusion technique
PDF
Video saliency detection using modified high efficiency video coding and back...
PDF
Particle filter framework for salient object detection in videos
PDF
IRJET-A Review on Implementation of High Dimension Colour Transform in Domain...
PDF
Implementation of High Dimension Colour Transform in Domain of Image Processing
PDF
18 13 sep17 8aug 8314 9991-1-ed (edit ari)
PDF
18 13 sep17 8aug 8314 9991-1-ed (edit ari)
Efficient fusion of spatio-temporal saliency for frame wise saliency identifi...
Video saliency-recognition by applying custom spatio temporal fusion technique
Video saliency detection using modified high efficiency video coding and back...
Particle filter framework for salient object detection in videos
IRJET-A Review on Implementation of High Dimension Colour Transform in Domain...
Implementation of High Dimension Colour Transform in Domain of Image Processing
18 13 sep17 8aug 8314 9991-1-ed (edit ari)
18 13 sep17 8aug 8314 9991-1-ed (edit ari)

Similar to Video saliency-detection using custom spatiotemporal fusion method (20)

PDF
Reduced-reference Video Quality Metric Using Spatial Information in Salient R...
PDF
Automatic identification of animal using visual and motion saliency
PDF
SVM Based Saliency Map Technique for Reducing Time Complexity in HEVC
PDF
A Comprehensive Analysis on Co-Saliency Detection on Learning Approaches in 3...
DOCX
2.ack, abstract,contents page deepa
PDF
Qcce quality constrained co saliency estimation for common object detection
PDF
Image Retrieval using Graph based Visual Saliency
PDF
Saliency Detection via Divergence Analysis: A Unified Perspective ICPR 2012
PDF
Automatic Foreground object detection using Visual and Motion Saliency
PPTX
A metric for no reference video quality assessment for hd tv delivery based o...
PDF
50120130405009
PDF
Poster 1-13-Paper ID 207
PDF
Discovering Anomalies Based on Saliency Detection and Segmentation in Surveil...
PPTX
Dataset and methods for 360-degree video summarization
PPTX
Iccv11 salientobjectdetection
PPTX
CVPR presentation
PDF
Object extraction using edge, motion and saliency information from videos
PDF
Content Based Image Retrieval using Color Boosted Salient Points and Shape fe...
PDF
Object Detection with Computer Vision
PDF
Visual Saliency Model Using Sift and Comparison of Learning Approaches
Reduced-reference Video Quality Metric Using Spatial Information in Salient R...
Automatic identification of animal using visual and motion saliency
SVM Based Saliency Map Technique for Reducing Time Complexity in HEVC
A Comprehensive Analysis on Co-Saliency Detection on Learning Approaches in 3...
2.ack, abstract,contents page deepa
Qcce quality constrained co saliency estimation for common object detection
Image Retrieval using Graph based Visual Saliency
Saliency Detection via Divergence Analysis: A Unified Perspective ICPR 2012
Automatic Foreground object detection using Visual and Motion Saliency
A metric for no reference video quality assessment for hd tv delivery based o...
50120130405009
Poster 1-13-Paper ID 207
Discovering Anomalies Based on Saliency Detection and Segmentation in Surveil...
Dataset and methods for 360-degree video summarization
Iccv11 salientobjectdetection
CVPR presentation
Object extraction using edge, motion and saliency information from videos
Content Based Image Retrieval using Color Boosted Salient Points and Shape fe...
Object Detection with Computer Vision
Visual Saliency Model Using Sift and Comparison of Learning Approaches
Ad

More from International Journal of Reconfigurable and Embedded Systems (20)

PDF
Channel reconstruction through improvised deep learning architecture for high...
PDF
Energy-efficient clustering and routing using fuzzy k-medoids and adaptive ra...
PDF
Leveraging the learning focal point algorithm for emotional intelligence
PDF
A novel smart irrigation framework with timing allocation using solenoid valv...
PDF
Improving the performance of IoT devices that use Wi-Fi
PDF
Portable neonatus incubator based on global positioning system
PDF
Precision medicine in hepatology: harnessing IoT and machine learning for per...
PDF
IoT-enabled smart cities towards green energy systems: a review
PDF
Air quality monitoring system based on low power wide area network technology...
PDF
Design of IoT-based monitoring system for temperature and dissolved oxygen le...
PDF
Internet based highly secure data transmission system in health care monitori...
PDF
Internet of things and long range-based bridge slope early detection systems
PDF
Arowana cultivation water quality monitoring and prediction using autoregress...
PDF
Approximate single precision floating point adder for low power applications
PDF
Highly selective filtering power divider using substrate integrated waveguide...
PDF
An active two-stage class-J power amplifier design for smart grid’s 5G wirele...
PDF
Timing issues on power side-channel leakage of advanced encryption standard c...
PDF
Moving objects detection based on histogram of oriented gradient algorithm ch...
PDF
Smart farming based on IoT to predict conditions using machine learning
PDF
Smart farming based on IoT to predict conditions using machine learning
Channel reconstruction through improvised deep learning architecture for high...
Energy-efficient clustering and routing using fuzzy k-medoids and adaptive ra...
Leveraging the learning focal point algorithm for emotional intelligence
A novel smart irrigation framework with timing allocation using solenoid valv...
Improving the performance of IoT devices that use Wi-Fi
Portable neonatus incubator based on global positioning system
Precision medicine in hepatology: harnessing IoT and machine learning for per...
IoT-enabled smart cities towards green energy systems: a review
Air quality monitoring system based on low power wide area network technology...
Design of IoT-based monitoring system for temperature and dissolved oxygen le...
Internet based highly secure data transmission system in health care monitori...
Internet of things and long range-based bridge slope early detection systems
Arowana cultivation water quality monitoring and prediction using autoregress...
Approximate single precision floating point adder for low power applications
Highly selective filtering power divider using substrate integrated waveguide...
An active two-stage class-J power amplifier design for smart grid’s 5G wirele...
Timing issues on power side-channel leakage of advanced encryption standard c...
Moving objects detection based on histogram of oriented gradient algorithm ch...
Smart farming based on IoT to predict conditions using machine learning
Smart farming based on IoT to predict conditions using machine learning
Ad

Recently uploaded (20)

PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
web development for engineering and engineering
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
Digital Logic Computer Design lecture notes
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
composite construction of structures.pdf
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PPTX
OOP with Java - Java Introduction (Basics)
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
DOCX
573137875-Attendance-Management-System-original
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Model Code of Practice - Construction Work - 21102022 .pdf
web development for engineering and engineering
Foundation to blockchain - A guide to Blockchain Tech
Digital Logic Computer Design lecture notes
bas. eng. economics group 4 presentation 1.pptx
composite construction of structures.pdf
CH1 Production IntroductoryConcepts.pptx
Strings in CPP - Strings in C++ are sequences of characters used to store and...
OOP with Java - Java Introduction (Basics)
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
573137875-Attendance-Management-System-original
UNIT-1 - COAL BASED THERMAL POWER PLANTS
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx

Video saliency-detection using custom spatiotemporal fusion method

  • 1. International Journal of Reconfigurable and Embedded Systems (IJRES) Vol. 12, No. 2, July 2023, pp. 269~275 ISSN: 2089-4864, DOI: 10.11591/ijres.v12.i2.pp269-275  269 Journal homepage: http://ijres.iaescore.com Video saliency-detection using custom spatiotemporal fusion method Vinay C. Warad, Ruksar Fatima Department of Computer Science and Engineering, Khaja Bandanawaz College of Engineering, Kalaburagi, India Article Info ABSTRACT Article history: Received Jul 20, 2022 Revised Oct 15, 2022 Accepted Dec 10, 2022 There have been several researches done in the field of image saliency but not as much as in video saliency. In order to increase precision and accuracy during compression, reduce coding complexity and time consumption along with memory allocation problems with our proposed solution. It is a modified high-definition video compression (HEVC) pixel based consistent spatiotemporal diffusion with temporal uniformity. It involves taking apart the video into groups of frames, computing colour saliency, integrate temporal fusion, pixel saliency fusion is conducted and then colour information guides the diffusion process for the spatiotemporal mapping with the help of permutation matrix. The proposed solution is tested on a publicly available extensive dataset with five global saliency valuation metrics and is compared with several other state-of-the-art saliency detection methods. The results display and overall best performance amongst all other candidates. Keywords: Computing colour saliency High-definition video compression pixel Image saliency Spatiotemporal diffusion Video saliency This is an open access article under the CC BY-SA license. Corresponding Author: Vinay C. Warad Department of Computer Science and Engineering, Khaja Bandanawaz College of Engineering Kalaburagi, Karnataka 585104, India Email: vinay_c111@rediffmail.com 1. INTRODUCTION The world has tried to imitate the functioning of the human eye and the brain. The marvel of the brain to distinguish among the important and non-important features of the view the eyes are seeing and take in only whatever is necessary. Various researchers have imitated this process and in today’s word, we have this in the form of conference videos, broadcasting and streaming. There have been several researches in the field of image saliency but not in video saliency. Few researches that have made a significant impact in this field. Itti’s model is one of the most [1] researched and most prominent models for image saliency. Fourier transformation is used with the help of phase spectrum and [2], [3] helps image saliency using frequency tuning. They have used the principles of inhibition of return and winner take all that is inspired from the visual nervous system [4], [5]. It is difficult for video saliency detection, as images are not still, making memory allocation and computational complexity increased. It has a video saliency detection methodology [6] that involves determining the position of an object with reference to another. They use computation of space-time-saliency map as well as computation of motion saliency map [7]-[10]. Fused static and dynamic saliency mapping [11] to obtain a space- time saliency detection model. Here dynamic texture model is employed [12] to obtain motion patterns for both stationary and dynamic scenes. They have used fusion model but it results in low-level saliency [13]-[15]. They have used global temporal clues to forge a robust low-level saliency map [16], [17]. The disadvantage of these methodologies is that the accumulation of error is quite high and this has led to several wrong detections.
  • 2.  ISSN: 2089-4864 Int J Reconfigurable & Embedded Syst, Vol. 12, No. 2, July 2023: 269-275 270 The proposed solution is a modified spatiotemporal fusion saliency detection method. It involves a spatiotemporal background to obtain high saliency values around the foreground objects. Then after ignoring the hollow effects, a series of adjustments are made to the general saliency strategies to increase efficiency of both motion and colour saliencies. The usage of cross frame super pixels and one to one spatial temporal fusion helps in overall increase in accuracy and precision during compression. 2. RELATED WORK In this section, the works of some of the research papers that have helped in the completion of the proposed algorithm have been mentioned. This survey talks about the various video saliency methodologies along with their advantages and disadvantages [18]. Borji [19], it has also the same outline of the paper but it also includes the various aspect, which make it difficult for the algorithms to imitate the human eye-brain coordination and how to overcome them. This paper has a notable contribution to this field of research [20]. It has a database named dynamic human fixation 1K (DHF1K) that helps in pointing out fixations that are needed during dynamic scene free viewing, then there is the attentive convolutional neural network-long short-term memory network (ACLNet) which has augmentations to the original convolutional neural network and long short-term memory (CNN- LSTM) model to enable fast end-to-end saliency learning. In this paper [21], [22] they have made some corrections in the smooth pursuits (SP) logic. It involves manual annotations of the SPs with fixation along the arithmetic points and SP salient locations by training slicing convolutional neural networks. High-definition video compression (HEVC) system has become the new standard video compression algorithms used today. With making changes to the HEVC algorithms with the help of a spatial saliency algorithm that uses the concept of a motion vector [23], It has led to better compression and efficiency. They haves introduced a salient object segmentation that uses the combination of conditional random field (CRF) and saliency measure. It has used statistical framework and local colour contrasting, motion and illumination features [24]. Fang et al. [25] is also using spatiotemporal fusion with uncertainty in statistics to measure visual saliency. They have used geodesic robustness methodology to get the saliency map [26], [27]. Has been a great help to our solution formation with its super-pixel usage and adaptive colour quantization [28]-[30]. Its measurement of difference between spatial distance and histograms has helped to obtain the super-pixel saliency map. They gave us an overall idea of the various evaluation metrics to be used in this paper [31], [32]. The first section has the introduction and section 2 succeeds it with the related work [33]. Section 3 and 4 displays the proposed algorithm, its methodologies and modifications along with its final experimentation and comparison. Section 5 concludes the paper. 3. PROPOSED SYSTEM 3.1. Modeling based saliency adjustment The robustness is obtained by combining long-term inter batch information with colour contrast computation. Background and foreground appearance models are represented by 𝐵𝑀 ∈ ℝ3×𝑏𝑛 and 𝐹𝑀 ∈ ℝ3×𝑓𝑛 with 𝑏𝑛 𝑎𝑛𝑑 𝑓𝑛 being their sizes respectively. The 𝑖 − 𝑡ℎ super pixel’s RGB history in all regions is taken care of with the following equations 𝑖𝑛𝑡𝑟𝑎𝐶𝑖 = exp(𝜆 − |𝜑(𝑀𝐶𝑖) − 𝜑(𝐶𝑀𝑖)|) ; 𝜆 = 0.5 and 𝑖𝑛𝑡𝑒𝑟𝐶𝑖 = 𝜑( min||(𝑅𝑖,𝐺𝑖,𝐵𝑖),𝐵𝑀||2 ⋅ 1 𝑏𝑛 ∑||(𝑅𝑖,𝐺𝑖,𝐵𝑖),𝐵𝑀||2 min||(𝑅𝑖,𝐺𝑖,𝐵𝑖),𝐹𝑀|| 2 ⋅ 1 𝑓𝑛 ∑||(𝑅𝑖,𝐺𝑖,𝐵𝑖),𝐹𝑀|| 2 ). Here, 𝜆 is the upper bound discrepancy degree and helps inversing the penalty between the motion and color saliencies. 3.2. Contrast-based saliency mapping The video sequence is now divided into several short groups of frames 𝐺𝑖 = {𝐹1, 𝐹2, 𝐹3, … . , 𝐹𝑛}. Each frame 𝐹𝑘, where (𝑘denotes the frame number) undergoes modification using simple linear iterative clustering with boundary-aware smoothing method which removes the unnecessary details. The colour and motion gradient mapping to help form the spatiotemporal gradient map with help of pixel-based computation is given by 𝑆𝑀𝑇 = ||𝑢𝑥, 𝑢𝑦||2 ⨀||∇(𝐹)||2 . That is, horizontal and vertical gradient of optical flow and ∇(𝐹) colour gradient map. We then calculate the 𝑖 − 𝑡ℎ super pixel’s motion contrast using (1). 𝑀𝐶𝑖 = ∑ ||𝑈𝑖,𝑈𝑗|| 2 ||𝑎𝑖,𝑎||2 , 𝑎𝑗∈𝜓𝑖 𝜓𝑖 = {𝜏 + 1 ≥ ||𝑎𝑖, 𝑎𝑗|| 2 ≥ 𝜏} (1)
  • 3. Int J Reconfigurable & Embedded Syst ISSN: 2089-4864  Video saliency-detection using custom spatiotemporal fusion method (Vinay C. Warad) 271 Where 𝑙2 norm has been used and 𝑈 and 𝑎𝑖 denote the optical flow gradient in two directions and 𝑖 − 𝑡ℎ super-pixel position centre respectively. 𝜓𝑖is used to denote computational contrast range and is calculated using shortest Euclidean distance between spatiotemporal map and 𝑖 − 𝑡ℎ superpixel. 𝜏 = 𝑟 ||Λ(𝑆𝑀𝑇)||0 ∑ ||Λ(𝑆𝑀𝑇𝜏 )|| 0 𝜏∈||𝜏,𝑖||≤𝑟 ; 𝑙 = 0.5 min{𝑤𝑖𝑑𝑡ℎ, ℎ𝑒𝑖𝑔ℎ𝑡} , Λ → 𝑑𝑜𝑤𝑛 𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔 (2) Colour saliency is also computed the same way as optical flow gradient, except we use the red, blue and green notations for the 𝑖 − 𝑡ℎ super pixel. So, the equation is 𝐶𝑀 = ∑ ||(𝑅𝑖,𝐺𝑖,𝐵𝑖,),(𝑅𝑗,𝐺𝑗,𝐵𝑗)|| 2 ||𝑎𝑖,𝑎𝑗|| 2 𝑎𝑗∈𝜓𝑖 . The following equation smoothens both 𝑀𝐶 and 𝐶𝑀 as temporal and saliency value refining is done by spatial information integration. 𝐶𝑀𝑘,𝑖 ← ∑ ∑ exp (−||𝑐𝑘,𝑖 𝑎𝜏,𝑗∈𝜇𝜙 ,𝑐 𝜏,𝑗||1 𝜇)⋅𝐶𝑀𝜏,𝑗 ⁄ 𝑘+1 𝜏=𝑘−1 ∑ ∑ exp (−||𝑐𝑘,𝑖 𝑎𝜏,𝑗∈𝜇𝜙 ,𝑐 𝜏,𝑗||1 𝜇) ⁄ 𝑘+1 𝜏=𝑘−1 (3) Here, 𝑐𝑘,𝑖 is the average of the 𝑖 − 𝑡ℎ super-pixel RGB colour value in 𝑘 − 𝑡ℎ frame while 𝜎 controls smoothing strength. The ||𝑎𝑘,𝑖, 𝑎 𝜏,𝑗|| 2 ≤ 𝜃 needs to be satisfied and this is done using 𝜇. 𝜃 = 1 𝑚×𝑛 ∑ ∑ || 1 𝑚 𝑚 𝑖=1 𝑛 𝑘=1 ∑ 𝐹(𝑆𝑀𝑇𝑘,𝑖 𝑚 𝑖=1 ), 𝐹(𝑆𝑀𝑇𝑘,𝑖 )||1; 𝑚, 𝑛 = 𝑓𝑟𝑎𝑚𝑒 𝑛𝑢𝑚𝑏𝑒𝑟𝑠 (4) 𝐹(𝑆𝑀𝑇𝑖 ) = { 𝑎𝑖, 𝑆𝑀𝑇𝑖 ≤ 𝜖 × 1 𝑚 ∑ 𝑆𝑀𝑇𝑖 𝑚 𝑖=1 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 ; 𝜖 = 𝑓𝑖𝑙𝑡𝑒𝑟 𝑠𝑡𝑟𝑒𝑛𝑔ℎ𝑡 𝑐𝑜𝑛𝑡𝑟𝑜𝑙 (5) At each batch frame level, the 𝑞 − 𝑡ℎ frame’s smoothing rate is dynamically updated with (1 − 𝛾)𝜃𝑠−1 + 𝛾𝜃𝑠 → 𝜃𝑠; 𝛾 = (𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔 𝑤𝑒𝑖𝑔ℎ𝑡 ,0.2). Now the colour and motion saliency is integrated to get the pixel-based saliency map𝐿𝐿𝑆 = 𝐶𝑀 ⊙ 𝑀𝐶. Since this fused saliency maps increases accuracy considerably but the rate decreases, so this will be dealt with in the next section. 3.3. Accuracy boosting Matrix 𝑀 is to be considered as the input. It will be decomposed using sparse 𝑆 and low level 𝐷 with min 𝐷,𝑆 𝛼||𝑆||1 + ||𝐷||∗ 𝑠𝑢𝑏𝑗 = 𝑀 = 𝑆 + 𝐷 where the nuclear form of 𝐷 is used. With the help of robust principal component analysis (RPCA) [30] and is showcased using 𝑆 ← 𝑠𝑖𝑔𝑛(𝑀 − 𝐷 − 𝑆)[|𝑀 − 𝐷 − 𝑆| − 𝛼𝛽]+ and 𝐷 ← 𝑉[Σ − 𝛽𝐼]+𝑈, (𝑉, Σ, 𝑈) ← 𝑠𝑣𝑑(𝑍). Where 𝑠𝑣𝑑(𝑍) denotes singular value decomposition of Lagrange multiplier and 𝛼 𝑎𝑛𝑑 𝛽 represent lesser-rank and sparse threshold parameters respectively. For reduction of incorrect detections caused by the misplacement of optical flow of super pixels in the foreground’s region, the given region’s rough foreground is located and feature subspace of a frame 𝑘 is spanned as 𝑔𝐼𝑘 = {𝐿𝐿𝑆𝑘,1 , 𝐿𝐿𝑆𝑘,2 , … . . 𝐿𝐿𝑆𝑘,𝑚 } and thus for the entire frame group we get 𝑔𝐵𝜏 = {𝑔𝐼1, 𝑔𝐼2, … . , 𝑔𝐼𝑛}. This way the rough foreground is calculated as 𝑅𝐹𝑖 = [∑ 𝐿𝐿𝑆𝑘,𝑖 − 𝑛 𝑘=1 𝜔 𝑛×𝑚 ∑ ∑ 𝐿𝐿𝑆𝑘,𝑖 ]+ 𝑚 𝑖=1 𝑛 𝑘=1 . Here 𝜔 is reliability cotrol factor and we also get two subspaces by 𝐿𝐿𝑆 and RGB colour and it is given by 𝑆𝐵 = {𝑐𝑣1, 𝑐𝑣2, … . , 𝑐𝑣𝑛} ∈ ℝ3𝑣×𝑛 where 𝑐𝑣𝑖 = {𝑣𝑒𝑐(𝑅𝑖,1, 𝐺𝑖,1, 𝐵𝑖,1, … . , 𝑅𝑖,𝑚, 𝐺𝑖,𝑚, 𝐵𝑖,𝑚)}𝐾 and 𝑆𝐹 = 𝑣𝑒𝑐(𝐿𝐿𝑆1 ), … . 𝑣𝑒𝑐(𝐿𝐿𝑆𝑛 ) ∈ ℝ𝑣×𝑛 . This helps in making a one-to-one correspondence and then pixel- based saliency mapping infusion that is dissipated on the entire group of frames. 𝑆𝐵over𝑆𝐹 causes disruptive foreground salient movements and hence with the help from [31]-[33] this issue was resolved with an alternate solution. min 𝑀𝑐𝑥,𝑆𝑐𝑥,𝜗,𝐴⊙𝜗 ||𝑀𝑐|| ∗ + ||𝐷𝑥|| ∗ + ||𝐴 + 𝜗||2 + 𝛼1||𝑆𝑐|| 1 + 𝛼2||𝑆𝑥||; || ∙ ||∗ 𝑛𝑢𝑐𝑙𝑒𝑎𝑟 𝑛𝑜𝑟𝑚, 𝐴 𝑖𝑠 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑚𝑎𝑡𝑟𝑖𝑥𝑠. 𝑡 𝑀𝑐 = 𝐷𝑐 + 𝑆𝑐, 𝑀𝑠 = 𝐷𝑠 + 𝑆𝑥, 𝑀𝑐 = 𝑆𝐵 ⊙ 𝜗, 𝑀𝑥 = 𝑆𝐹 ⊙ 𝜗, 𝜗 = {𝐸1, 𝐸2, … . , 𝐸𝑛}, 𝐸𝑖 ∈ {0,1}𝑚×𝑚 , 𝐸𝑖1𝐾 = 1. (6) 𝐷𝑐, 𝐷𝑥variables represent colour and saliency mapping, 𝜗 is the permutation matrix while 𝑆𝑥, 𝑆𝑐 represents colour feature sparse component space and saliency feature space. This entire equation set helps in correcting super-pixel correspondences.
  • 4.  ISSN: 2089-4864 Int J Reconfigurable & Embedded Syst, Vol. 12, No. 2, July 2023: 269-275 272 3.4. Mathematical model As shown in (6) generates a distributed version of convex problems 𝐷(𝑀𝑐𝑥, 𝑆𝑐𝑥, 𝜗, 𝐴 ⊙ 𝜗) = 𝛼1||𝑆𝑐|| 1 + 𝛼2||𝐸𝑥|| 2 + 𝛽1||𝑀𝑐|| ∗ + 𝛽2||𝑀𝑥|| ∗ + ||𝐴 ⊙ 𝜗||2 + 𝑡𝑟𝑎𝑐𝑒(𝑍1 𝐾(𝑀𝑐 − 𝐷𝑐 − 𝑆𝑐)) + 𝑡𝑟𝑎𝑐𝑒(𝑍2 𝐾(𝑀𝑥 − 𝐷𝑥 − 𝑆𝑥)) + 𝜋 2 (||𝑀𝑐 − 𝐷𝑐 − 𝑆𝑐|| 2 + ||(𝑀𝑥 − 𝐷𝑥 − 𝑆𝑥)|| 2 ). Where 𝑍𝑖 represents Lagrangian multiplier. 𝜋denotes steps of iterations and the optimized solution using partial derivative 𝑆𝑐,𝑥 𝑘+1 = 1 2 ||𝑆𝑐,𝑥 𝑘 − (𝑀𝑐,𝑥 𝑘 − 𝑆𝑐,𝑥 𝑘 + 𝑍1,2 𝑘 𝜋𝑘||2 2 ⁄ + min 𝑆𝑐,𝑥 𝑘 𝛼1,2 ||𝑆𝑐,𝑥 𝑘 || 1 /𝜋𝑘 and 𝐷𝑐,𝑥 𝑘+1 = 1 2 ||𝐷𝑐,𝑥 𝑘 − (𝑀𝑐,𝑥 𝑘 − 𝐷𝑐,𝑥 𝑘 + 𝑍1,2 𝑘 𝜋𝑘||2 2 ⁄ + min 𝐷𝑐,𝑥 𝑘 𝛽1,2 ||𝐷𝑐,𝑥 𝑘 || ∗ /𝜋𝑘. 𝐷𝑖 is updated to become 𝐷𝑐,𝑥 𝑘+1 ← 𝑈𝐾 + 𝑉 [Σ − 𝛽1,2 𝜋𝑘 ], where (𝑉, Σ, 𝑈) ← 𝑠𝑣𝑑 (𝑀𝑐,𝑥 𝑘 − 𝑆𝑐,𝑥 𝑘 + 𝑍1,2 𝑘 𝜋𝑘 ). Similarly, for 𝑆𝑖,𝑆𝑐,𝑥 𝑘+1 ← 𝑠𝑖𝑔𝑛 ( |𝐽| 𝜋𝑘 ) [𝐽 − 𝛼1,2 𝜋𝑘 ] + as 𝐽 = 𝑀𝑐,𝑥 𝑘 − 𝐷𝑐,𝑥 𝑘 + 𝑍𝑐,𝑥 𝑘 /𝜋𝑘. Value of 𝐸 is determined are used to compute the norm cost 𝐿 ∈ ℝ𝑚×𝑚 is calculated as 𝑙𝑖,𝑗 𝑘 = ||𝑂𝑘,𝑖 − 𝐻(𝑉1, 𝑗)|| 2 , 𝑉1 = 𝐻(𝑆𝐵, 𝑘) ⊙ 𝐸𝑘 and 𝑙𝑖,𝑗 𝑘 = ||𝑂𝑘,𝑖 − 𝐻(𝑉2, 𝑗)|| 2 , 𝑉2 = 𝐻(𝑆𝐵, 𝑘) ⊙ 𝐸𝑘. Then we use and objective matrix 𝑂 to calculate the 𝑘 − 𝑡ℎ of 𝑅𝐹 and the equation is 𝑂𝑘,𝑖 = 𝑆𝑐,𝑥(𝑘, 𝑖) + 𝐷𝑐,𝑥(𝑘, 𝑖) − 𝑍1,2(𝑘, 𝑖)/𝜋𝑘 . There is a need to change 𝐿𝜏 as it is hard to approximate the value of 𝑚𝑖𝑛||𝐴 + 𝜗||2 . 𝐿𝜏 = {𝑟1,1 𝜏 + 𝑑1,1 𝜏 , 𝑟1,2 𝜏 + 𝑑1,2 𝜏 , … . , 𝑟𝑚,𝑚 𝜏 + 𝑑𝑚,𝑚 𝜏 } ∈ ℝ𝑚×𝑚 𝑓𝑜𝑟 𝑘 = [𝑘 − 1, 𝑘 + 1] is hanged to 𝐿𝑘 as shown in (7). 𝐻(𝐿𝑘, 𝑗) ← ∑ ∑ 𝐻(𝐿𝜏, 𝑣). exp (−||𝑐𝜏,𝑣, 𝑐𝑘,𝑗|| 1 𝜇) ⁄ 𝑝𝑡,𝑣∈𝜉 𝑘+1 𝜏=𝑘−1 (7) The global optimization is solved using the equations 𝑆𝐹𝑘+1 ← 𝑆𝐹𝑘 ⊙ 𝜗, 𝑆𝐵𝑘+1 𝑆𝐵𝑘 ⊙ 𝜗 and 𝑍1,2 𝑘+1 ← 𝜋𝑘(𝑀𝑐,𝑥 𝑘 − 𝐷𝑐,𝑥 𝑘 − 𝑆𝑐,𝑥 𝑘 ) + 𝑍1,2 𝑘 where 𝜋𝑘+1 ← 𝜋𝑘 × 1.05. The alignment of the super pixels is now given by 𝑔𝑆𝑖 = 1 𝑛−1 ∑ 𝐻(𝑆𝐹 ⊙ 𝜗, 𝜏) 𝑛 𝜏=1,𝑖≠𝜏 . To reduce the incorrect detections and alignments we introduce 𝑆𝐹 and use (8)-(10). 𝑆𝐹 ̃ ← 𝑆𝐹 ⊙ 𝜗 (8) 𝑆𝐹 ← 𝑆𝐹 ̃ ∙ (1𝑚×𝑛 − 𝑋(𝑆𝑐)) + 𝜌 ∙ 𝑆𝐹 ̅̅ ̅̅ ∙ 𝑋(𝑆𝑐) (9) 𝜌𝑖,𝑗 = { 0.5, 1 𝑛 ∑ 𝑆𝐹𝑖,𝑗 < ̃ 𝑛 𝑗=1 𝑆𝐹𝑖,𝑗 ̃ 2, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 (10) The equation for mapping for the 𝑖 − 𝑡ℎ video frame is given by 𝑔𝑆𝑖 = 𝐻(𝜌,𝑖)−(𝐻(𝜌,𝑖).𝑋(𝑆𝑐) 𝐻(𝜌,𝑖)(𝑛−1) ∑ 𝐻(𝑆𝐹 ⊙ 𝜗, 𝜏) 𝑛 𝜏=1,𝑖≠𝜏 . There is a need to diffuse inner temporal batch 𝑥𝑟 of the current group’s frames based of degree of colour similarity. The final output is given by 𝑔𝑆𝑖,𝑗 = 𝑥𝑟∙𝑦𝑟+∑ 𝑦𝑖∙𝑔𝑆𝑖,𝑗 𝑛 𝑖=1 𝑦𝑟+∑ 𝑦𝑖 𝑛 𝑖=1 ; 𝑦𝑟 = exp (− ||𝑐𝑟,𝑗, 𝑐𝑖,𝑗|| 2 /𝜇). Where 𝑥𝑙showcases the colour distance-based weights. 4. RESULTS, EXPERIMENTS AND DATABASE The proposed solution has been compared with [34] as a base reference as well as by [35]’s operational block description length (OBDL) algorithm, [36]’s dynamic adaptive whitening saliency (AWS- D) algorithm, the object-to-motion convolutional neural network two layer long short-term memory (OMCNN-2CLSTM) algorithm in [36], attentive convolutional (ACL) algorithm [37], saliency-aware video compression (SAVC) algorithm from [38] and [39]. The database used is the same as the one in the base paper. It is a high-definition eye-tracking database with its open source available at GitHub https://github.com/spzhubuaa/Video-based-Eye-Tracking-Dataset [40]. 10 video sequences with 3 different resolutions, 1920 × 1080, 1280 × 720, and 832 × 480, were taken for experimentation. For evaluating the performance of all the saliency methods, we employed five global evaluation metrics, namely area under the ROC curve (AUC), Similarity (SIM), correlation coefficient (CC), normalized scanpath saliency (NSS) and Kullback-Leibler (KL).
  • 5. Int J Reconfigurable & Embedded Syst ISSN: 2089-4864  Video saliency-detection using custom spatiotemporal fusion method (Vinay C. Warad) 273 The XU algorithm is quite similar to HEVC; hence its saliency detection is better than most algorithms but is faces problems when there are complex images as input. Other than that, our proposed solution has performed remarkably well and has the best compression efficiency and precision among all the algorithms in comparison. Table 1 shows results for saliency algorithms that are used. Figure 1 shows the saliency evaluation and comparison graph. Table 1. The following results for saliency algorithms used: fixation maps, XU [40], base paper [34] and proposed algorithm Parameter BasketBall FourPeople RaceHorses Fixation Maps XU [40] Base Paper [34] Proposed algorithm Figure 1. Saliency evaluation and comparison graph 5. CONCLUSION This paper has proposed a solution called modified spatiotemporal fusion video saliency detection method. It involves a modified fusion calculation along with several changes to the basic HEVC code to include colour contrast computations, boost both motions, and colour values. There is also spatiotemporal of pixel-based coherency boost to increase temporal scope saliency. The proposed work is tested on the database as same as that of the base paper and is compared with other state-of-the-art methods with the help of five global evaluation metrics AUC, SIM, CC, NSS and KL. It has been concluded that the proposed algorithm of this paper has the best performance out of all the mentioned methods with better compression efficiency and precision. REFERENCES [1] L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 11, pp. 1254–1259, 1998, doi: 10.1109/34.730558. [2] C. Guo, Q. Ma, and L. Zhang, “Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform,” in 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Jun. 2008, pp. 1–8, doi: 10.1109/CVPR.2008.4587715. [3] R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, “Frequency-tuned salient region detection,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2010, pp. 1597–1604, doi: 10.1109/cvpr.2009.5206596. [4] M. Cerf, E. P. Frady, and C. Koch, “Faces and text attract gaze independent of the task: experimental data and computer model,” Journal of Vision, vol. 9, no. 12, pp. 10–10, Nov. 2009, doi: 10.1167/9.12.10. [5] M. Cerf, J. Harel, W. Einhäuser, and C. Koch, “Predicting human gaze using low-level saliency combined with face detection,” Advances in Neural Information Processing Systems 20 (NIPS 2007), 2008.
  • 6.  ISSN: 2089-4864 Int J Reconfigurable & Embedded Syst, Vol. 12, No. 2, July 2023: 269-275 274 [6] L. J. Li and L. Fei-Fei, “What, where and who? Classifying events by scene and object recognition,” in Proceedings of the IEEE International Conference on Computer Vision, 2007, pp. 1–8, doi: 10.1109/ICCV.2007.4408872. [7] B. Scassellati, “Theory of mind for a humanoid robot,” Autonomous Robots, vol. 12, no. 1, pp. 13–24, 2002, doi: 10.1023/A:1013298507114. [8] S. Marat, T. H. Phuoc, L. Granjon, N. Guyader, D. Pellerin, and A. Guérin-Dugué, “Spatio-temporal saliency model to predict eye movements in video free viewing,” 2008 16th European Signal Processing Conference, Lausanne, 2008, pp. 1-5. [9] Y. F. Ma and H. J. Zhang, “A model of motion attention for video skimming,” in IEEE International Conference on Image Processing, 2002, vol. 1, pp. I-129-I–132, doi: 10.1109/icip.2002.1037976. [10] S. Li and M. C. Lee, “Fast visual tracking using motion saliency in video,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2007, vol. 1, pp. I-1073-I–1076, doi: 10.1109/ICASSP.2007.366097. [11] R. J. Peters and L. Itti, “Beyond bottom-up: incorporating task-dependent influences into a computational model of spatial attention,” in 2007 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2007, pp. 1–8, doi: 10.1109/CVPR.2007.383337. [12] A. C. Schütz, D. I. Braun, and K. R. Gegenfurtner, “Object recognition during foveating eye movements,” Vision Research, vol. 49, no. 18, pp. 2241–2253, 2009, doi: 10.1016/j.visres.2009.05.022. [13] F. Zhou, S. B. Kang, and M. F. Cohen, “Time-mapping using space-time saliency,” in 2014 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2014, pp. 3358–3365, doi: 10.1109/CVPR.2014.429. [14] Z. Liu, X. Zhang, S. Luo, and O. Le Meur, “Superpixel-based spatiotemporal saliency detection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, no. 9, pp. 1522–1540, Sep. 2014, doi: 10.1109/TCSVT.2014.2308642. [15] Y. Li, S. Li, C. Chen, A. Hao and H. Qin, “Accurate and robust video saliency detection via self-paced diffusion,” in IEEE Transactions on Multimedia, vol. 22, no. 5, pp. 1153-1167, May 2020, doi: 10.1109/TMM.2019.2940851. [16] Y. Fang, G. Ding, J. Li and Z. Fang, “Deep3DSaliency: deep stereoscopic video saliency detection model by 3D convolutional networks,” in IEEE Transactions on Image Processing, vol. 28, no. 5, pp. 2305-2318, May 2019, doi: 10.1109/TIP.2018.2885229. [17] C. Chen, Y. Li, S. Li, H. Qin and A. Hao, “A novel bottom-up saliency detection method for video with dynamic background,” in IEEE Signal Processing Letters, vol. 25, no. 2, pp. 154-158, Feb. 2018, doi: 10.1109/LSP.2017.2775212. [18] T. M. Hoang and J. Zhou, “Recent trending on learning based video compression: A survey,” Cognitive Robotics, vol. 1, pp. 145– 158, 2021, doi: 10.1016/j.cogr.2021.08.003. [19] A. Borji, “Saliency prediction in the deep learning era: successes and limitations,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 2, pp. 679–700, Feb. 2021, doi: 10.1109/TPAMI.2019.2935715. [20] W. Wang, J. Shen, J. Xie, M.-M. Cheng, H. Ling, and A. Borji, “Revisiting video saliency prediction in the deep learning era,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 1, pp. 220–237, Jan. 2021, doi: 10.1109/TPAMI.2019.2924417. [21] M. Startsev and M. Dorr, “Supersaliency: a novel pipeline for predicting smooth pursuit-based attention improves generalisability of video saliency,” IEEE Access, vol. 8, pp. 1276–1289, 2020, doi: 10.1109/ACCESS.2019.2961835. [22] H. Li, F. Qi, and G. Shi, “A novel spatio-temporal 3D convolutional encoder-decoder network for dynamic saliency prediction,” IEEE Access, vol. 9, pp. 36328–36341, 2021, doi: 10.1109/ACCESS.2021.3063372. [23] F. Guo, W. Wang, Z. Shen, J. Shen, L. Shao, and D. Tao, “Motion-aware rapid video saliency detection,” in IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 12, pp. 4887-4898, Dec. 2020, doi: 10.1109/TCSVT.2019.2906226.. [24] E. Rahtu, J. Kannala, M. Salo, and J. Heikkilä, “Segmenting salient objects from images and videos,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 6315 LNCS, no. PART 5, 2010, pp. 366–379, doi: 10.1007/978-3-642-15555-0_27. [25] Y. Fang, Z. Wang, and W. Lin, “Video saliency incorporating spatiotemporal cues and uncertainty weighting,” in Proceedings - IEEE International Conference on Multimedia and Expo, Jul. 2013, pp. 1–6, doi: 10.1109/ICME.2013.6607572. [26] W. Wang, J. Shen, and F. Porikli, “Saliency-aware geodesic video object segmentation,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2015, vol. 07-12-June, pp. 3395–3402, doi: 10.1109/CVPR.2015.7298961. [27] W. Wang, J. Shen, and Ling Shao, “Consistent video saliency using local gradient flow optimization and global refinement,” IEEE Transactions on Image Processing, vol. 24, no. 11, pp. 4185–4196, Nov. 2015, doi: 10.1109/TIP.2015.2460013. [28] Z. Liu, L. Meur, and S. Luo, “Superpixel-based saliency detection,” in International Workshop on Image Analysis for Multimedia Interactive Services, Jul. 2013, pp. 1–4, doi: 10.1109/WIAMIS.2013.6616119. [29] Z. Bylinskii, T. Judd, A. Oliva, A. Torralba, and F. Durand, “what do different evaluation metrics tell us about saliency models?,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 3, pp. 740–757, Mar. 2019, doi: 10.1109/TPAMI.2018.2815601. [30] J. Wright, Y. Peng, Y. Ma, A. Ganesh, and S. Rao, “Robust principal component analysis: exact recovery of corrupted low-rank matrices by convex optimization,” in Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference, 2009, pp. 2080–2088. [31] X. Zhou, C.Yang, and W.Yu, “Moving object detection by detecting contiguous outliers in the low-rank representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 3, pp. 597–610, 2013, doi: 10.1109/TPAMI.2012.132. [32] Z. Zeng, T.-H. Chan, K. Jia, and D. Xu, “Finding correspondence from multiple images via sparse and low-rank decomposition,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7576 LNCS, no. PART 5, 2012, pp. 325–339, doi: 10.1007/978-3-642-33715-4_24. [33] P. Ji, H. Li, M. Salzmann, and Y. Dai, “Robust motion segmentation with unknown correspondences,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8694 LNCS, no. PART 6, 2014, pp. 204–219, doi: 10.1007/978-3-319-10599-4_14. [34] S. Zhu, C. Liu, and Z. Xu, “High-definition video compression system based on perception guidance of salient information of a convolutional neural network and HEVC compression domain,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 7, pp. 1–1, 2020, doi: 10.1109/TCSVT.2019.2911396. [35] S. H. Khatoonabadi, N. Vasconcelos, I. V. Bajic, and Y. Shan, “How many bits does it take for a stimulus to be salient?,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2015, vol. 07-12-June, pp. 5501–5510, doi: 10.1109/CVPR.2015.7299189. [36] V. Leboran, A. Garcia-Diaz, X. R. Fdez-Vidal, and X. M. Pardo, “Dynamic whitening saliency,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 5, pp. 893–907, May 2017, doi: 10.1109/TPAMI.2016.2567391. [37] W. Wang, J. Shen, F. Guo, M.-M. Cheng, and A. Borji, “Revisiting video saliency: a large-scale benchmark and a new model,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp. 4894–4903, doi: 10.1109/CVPR.2018.00514.
  • 7. Int J Reconfigurable & Embedded Syst ISSN: 2089-4864  Video saliency-detection using custom spatiotemporal fusion method (Vinay C. Warad) 275 [38] H. Hadizadeh and I. V. Bajic, “Saliency-aware video compression,” IEEE Transactions on Image Processing, vol. 23, no. 1, pp. 19–33, Jan. 2014, doi: 10.1109/TIP.2013.2282897. [39] M. Xu, L. Jiang, X. Sun, Z. Ye, and Z. Wang, “Learning to detect video saliency with HEVC features,” IEEE Transactions on Image Processing, vol. 26, no. 1, pp. 369–385, Jan. 2017, doi: 10.1109/TIP.2016.2628583. [40] F. Zhang, “VED100: A video-based eye-tracking dataset on visual saliency detection,” Jan 1, 2019. Distributed by Github. https://github.com/spzhubuaa/VED100-A-Video-Based-Eye-Tracking-Dataset-on-Visual-Saliency-Detection BIOGRAPHIES OF AUTHORS Vinay C. Warad working as assistant professor in department of computer science and engineering at Khawaja Bandanawaz College of Engineering. He has 8 years of teaching experience. His area of interest is video saliency, image retrieval. He can be contacted at email: vinaywarad999@gmail.com. Dr. Ruksar Fatima is a professor &head of the department for computer science and engineering, Vice principal and examination in charge at Khaja Bandanawaz College of Engineering (KBNCE) Kalaburagi, Karnataka. She is the Advisory Board Member for IJESRT (International Journal of Engineering and Research Technology). She is Member of The International Association of Engineers (IAENG). She can be contacted at email: ruksarf@gmail.com.