Low complexity features for jpeg steganalysis using undecimated dct

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO. 2, FEBRUARY 2015 219
Low-Complexity Features for JPEG Steganalysis
Using Undecimated DCT
Vojtˇech Holub and Jessica Fridrich, Member, IEEE
Abstract—This paper introduces a novel feature set for
steganalysis of JPEG images. The features are engineered as
first-order statistics of quantized noise residuals obtained from
the decompressed JPEG image using 64 kernels of the discrete
cosine transform (DCT) (the so-called undecimated DCT). This
approach can be interpreted as a projection model in the
JPEG domain, forming thus a counterpart to the projection
spatial rich model. The most appealing aspect of this proposed
steganalysis feature set is its low computational complexity, lower
dimensionality in comparison with other rich models, and a
competitive performance with respect to previously proposed
JPEG domain steganalysis features.
Index Terms—Image, steganalysis, JPEG, DCT, features.
I. INTRODUCTION
STEGANALYSIS of JPEG images is an active and highly
relevant research topic due to the ubiquitous presence of
JPEG images on social networks, image sharing portals, and
in Internet traffic in general. There exist numerous stegano-
graphic algorithms specifically designed for the JPEG domain.
Such tools range from easy-to-use applications incorporat-
ing quite simplistic data hiding methods to advanced tools
designed to avoid detection by a sophisticated adversary.
According to the information provided by Wetstone Technolo-
gies, Inc, a company that keeps an up-to-date comprehensive
list of all software applications capable of hiding data in
electronic files, as of March 2014 a total of 349 applications
that hide data in JPEG images were available for download. 1
Historically, two different approaches to steganalysis have
been developed. One can start by adopting a model for the
statistical distribution of DCT coefficients in a JPEG file
and design the detector using tools of statistical hypothesis
testing [7], [30], [34]. In the second, much more common
approach, a representation of the image (a feature) is iden-
tified that reacts sensitively to embedding but does not vary
much due to image content. For some simple steganographic
methods that introduce easily identifiable artifacts, such as
Jsteg, it is often possible to identify a scalar feature – an
estimate of the payload length [4], [19], [31]–[33].
Manuscript received April 5, 2014; revised August 25, 2014; accepted
October 15, 2014. Date of publication October 23, 2014; date of current
version December 29, 2014. The work was supported by the Air Force Office
of Scientific Research, Arlington, VA, USA, under Grant FA9950-12-1-0124.
The associate editor coordinating the review of this manuscript and approving
it for publication was Prof. Hitoshi Kiya.
The authors are with the Department of Electrical and Computer Engi-
neering, Binghamton University, Binghamton, NY 13902 USA (e-mail:
vholub1@binghamton.edu; fridrich@binghamton.edu).
Digital Object Identifier 10.1109/TIFS.2014.2364918
1Personal communication by Chet Hosmer, CEO of Wetstone Tech.
More sophisticated embedding algorithms usually require
higher-dimensional feature representation to obtain more accu-
rate detection. In this case, the detector is typically built
using machine learning through supervised training during
which the classifier is presented with features of cover as well
as stego images. Alternatively, the classifier can be trained
that recognizes only cover images and marks all outliers as
suspected stego images [26], [28]. Recently, Ker and Pevný
proposed to shift the focus from identifying stego images
to identifying “guilty actors,” e.g., Facebook users, using
unsupervised clustering over actors in the feature space [17].
Irrespectively of the chosen detection philosophy, the most
important component of the detectors is the feature space –
their detection accuracy is directly tied to the ability of the
features to capture the steganographic embedding changes.
Selected examples of popular feature sets proposed for
detection of steganography in JPEG images are the historically
first image quality metric features [1], first-order statistics of
wavelet coefficients [8], Markov features formed by sample
intra-block conditional probabilities [29], inter- and intra-
block co-occurrences of DCT coefficients [6], the PEV feature
vector [27], inter and intra-block co-occurrences calibrated
by difference and ratio [23], and the JPEG Rich
Model (JRM) [20]. Among the more general techniques that
were identified as improving the detection performance is the
calibration by difference and Cartesian calibration [18], [23].
By inspecting the literature on features for steganalysis, one
can observe a general trend – the features’ dimensionality
is increasing, a phenomenon elicited by developments in
steganography. More sophisticated steganographic schemes
avoid introducing easily detectable artifacts and more
information is needed to obtain better detection. To address
the increased complexity of detector training, simpler
machine learning tools were proposed that better scale w.r.t.
feature dimensionality, such as the FLD-ensemble [21] or
the perceptron [25]. Even with more efficient classifiers,
however, the obstacle that may prevent practical deployment
of high-dimensional features is the time needed to extract the
feature [3], [13], [16], [22].
In this article, we propose a novel feature set for
JPEG steganalysis, which enjoys low complexity, relatively
small dimension, yet provides competitive detection perfor-
mance across all tested JPEG steganographic algorithms.
The features are built as histograms of residuals obtained
using the basis patterns used in the DCT. The feature
extraction thus requires computing mere 64 convolutions of
the decompressed JPEG image with 64 8 × 8 kernels and
1556-6013 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
For More Details Contact G.Venkat Rao
PVR TECHNOLOGIES 8143271457

220 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO. 2, FEBRUARY 2015
forming histograms. The features can also be interpreted
in the DCT domain, where their construction resembles
the PSRM with non-random orthonormal projection vectors.
Symmetries of these patterns are used to further compactify the
features and make them better populated. The proposed
features are called DCTR features (Discrete Cosine Transform
Residual).
In the next section, we introduce the undecimated DCT,
which is the first step in computing the DCTR features.
Here, we explain the essential properties of the undecimated
DCT and point out its relationship to calibration and other
previous art. The complete description of the proposed DCTR
feature set as well as experiments aimed at determining
the free parameters appear in Section III. In Section IV,
we report the detection accuracy of the DCTR feature set
on selected JPEG domain steganographic algorithms. The
results are contrasted with the performance obtained using
current state-of-the-art rich feature sets, including the JPEG
Rich Model and the Projection Spatial Rich Model. The
paper is concluded in Section V, where we discuss future
directions.
A condensed version of this paper was submitted to
the IEEE Workshop on Information Security and Foren-
sics (WIFS) 2014.
II. UNDECIMATED DCT
In this section, we describe the undecimated DCT and
study its properties relevant for building the DCTR feature
set in the next section. Since the vast majority of stegano-
graphic schemes embed data only in the luminance component,
we limit the scope of this paper to grayscale JPEG images.
For easier exposition, we will also assume that the size of all
images is a multiple of 8.
A. Description
Given an M × N grayscale image X ∈ RM×N , the undeci-
mated DCT is defined as a set of 64 convolutions with 64 DCT
basis patterns B(k,l):
U(X) = {U(k,l)
|0 ≤ k,l ≤ 7}
U(k,l)
= X B(k,l)
, (1)
where U(k,l) ∈ R(M−7)×(N−7) and ‘ ’ denotes a convolution
without padding. The DCT basis patterns are 8 × 8 matrices,
B(k,l) = (B
(k,l)
mn ), 0 ≤ m, n ≤ 7:
B(k,l)
mn =
wkwl
4
cos
πk(2m + 1)
16
cos
πl(2n + 1)
16
, (2)
and w0 = 1/
√
2, wk = 1 for k > 0.
When the image is stored in the JPEG format, before
computing its undecimated DCT it is first decompressed to
the spatial domain without quantizing the pixel values to
{0, . . . , 255} to avoid any loss of information.
For better readability, from now on we will reserve the
indices i, j and k,l to index DCT modes (spatial frequencies);
they will always be in the range 0 ≤ i, j, k,l ≤ 7.
1) Relationship to Prior Art: The undecimated DCT has
already found applications in steganalysis. The concept of cali-
bration, for the first time introduced in the targeted quantitative
attack on the F5 algorithm [9], formally consists of computing
the undecimated DTC, subsampling it on an 8×8 grid shifted
by four pixels in each direction, and computing a reference
feature vector from the subsampled and quantized signal.
Liu [23] made use of the entire transform by computing 63
inter- and intra-block 2D co-occurrences from all possible
JPEG grid shifts and averaging them to form a more powerful
reference feature that was used for calibration by difference
and by ratio. In contrast, in this paper we avoid using the
undecimated DCT to form a reference feature, and, instead
keep the statistics collected from all shifts separated.
B. Properties
First, notice that when subsampling the convolution U(i, j) =
X B(i, j) on the grid G8×8 = {0, 7, 15, . . . , M − 9} ×
{0, 7, 15, . . ., N − 9} (circles in Figure 1 on the left), one
obtains all unquantized values of DCT coefficients for DCT
mode (i, j) that form the input into the JPEG representation
of X.
We will now take a look at how the values of the
undecimated DCT U(X) are affected by changing one DCT
coefficient of the JPEG representation of X. Suppose one
modifies a DCT coefficient in mode (k,l) in the JPEG file
corresponding to (m, n) ∈ G8×8. This change will affect all
8 × 8 pixels in the corresponding block and an entire 15 × 15
neighborhood of values in U(i, j) centered at (m, n) ∈ G8×8.
In particular, the values will be modified by what we call the
“unit response”
R(i, j)(k,l)
= B(i, j)
⊗ B(k,l)
, (3)
where ⊗ denotes the full cross-correlation. While this unit
response is not symmetrical, its absolute values are symmet-
rical by both axes: |R
(i, j)(k,l)
a,b | = |R
(i, j)(k,l)
−a,b |, |R
(i, j)(k,l)
a,b | =
|R
(i, j)(k,l)
a,−b | for all 0 ≤ a, b ≤ 7 when indexing R ∈ R15×15
with indices in {−7, . . . , −1, 0, 1, . . . , 7}.
Figure 2 shows two examples of unit responses. Note that
the value at the center (0, 0) is zero for the response on the
left and 1 for the response on the right. This central value
equals to 1 only when i = k and j = l.
We now take a closer look at how a particular value
u ∈ U(i, j) is computed. First, we identify the four neighbors
from the grid G8×8 that are closest to u (follow Figure 1
where the location of u is marked by a triangle). We will
capture the position of u w.r.t. to its four closest neighbors
from G8×8 using relative coordinates. With respect to the
upper left neighbor (A), u is at position (a, b), 0 ≤ a, b, ≤ 7
((a, b) = (3, 2) in Figure 1). The relative positions w.r.t. the
other three neighbors (B–D) are, correspondingly, (a, b − 8),
(a − 8, b), and (a − 8, b − 8). Also recall that the elements
of U(i, j) collected across all (i, j), 0 ≤ i, j ≤ 7, at A, form
all non-quantized DCT coefficients corresponding to the 8×8
block A (see, again Figure 1).
Arranging the DCT coefficients from the neighboring blocks
A–D into 8×8 matrices Akl, Bkl, Ckl and Dkl, where k and l

HOLUB AND FRIDRICH: LOW-COMPLEXITY FEATURES FOR JPEG STEGANALYSIS 221
Fig. 1. Left: Dots correspond to elements of U(i, j) = X B(i, j), circles correspond to grid points from G8×8 (DCT coefficients in the JPEG representation
of X). The triangle is an element u ∈ U(i, j) with relative coordinates (a, b) = (3, 2) w.r.t. its upper left neighbor (A) from G8×8. Right: JPEG representation
of X when replacing each 8 × 8 pixel block with a block of quantized DCT coefficients.
denote the horizontal and vertical spatial frequencies in the
8 × 8 DCT block, respectively, u ∈ U(i, j) can be expressed as
u =
7
k=0
7
l=0
Qkl Akl R
(i, j)(k,l)
a,b + Bkl R
(i, j)(k,l)
a,b−8
+ Ckl R
(i, j)(k,l)
a−8,b + Dkl R
(i, j)(k,l)
a−8,b−8 , (4)
where the subscripts in R
(i, j)(k,l)
a,b capture the position of u w.r.t.
its upper left neighbor and Qkl is the quantization step of the
(k,l)-th DCT mode. This can be written as a projection of
256 dequantized DCT coefficients from four adjacent blocks
from the JPEG file with a projection vector p
(i, j)
a,b
u =
⎛
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎝
Q00 A00
...
Q77 A77
Q00 B00
...
Q77 B77
...
Q00 D00
...
Q77 D77
⎞
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎠
T
·
⎛
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎝
R
(i, j)(1,1)
a,b
...
R
(i, j)(8,8)
a,b
R
(i, j)(1,1)
a−8,b
...
R
(i, j)(8,8)
a−8,b
...
R
(i, j)(1,1)
a−8,b−8
...
R
(i, j)(8,8)
a−8,b−8
⎞
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎠
p
(i, j)
a,b
. (5)
It is proved in Appendix that the projection vectors
form an orthonormal system satisfying for all (a, b), (i, j),
Fig. 2. Examples of two unit responses scaled so that medium gray
corresponds to zero.
and (k,l)
p
(i, j)T
a,b · p
(k,l)
a,b = δ(i, j),(k,l), (6)
where δ is the Kronecker delta. Projection vectors that are
too correlated (in the extreme case, linearly dependent) would
lead to undesirable redundancy (near duplication) of fea-
ture elements. Orthonormal (uncorrelated) projection vectors
increase features’ diversity and provide better dimensionality-
to-detection ratio.
The projection vectors also satisfy the following symmetry
p
(i, j)
a,b = p
(i, j)
a,b−8 = p
(i, j)
a−8,b = p
(i, j)
a−8,b−8 (7)
for all i, j and a, b when interpreting the arithmetic operations
on indices as mod 8.
III. DCTR FEATURES
The DCTR features are built by quantizing the absolute
values of all elements in the undecimated DCT and collecting

TABLE I
HISTOGRAMS ha,b TO BE MERGED ARE LABELED WITH THE SAME
LETTER. ALL 64 HISTOGRAMS CAN THUS BE MERGED INTO 25.
LIGHT SHADING DENOTES MERGING OF FOUR HISTOGRAMS,
MEDIUM SHADING TWO HISTOGRAMS, AND DARK
SHADING DENOTES NO MERGING
the first-order statistic separately for each mode (k,l) and each
relative position (a, b), 0 ≤ a, b ≤ 7. Formally, for each
(k,l) we define the matrix2 U(k,l)
a,b ∈ R(M−8)/8×(N−8)/8 as a
submatrix of U(k,l) with elements whose relative coordinates
w.r.t. the upper left neighbor in the grid G8×8 are (a, b). Thus,
each U(k,l) = ∪7
a,b=0U(k,l)
a,b and U(k,l)
a,b ∩ U(k,l)
a ,b = ∅ whenever
(a, b) = (a , b ). The feature vector is formed by normalized
histograms for 0 ≤ k,l ≤ 7, 0 ≤ a, b ≤ 7:
h
(k,l)
a,b (r) =
1
U(k,l)
a,b u∈U
(k,l)
a,b
[QT (|u|/q) = r], (8)
where QT is a quantizer with integer centroids {0, 1, . . . , T },
q is the quantization step, and [P] is the Iverson bracket
equal to 0 when the statement P is false and 1 when P is
true. We note that q could potentially depend on a, b as well
as the DCT mode indices k,l, and the JPEG quality factor
(see Section III-D for more discussions).
Because U(k,l) = X B(k,l) and the sum of all ele-
ments of B(k,l) is zero (they are DCT modes (2)) each
U(k,l) is an output of a high-pass filter applied to X. For
natural images X, the distribution of u ∈ U
(k,l)
a,b will thus
be approximately symmetrical and centered at 0 for all a, b,
which allows us to work with absolute values of u ∈ U(k,l)
a,b
giving the features a lower dimension and making them better
populated.
Due to the symmetries of projection vectors (7), it is
possible to further decrease the feature dimensionality by
adding together the histograms corresponding to indices (a, b),
(a, 8−b), (8−a, b), and (8−a, 8−b) under the condition that
these indices stay within {0, . . . , 7}× {0, . . . , 7} (see Table I).
Note that for (a, b) ∈ {1, 2, 3, 5, 6, 7}2
, we merge four
histograms. When exactly one element of (a, b) is in {0, 4},
only two histograms are merged, and when both a and b
are in {0, 4} there is only one histogram. Thus, the total
dimensionality of the symmetrized feature vector is 64 ×
(36/4 + 24/2 + 4) × (T + 1) = 1600 × (T + 1).
2Since U(k,l) ∈ R(M−7)×(N−7), the height (width) of U
(k,l)
a,b is larger by
one when a = 0 (b = 0).
In the rest of this section, we provide experimental evi-
dence that working with absolute values and symmetrizing
the features indeed improves the detection accuracy. We also
experimentally determine the proper values of the threshold T
and the quantization step q, and evaluate the performance of
different parts of the DCTR feature vector w.r.t. the DCT mode
indices k,l.
A. Experimental Setup
All experiments in this section are carried out on BOSSbase
1.01 [2] containing 10,000 grayscale 512 × 512 images.
All detectors were trained as binary classifiers imple-
mented using the FLD ensemble [21] with default settings
available from http://dde.binghamton.edu/download/ensemble.
As described in the original publication [21], the ensemble
by default minimizes the total classification error probability
under equal priors PE. The random subspace dimensionality
and the number of base learners is found by minimizing
the out-of-bag (OOB) estimate of the testing error, EOOB,
on bootstrap samples of the training set. We also use EOOB
to report the detection performance since it is an unbiased
estimate of the testing error on unseen data [5]. For experi-
ments in Sections III-B–III-E, the steganographic method was
J-UNIWARD at 0.4 bit per non-zero AC DCT coefficient
(bpnzAC) with JPEG quality factor 75. We selected this
steganographic method as an example of a state-of-the-art data
hiding method for the JPEG domain.
B. Symmetrization Validation
In this section, we experimentally validate the feature sym-
metrization. We denote by EOOB(X) the OOB error obtained
when using features X. The histograms concatenated over the
DCT mode indices will be denoted as
ha,b =
7
k,l=0
h
(k,l)
a,b . (9)
For every combination of indices a, b, c, d ∈ {0, . . . , 7}2
,
we computed three types of error (the symbol ‘&’ means
feature concatenation):
1) E
Single
a,b EOOB(ha,b)
2) EConcat
(a,b),(c,d) EOOB(ha,b ∨ hc,d)
3) E
Merged
(a,b),(c,d) EOOB(ha,b + hc,d)
to see the individual performance of the features across the
relative indices (a, b) as well as the impact of concatenating
and merging the features on detectability. In the following
experiments, we fixed q = 4 and T = 4. This gave each
feature ha,b the dimensionality of 64 × (T + 1) = 320 (the
number of JPEG modes, 64, times the number of quantization
bins T + 1 = 5).
Table II informs us about the individual performance of
features ha,b. Despite the rather low dimensionality of 320,
every ha,b achieves a decent detection rate by itself (c.f.,
Figure 4 in Section IV).
The next experiment was aimed at assessing the loss of
detection accuracy when merging histograms corresponding

TABLE II
ESINGLE
a,b IS THE DETECTION OOB ERROR WHEN
STEGANALYZING WITH ha,b
TABLE III
EMERGED
(a,b),(c,d) − ECONCAT
(a,b),(c,d) FOR (a, b) AS A FUNCTION OF (c, d)
TABLE IV
EOOB(h(k,l)) AS A FUNCTION OF k,l
to different relative coordinates as opposed to concatenating
them. When this drop of accuracy is approximately zero, both
feature sets can be merged. Table III shows the detection
drop E
Merged
(a,b),(c,d) − EConcat
(a,b),(c,d) when merging h1,2 with hc,d
as a function of c, d. The results clearly show which features
should be merged; they are also consistent with the symmetries
analyzed in Section II-B.
C. Mode Performance Analysis
In this section, we analyze the performance of the DCTR
features by DCT modes when steganalyzing with the merger
h(k,l) 7
a,b=0 h
(k,l)
a,b of dimension 25 × (T + 1) = 125.
Table I explains why the total number of histograms can be
reduced from 64 to 25 by merging histograms for different
shifts a, b. Interestingly, as Table IV shows, for J-UNIWARD
Fig. 3. The effect of feature quantization without normalization (top charts)
and with normalization (bottom charts) on detection accuracy.
the histograms corresponding to high frequency modes provide
the same or better distinguishing power than those of low
frequencies.
D. Feature Quantization and Normalization
In this section, we investigate the effect of quantization and
feature normalization on the detection performance.
We carried out experiments for two quality factors,
75 and 95, and studied the effect of the quantization step q
on detection accuracy (the two top charts in Figure 3).
Additionally, we also investigated whether it is advanta-
geous, prior to quantization, to normalize the features by
the DCT mode quantization step, Qkl, and by scaling U(k,l)
to a zero mean and unit variance (the two bottom charts
in Figure 3).
Figure 3 shows that the effect of feature normalization is
quite weak and it appears to be slightly more advantageous
to not normalize the features and keep the feature design
simple. The effect of the quantization step q is, however, much
stronger. For quality factor 75 (95), the optimal quantization
steps were 4 (0.8). Thus, we opted for the following linear ﬁt3
to obtain the proper value of q for an arbitrary quality factor
3Coincidentally, the term in the bracket corresponds to the multiplier used
for computing standard quantization matrices.

Fig. 4. Detection error EOOB for J-UNIWARD for quality factors 75 and 95
when steganalyzed with the proposed DCTR and other rich feature sets.
TABLE V
EOOB OF THE ENTIRE DCTR FEATURE SET WITH DIMENSIONALITY
1600 × (T + 1) AS A FUNCTION OF THE THRESHOLD T
FOR J-UNIWARD AT 0.4 BPNZAC
in the range 50 ≤ K ≤ 99:
qK = 8 × 2 −
K
50
. (10)
E. Threshold
As Table V shows, the detection performance is quite insen-
sitive to the threshold T. Although the best performance is
achieved with T = 6, the gain is negligible compared to
the dimensionality increase. Thus, in this paper we opted
for T = 4 as a good compromise between performance and
detectability.
Fig. 5. Detection error EOOB for UED with ternary embedding for quality
factors 75 and 95 when steganalyzed with the proposed DCTR and other rich
feature sets.
To summarize, the ﬁnal form of DCTR features includes the
symmetrization as explained in Section III, no normalization,
quantization according to (10), and T = 4. This gives the
DCTR set the dimensionality of 8,000.
IV. EXPERIMENTS
In this section, we subject the newly proposed DCTR feature
set to tests on selected state-of-the-art JPEG steganographic
schemes as well as examples of older embedding schemes.
Additionally, we contrast the detection performance to previ-
ously proposed feature sets. Each time a separate classiﬁer
is trained for each image source, embedding method, and
payload to see the performance differences.
Figures 4, 5 and 6 show the detection error EOOB for
J-UNIWARD [14], ternary-coded UED (Uniform Embed-
ding Distortion) [12], and nsF5 [11] achieved using the
proposed DCTR, the JPEG Rich Model (JRM) [20] of
dimension 22,510, the 12,753-dimensional version of the
Spatial Rich Model called SRMQ1 [10], the merger of JRM
and SRMQ1 abbreviated as JSRM (dimension 35,263), and
the 12,870 dimensional Projection Spatial Rich Model with

Fig. 6. Detection error EOOB for nsF5 for quality factors 75 and 95 when
steganalyzed with the proposed DCTR and other rich feature sets.
quantization step 3 specially designed for the JPEG domain
(PSRMQ3) [13]. When interpreting the results, one needs
to take into account the fact that the DCTR has by far the
lowest dimensionality and computational complexity of all
tested feature sets.
The most significant improvement is seen for J-UNIWARD,
even though it remains very difficult to detect. Despite its com-
pactness and a significantly lower computational complexity,
the DCTR set is the best performer for the higher quality factor
and provides about the same level of detection as PSRMQ3 for
quality factor 75. For the ternary UED, the DCTR is the best
performer for the higher JPEG quality factor for all but the
largest tested payload. For quality factor 75, the much larger
35,263-dimensional JSRM gives a slightly better detection.
The DCTR also provides quite competitive detection for nsF5.
The detection accuracy is roughly at the same level as for the
22,510-dimensional JRM.
The DCTR feature set is also performing quite well
against the state-of-the-art side-informed JPEG algorithm
SI-UNIWARD [14] (Figure 7). On the other hand,
Fig. 7. Detection error EOOB for the side-informed SI-UNIWARD for quality
factors 75 and 95 when steganalyzed with the proposed DCTR and other rich
feature sets. Note the different scale of the y axis.
JSRM and JRM are better suited to detect NPQ [15] (Figure 8).
This is likely because NPQ introduces (weak) embedding
artifacts into the statistics of JPEG coefficients that are easier
to detect by the JRM, whose features are entirely built as
co-occurrences of JPEG coefficients. We also point out the
saturation of the detection error below 0.5 for quality factor
95 and small payloads for both schemes. This phenomenon,
which was explained in [14], is caused by the tendency of both
algorithms to place embedding changes into four specific DCT
coefficients.
In Table VI, we take a look at how complementary the
DCTR features are in comparison to the other rich models.
This experiment was run only for J-UNIWARD at 0.4 bpnzAC.
The DCTR seems to well complement PSRMQ3 as this
20,870-dimensional merger achieves so far the best detection
of J-UNIWARD, decreasing EOOB by more than 3% w.r.t.
the PSRMQ3 alone. Next, we report on the computational
complexity when extracting the feature vector using a Matlab
code. The extraction of the DCTR feature vector for one
BOSSbase image is twice as fast as JRM, ten times faster

Fig. 8. Detection error EOOB for the side-informed NPQ for quality factors
75 and 95 when steganalyzed with the proposed DCTR and other rich feature
sets.
TABLE VI
DETECTION OF J-UNIWARD AT PAYLOAD 0.4 BPNZAC WHEN MERGING
VARIOUS FEATURE SETS. THE TABLE ALSO SHOWS THE FEATURE
DIMENSIONALITY AND TIME REQUIRED TO EXTRACT A SINGLE
FEATURE FOR ONE BOSSBASE IMAGE ON AN INTEL I5
2.4 GHZ COMPUTER PLATFORM
than SRMQ1, and almost 200 times faster than the PSRMQ3.
Furthermore, a C++ (Matlab MEX) implementation takes
only between 0.5–1 sec.
V. CONCLUSION
This paper introduces a novel feature set for steganalysis
of JPEG images. Its name is DCTR because the features are
computed from noise residuals obtained using the 64 DCT
bases. Its main advantage over previous art is its relatively low
dimensionality (8,000) and a significantly lower computational
complexity while achieving a competitive detection across
many JPEG algorithms. These qualities make DCTR a good
candidate for building practical steganography detectors and
in steganalysis applications where the detection accuracy and
the feature extraction time are critical.
The DCTR feature set utilizes the so-called undecimated
DCT. This transform has already found applications in
steganalysis in the past. In particular, the reference features
used in calibration are essentially computed from the undec-
imated DCT subsampled on an 8 × 8 grid shifted w.r.t. the
JPEG grid. The main point of this paper is the discovery that
the undecimated DCT contains much more information that is
quite useful for steganalysis.
In the spatial domain, the proposed feature set can be
interpreted as a family of one-dimensional co-occurrences
(histograms) of noise residuals obtained using kernels formed
by DCT bases. Furthermore, the feature set can also be viewed
in the JPEG domain as a projection-type model with orthonor-
mal projection vectors. Curiously, we were unable to improve
the detection performance by forming two-dimensional co-
occurrences instead of first-order statistics. This is likely
because the neighboring elements in the undecimated DCT are
qualitatively different projections of DCT coefficients, making
the neighboring elements essentially independent.
We contrast the detection accuracy and computational
complexity of DCTR with four other rich models
when used for detection of five JPEG steganographic
methods, including two side-informed schemes. The
code for the DCTR feature vector is available from
http://dde.binghamton.edu/download/feature_extractors/ (note
for the reviewers: the code will be posted upon acceptance of
this manuscript).
Finally, we would like to mention that it is possible that
the DCTR feature set will be useful for forensic applications,
such as [24], since many feature sets originally designed for
steganalysis found applications in forensics. We consider this
as a possible future research direction.
APPENDIX
ORTHONORMALITY OF PROJECTION VECTORS IN
UNDECIMATED DCT
Here, we provide the proof of orthonormality (6) of vectors
p(k,l)
a,b defined in (5). It will be useful to follow Figure 9 for
easier understanding. For each a, b, 0 ≤ a, b ≤ 7, the (i, j)th
DCT basis pattern B(i, j) positioned so that its upper left corner
has relative index (a, b) is split into four 8 × 8 subpatterns:
κ stands for cirκle, μ stands for diaμond, τ for τriangle, and
σ for σtar:
κ
(i, j)
mn =
⎧
⎪⎨
⎪⎩
B
(i, j)
m−a,n−b
a ≤ m ≤ 7
b ≤ n ≤ 7
0 otherwise

Fig. 9. Diagram showing the auxiliary patterns κ (cirκle), μ (diaμond),
τ (τriangle), and σ (σtar). The black square outlines the position of the DCT
basis pattern B(i, j).
μ
(i, j)
mn =
⎧
⎪⎨
⎪⎩
B
(i, j)
m−a,8+n−b
a ≤ m ≤ 7
0 ≤ n < b
0 otherwise
τ
(i, j)
mn =
⎧
⎪⎨
⎪⎩
B
(i, j)
8+m−a,n−b
0 ≤ m < a
b ≤ n ≤ 7
0 otherwise.
σ
(i, j)
mn =
⎧
⎪⎨
⎪⎩
B
(i, j)
8+m−a,8+n−b
0 ≤ m < a
0 ≤ n < b
0 otherwise
In Figure 9 top, the four patterns are shown using four
different markers. The light-color markers correspond to zeros.
The first 64 elements of p
(i, j)
a,b are simply projections of
κ
(i, j)
mn onto the 64 patterns forming the DCT basis. The next
64 elements are projections of μ
(i, j)
mn onto the DCT basis, the
next 64 are projections of τ
(i, j)
mn , and the last 64 are projections
of σ
(i, j)
mn . We will denote these projections with the same Greek
letters but with a single index instead: (κ
(i, j)
1 , . . . , κ
(i, j)
64 ),
(μ
(i, j)
1 , . . . , μ
(i, j)
64 ), (τ
(i, j)
1 , . . . , τ
(i, j)
64 ), and (σ
(i, j)
1 , . . . , σ
(i, j)
64 ).
In terms of the introduced notation,
p
(i, j)T
a,b · p
(k,l)
a,b =
64
r=1
κ
(i, j)
r κ(k,l)
r +
64
r=1
μ
(i, j)
r μ(k,l)
r
+
64
r=1
τ
(i, j)
r τ(k,l)
r +
64
r=1
σ
(i, j)
r σ(k,l)
r . (11)
Note that the sum κ(i, j) +μ(i, j) +τ(i, j) +σ(i, j) is the entire
DCT mode (i, j) split into four pieces and rearranged back
together to form an 8 × 8 block (Figure 9 botom). For fixed
a, b, due to the orthonormality of DCT modes (i, j) and (k,l),
κ(i, j) +μ(i, j) +τ(i, j) +σ(i, j) and κ(k,l) +μ(k,l) +τ(k,l) +σ(k,l)
are thus also orthonormal and so are their projections onto the
DCT basis (because the DCT transform is orthonormal):
64
r=1
(κ
(i, j)
r + μ
(i, j)
r + τ
(i, j)
r + σ
(i, j)
r )
×(κ(k,l)
r + μ(k,l)
r + τ(k,l)
r + σ(k,l)
r ) = δ(i, j),(k,l). (12)
The orthonormality now follows from the fact that the LHS
of (12) and the RHS of (11) have the exact same value
because the sum of every mixed term in (12) is zero (e.g.,
64
r=1 κ
(i, j)
r τ
(k,l)
r = 0, etc.). This is because the subpatterns
κ(i, j) and τ(k,l) have disjoint supports (their dot product in the
spatial domain is 0 and thus the product in the DCT domain
is also 0 because DCT is orthonormal).
ACKNOWLEDGMENT
The U.S. Government is authorized to reproduce and dis-
tribute reprints for Governmental purposes notwithstanding
any copyright notation there on. The views and conclusions
contained herein are those of the authors and should not be
interpreted as necessarily representing the official policies,
either expressed or implied of AFOSR or the U.S. Govern-
ment.
REFERENCES
[1] I. Avcibas, N. D. Memon, and B. Sankur, “Steganalysis of watermark-
ing techniques using image quality metrics,” Proc. SPIE, vol. 4314,
pp. 523–531, Jan. 2001.
[2] P. Bas, T. Filler, and T. Pevný, “‘Break our steganographic system’: The
ins and outs of organizing BOSS,” in Proc. 13th Int. Conf. Inf. Hiding,
Prague, Czech Republic, May 2011, pp. 59–70.
[3] S. Bayram, A. E. Dirik, H. T. Sencar, and N. Memon, “An ensemble
of classifiers approach to steganalysis,” in Proc. 20th Int. Conf. Pattern
Recognit. (ICPR), Istanbul, Turkey, Aug. 2010, pp. 4376–4379.
[4] R. Böhme, “Weighted stego-image steganalysis for JPEG covers,” in
Proc. 10th Int. Workshop Inf. Hiding, vol. 5284, pp. 178–194, Jun. 2007.
[5] L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, no. 2,
pp. 123–140, Aug. 1996.
[6] C. Chen and Y. Q. Shi, “JPEG image steganalysis utilizing both
intrablock and interblock correlations,” in Proc. IEEE Int. Symp. Circuits
Syst. (ISCAS), Seattle, WA, USA, May 2008, pp. 3029–3032.
[7] R. Cogranne and F. Retraint, “Application of hypothesis testing theory
for optimal detection of LSB matching data hiding,” Signal Process.,
vol. 93, no. 7, pp. 1724–1737, Jul. 2013.
[8] H. Farid and L. Siwei, “Detecting hidden messages using higher-order
statistics and support vector machines,” in Proc. 5th Int. Workshop Inf.
Hiding, Oct. 2002, pp. 340–354.
[9] J. Fridrich, M. Goljan, and D. Hogea, “Steganalysis of JPEG images:
Breaking the F5 algorithm,” in Proc. 5th Int. Workshop Inf. Hiding,
Oct. 2002, pp. 310–323.

[10] J. Fridrich and J. Kodovský, “Rich models for steganalysis of dig-
ital images,” IEEE Trans. Inf. Forensics Security, vol. 7, no. 3,
pp. 868–882, Jun. 2011.
[11] J. Fridrich, T. Pevný, and J. Kodovský, “Statistically undetectable JPEG
steganography: Dead ends challenges, and opportunities,” in Proc. 9th
ACM Multimedia Security Workshop, Sep. 2007, pp. 3–14.
[12] L. Guo, J. Ni, and Y.-Q. Shi, “An efficient JPEG steganographic
scheme using uniform embedding,” in Proc. 4th IEEE Int. Workshop
Inf. Forensics Security, Tenerife, Spain, Dec. 2012, pp. 169–174.
[13] V. Holub and J. Fridrich, “Random projections of residuals for digital
image steganalysis,” IEEE Trans. Inf. Forensics Security, vol. 8, no. 12,
pp. 1996–2006, Dec. 2013.
[14] V. Holub and J. Fridrich, “Universal distortion design for steganography
in an arbitrary domain,” EURASIP J. Inf. Security, vol. 2014, no. 1,
pp. 1–13, 2014.
[15] F. Huang, J. Huang, and Y.-Q. Shi, “New channel selection rule for
JPEG steganography,” IEEE Trans. Inf. Forensics Security, vol. 7, no. 4,
pp. 1181–1191, Aug. 2012.
[16] A. D. Ker, “Implementing the projected spatial rich features on a GPU,”
Proc. SPIE, vol. 9028, pp. 1801–1810, Feb. 2014.
[17] A. D. Ker and T. Pevný, “Identifying a steganographer in
realistic and heterogeneous data sets,” Proc. SPIE, vol. 8303,
pp. 83030N-1–83030N-13, Jan. 2012.
[18] A. D. Ker and T. Pevný, “Calibration revisited,” in Proc. 11th ACM
Multimedia Security Workshop, Sep. 2009, pp. 63–74.
[19] J. Kodovský and J. Fridrich, “Quantitative structural steganalysis of
Jsteg,” IEEE Trans. Inf. Forensics Security, vol. 5, no. 4, pp. 681–693,
Dec. 2010.
[20] J. Kodovský and J. Fridrich, “Steganalysis of JPEG images using rich
models,” Proc. SPIE, vol. 8303, pp. 83030A-1–83030A-13, Jan. 2012.
[21] J. Kodovský, J. Fridrich, and V. Holub, “Ensemble classifiers for
steganalysis of digital media,” IEEE Trans. Inf. Forensics Security, vol. 7,
no. 2, pp. 432–444, Apr. 2012.
[22] L. Li, H. T. Sencar, and N. Memon, “A cost-effective deci-
sion tree based approach to steganalysis,” Proc. SPIE, vol. 8665,
pp. 86650P-1–86650P-7, Feb. 2013.
[23] Q. Liu, “Steganalysis of DCT-embedding based adaptive steganogra-
phy and YASS,” in Proc. 13th ACM Multimedia Security Workshop,
Sep. 2011, pp. 77–86.
[24] Q. Liu and Z. Chen, “Improved approaches to steganalysis and seam-
carved forgery detection in JPEG images,” ACM Trans. Intell. Syst. Tech.
Syst., vol. 5, no. 4, pp. 39:1–39:30, 2014.
[25] I. Lubenko and A. D. Ker, “Going from small to large data in steganaly-
sis,” Proc. SPIE, vol. 8303, pp. 83030M-1–83030M-10, Jan. 2012.
[26] S. Lyu and H. Farid, “Steganalysis using higher-order image statis-
tics,” IEEE Trans. Inf. Forensics Security, vol. 1, no. 1, pp. 111–119,
Mar. 2006.
[27] T. Pevný and J. Fridrich, “Merging Markov and DCT fea-
tures for multi-class JPEG steganalysis,” Proc. SPIE, vol. 6505,
pp. 650503-1–650503-14, Feb. 2007.
[28] T. Pevný and J. Fridrich, “Novelty detection in blind steganaly-
sis,” in Proc. 10th ACM Multimedia Security Workshop, Sep. 2008,
pp. 167–176.
[29] Y. Q. Shi, C. Chen, and W. Chen, “A Markov process based approach
to effective attacking JPEG steganography,” in Proc. 8th Int. Workshop
Inf. Hiding, Jul. 2006, pp. 249–264.
[30] T. H. Thai, R. Cogranne, and F. Retraint, “Statistical model of quantized
DCT coefficients: Application in the steganalysis of Jsteg algorithm,”
IEEE Trans. Image Process., vol. 23, no. 5, pp. 1980–1993, May 2014.
[31] A. Westfeld, “Generic adoption of spatial steganalysis to trans-
formed domain,” in Proc. 10th Int. Workshop Inf. Hiding, Jun. 2007,
pp. 161–177.
[32] A. Westfeld and A. Pfitzmann, “Attacks on steganographic systems,” in
Proc. 3rd Int. Workshop Inf. Hiding, Sep./Oct. 1999, pp. 61–75.
[33] T. Zhang and X. Ping, “A fast and effective steganalytic technique
against JSteg-like algorithms,” in Proc. ACM Symp. Appl. Comput.,
Melbourne, FL, USA, Mar. 2003, pp. 307–311.
[34] C. Zitzmann, R. Cogranne, L. Fillatre, I. Nikiforov, F. Retraint,
and P. Cornu, “Hidden information detection based on quantized
Laplacian distribution,” in Proc. IEEE ICASSP, Kyoto, Japan, Mar. 2012,
pp. 1793–1796.
Vojtˇech Holub is currently a Research and Develop-
ment Engineer with Digimarc Corporation, Beaver-
ton, OR, USA. He received the Ph.D. degree from
the Department of Electrical and Computer Engi-
neering, Binghamton University, Binghamton, NY,
USA, in 2014. The main focus of his disserta-
tion was on steganalysis and steganography. He
received the M.S. degree in software engineering
from Czech Technical University in Prague, Prague,
Czech Republic, in 2010.
Jessica Fridrich (M’05) is currently a Professor of
Electrical and Computer Engineering with Bingham-
ton University, Binghamton, NY, USA. She received
the Ph.D. degree in systems science from Bing-
hamton University, in 1995, and the M.S. degree
in applied mathematics from Czech Technical Uni-
versity, Prague, Czech Republic, in 1987. Her main
interests are in steganography, steganalysis, digi-
tal watermarking, and digital image forensics. Her
research work has been generously supported by the
U.S. Air Force and the Air Force Office of Scientific
Research. Since 1995, she has received 19 research grants totaling over $9
million for projects on data embedding and steganalysis that lead to over
160 papers and seven U.S. patents. She is a member of the Association for
Computing Machinery.

Low complexity features for jpeg steganalysis using undecimated dct

More Related Content

What's hot (20)

Viewers also liked (15)

Similar to Low complexity features for jpeg steganalysis using undecimated dct (20)

More from Pvrtechnologies Nellore (20)

Recently uploaded (20)

Low complexity features for jpeg steganalysis using undecimated dct