Image Processing Computer Vision and Pattern Recognition Hamid R. Arabnia

Image Processing Computer Vision and Pattern
Recognition Hamid R. Arabnia download
https://textbookfull.com/product/image-processing-computer-
vision-and-pattern-recognition-hamid-r-arabnia/
Download more ebook from https://textbookfull.com

We believe these products will be a great fit for you. Click
the link to download now, or visit textbookfull.com
to discover even more!
Computer Vision Pattern Recognition Image Processing
and Graphics Renu Rameshan
https://textbookfull.com/product/computer-vision-pattern-
recognition-image-processing-and-graphics-renu-rameshan/
Energy Minimization Methods in Computer Vision and
Pattern Recognition Marcello Pelillo
https://textbookfull.com/product/energy-minimization-methods-in-
computer-vision-and-pattern-recognition-marcello-pelillo/
Handbook Of Pattern Recognition And Computer Vision 6th
Edition C H Chen
https://textbookfull.com/product/handbook-of-pattern-recognition-
and-computer-vision-6th-edition-c-h-chen/
Image Processing and Computer Vision in iOS Oge Marques
https://textbookfull.com/product/image-processing-and-computer-
vision-in-ios-oge-marques/

Image Processing and Pattern Recognition Based on
Parallel Shift Technology First Edition Bilan
https://textbookfull.com/product/image-processing-and-pattern-
recognition-based-on-parallel-shift-technology-first-edition-
bilan/
Practical Machine Learning and Image Processing: For
Facial Recognition, Object Detection, and Pattern
Recognition Using Python Himanshu Singh
https://textbookfull.com/product/practical-machine-learning-and-
image-processing-for-facial-recognition-object-detection-and-
pattern-recognition-using-python-himanshu-singh/
Nonlinear Eigenproblems in Image Processing and
Computer Vision Guy Gilboa
https://textbookfull.com/product/nonlinear-eigenproblems-in-
image-processing-and-computer-vision-guy-gilboa/
Feature extraction and image processing for computer
vision Fourth Edition Aguado
https://textbookfull.com/product/feature-extraction-and-image-
processing-for-computer-vision-fourth-edition-aguado/
Computer Vision and Image Analysis Digital Image
Processing and Analysis 4th Edition Scott E Umbaugh
https://textbookfull.com/product/computer-vision-and-image-
analysis-digital-image-processing-and-analysis-4th-edition-scott-
e-umbaugh/

P u b l i c a t i o n o f t h e 2 0 1 8 W o r l d C o n g r e s s i n C o m p u t e r S c i e n c e ,
C o m p u t e r E n g i n e e r i n g , & A p p l i e d C o m p u t i n g ( C S C E ’ 1 8 )
J u l y 3 0 - A u g u s t 0 2 , 2 0 1 8 | L a s V e g a s , N e v a d a , U S A
h t t p s : / / a m e r i c a n c s e . o r g / e v e n t s / c s c e 2 0 1 8
Copyright © 2018 CSREA Press
IPCV’18
PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON
IMAGE PROCESSING, COMPUTER VISION, & PATTERN
RECOGNITION
Editors
Hamid R. Arabnia
Leonidas Deligiannidis, Fernando G. Tinetti

American Council on Science and Education (ACSE)
Copyright © 2018 CSREA Press
ISBN: 1-60132-485-5
Printed in the United States of America
https://americancse.org/events/csce2018/proceedings
This volume contains papers presented at the 2018 International Conference on Image Processing, Computer
Vision, & Pattern Recognition. Their inclusion in this publication does not necessarily constitute endorsements
by editors or by the publisher.
Copyright and Reprint Permission
Copying without a fee is permitted provided that the copies are not made or distributed for direct
commercial advantage, and credit to source is given. Abstracting is permitted with credit to the source.
Please contact the publisher for other copying, reprint, or republication permission.

Foreword
It gives us great pleasure to introduce this collection of papers to be presented at the 2018 International
Conference on Image Processing, Computer Vision, & Pattern Recognition (IPCV’18), July 30 – August 2,
2018, at Luxor Hotel (a property of MGM Resorts International), Las Vegas, USA.
An important mission of the World Congress in Computer Science, Computer Engineering, and Applied
Computing, CSCE (a federated congress to which this conference is affiliated with) includes "Providing a
unique platform for a diverse community of constituents composed of scholars, researchers, developers,
educators, and practitioners. The Congress makes concerted effort to reach out to participants affiliated
with diverse entities (such as: universities, institutions, corporations, government agencies, and research
centers/labs) from all over the world. The congress also attempts to connect participants from institutions
that have teaching as their main mission with those who are affiliated with institutions that have research
as their main mission. The congress uses a quota system to achieve its institution and geography diversity
objectives." By any definition of diversity, this congress is among the most diverse scientific meeting in
USA. We are proud to report that this federated congress has authors and participants from 67 different
nations representing variety of personal and scientific experiences that arise from differences in culture and
values. As can be seen (see below), the program committee of this conference as well as the program
committee of all other tracks of the federated congress are as diverse as its authors and participants.
The program committee would like to thank all those who submitted papers for consideration. About 58%
of the submissions were from outside the United States. Each submitted paper was peer-reviewed by two
experts in the field for originality, significance, clarity, impact, and soundness. In cases of contradictory
recommendations, a member of the conference program committee was charged to make the final decision;
often, this involved seeking help from additional referees. In addition, papers whose authors included a
member of the conference program committee were evaluated using the double-blinded review process.
One exception to the above evaluation process was for papers that were submitted directly to
chairs/organizers of pre-approved sessions/workshops; in these cases, the chairs/organizers were
responsible for the evaluation of such submissions. The overall paper acceptance rate for regular papers
was 23%; 15% of the remaining papers were accepted as poster papers (at the time of this writing, we had
not yet received the acceptance rate for a couple of individual tracks.)
We are very grateful to the many colleagues who offered their services in organizing the conference. In
particular, we would like to thank the members of Program Committee of IPCV’18, members of the
congress Steering Committee, and members of the committees of federated congress tracks that have topics
within the scope of IPCV. Many individuals listed below, will be requested after the conference to provide
their expertise and services for selecting papers for publication (extended versions) in journal special
issues as well as for publication in a set of research books (to be prepared for publishers including:
Springer, Elsevier, BMC journals, and others).
• Prof. Nizar Al-Holou (Congress Steering Committee); Professor and Chair, Electrical and Computer
Engineering Department; Vice Chair, IEEE/SEM-Computer Chapter; University of Detroit Mercy, Detroit,
Michigan, USA
• Prof. Hamid R. Arabnia (Congress Steering Committee); Graduate Program Director (PhD, MS, MAMS);
The University of Georgia, USA; Editor-in-Chief, Journal of Supercomputing (Springer); Fellow, Center of
Excellence in Terrorism, Resilience, Intelligence & Organized Crime Research (CENTRIC).
• Prof. Dr. Juan-Vicente Capella-Hernandez; Universitat Politecnica de Valencia (UPV), Department of
Computer Engineering (DISCA), Valencia, Spain
• Prof. Juan Jose Martinez Castillo; Director, The Acantelys Alan Turing Nikola Tesla Research Group and
GIPEB, Universidad Nacional Abierta, Venezuela
• Prof. Kevin Daimi (Congress Steering Committee); Director, Computer Science and Software Engineering
Programs, Department of Mathematics, Computer Science and Software Engineering, University of Detroit
Mercy, Detroit, Michigan, USA
• Prof. Zhangisina Gulnur Davletzhanovna; Vice-rector of the Science, Central-Asian University, Kazakhstan,
Almaty, Republic of Kazakhstan; Vice President of International Academy of Informatization, Kazskhstan,
Almaty, Republic of Kazakhstan

• Prof. Leonidas Deligiannidis (Congress Steering Committee); Department of Computer Information Systems,
Wentworth Institute of Technology, Boston, Massachusetts, USA; Visiting Professor, MIT, USA
• Dr. Trung Duong; Research Faculty at Center for Advanced Infrastructure and Transportation (CAIT),
Rutgers University, the State University of New Jersey, New Jersey, USA
• Prof. Mary Mehrnoosh Eshaghian-Wilner (Congress Steering Committee); Professor of Engineering
Practice, University of Southern California, California, USA; Adjunct Professor, Electrical Engineering,
University of California Los Angeles, Los Angeles (UCLA), California, USA
• Prof. Byung-Gyu Kim (Congress Steering Committee); Multimedia Processing Communications
Lab.(MPCL), Department of Computer Science and Engineering, College of Engineering, SunMoon
University, South Korea
• Prof. Dr. Guoming Lai; Computer Science and Technology, Sun Yat-Sen University, Guangzhou, P. R. China
• Prof. Hyo Jong Lee; Director, Center for Advanced Image and Information Technology, Division of
Computer Science and Engineering, Chonbuk National University, South Korea
• Dr. Muhammad Naufal Bin Mansor; Faculty of Engineering Technology, Department of Electrical,
Universiti Malaysia Perlis (UniMAP), Perlis, Malaysia
• Dr. Andrew Marsh (Congress Steering Committee); CEO, HoIP Telecom Ltd (Healthcare over Internet
Protocol), UK; Secretary General of World Academy of BioMedical Sciences and Technologies (WABT) a
UNESCO NGO, The United Nations
• Prof. Aree Ali Mohammed; Head, Computer Science Department, University of Sulaimani, Kurdistan Region,
Iraq
• Prof. Dr., Eng. Robert Ehimen Okonigene (Congress Steering Committee); Department of Electrical &
Electronics Engineering, Faculty of Engineering and Technology, Ambrose Alli University, Nigeria
• Prof. James J. (Jong Hyuk) Park (Congress Steering Committee); Department of Computer Science and
Engineering (DCSE), SeoulTech, Korea; President, FTRA, EiC, HCIS Springer, JoC, IJITCC; Head of
DCSE, SeoulTech, Korea
• Prof. Dr. R. Ponalagusamy; Department of Mathematics, National Institute of Technology, India
• Dr. Akash Singh (Congress Steering Committee); IBM Corporation, Sacramento, California, USA;
Chartered Scientist, Science Council, UK; Fellow, British Computer Society; Member, Senior IEEE, AACR,
AAAS, and AAAI; IBM Corporation, USA
• Ashu M. G. Solo (Publicity), Fellow of British Computer Society, Principal/R&D Engineer, Maverick
Technologies America Inc.
• Prof. Dr. Ir. Sim Kok Swee; Fellow, IEM; Senior Member, IEEE; Faculty of Engineering and Technology,
Multimedia University, Melaka, Malaysia
• Dr. Ahmad P. Tafti (Vice Chair, Technical Tracks); Associate Research Scientist, Biomedical Informatics
Research Center, Marshfield Clinic Research Institute, Marshfield, WI, USA
• Prof. Fernando G. Tinetti (Congress Steering Committee); School of Computer Science, Universidad
Nacional de La Plata, La Plata, Argentina; also at Comision Investigaciones Cientificas de la Prov. de Bs.
As., Argentina
• Prof. Hahanov Vladimir (Congress Steering Committee); Vice Rector, and Dean of the Computer
Engineering Faculty, Kharkov National University of Radio Electronics, Ukraine and Professor of Design
Automation Department, Computer Engineering Faculty, Kharkov; IEEE Computer Society Golden Core
Member; National University of Radio Electronics, Ukraine
• Dr. Haoxiang Harry Wang (CSCE); Cornell University, Ithaca, New York, USA; Founder and Director,
GoPerception Laboratory, New York, USA
• Prof. Shiuh-Jeng Wang (Congress Steering Committee); Director of Information Cryptology and
Construction Laboratory (ICCL) and Director of Chinese Cryptology and Information Security Association
(CCISA); Department of Information Management, Central Police University, Taoyuan, Taiwan; Guest Ed.,
IEEE Journal on Selected Areas in Communications.
• Prof. Layne T. Watson (Congress Steering Committee); Fellow of IEEE; Fellow of The National Institute of
Aerospace; Professor of Computer Science, Mathematics, and Aerospace and Ocean Engineering, Virginia
Polytechnic Institute & State University, Blacksburg, Virginia, USA
• Prof. Jane You (Congress Steering Committee); Associate Head, Department of Computing, The Hong Kong
Polytechnic University, Kowloon, Hong Kong
We would like to extend our appreciation to the referees, the members of the program committees of
individual sessions, tracks, and workshops; their names do not appear in this document; they are listed on
the web sites of individual tracks.
As Sponsors-at-large, partners, and/or organizers each of the followings (separated by semicolons)
provided help for at least one track of the Congress: Computer Science Research, Education, and

Applications Press (CSREA); US Chapter of World Academy of Science; American Council on Science &
Education & Federated Research Council (http://www.americancse.org/). In addition, a number of
university faculty members and their staff (names appear on the cover of the set of proceedings), several
publishers of computer science and computer engineering books and journals, chapters and/or task forces of
computer science associations/organizations from 3 regions, and developers of high-performance machines
and systems provided significant help in organizing the conference as well as providing some resources.
We are grateful to them all.
We express our gratitude to keynote, invited, and individual conference/tracks and tutorial speakers - the
list of speakers appears on the conference web site. We would also like to thank the followings: UCMSS
(Universal Conference Management Systems & Support, California, USA) for managing all aspects of the
conference; Dr. Tim Field of APC for coordinating and managing the printing of the proceedings; and the
staff of Luxor Hotel (Convention department) at Las Vegas for the professional service they provided. Last
but not least, we would like to thank the Co-Editors of IPCV’18: Prof. Hamid R. Arabnia, Prof. Leonidas
Deligiannidis, Prof. Fernando G. Tinetti.
We present the proceedings of IPCV’18.
Steering Committee, 2018
http://americancse.org/

Image Processing Computer Vision and Pattern Recognition Hamid R. Arabnia

Contents
SESSION: COMPUTER VISION, RECOGNITION AND DETECTION SYSTEMS +
APPLICATIONS
A Preliminary Evaluation of Pedestrian Detection on Real-World Video Surveillance 3
Gustavo R. Valiati, David Menotti
Convolutional Neural Network- based Human Recognition for Vision Occupancy Sensors 10
Seung Soo Lee, Manbae Kim
2D and 3D Face Analysis for Ticketless Rail Travel 16
Lyndon Smith, Wenhao Zhang, Melvyn L. Smith
Metrics for Complete Evaluation of OCR Performance 23
Romain Karpinski, Devashish Lohani, Abdel Belaid
Towards Creation of a Curl Pattern Recognition System 30
Kymberlee Hill, Adesola Abimbola, Prajjwhal Danagl, Gloria Washington, Legand Burge
Named Entity Recognition by Neural Prediction 34
Nouha Hammadi, Abdel Belaid, Yolande Belaid
Computer-Aided Inspection of Optical Components Using Computer Vision Technologies 41
Hong-Dar Lin, Hsing-Lun Chen
Intelligent Computer Vision Tracking and Embedded Microcontroller in the Sporting Domain 45
Bassam Shaer, Timothy Stewart
SESSION: IMAGE PROCESSING + FEATURE EXTRACTION AND
CLASSIFICATION AND APPLICATIONS
MRI Brain Tumor Segmentation 53
Anod Alhazmi, Sudhanshu Semwal
A Novel Modality of Compton Scattering Tomography, Image Formation and Reconstruction 60
Cecilia Tarpau, Mai K. Nguyen
A Haar-Cascade classifier based Smart Parking System 66
Praveen Meduri, Eric Telles
Automating the Segmentation of Necrotized Regions in Cassava Root Images 71
Flavia Delmira Ninsiima, Godliver Owomugisha, Ernest Mwebaze
Spatial Location of 3D Reconstructed Subthalamic Nuclei 78
Sihai Tang, Rajan Alex, Shijun Tang

SESSION: IMAGING SCIENCE AND NOVEL APPLICATIONS
Software Development for Unsupervised Approach to Identification of a Multi Temporal
Spatial Analysis Model
85
Mauro Mazzei
Applying Genetic Algorithms to Generating Paintings 92
Alexander Hansen, Mark C. Lewis
Encrypting ImageNet with Chaotic Logistic Maps and AES in ECB Mode 99
Pablo Rivas-Perea, Patrick Handley, Raul Aragon Franco
Literature-Based Biomedical Image Retrieval with Multimodal Query Expansion and Data
Fusion Based on Relevance Feedback (RF)
103
Md Mahmudur Rahman
Decision Making and Decision Fusion Using Grey Relational Analysis in Layered Sensing
Systems
108
Bakhita Salman, Mohammed I. Thanoon, Saleh Zein-Sabatto, Fenghui Yao
Forensic Data Collection Techniques for Fingerprint Identification 113
Ifeoma Ugochi Ohaeri, Micheal Bukohwo Esiefarienhe
Arabic Handwritten Characters Recognition Using Support Vector Machine, Normalized
Central Moments, and Local Binary Patterns
121
Hassan Althobaiti, Chao Lu
Non-dominant Object Recognition using Convolutional Neural Networks 128
Lei Huang, Daniel Oleas
A Comparative Study of Autoencoders against Adversarial Attacks 132
I-Ting Chen, Birsen Sirkeci-Mergen
Application of Machine Learning Technique to Nowcast Severe Atmospheric Event 137
Himadri Chakrabarty (Bhattacharyya), Sonia Bhattacharya
SESSION: POSTER PAPERS AND EXTENDED ABSTRACTS
CNN-Based Brain Tumor Segmentation and 3D Visualization 145
Jae-Hong So, Boo-Kyeong Choi, Nuwan Madusanka Kaluwa Hewage, Heung-Kook Choi
Visualization of Hippocampus Regional Atrophy in Alzheimer's Disease 147
Nuwan Madusanka Kaluwa Hewage, Jae-Hong So, Boo-Kyeong Choi, Heung-Kook Choi

151
SESSION: LATE BREAKING PAPERS
On Image Registration for Study of Thyroid Disorders by Infrared Exams
Jose R Gonzelez Montero, Yanexis Pupo Toledo, Maira B Hernandez Moran, Aura Conci, Trueman
Machenry, Wilian G Fiirst
159
A Tool for Detection and Analysis of a Human Face for Aesthetical Quality Using Mobile
Devices
Shuyi Zhao, Scott King, Dulal Kar
165
Automated Cell Nuclei Segmentation in Overlapping Cervical Images Using Deep Learning
Model
Arti Taneja, Priya Ranjan, Amit Ujlayan
173
High Dynamic Range Video Processing Using Temporal Bidirectional Similarity and
Coherency Sensitive Hashing
Kung-Hsuan Lo, Jin-Jang Leou
177
Edge Detection and Dominant Color Masking of Thermal Imagery Data Sets
Katelyn Koiner, Andrew Rosener, Prakash Ranganathan
182
An Efficient Clustering for Epipolar Geometry based MOD
Yeongyu Choi, Ju H. Park, Ho-Youl Jung
186
Semantic labelling and Instance Segmentation using Weakly Supervised Convolutional Neural
Network
Abdul Wahid, Hyo Jong Lee

SESSION
COMPUTER VISION, RECOGNITION AND
DETECTION SYSTEMS + APPLICATIONS
Chair(s)
TBA
Int'l Conf. IP, Comp. Vision, and Pattern Recognition | IPCV'18 | 1
ISBN: 1-60132-485-5, CSREA Press ©

2 Int'l Conf. IP, Comp. Vision, and Pattern Recognition | IPCV'18 |

A Preliminary Evaluation of Pedestrian Detection on Real-World
Video Surveillance
Gustavo R. Valiati, David Menotti
Department of Informatics, Federal University of Paraná (UFPR), Curitiba, Paraná, Brazil
{grvaliati,menotti}@inf.ufpr.br
Abstract— Pedestrian detection has been shown as an
important task for many different real-world application
areas such as robotics, human behavior analysis, assistive
technology, autonomous vehicles, health care and public
security. In this work, we evaluate the performance of well-
known object/pedestrian detectors on our introduced real-
world surveillance dataset composed of 7,927 frames from
more than 20 video cameras with 28,510 annotated pedes-
trian bounding boxes. Our hypothesis is that the previous
evaluations of such detectors do not reflect their performance
in real-world scenarios. We theorize that the commonly eval-
uated datasets are visually different from practical surveil-
lance environments. Our experiments showed that all the
evaluated models have their performance decreased when
running in our dataset. The results are compared with those
achieved in the PASCAL VOC challenge. Outperforming
the Faster R-CNN and SSD detectors, YOLOv2 attained
better accuracy and speed, when using default models and
configurations in all of them. We also discuss some future
directions from this preliminary work.
Keywords: Pedestrian Detection; Video Surveillance; Convolu-
tional Neural Networks
1. Introduction
The pedestrian detection task refers to the ability of
locating people in images and videos [39], being a popular
topic in Computer Vision [42]. The task ended up being
a key issue [7], [17] for several applications [3], [6], [9],
[19], [24], [42] in real-world tasks like surveillance, human
behavior analysis, driving assistance and safety, autonomous
vehicles, gait recognition, assistive technology for elderly
and disabled, robotics, content indexing (images and video),
advanced human interfaces, biometry, and others. Because
of a variety of important applications and its potential to
positively impact the quality of life [7], it has received great
attention in the last decade [7], [24].
In this work, we focus on video surveillance systems,
which play an essential role in maintaining public se-
curity [6], [9], [41]. It is worth noting that more than
80% of this kind of systems are used for monitoring
people [25]. In video surveillance, the pedestrian detection
task has many applications such as danger understanding
and recognition [19], [41], risky behavior and violence
detection [17], [20], detection of hand-held weapons [21],
person re-identification [45], person tracking [13], etc.
There is an increasing demand for these tasks, probably
due to the growing need for public safety and the widespread
of surveillance camera networks [40], [45]. As pedestrian
detection is a vital step for those tasks and because it is
impractical to use human labor to handle it [45], automatic
detection of people in video security monitoring is desired.
Pedestrian detection is considered one of the most chal-
lenging problems in computer vision [37]. It has been tackled
with scaling complex solutions which use compute-intensive
methods [17]. It seems to be difficult due to the high
articulation of human body [36].
The most recent successful object detectors and image
classifiers use Convolutional Neural Networks (CNNs) to
achieve state-of-the-art in many well-known challenges and
datasets [8], [16], [41], [42], [45], and demonstrate higher
accuracy over traditional methods such as Viola & Jones and
Deformable Part Model (DPM) [16]. Faster R-CNN [34],
Single Shot Multibox Detector (SSD) [23] and YOLOv2 [32]
have been considered key-works for object and pedestrian
detection.
After presenting the importance of pedestrian detection
for public security in video surveillance, we raise the main
question: How well the current best detectors maintain the
achieved accuracy and speed from tested datasets when
compared to a very realistic dataset composed of challenging
monitoring images, which qualitatively we find out to be very
different of the current evaluated scenarios?
Our questioning is reinforced by the authors of [42] who
claim that the large progress done in pedestrian detection
benchmarks might not translate in terms of performance
on non-ruled scenarios, for instance on real-world scenar-
ios [28]. We summarize this by saying that the literature has
been developing and evaluating solutions [17], [24], [25],
[40], [42] for pedestrian detection basically in more con-
trolled, stable and well-known datasets such as INRIA [4],
ETH [10], Caltech Pedestrians [6], TUD-Brussels [38],
KITTY [2], PETS2009 [11], PNN-Parking-Lot2/Pizza [5],
[35], etc. Those datasets are suitable for comparison among
different works, however simultaneously, they do not seem to
represent an overall video surveillance application scenario.

In this paper, we start to address this question by in-
troducing a new dataset and using it to evaluate the most
successful pedestrian detectors that we have found in the
literature so that we can measure the performance variation
in our dataset. To keep this work more concise, we consider
the following papers: Faster R-CNN [34], SSD [23], and
YOLOv2 [32]. The introduced dataset is called “PTI01
Pedestrian Dataset (PTI01)” and is built from a small subset
from a real and large video surveillance circuit, considered
by us as complementary to others presented in the literature.
The remainder of this paper is organized as follows.
Section 2 briefly surveys some of the works found in
the literature that tackles pedestrian detection and video
surveillance. In Section 3, we give more details about our
dataset and also about the evaluation methodology. The
three evaluated detectors are summarized in Section 4. The
experiments are presented in Section 5. We briefly discuss
future directions in Section 6. Finally, we conclude our work
in Section 7.
2. Related work
In this paper, we discuss pedestrian detection in video
surveillance scenarios. To address that, we have reviewed
the general ideas of previous and recent works that handle
one or more aspects of our discussion.
Granger et al. [16] applies YOLOv2 [32] and SSD [23]
to face and head detections, as a sub-task of pedestrian
detection. In their experiments, they find out that those
frameworks do perform well in terms of speed and could
potentially fit in real-time industrial applications. The au-
thors also conclude that despite some detectors reaching
top accuracy over public datasets, some of them are not
suitable for deployment in many real-time video surveillance
applications, especially if they use high-resolution images.
Many works consider the usage of the YOLOv2 detector.
Molchanov et al. [25] tackles the YOLOv2’s low accuracy
for grouped small objects. Alimuin et al. [1] successfully
integrated their own modified version of YOLOv2 called
YOLO SNN with a surveillance system. In other work,
Peng et al. [30] evaluate the results from YOLOv2 in
combination with the ones from a Gaussian Mixture Model,
to monitor a Transform Substation for the pedestrian and
equipment safety. In [43], Zhang et al. propose a Storm
based distributed real-time computation platform, running
the YOLO (v1) [31] as the object detector on a embed-
ded platform built with NVIDIA Jetson. Han et al. [18]
shows that their integration of the fine-tuned YOLOv2 [32]
with their proposed two-stream video classification model
achieved better results over the baseline model, for human
action localization in the respective tested dataset. In the
study [20], Ko et al. introduce an autonomous method
based in the YOLOv2 to detect abnormal human behavior
in real-time, which, according to the authors, is applicable
to smart surveillance/monitor systems. In [26], Nguyen et
al. proposed a real-time pedestrian detector using images
from fisheye cameras based on the YOLO model, with
additional information coming from a Adaptive Gaussian
Mixture Model (AGMM). The authors suggested it could
be used in embedded applications.
There are works that use the Faster R-CNN [34] detector.
Zhang et al. [42] propose some CNN adaptations on Faster
R-CNN, using their introduced pedestrian dataset (CityPer-
sons) for additional training, and then achieving state-of-the-
art on the Caltech Pedestrian dataset [6]. In [29], Papandreou
et al. proposed a method for multi-person detection and 2-D
pose estimation, using Faster R-CNN to predict the location
of people. Zhang et al. [41] investigates the Faster R-CNN
as pedestrian detector proposing the use of Region Proposal
Network (RPN) followed by boosted forests on feature maps
achieving competitive results on the evaluated datasets.
In the literature, other techniques are also proposed for
pedestrian detection. In [24], Mao et al. introduce a new
architecture named HyperLearner, for evaluating the use of
extra features into CNN, more specifically the Faster R-
CNN [34], for pedestrian detection. Ye et al. [40] elaborates
the evaluation of five "real-world" datasets built from surveil-
lance cameras. The tested datasets are mostly related to
outdoor video monitoring and some are from high-resolution
cameras.
Among the reviewed works, we have not seen a study that
considers the application of any state-of-the-art pedestrian
detector for real-world video monitoring circuits similar
to our introduced dataset. What we have apprehended is
that the evaluation of detectors are based on datasets that
either are not from video surveillance circuits (e.g., Caltech
Pedestrian [6] and KITTI [2]) made with a camera on
a moving car, or are significantly visually different from
our proposed dataset (e.g., PETS2009 [11]) made by very
similar outdoor scenes. Through those facts, we explore
the conjectural dissimilarity from our dataset to others, by
evaluating well-known state-of-the-art detectors.
3. Benchmark
To answer our main question we introduce a new and
complementary dataset. Its details and some evaluation
methodology are discussed in this section.
3.1 Dataset
The introduced “PTI01” dataset aims to provide images
with high similarity to real video surveillance environ-
ments. Current datasets such as Caltech Pedestrian [6] and
MARS [44] fulfill many useful aspects such as image
quantity (for deep learning training), and annotations for
tracking and identity. However, they differ from common
video surveillance circuits once these have very high number
of cameras, low-resolution video, low Frames per Second
(FPS), uncommon angling of cameras, scenario’s appearance
variation due to day/night and weather changes, and finally

Fig. 1: PTI01 dataset samples. White masks are applied to
protect people’s identities. Better visualized in color.
because of the different camera’s hardware which generates
images of distinct qualities. Evaluating detectors in common
datasets from the literature would not be so meaningful
when investigating how well those techniques really work
in real applications [16]. We also understand that every
new dataset introduced represents an opportunity to evaluate
many available techniques.
Our dataset is built with images of a real video surveil-
lance circuit from the Itaipu Technological Park (PTI),
Brazil1
(samples available in Fig. 1). The circuit has more
than 250 cameras, monitoring about six thousand active
individuals. Due to the huge number of cameras and also
because they are active 24 hours per day, the video recording
is done in ~2,5 frames per second. The PTI01, in its
current version (v1), has 7,927 frames with 28,510 bounding
boxes (pedestrian labeled detections), being equivalent to
~52 minutes of video in a resolution of 640 × 480 pixels
from more than 20 different cameras. A significant dataset
annotation was done by the first author and the remainder by
an on-line platform. For the manual labeling, an annotation
tool was modified and made publicly available2
.
In contrast to the huge MARS dataset [44] that provides
cropped images of the pedestrian bounding boxes, the PTI01,
instead, offers the original frames exactly as they are stored
in the real video monitoring dataset managed by the Zone-
Minder system3
. No changes or pre-processing are done,
1www.pit.org.br/en
2github.com/gustavovaliati/BBox-Label-Tool
3zoneminder.com
keeping as realistic as possible for training and evaluation.
It is important to have a dataset that better represents
a real video-monitoring environment, so that it becomes
more challenging and coherent. In next versions of PTI01,
tracklets and identities are going to be added to the anno-
tations so that tasks such as tracking and re-identification
can also be evaluated. In addition, more frames, cameras
and new identities will be added enabling better training for
deep learning, also increasing image/scenario diversity and
making the dataset even more challenging.
Currently, the dataset is available through formal requests4
and is under PTI’s terms.
3.2 Protocols and Measures
Instead of implementing our own versions of each wanted
framework, two open-source libraries are used. They contain
native implementations of the needed models. For the eval-
uation of SSD [23] (300 and 512) and Faster R-CNN [34],
the library chainercv [27] is utilized. The darkflow5
is the
chosen one for YOLOv2 [32] reproduction. Both libraries
are written in Python which enables easier and faster im-
plementations for evaluation purposes, instead of having
to build wrappers around heterogeneous technologies. Even
knowing that the Python implementation would give a much
lower FPS for the models, all of them would then be under
the similar circumstances which are fairer than comparing
them over very performance discrepant languages, such as
C (e.g. darknet6
) and Python. To accurately compare the
speed of the models, they would need to be implemented
by the same framework and executed in the same hardware
configuration. In terms of speed, for now, we develop a basic
comparison.
For the metrics, instead of manually implementing them
according to its raw definitions, we also have chosen to
use an implementation of the PASCAL VOC challenge7
,
available in the chainercv library. Our evaluation script is
publicly available8
.
4. The studied frameworks
For this experiment, three of the most notorious recent
works in object detection (of our knowledge) have been
chosen: Faster R-CNN [34], SSD [23] and YOLOv2 [32].
These frameworks fill up a mandatory requirement in our
paper: they are reproducible. The selected frameworks have
their codes publicly available, being possible to, at least,
replicate the achieved results. Evaluating these works in
such scenario may be enough for a basic comprehension
of the pedestrian detection effectiveness for video security
4Requests can be sent to: eiitec@pti.org.br
5github.com/thtrieu/darkflow
6pjreddie.com/darknet
7host.robots.ox.ac.uk/pascal/VOC
8github.com/gustavovaliati/pti01eval

monitoring. In the next subsections, those works are going to
be briefly reviewed, providing a general idea of their aspects.
4.1 Faster R-CNN
Faster R-CNN [34] is the third version of its own se-
quence, and has been highly referenced in the literature and
became basis of many other works.
It works by choosing regions of interest in the image and
classifying them according to known categories, using CNN
as the feature extractor. The idea is the same since its first
version (R-CNN [15]), when they have used Selective Search
for region proposals. The Faster R-CNN is composed by
two main modules: 1) a RPN to efficiently choose relevant
regions, replacing Selective Search; and 2) a neural network
that classifies the proposed regions, corresponding to the
same network introduced in the previous work (Fast R-
CNN [14]).
In the third version (current), it mostly relies on the RPN.
It scans every location of the image by defining regions
(anchor boxes) and ranks them according to their probability
of containing objects. While the images pass through the
convolutional network, the anchors are generated by the RPN
by sliding a window of 3x3 over a feature map in the high
layers. The network outputs through regression the bounding
box offsets and through classification outputs the confidence
scores for a possible object presence. Only regions with high
confidence continue the processing, reducing the computa-
tional cost and making the network faster.
4.2 SSD
The SSD [23] method achieved better accuracy and speed
over Faster R-CNN [34] and YOLOv1 [31], requiring less
processing power. According to [23], its architecture make
it easier to train and to be used by other systems.
Liu et al. [23] affirms that the state-of-the-art object
detectors were based on the idea of creating box proposals,
getting features from these boxes and finally inserting them
into a high quality classifier. For them, this kind of approach
is too computationally expensive for real-time applications
or low-power processing systems (embedded).
Instead of computing those box proposals and any conse-
quent needed stages (feature/pixel re-sampling), SSD uses a
set of default boxes with different aspect ratios and scales
per feature map location. Then, it generates scores for each
class in every default box, and produces adjustments to fit
the object shape. Also, it combines predictions from multiple
feature maps with different resolutions, being able to better
detect object of many different sizes.
4.3 YOLOv2
YOLOv2 [32] is the second version of the YOLO [31], a
real-time object detection system. It has defined the state of
the art in challenges such as PASCAL VOC and COCO9
. In
9cocodataset.org
addition of having top accuracy, YOLOv2 is known by being
fast and accurate. It has shown better results when compared
with other challenging frameworks such as SSD [23] and
Faster R-CNN [34], and is much faster. According to Red-
mon et al. [32], the model has been projected thinking on
low latency processing, for applications such as robotics and
autonomous vehicles.
The way YOLOv2 learns is by simultaneously training
detection and classification. In [32], the detection training
was executed in the COCO dataset and classification in the
ImageNet10
dataset. Through an hierarchical composition
by the combination of annotations from both COCO and
ImageNet datasets, YOLOv2 was able to learn how to detect
more than 9000 classes of objects. Redmon et al. used
this approach to use a great amount of data available from
classification datasets. The detection datasets are normally
much smaller.
When predicting, YOLOv2 divides the images into grids
of cells and sets bounding boxes along with confidence
scores related to the likelihood that they have a known
object class. The boxes with better confidence and from the
same class are merged, corresponding to the location of the
objects. YOLOv2 does all the computation in a single pass
in its CNN.
5. Experiments
The focus of our experimentation is to evaluate the per-
formance of some state-of-the-art techniques for pedestrian
detection when applied to a very realistic environment. The
dataset PTI01 represents this kind of environment towards
video surveillance. It enables a complementary verification
of how efficient the best pedestrian detectors are in a real-
world scenario, and therefore how good they would be for
practical applications.
The evaluation consists in running the detectors with their
own pre-trained weights, without any significant modifica-
tions. These weights should represent the normal potential
of the optimized model for the application it have been
made for. As [7], we do not change them. Improvements
in the evaluated models to better fit the aimed application
(surveillance) are out of the scope of this paper, but are
suitable for future works.
For each detector, the “mean Average Precision” (mAP)
metric from PASCAL VOC is applied. First we collect
the results (mAP and FPS) reported in each model’s paper
which have been evaluated in PASCAL VOC 2007 test set.
That score gives us the general performance achieved by
the respective frameworks in a common and well-known
evaluating environment. Then we compare that with the
evaluation of the same models and the same configuration
on our introduced dataset. The results are shown in Table 1.
10image-net.org

Table 1: Evaluation results. The VOC results refer to training
on PASCAL VOC 2007+2012 and testing on PASCAL
VOC 2007 test set, and the VOC mAP metric corresponds
to results from respective papers. The PTI01 results are
achieved using Python frameworks as reported in Section 3.
Model Dataset
Metrics
mAP FPS
Faster R-CNN
VOC 73.2 7.0
PTI01 51.6 3.6
SSD 300 × 300
VOC 74.3 46
PTI01 55.9 4.9
SSD 512 × 512
VOC 76.8 19
PTI01 65.6 4.1
YOLOv2 544 × 544
VOC 78.6 40
PTI01 67.1 10.9
Fig. 2: Example of detections in the tested models. The
red bounding boxes are the predictions. Better visualized
in color.
Due to the score decreasing from VOC to PTI01, it is
very clear that our dataset represents a harder task when
compared to the PASCAL VOC 2007. The main reason
for that is probably related to the training set. The models
are the ones trained over PASCAL VOC 2007 (train) and
2012 (train + test). This means that the models did not
have the opportunity to train using images that are similar
to the test set. By consequence, this demonstrated that the
generalization power provided by the current configuration
is not so efficient when applied to the new dataset. As our
objective is to evaluate the current models in its default
configuration, we did not execute a full training or even a
fine-tuning using PTI01 samples. This prevented any attempt
to improve the detection quality. As expected, we conclude
that is relevant to evaluate PTI01 having its samples included
in the training phase to verify whether the performance
improves. We even propose that the models should be trained
from scratch on as many datasets as possible and using
PTI01 training slice for fine-tuning.
Another relevant conclusion is related to the individual
model’s performance. Table 1 shows that YOLOv2 [32] has
the best mAP as well the best FPS performances, which puts
the YOLOv2 in the lead between the compared frameworks.
This means that general further experiments could focus on
YOLOv2 if we are looking for improvements.
As can be verified in Fig. 2, the SSD [23] models
present a very high number of imprecise or miss-detections.
Apparently, this is related to threshold configuration. The
Fig. 3: Precision × Recall for each model.
experiments have been done using the chainercv library [27]
with the default configurations for the native implemented
models and with the same parameters used to achieve the
PASCAL VOC results. Therefore, we did not fine-tune any
of the evaluated models.
Another way to verify the best results obtained by
YOLOv2, is looking to Fig. 3 for the plotting of the
Precision × Recall relation for each model. The YOLOv2
plot shows that the model can sustain better the Precision
while maintaining the Recall. For example, when reaching
0.6 in Recall the YOLOv2 [32] still has high Precision (about
0.9), while the others do not even reach the Recall (Faster
R-CNN [34]) or heavily drop the precision (SSD 300 and
512 [23]).
6. Future works
As this paper is an introductory work, we are able to
discuss and propose some future directions.
To better evaluate the generalization power of a specific
model, we propose the experiment of training it from scratch
in datasets specific for pedestrian detection, then fine-tuning
over PTI01 training set. If the experiment shows significant
improvement in accuracy, the trained model (without the
fine-tuning) could be used as basis in future works instead
of the default one provided by the frameworks. In other ex-
periments or applications, a very simple fine-tuning could be
applied with any local dataset for the model’s specialization.
Instead of just measuring the evaluation performance by
the mAP and Precision × Recall plotting, we propose a
more complete metric report. This could include methods
from Dollar’s et al. [7] unified evaluation framework. As
well, would be important to have more statistics about the
PTI01 dataset, such as the ones also presented in [7], for
example, data about scale, occlusion and positions.

Fig. 4: YOLOv2 bad or miss detections on PTI01 dataset.
The green boxes are ground truths and the red ones are
predictions. White masks are applied to protect people’s
identities. Better visualized in color.
This work evaluates three of the most referenced works in
the literature: Faster R-CNN [34], SSD [23] and YOLO [32].
However there are newer versions of those (YOLOv3 [33],
DSSD [12]) and others with high performance (FPN
FRCN [22], Fused DNN [8]).
An evaluation over these would be important to find the
most efficient models, always considering reproducibility,
speed and accuracy.
We have seem that YOLOv2 performs better than the
other compared models. It is known that YOLOv2 is able to
detect many classes at once. A relevant experiment would
be to try specializing the YOLO’s network to detect only
pedestrians. Maybe the network gets simpler and improves,
at least, the computation speed which is important. Also the
training process could become faster.
Despite of showing the best performance, YOLOv2 has
room for improvements, as can be verified in Fig. 4. The
model still commits many mistakes confusing other objects
as pedestrians, defining various incorrect boxes at the same
person, missing people when they: appear in uncommon
angles; are carrying some objects; are occluded by other
people, bushes or the environment. This demonstrates that
the datasets used for the model training, probably, have a
less richer scene variety.
7. Conclusion
In this paper we studied Pedestrian Detection in Computer
Vision and its importance for the society, mainly related
to public security. Some works in video-surveillance were
briefly described as well as the most common evaluation
datasets and deep learning models related to the task. We
formulated a hypothesis defining that the current state-of-
the-art pedestrian detectors would probably not perform so
well in a more realistic dataset as they do in other datasets.
To support our claims, we designed experiments and intro-
duced a new dataset called “PTI01”. The dataset proved to be
more challenging than others known in the literature, when
taken into account the use of default models configurations
from other benchmarks, in our case, the PASCAL VOC
pedestrian detection.
Three of the most notorious object detectors present in the
literature have been used to evaluate our introduced dataset.
It was possible to verify that the default models did not
have enough generalization power for the direct application
in the new dataset, resulting in a worse performance when
compared to PASCAL VOC results. We confirmed that
YOLOv2 [32] performs better than the other evaluated
models in terms of accuracy and speed. We also discussed
and proposed some future works.
Acknowledgements
We are thankful to the Itaipu Technological Park for
providing the video surveillance images. We also gratefully
acknowledge the support of NVIDIA Corporation for the
donation of the Titan X GPU used in our experiments.
References
[1] Ryann Alimuin, Aldrich Guiron, and Elmer Dadios. Surveillance
systems integration for real time object identification using weighted
bounding single neural network. In Humanoid, Nanotechnology, In-
formation Technology, Communication and Control, Environment and
Management (HNICEM), 2017 IEEE 9th International Conference on,
pages 1–6. IEEE, 2017.
[2] Andreas Geiger and Philip Lenz and Raquel Urtasun. Are we ready
for Autonomous Driving? The KITTI Vision Benchmark Suite. In
Conference on Computer Vision and Pattern Recognition (CVPR),
2012.
[3] Artur Jordão Lima Correia. The Good, The Fast And The Better
Pedestrian Detector. 2016.
[4] Navneet Dalal and Bill Triggs. Histograms of oriented gradients for
human detection. In Computer Vision and Pattern Recognition, 2005.
CVPR 2005. IEEE Computer Society Conference on, volume 1, pages
886–893. IEEE, 2005.
[5] Afshin Dehghan, Shayan Modiri Assari, and Mubarak Shah. Gmmcp
tracker: Globally optimal generalized maximum multi clique problem
for multiple object tracking. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pages 4091–4099, 2015.
[6] Piotr Dollár, Christian Wojek, Bernt Schiele, and Pietro Perona.
Pedestrian detection: A benchmark. In Computer Vision and Pattern
Recognition, 2009. CVPR 2009. IEEE Conference on, pages 304–311.
IEEE, 2009.
[7] Piotr Dollar, Christian Wojek, Bernt Schiele, and Pietro Perona.
Pedestrian detection: An evaluation of the state of the art. IEEE
transactions on pattern analysis and machine intelligence, 34(4):743–
761, 2012.

[8] Xianzhi Du, Mostafa El-Khamy, Jungwon Lee, and Larry Davis.
Fused DNN: A deep neural network fusion approach to fast and robust
pedestrian detection. In Applications of Computer Vision (WACV),
2017 IEEE Winter Conference on, pages 953–961. IEEE, 2017.
[9] Lingyu Duan, Yihang Lou, Shiqi Wang, Wen Gao, and Yong Rui.
AI Oriented Large-Scale Video Management for Smart City: Tech-
nologies, Standards and Beyond. arXiv preprint arXiv:1712.01432,
2017.
[10] Andreas Ess, Bastian Leibe, and Luc Van Gool. Depth and appearance
for mobile scene analysis. In Computer Vision, 2007. ICCV 2007.
IEEE 11th International Conference on, pages 1–8. IEEE, 2007.
[11] J Ferryman and A Shahrokni. PETS2009: Dataset and challenge. In
Performance Evaluation of Tracking and Surveillance (PETS-Winter),
2009 Twelfth IEEE International Workshop on, pages 1–6. IEEE,
2009.
[12] Cheng-Yang Fu, Wei Liu, Ananth Ranga, Ambrish Tyagi, and Alexan-
der C. Berg. DSSD : Deconvolutional single shot detector. CoRR,
abs/1701.06659, 2017.
[13] Vandit Gajjar, Ayesha Gurnani, and Yash Khandhediya. Human
detection and tracking for video surveillance: A cognitive science
approach. arXiv preprint arXiv:1709.00726, 2017.
[14] Ross Girshick. Fast R-CNN. arXiv preprint arXiv:1504.08083, 2015.
[15] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik.
Rich feature hierarchies for accurate object detection and semantic
segmentation. In Proceedings of the IEEE conference on computer
vision and pattern recognition, pages 580–587, 2014.
[16] Eric Granger, Madhu Kiran, Louis-Antoine Blais-Morin, et al. A
comparison of CNN-based face and head detectors for real-time video
surveillance applications. In Image Processing Theory, Tools and
Applications (IPTA), 2017 Seventh International Conference on, pages
1–7. IEEE, 2017.
[17] Zhonghua Guo, Fengjie Wu, Haishan Chen, Junying Yuan, and
Canzeng Cai. Pedestrian violence detection based on optical flow
energy characteristics. In Systems and Informatics (ICSAI), 2017 4th
International Conference on, pages 1261–1265. IEEE, 2017.
[18] Charles Han, Chao Wang, and Evelyn Mei. Yolo-based adaptive win-
dow two-stream convolutional neural network for video classification.
[19] Shiyu Huang and Deva Ramanan. Expecting the Unexpected: Training
Detectors for Unusual Pedestrians with Adversarial Imposters. In
The IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), volume 1, 2017.
[20] Kwang-Eun Ko and Kwee-Bo Sim. Deep convolutional framework
for abnormal behavior detection in a smart surveillance system.
Engineering Applications of Artificial Intelligence, 67:226–234, 2018.
[21] Justin Lai and Sydney Maples. Developing a real-time gun detection
classifier.
[22] Tsung-Yi Lin, Piotr Dollár, Ross B. Girshick, Kaiming He, Bharath
Hariharan, and Serge J. Belongie. Feature pyramid networks for object
detection. CoRR, abs/1612.03144, 2016.
[23] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy,
Scott Reed, Cheng-Yang Fu, and Alexander C Berg. SSD: Single
Shot Multibox Detector. In European conference on computer vision,
pages 21–37. Springer, 2016.
[24] Jiayuan Mao, Tete Xiao, Yuning Jiang, and Zhimin Cao. What Can
Help Pedestrian Detection? In The IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), volume 1, page 3, 2017.
[25] VV Molchanov, BV Vishnyakov, YV Vizilter, OV Vishnyakova, and
VA Knyaz. Pedestrian detection in video surveillance using fully
convolutional YOLO neural network. In Automated Visual Inspection
and Machine Vision II, volume 10334, page 103340Q. International
Society for Optics and Photonics, 2017.
[26] Thanh Binh Nguyen, Sun-Tae Chung, et al. ConvNets and AGMM
based real-time human detection under fisheye camera for embedded
surveillance. In Information and Communication Technology Con-
vergence (ICTC), 2016 International Conference on, pages 840–845.
IEEE, 2016.
[27] Yusuke Niitani, Toru Ogawa, Shunta Saito, and Masaki Saito. Chain-
erCV: a Library for Deep Learning in Computer Vision. In ACM
Multimedia, 2017.
[28] Rameswar Panda, Amran Bhuiyan, Vittorio Murino, and Amit K Roy-
Chowdhury. Unsupervised Adaptive Re-identification in Open World
Dynamic Camera Networks. arXiv preprint arXiv:1706.03112, 2017.
[29] George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev,
Jonathan Tompson, Chris Bregler, and Kevin Murphy. Towards
accurate multiperson pose estimation in the wild. arXiv preprint
arXiv:1701.01779, 8, 2017.
[30] Qiwei Peng, Wang Luo, Gongyi Hong, Min Feng, Yuan Xia, Lei Yu,
Xiaolong Hao, Xu Wang, and Mingxuan Li. Pedestrian Detection
for Transformer Substation Based on Gaussian Mixture Model and
YOLO. In Intelligent Human-Machine Systems and Cybernetics
(IHMSC), 2016 8th International Conference on, volume 2, pages
562–565. IEEE, 2016.
[31] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi.
You Only Look Once: Unified, Real-Time Object Detection. In
Proceedings of the IEEE conference on computer vision and pattern
recognition, pages 779–788, 2016.
[32] Joseph Redmon and Ali Farhadi. YOLO9000: Better, Faster, Stronger.
In 2017 IEEE Conference on Computer Vision and Pattern Recogni-
tion (CVPR), 2017.
[33] Joseph Redmon and Ali Farhadi. YOLOv3: An Incremental Improve-
ment. Technical report, 2018.
[34] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster
R-CNN: Towards real-time object detection with region proposal
networks. In Advances in neural information processing systems,
pages 91–99, 2015.
[35] Guang Shu, Afshin Dehghan, Omar Oreifej, Emily Hand, and
Mubarak Shah. Part-based multiple-person tracking with partial
occlusion handling. In Computer Vision and Pattern Recognition
(CVPR), 2012 IEEE Conference on, pages 1815–1821. IEEE, 2012.
[36] Domonkos Varga and Tamás Szirányi. Robust real-time pedestrian
detection in surveillance videos. Journal of Ambient Intelligence and
Humanized Computing, 8(1):79–85, 2017.
[37] Wenfeng Wang, Xi Chen, Guiwei Zhang, Jing Qian, Peng Wei, Boqian
Wu, and Hongwei Zheng. Precision Security: Integrating Video
Surveillance with Surrounding Environment Changes. Complexity,
2018, 2018.
[38] Christian Wojek, Stefan Walk, and Bernt Schiele. Multi-cue onboard
pedestrian detection. In Computer Vision and Pattern Recognition,
2009. CVPR 2009. IEEE Conference on, pages 794–801. IEEE, 2009.
[39] Weiwei Xing, Pingping Bai, Shunli Zhang, and Peng Bao. Scene-
specific pedestrian detection based on transfer learning and saliency
detection for video surveillance. Automatic Control and Computer
Sciences, 51(3):180–192, 2017.
[40] Qixiang Ye, Tianliang Zhang, Qiang Qiu, Baochang Zhang, Jie
Chen, and Guillermo Sapiro. Self-learning scene-specific pedestrian
detectors using a progressive latent model. CoRR, abs/1611.07544, 2,
2016.
[41] Liliang Zhang, Liang Lin, Xiaodan Liang, and Kaiming He. Is
Faster R-CNN Doing Well for Pedestrian Detection? In European
Conference on Computer Vision, pages 443–457. Springer, 2016.
[42] Shanshan Zhang, Rodrigo Benenson, and Bernt Schiele. Cityper-
sons: A diverse dataset for pedestrian detection. arXiv preprint
arXiv:1702.05693, 2017.
[43] Weishan Zhang, Dehai Zhao, Liang Xu, Zhongwei Li, Wenjuan Gong,
and Jiehan Zhou. Distributed embedded deep learning based real-time
video processing. In Systems, Man, and Cybernetics (SMC), 2016
IEEE International Conference on, pages 001945–001950. IEEE,
2016.
[44] Liang Zheng, Zhi Bie, Yifan Sun, Jingdong Wang, Chi Su, Shengjin
Wang, and Qi Tian. MARS: A video benchmark for large-scale person
re-identification. In European Conference on Computer Vision, pages
868–884. Springer, 2016.
[45] Liang Zheng, Yi Yang, and Alexander G Hauptmann. Per-
son re-identification: Past, present and future. arXiv preprint
arXiv:1610.02984, 2016.

Convolutional Neural Network- based Human Recognition
for Vision Occupancy Sensors
Seung Soo Lee and Manbae Kim*
Dept. of Computer and Communications Engineering, Kangwon National University
Chunchon, Republic of Korea
E-mail: {rugh, *
manbae}@kangwon.ac.kr
Abstract – In building and households, occupancy motion
sensors are installed where the lights are turned on/off
according to the detection of moving objects by the occupancy
sensors. One of disadvantages is that PIR sensor cannot detect
the stationary person due to its functionality of detecting the
variation of thermal temperature. To solve this problem, the
utilization of camera vision sensors has gained interests,
where object tracking is used for detecting the stationary
persons. However, the object tracking has an inherent
problem such as tracking drift. Therefore, the recognition of
humans for static trackers is an important task. In this paper,
we propose a CNN-based human recognition to determine
whether a static tracker contains humans. Experimental
results validate that human(s) and non-humans (background)
are classified with accuracy of about 85% and that the
proposed method can be incorporated into practical vision
occupancy sensors.
Keywords: Vision sensor, occupancy sensor, convolutional
neural network, human classification
1 Introduction
Most occupancy sensors installed in buildings, households
and so forth are pyroelectric infra-red (PIR) sensors [1-6].
PIRs operate by the detection of thermal temperature of
humans. They detect the change of thermal temperatures.
Since PIRs detect the variation of the temperature, they have
three demerits. 1) It is impossible to detect stationary people,
2) hot airflow might cause the malfunction of the sensors, and
3) the classification of humans, cats, dogs is not easy a task.
Recently, on-going works utilizing vision sensors have
been introduced to replace PIRs with camera sensors [3, 10].
The use of the camera sensors could not only overcome the
inherent problems of the PIRs, but also additional information
such as people counting, tracking, human activities, and
intelligent surveillance can be obtained.
Generally, the vision sensors carry out object tracking for
the stationary people detection, which can be accomplished
only by the camera sensors. In other words, tracking can
detect static humans.
However, a tracking drift is a main problem that happens
due to image color similarity, the illumination changes and so
forth. If the drift occurs, most drifting trackers will move to
any fixed locations because background has no color
variations. Even people can also be stationary for short-term
and long-term durations. Therefore, the classification of
drifting trackers becomes an important issue.
Fig. 1 shows the bounding boxes of static trackers. Static
trackers are marked in a blue square. A bounding box in Fig.
1(a) indicates a static tracker without humans. Therefore, the
box tracker needs to be removed. Other real example is shown
in Fig 1(b). We observe four static trackers, here a tracker in
the bottom-right contains a human, but other three trackers
have background image. Therefore, the latter are subject to the
elimination from tracking process.
(a) (b)
Fig. 1. The detected stationary bounding boxes marked in blue.
The moving boxes are marked in red.
This paper presents CNN-based stationary people detection.
Using this neural network, we intend to determine whether a
static tracker contains human or background, thereby
improving the performance of occupancy sensors.
A convolutional neural network (CNN) is comprised of one
or more convolutional layers (often with a subsampling step)
and then followed by one or more fully connected layers as in
a standard multilayer neural network. The architecture of a
CNN is designed to take advantage of the 2D structure of an
input image. This is achieved with local connections and tied
weights followed by some form of pooling which results in
translation invariant features. Another benefit of CNNs is that
they are easier to train and have many fewer parameters than
fully connected networks with the same number of hidden
units.
Applying CNN classification to all the trackers requires
high computational complexity. Moving trackers have high
probability of containing humans. Therefore, we limit the
application of CNN classification only to static trackers,
thereby reducing the processing time. The tracking
performance is beyond the scope of this paper. The purpose is

the recognition of human and non-human or background for
each static tracker.
This paper is organized as follows; Next section presents
the overall methodology of this papers. In Section 3, object
tracking method is introduced. CNN-based human
classification is presented in Section 4. Experimental results
are described in Section 5 followed by conclusion and future
works.
2 Proposed Method
Fig. 2. Flow diagram of the proposed method.
The overall structure of the proposed method is shown in
Fig. 2. Object tracking is needed for examining occupancy for
stationary people. The basic tracking method is based on
motion history image (MHI) for real-time. Since trackers
repeat dynamic and static motions, CNN-based classification
is carried out only for static trackers. If the CNN classifier
decides any trackers as a background tracker, the tracker is
automatically removed.
3 MHI –based Tracking
This section introduces a tracking method used in this paper.
Even though the aim of our work is the decision of human
occupancy, a tracking is needed to acquire stationary trackers
that could contain human or background.
For each occupant, a search window is put on a tracker. An
overhead camera is installed on the top of the space. The
height of the ceiling is 3m. Since humans could move fast,
this can result in tracking drift. In other words, the tracker in
the next frame has high probability of being located outside
the search window.
The MHI proposed by Bovick [7] is mainly used in the
recognition of human activities as well as depth generation [8].
A basic definition of MHI is as follows: A differential image
∆𝑡𝑡
𝑖𝑖
is derived from the current and previous images by
∆𝑡𝑡
𝑖𝑖
= |𝐼𝐼𝑡𝑡
𝑖𝑖
− 𝐼𝐼𝑡𝑡
𝑖𝑖
| (1)
where i is a pixel index and t is a frame number. Then, MHI is
computed by



−
>
∆
=
− otherwise
MHI
MAX
T
MHI i
i
i
t
i
i
)
,
( δ
τ
1
0
(2)
where τ denotes the number of frames in which a person’s
action continues. δ is a positive decay parameter. If Δi
< T,
then MHI approaches τ, otherwise it is decreased by δ. MHI
increases rapidly at a large τ and drastically drops at a large δ.
As well, the MHI of a pixel whose motion has recently
occurred will quickly approach τ. MHI0 is 0. Fig. 3 show the
subsequent images with theirs MHIs.
Fig. 3. Subsequent images and their MHIs
Pixel-wise search requires high computational time. In
order to reduce the time, we decompose a tracking search
window into 8x8 blocks. (Fig. 4(b)). Then MHI energy Ek is
computed for each block. k is the block number. Fig. 4(c)
shows the close-up of MHI energy in Fig. 4(d).
Ek = ∑ 𝑀𝑀𝑀𝑀𝑀𝑀𝑖𝑖
i ∈Block(k) (3)
Fig. 4. Saving the sum of all pixel MHIs into a 8x8 block after
decomposition of a search window
We search for a block with a maximum energy of one
current block and eight neighboring blocks, whose location
becomes a tracking point TPn of a tracker n. The
determination of a stationary tracker is easily made from the
energy variation. The CNN network that classifies the human
or background type of the objects takes as input with this
stationary tracker.
Fig. 5 show the resulting tracking images in the upper two
rows and their associated MHIs in the bottom two rows. Two
trackers accompany a single person. Red box is a motion
tracker and blue box indicates a stationary tracker.

Fig. 5. Resulting tracking images obtained by the MHI
energy-based tracking method
4 Proposed Human Recognition based
on CNN Model
The aim of the CNN model is object classification. We
explain the procedure of the classification. The number of
object classes are fixed to be 5 and 2. The labels and their
associated objects in 5-label classification (L5-CNN) are
described in Table 1. L1 contains floor and wall that have little
changes of intensities. Chairs are categorized into L2. L3
contains desk, whiteboard with vertical boundary. Bookshelf,
computer, boxes are L4. L1∼L4 are background. Humans
belong to L5. In 2-label classification CNN (L2-CNN), binary
labels (0,1) are used where L1∼L4 classification are grouped
into L1 and humans become L2 (Fig. 7). The human and
background objects are divided into only two classes, which
reduces the training time. Further, the usage of five labels can
deliver more interesting and diverse information. Figs. 6 and 7
show examples of label objects.
Table 1. Labels and objects for 5-label classification CNN
(L5-CNN)
Label Object
L1 floor, wall
L2 chair
L3 desk, whiteboard
L4 bookshelf, computer, box
L5 human
For practical applications, the complexity of a vision
occupancy sensor needs to be compatible or less compared
with PIR sensors. Therefore, to satisfy this requirement, we
firstly design the most simple neural network model which is
composed of a single convolutional layer and a single fully-
connected layer. Then we will compare the performance with
the well-known SVM (support vector machine). The
performance needs to satisfy a minimum requirement of
occupancy sensors. Fig. 8 shows the baseline CNN model
Fig. 6. Training images belonging to [L1, L5] in L5-CNN
Fig. 7. Training images belonging to [L1, L2] in L2-CNN
Fig. 8. Network structure of 5-class classification (L5-CNN).
The input to the network is a 64x64 grayscale image, then
the number of input nodes is 4,096. The number of
convolutional layers is at [1, 4] and the number of fully-
connected layers varies at [1, 2]. The activation function is
ReLU [9] and 2x2 max pooling is used. The final
classification is carried out by Softmax. Xavier weight
initialization [10] is used and stochastic gradient descent
(SGD) is employed for network backpropagation.
Table 2. Parameter values used in neural networks
Layer
Activation
function
input layer 64 x 64 node
convolution
layer
3 x 3 x 15 ReLU
pooling layer 2 x 2 max pooling
hidden layer 50 ReLU
output layer 5 Softmax
.

5 Experimental Results
Fig. 9. Images acquired in the lab from the overhead camera
The test images were acquired in our lab. Fig. 8 shows
images captured in the lab. The overhead camera is vertically
located on the top of the ceiling. Image resolution is 720x480
with RGB channels. The distance between the ground and the
ceiling is 2.7m. The codes are written in Matlab and C/C++.
Human classification requires the labelling process. For non-
human images, 64x64 image was captured at no-human
occupancy. 500 images per each label were produced and
5,000 test images are available. To acquire human images,
multiple persons repeated exit/entrance in the lab. Then,
human images were manually captured. As well, while our
camera sensor is operating, images are captured when any
trackers become stationary. Since this image might contain
human or background, we manually separated them into its
associated label category.
The purpose of the classification is to decide whether the
image patch contains human(s). The images assigned a label
is used for training. At no-occupancy states, non-human
images were captured. Several persons are walking around the
lab and the human imagers were captured by the vision sensor.
The architecture of the network is summarized in Table 3.
Our network is composed of l convolution layers, n fully
connected layers. Each network is denoted by l-n CNN
throughout this paper, where l ∊ {1, 2, 3, 4} represents the
number of convolution layers and n ∊ {1, 2} denotes the
number of hidden layers. The first convolutional layer takes
64x4 input images with 16 kernels of size 5x5x3. Rectified
Linear Unit (RELU) neuron is used as an activation function
for each convolutional layer.
)
,
0
max( x
y = (4)
where x and y are the input and output values, respectively.
This function can reduce the vanishing gradient problem [13]
that might occur when a sigmoid or hyperbolic tangent
function is adopted in back-propagation train and has a faster
processing speed than a non-linear activation function. This is
suitable to our vision occupancy sensor.
The learning rate is 0.01. We changed the number of
convolutional and hidden layers. The number of filters starts
at 16 and increases in double. Zero padding is used to
maintain the same size of input and output. The number of
hidden nodes in the fully-connected layers is 50 for one
hidden layer and 100-50 for two hidden layers. In the
experiment, we used 80% training data and 20% test data. The
data was randomly shuffled.
Table 3 show the accuracy for L5-CNN. 4-1 CNN shows
the accuracy of 62.8%. The simplest model, 1-1 CNN has the
accuracy of 50%. Since 5-class classification is not
satisfactory, we tested a binary classification, L2-CNN that
determines only human or background.
Table 3. Accuracy measured by no. of convolution filters and
hidden layers in L5-CNN. n is the number of convolution
layers and l is the number of hidden layers in n-l CNN.
CNN type
No. of
conv layer
No. of
hid layer
Accuracy
1-1 CNN
1
1 50.00%
1-2 CNN 2 43.60%
2-1 CNN
2
1 52.80%
2-2 CNN 2 56.80%
3-1 CNN
3
1 60.00%
3-2 CNN 2 50.40%
4-1 CNN
4
1 62.80%
4-2 CNN 2 55.60%
Table 4. Accuracy measured by no. of convolution filters and
hidden layers in L2-CNN. n is the number of convolution
layers and l is the number of hidden layers for n-l CNN.
CNN type
No. of
conv
layer
No. of
hidden layer
Accuracy
1-1 CNN
1
1 83.17%
1-2 CNN 2 84.67%
2-1 CNN
2
1 83.00%
2-2 CNN 2 83.00%
3-1 CNN
3
1 87.17%
3-2 CNN 2 88.17%
4-1 CNN
4
1 86.00%
4-2 CNN 2 86.67%
For comparative performance validation, SVM [12]
(Support Vector Machine) is compared with our CNN models.
The pixels of the estimated regions of interest (ROIs) are
made into feature vectors, and human classification is
performed via a support vector machine (SVM). In [14],
histograms of oriented gradients (HOGs) from the ROIs are

acquired from each image.
The humans and background were differentiated via the
SVM. To make a fair comparison, we used the same training
data set. SVM has shown excellent performance in object
recognition. SVM is implemented in L2-SVM for two classes
and L5-SVM for five classes. The performance is shown in
Table 5. The accuracy of L5- SVM is 51.6% and slightly
better than 1-1 CNN and 1-3 CNN by 1.6, 8.0%. 2-1 CNN ~4-
2 CNN outperform the SVN by a maximum of 9.0%.
Therefore, for 5-class classification, we need to use at least
two convolutional layers to outperform the SVM. This
observation indicates that any CNN model can be
substantially better on the human recognition than
conventional machine learning methods.
In the comparative classification accuracy of Table 5, in
binary classification, L2-SVM shows the accuracy of 60.5%.
In L2-CNN, even the simplest model, 1-1 CNN shows the
accuracy of 83.17%. As the number of convolutional layers
increments, a maximum of 88.17% is achieved. The
experimental results shows that 1-1 CNN can outperform
SVM. It was mentioned in the introduction that the camera
vision senor needs to be compatible with PIRs to have
competitive cost in terms of practical applications. The
experimental results validated this assertion.
Table 5. Classification accuracy of the proposed CNN with
SVM for L2 and L5.
5 Labels 2 Labels
Classifier Accuracy Classifier Accuracy
L5-SVM 51.60% L2-SVM 60.50%
1-1 CNN 50.00% 1-1 CNN 83.17%
1-2 CNN 43.60% 1-2 CNN 84.67%
2-1 CNN 52.80% 2-1 CNN 83.00%
2-2 CNN 56.80% 2-2 CNN 83.00%
3-1 CNN 60.00% 3-1 CNN 87.17%
3-2 CNN 50.40% 3-2 CNN 88.17%
4-1 CNN 62.80% 4-1 CNN 86.00%
4-2 CNN 55.60% 4-2 CNN 86.67%
Finally, the misclassified image are shown in Fig. 10. In
L5-CNN, bookshelf is misclassified as floor, computer as a
chair, human as a chair. In L2-CNN, three different humans
are classified as other background objects. This
misclassification needs a further research. In practice, such
background objects can be eliminated using the stop duration
time [11]. Then, combining this empirical decision with our
CNN network is expected to achieve satisfactory vision
occupancy sensor.
(b)
Fig. 10. Images with classification error and their labels. (a)
L5-CNN and (b) L2-CNN
6 Conclusion
In this paper, we proposed a vision occupancy sensor
that is expected to replace current PIR motion sensors.
Traking is one of imporant functionalities that can track
stationary people. PIRs cannot detect static objects. The
tracking drift is unavoidable in the system and thus
requires the removal decision of drifting trackers. To
solve this, CNN-based people recognition has been
presented in this paper. In L5-CNN, at least 2
convolutional layers are needed to outperform SVM. In
L2-CNN, a single convolutional layer and one hidden
layer outperforms SVM by a large margin. The
experimental results validate that vision occupancy
sensor could replace conventional PIRs in terms of
installation and operating cost.
7 Acknowledgement
This research was supported by Basic Science Research
Program through the National Research Foundation of Korea
(NRF) funded by the Ministry of Education (No.
2017R1D1A3B03028806).
8 References
[1] P. Liu et al. “Occupancy inference using pyroelectric
infrared sensors through hidden Markov model”, IEEE
Sensors Journal, 16(4), Feb. 2016.
[2] F. Wahl, M. Milenkovic, and O. Amft, “A distributed PIR-
based approach for estimating people count in office
environments”, IEEE Conf. on Computational Science
and Engineering, 2012.
[3] Y. Benezeth et al. “Towards a sensor for detecting human
presence and characterizing activity”, Energy and
Buildings, 43, 2011.
[4] J. Han and B. Bhanu, “Fusion of color and infrared video
for moving human detection”, Pattern Recognition, 40,
2007.
[5] S. Nakashima, Y. KItazono, L. Zhang, and S. Serikawa.
“Development of privacy-preserving sensor for person

detection,” Procedia, 2, 2010.
[6] I. Amin, A. Taylor, F. Junejom, A. Al-Habaibeh, and R.
Parkin, “Automated people-counting by using low-
resolution infrared and visual cameras”, Measurement, 41,
2008.
[7] A. Bobick and J. Davis, "The recognition of human
movement using temporal templates," IEEE Trans.
Pattern Recognition and Pattern Analysis, Vol 23, No. 3,
Mar. 2001.
[8] J. Gil and M. Kim, “Motion depth generation using MHI
for 2D-to-3D conversion,” Electronics Letters, Vol. 53,
No. 23, pp. 1520-1522, Nov. 2017.
[9] A. L. Maas, A. Y. Hannun, A. Y. Ng, “Rectifier
Nonlinearities Improve Neural Network Acoustic
Models,” Proc. of the 30th Int. Conf. on Machine
Learning, Atlanta, USA, June 2013.
[10] X. Glorot and Y. Bengio, “Understanding the difficulty
of training deep forward neural networks”, Int’ Conf.
Artificial Intelligence and Statistics, Society for Artificial
Intelligence and Statistics, 2010.
[11] J. Gil and M. Kim, “Real-time People Occupancy
Detection by Camera Vision Sensor”, Journal of
Broadcast Engineering, Vol. 22, No. 6, Nov. 2017.
[12] C-C. Chang and C-J. Lin, “LIBSVM: A library for
support vector machines”, ACM Tran. Intelligent
Systems and Technology, Vol. 2, No. 3, pp. 27:1-27:27,
2011
[13] X. Glorot, A. Bodes, and Y. Bengio, “Deep sparse
rectifier neural networks,’ Int. Conf. Artificial
Intelligence and Statistics, Apr. 2011.

2D and 3D Face Analysis for Ticketless Rail Travel
L. N. Smith, W. Zhang, and M. L. Smith
Centre for Machine Vision, BRL, Department of Engineering Design and Mathematics
University of the West of England, Bristol BS16 1QY UK
Abstract - Research is reported into the design,
implementation and functionalities of a vision system that
employs the human face as a biometric for enabling ticketless
rail travel. The system has been developed to optimise
performance in the relatively unstructured railway station
scenario. In addition to establishing the working vision
system, major outputs of the work have included
demonstration that 3D face recovery prevents ‘spoofing’ by
use of photographs; and also the finding that incorporation
of 3D data into facial modelling has enabled a 6%
improvement in face classification. Our conclusion is that 3D
data increase face recognition reliability significantly and
will be the enabling factor for ensuring revenue protection
when employing vision systems for implementation of
ticketless rail travel.
Keywords: 2D 3D face analysis biometric
1 Introduction
Currently many rail transportation companies ensure
revenue protection by issuing tickets (or electronic swipe
cards) that enable customers to access platform areas by
passing through paddle gates. While this approach works
relatively well during most times of the day, it creates some
quite serious problems at peak periods – in the form of
pronounced bottlenecks and associated passenger
throughput delays. If the paddle gates could be eliminated,
whilst still ensuring revenue protection, the experience of the
traveller could be significantly improved, in terms of their
journeys being made easier, quicker and more pleasant. Also,
the rail operators might benefit from reduced costs arising
from not having to operate and maintain large amounts of
relatively expensive and complex electro-mechanical paddle
gate equipment. The ultimate long-term aim is therefore to
replace the currently employed sets of paddle gates with
‘gateless gatelines’, where there is one large open entrance
which passengers could pass through while the system
automatically detects that fare has been paid, thereby
producing minimal obstruction and inconvenience and
ensuring maximum passenger flow through stations. Machine
vision provides a convenient and useful means of
implementing such an entrance, without the need for
passengers to carry active items such as RFID cards or Smart
Phones. This can be achieved by using the face as a useful
biometric and one that cannot easily be hid from view (this
would itself generate suspicion). Also, unlike most biometric-
based technologies, face recognition (FR) does not require
particular actions by the users, such as having to place
hands/fingers on designated areas for palm/fingerprint
recognition. However, the relatively unstructured nature of
the situation, and the need to use vision to identify a reliable
biometric, so that passengers can be reliably identified and
repeat fare evaders detected, means that face analysis for
ticketless rail travel is a quite challenging vision application.
The situation of rail passengers casually passing through a
large entrance area in groups is considerably more involved
than, for example, face authentication for a person standing in
front of a door. Reasons for this include the fact that in the
rush hour, passengers are likely to be moving quite quickly
and to be close together and the large entrance allows a
number of people to pass through simultaneously. This may
result in occlusion, the same face appearing more than once,
non-uniform illuminations, low resolution (due to the size of
the entrance and the consequent relatively wide camera field-
of-view), and/or significant variation in poses. Therefore,
reliable FR for this application requires relatively advanced
machine vision hardware and software implementations for
generating the needed rich data sets and in-depth modelling.
Implementing this has required research investigation of:
• Major challenges associated with the application context,
including incomplete/noisy data, - such as significant changes
in illumination and passenger behaviours, facial occlusion, a
relatively long imaging distance, variable walking paces of
passengers and space limitations in a station environment.
• A preliminary 2D facial recognition system. This includes the
study of feature extraction and classification, as well as
evaluation of system performance when subjecting it to a
realistic real-world environment. While this 2D system has
yielded promising results, it confirmed the need to explore 3D
features to ensure robust and reliable operation in this kind of
relatively unstructured application environment.
• Analysis of the face in 3D. A 3D system, based on recovery
of facial surface normals using a new type of two-source
photometric stereo, is described. The accuracy of surface
gradient recovery is found to be directly linked to facial
recognition accuracy.

A significant finding that emerged from the above
investigations was that capture and analysis of 3D face data,
in addition to 2D data, is required to provide reliability in face
detection and recognition in the rail travel situation. The
investigations were preceded by a detailed literature review on
FR research, the full details of which are beyond the scope of
this paper. However, below the main approaches that have
been employed are identified, and illustrated by reference to
various researchers, and the benefits that can be realised by
employing 3D data in the analysis, are explained.
2 Background
2.1 2D face recognition
Here conventional digital images are processed by direct
analysis of pixel intensities, where one approach is to employ
feature-based methods to process an image to identify,
extract, measure, and compare facial features. Relevant
features include eyes, nose, mouth, ears, and/or other fiducial
marks. The aimis to establish geometric relationships between
facial points, by measuring the distance between the eyes,
lips, chin, and eye brows, with statistical techniques then
being used to recognize faces using relationships between
these measurements. An example of a recent feature-based
approach is the work by Thomas and Peter [1], where they
used Part-based One vs One Features for bird species
identification using the Caltech UCSD Birds (CUB) dataset
and on Labelled Faces in the Wild (LFW) dataset. They
reported an average accuracy of 73.30% on CUB and 93.13%
on the LFW datasets. These feature-based approaches have
the advantage of high-speed matching due to their compact
representation of the face images, as well as invariance to size
and illumination. Disadvantages include errors that occur
when the face is presented at an angle to the camera (metrics
can be applied with the aim of reducing the magnitudes of
these errors, but some errors are likely to remain) and the
difficulty associated with reliable automatic feature detection.
The other major approach for 2D FR are the global
methods. Here analysis is performed on the entire image of the
face rather than on local features so that similarities of the
whole face are compared, while ignoring individual features
like eyes, mouth, nose etc. In global methods, analysis is
generally undertaken by application of statistical or artificial
intelligence methods. Statistical methods involve a mask to
iteratively compare a face image to all other faces in the
database. Such approaches have been shown to work
efficiently under controlled environments. For example, Yang
and Zhou [2] employed a statistical method known as
Augmented Lagrangian Methods (ALM) to recognise faces in
images corrupted by disguise, illumination, and pose – an
accuracy of up to 82% was reported. Artificial Intelligence
global approaches utilize tools such as neural networks and
machine learning techniques to recognize faces in 2D texture
data. Examples of this approach include: Neural Networks
(NN), Support Vector Machines (SVM), Hidden Markov
Model (HMM) and Local Binary Pattern (LBP). Although
such methods can produce better recognition results than
feature-based techniques, they are computationally intensive
since they process each facial pixel and they do not perform
effectively under large variations in pose, scale and
illumination. In order to address this, Marisco et al. [3]
proposed a scheme that aimed to quantitatively assessed
pose and illumination changes. Experiments conducted on
LFW, FERET, and SCface databases revealed an accuracy of
61%, 95%, and 89%, respectively. To summarise, 2D global
methods can provide good functionality but lack the
robustness needed for reliable operation in unstructured
situations such as that of the ticketless rail travel application.
2.2 3D face recognition
The authors have investigated 3D face reconstruction
methods that utilize 3D features for achieving higher accuracy
and robustness in face analysis [4]. 3D facial features reveal
facial topology by providing geodesic distances and surface
curvatures. They have thus shown promise for bringing
higher accuracy to face recognition and improved robustness
to practical applications where scenes are complex and
dynamic. However, the exploitation of 3D vision is not
currently sufficient to enable a wide array of 3D vision based
applications. This is mainly due to 3D reconstruction
techniques being associated with 3D imaging systems that are
commonly perceived as being slow in operation, having
limited work envelopes and requiring bulky and/or expensive
setups. Consequently, many algorithms struggle to find their
way into real-world scenarios. To address this, we introduced
a variation on the standard three or four light photometric
stereo (PS) method, where two light sources are employed [4];
and, for the first time, we apply it to various types of realistic
data.
2.3 Photometric stereo principles
Photometric stereo (PS) enables calculation of surface
normals from reflectance maps obtained from images of the
same object captured under different illumination directions.
As shown in Fig. 1, the object is viewed froma single camera
and imaged sequentially by a number of structured lights in
known locations. Four lights (and so four views) are shown in
Fig. 1, but Woodham [5] showed that three views are
sufficient to uniquely determine surface normal as well as
albedo at each image point, provided that the directions of
incident illumination are not collinear in azimuth. PS
techniques are superior in capturing detailed high-frequency
3D textures and are less affected by image noise compared to
triangulation-based techniques [6], which recover distance to
a surface rather than surface orientations and may require the
use of a projected pattern. The latter will often place
constraints upon the system’s range and/or work envelope,
while the camera interface bandwidth and the resolution of the

projected pattern will limit the resolution with which
triangulation systems can measure object distance, hence
introducing noise and artefacts and limiting their ability to
resolve 3D surface textures. In contrast, PS methods can
directly recover 3D surface textures with a high resolution that
is only limited by the camera sensor and lens, while normally
requiring only one camera for image capture - thereby
simplifying the calibration process and allowing for high
efficiency, a greater depth of field and a large work envelope.
Fig 1. The principle of photometric stereo, which employs a
single camera to capture multiple images of a surface
illuminated by multiple light sources.
3 Method
3.1 Two-source photometric stereo
Although the standard 4-source PS method is highly
accurate and relatively efficient in recovering 3D surface
normals, it is not ideal for facilitating real-world applications.
Generally, its implementation is prohibited by the need for
capturing at least 3 (or more commonly 4 or more) images at
high frame rate for every full 3D object reconstruction. Further
limitations can arise from the complexity of the data capture
system. It is usual for a large set of light sources to be
deployed, that result in a set up that is more expensive and
requires more space, which can be inconvenient in some
cases. We propose to simplify both data capture and
hardware design by employing a two-source PS variation
where only two light sources are required and therefore only
two images need to be captured per reconstruction.
Our aim is to provide a 3D reconstruction algorithm,
together with a stereo imaging system, for robust face
recognition in unstructured environments. We combine
analysis of the 3D data from this system with our 2D global
methods, in order to obtain significant increases in the
reliability of our face recognition system. To facilitate accurate
and robust facial recognition technology for ticketless travel,
we developed a two-source Photometric Stereo (PS) system –
the first prototype of which is shown in Fig. 2.
Fig. 2. Systemstructure for the 2D+3D face recognition
systemfor ticketless rail travel.
The PS system employs a single camera and two near
infra-red illumination sources to capture greyscale images of
an object illuminated from different angles. The greyscale
images are then used to reconstruct surface normals as a type
of 3D feature. The uncertainty associated with the two-source
Photometric Stereo problem was compensated for by having
two lights symmetrically located on both sides of the camera.
This enables us to recover a one-directional surface normal,
i.e. horizontal gradients, independent from gradient in the
perpendicular direction. Uniformlocal binary patterns are then
employed to further encode surface gradient information (as
opposed to conventional intensity analysis). Various
classification methods have been implemented and a number
were found to achieve desirable performance, including K-
nearest neighbour and Support Vector Machine. This is due
to the features being more robust than those from intensity
images and therefore having higher discriminability.
Interestingly, similar results were also achieved when local
binary patterns were not employed, but when normalised
surface gradient data were used directly for classification.
A representative PS based facial recognition system
developed in our CMV laboratory – the Photoface system[7]
employed four illumination sources and realised high
recognition rates. However, the system shown in Fig 2 only
uses two light sources for reconstruction of (horizontal)
surface gradients, such that the practicability of the systemis
sufficient to suit railway station scenarios. Near Infrared (NIR)
lights are to be employed to make the systemmore covert and
less intrusive. The surface gradient (3D) features have the
advantages that they are independent of illumination and are
robust to head pose changes to a large extent, meaning that
they are most appropriate for accommodating dynamic
passenger behaviours. Furthermore, in being able to capture

PS images and to reconstruct potentially high-resolution 3D
features at a high frame rate, this system is superior to stereo
vision systems in fulfilling the task of increasing the
throughput in station gate scenarios. A 3D facial recognition
system also has the advantage that it is not prone to
‘spoofing’ by a 2D photograph that cannot be differentiated
froma real face by conventional 2D feature based systems.
3.2 Construction of face databases
To facilitate ticketless rail travel, two types of face
databases need to be constructed, in order that enrolled or
registered users/passengers can be differentiated from those
who are unregistered. A global face database is constructed
by passengers going to any station where an enrolment booth
is available. Each passenger needs to be enrolled once only in
order to use the system. The hardware structure of the
enrolment system is similar to that of the facial recognition
system, subject to minor adjustments (e.g. smaller in scale).
This database is updated when a passenger gets registered or
deregistered. Each local (per-station) face database is also
constructed as a subset of the global face database. A local
face database is updated when a passenger arrives at a station
and ‘taps in’ (i.e. is added to the database) or a passenger is
successfully recognised by the facial recognition system
while departing from a station (i.e. removed from the
database).
3.3 Conditions for a gateless gateline scenario
The conditions listed below allow design of a facial
recognition system that can be more specific to the gateless
gateline scenario, thereby allowing maximised performance for
this application:
1) No direct sunlight (ambient sunlight is allowed).
2) Co-operative passengers, i.e. the passenger not
intentionally avoiding the camera (i.e. no extreme head pose).
Note that a passenger will not be required to stop and look up
to the camera.
3) No severe occlusion of the face to be recognised.
4) In the initial system, there will be a single-file channel for
each gate, i.e. no more than one passenger walking through
the gate at a given time.
5) Persons will walk steadily through the gate (e.g. at a fast
walking pace), rather than running/moving very quickly.
3.4 Facial Recognition Algorithm Chain
This facial recognition system realises its functionalities
by following five main stages:
1) Data capture: the system achieves sequential high-speed
synchronisation of a camera (referred to as camera A in the
rest of the report) with two NIR light sources. Pairs of
greyscale images of a gate corridor scene are captured - with
one image illuminated by one of the NIR lights and the other
image by the other light.
2) Face detection: the system then searches to detect face
regions in the greyscale images.
3) Surface gradient recovery: for every face image present in a
pair of PS images, the system performs a two-source PS
algorithmto recover (horizontal) surface gradient features.
4) Feature processing: gradient features are then processed in
order to provide enhanced discriminative power.
5) Facial recognition: discriminative features are compared to
those in the local face database, and a face is classified as
belonging to a particular passenger (i.e. a ticket holder) or
otherwise an unenrolled/unregistered passenger (i.e. a
potential fare evader).
3.5 Facial recognition hardware configuration
Given that the aforementioned two-source PS systemwill
fulfil the role of a facial recognition system, it is worth
exploring the influence of the system parameters (e.g. camera
height, camera angle, etc.) at this stage, such that the
implementation of the system can be tailored for use in a
railway station. We therefore sought to obtain the optimal
system parameters to shed light upon our initial design choice.
This was achieved by a series of simulations, trigonometry
calculations and trial experiments. A camera (designated as
‘Camera A’), was employed to record PS images for 3D
reconstruction and facial recognition, while the utilisation of a
second vertically mounted camera was intended for passenger
tracking from an overhead viewpoint. Although the
investigation of tracking methods was out of the scope of the
authors’ work on this project, integration of video tracking
with the face recognition systemoffers potential to strengthen
revenue protection by helping to avoid face
occlusion/overlapping and reducing imaging variations
caused by extreme dynamic passenger behaviours.
By incorporating the popular Viola-Jones face detector
[8] for face detection and the uniform Local Binary Pattern
(LBP) histogram features for facial recognition [9], this system
can image and detect the face (without severe head poses
present) of a passenger of average height (e.g. 1.75m in the
experiments). In the experiments the camera was positioned at
a height of 2.35m, which is the minimumclearance according to
the UK National Rail standard. It was observed that, at an
imaging distance between 1.2m and 2.8m, this provided good-
quality facial data – the faces were not severely occluded, they
were mostly included in the camera’s field of view, and they
were of sufficient image sizes (over 200×200 pixels). Although
results indicated that this can provide face recognition
capability in many gateless gateline scenarios, it was observed
that when lighting conditions changed, the recognition
accuracy plunged drastically. This again proved that the
incorporation of 3D features (i.e. surface gradients fromthe PS
system) would be very significant for ensuring robust facial
recognition. For 3D imaging, the distance between the two NIR

Another Random Scribd Document
with Unrelated Content

France, the restored king or his successor was hardly more than a
figurehead. A new class, the Third Estate, remained in fact master of
France. There had been a change in the ownership of the land;
power through the control of vested property rested with the group
which in 1789 began its revolt under the leadership of Mirabeau. A
new dictatorship had succeeded the old. And this is what a
revolution is—the dictatorship of a new crowd. The Russian
revolutionists now candidly admit this fact in their use of the phrase
"the dictatorship of the proletariat." Of course it is claimed that this
dictatorship is really the dictatorship of "all the people." But this is
simply the old fiction with which every dominant crowd disguises
seizure of power. Capitalist republicanism is also the rule of all the
people, and the pope and the king, deriving their authority from
God, are really but "the servants of all."
As we have seen, the crowd mind as such wills to dominate.
Society is made up of struggle groups, or organized crowds, each
seeking the opportunity to make its catchwords realities and to
establish itself in the position of social control. The social order is
always held intact by some particular crowd which happens to be
dominant. A revolution occurs when a new crowd pushes the old one
out and itself climbs into the saddle. When the new crowd is only
another faction within the existing dominant crowd, like one of our
established political parties, the succession will be accomplished
without resort to violence, since both elements of the ruling crowd
recognize the rules of the game. It will also not result in far-reaching
social changes for the same reason. A true revolution occurs when
the difference between the dominant crowd and the one which
supplants it is so great as to produce a general social upheaval. The
Reformation, the French Revolution, and the "Bolshevist" coup detat
in Russia, all were of this nature. A new social leadership was
established and secured by a change in each case in the personnel
of the ownership of such property as would give the owners the
desired control. In the first case there was a transfer of property in
the church estates, either to the local congregations, or the state, or
the denomination. In the second case the property transferred was

property in land, and with the Russian revolutionists landed property
was given to the peasants and vested capital turned over to the
control of industrial workers.
Those who lay all emphasis on this transfer of property naturally
see only economic causes in revolutionary movements. Economics,
however, is not a science of impersonal things. It treats rather of
mens relations to things, and hence to one another. It has to do with
valuations and principles of exchange and ownership, all of which
need psychological restatement. The transfer of the ownership of
property in times of revolution to a new class is not an end, it is a
means to a new crowds social dominance. The doctrines, ideals, and
principles believed by the revolutionary crowd also serve this end of
securing its dominance, as do the social changes which it effects,
once in power.
Revolutions do not occur directly from abuses of power, for in
that case there would be nothing but revolution all the time, since
every dominant crowd has abused its power. It is an interesting fact
that revolution generally occurs after the abuses of which the
revolutionists complain have been in great measure stopped—that is,
after the ruling crowd has begun to make efforts at reform. The
Reformation occurred in the pontificate of Leo X. If it had been the
result of intolerable abuse alone, it would have happened in the time
of Alexander VI, Borgia. The French Revolution fell upon the mild
head of Louis XVI, though the wrongs which it tried to right mostly
happened in the reign of his predecessor. In most cases the abuses,
the existence of which a revolutionary crowd uses for propaganda
purposes, are in turn repeated in new form by itself after it becomes
dominant. The Reformers in the sixteenth and seventeenth centuries
resorted to much the same kind of persecution from which they had
themselves earlier suffered. The Constituent Assembly, though it had
demanded liberty, soon set up a more outrageous tyranny through
its own committees than any that the Louies had dreamed of.
Bolshevists in capitalist countries are the greatest advocates of free

speech; in Russia they are the authors of a very effective press-
censorship.
No, it is hardly the abuses which men suffer from their ruling
crowds which cause insurrection. People have borne the most
terrible outrages and suffered in silence for centuries. Russia itself is
a good example of this.
A revolution occurs when the dominant crowd begins to weaken.
I think we find proof of this in the psychology of revolutionary
propaganda. A general revolution is not made in a day, each such
cataclysm is preceded by a long period of unrest and propaganda of
opposition to the existing order and its beneficiaries. The Roman
Republic began going to pieces about a hundred years before the
battle of Actium. The social unrest which followed the Punic Wars
and led to the revolt of the brothers Gracchi was never wholly
checked during the century which followed. The dominant party had
scarcely rid itself of these troublesome "demagogues" than revolt
broke out among the slave population of Sicily. This was followed by
the revolt of the Italian peasants, then again by the insurrection of
Spartacus, and this in turn by the civil war between Marius and
Sulla, the conspiracy of Catiline, the brief triumph of Julius Cæsar
over the Senate, the revenge of the latter in the assassination of
Cæsar, and the years of turmoil during the Second Triumvirate.
It is doubtful if there was at any time a very clear or widespread
consciousness of the issues which successively arose during that
unhappy century. It would seem that first one counter-crowd and
then another, representing various elements of the populace, tried
issue with the ruling crowd. The one factor which remained constant
through all this was the progressive disintegration of the dominant
party. The supremacy of the Patres Conscripti et Equites became in
fact a social anachronism the day that Tiberius Gracchus demanded
the expropriation of the landed aristocracy. The ideas whereby the
dominant crowd sought to justify its pre-emptions began to lose
their functional value. Only the undisguised use of brute force was

left. Such ideas ceased to convince. Men of unusual independence of
mind, or men with ambitious motives, who had grown up within the
dominant crowd, began to throw off the spell of its control-ideas,
and, by leaving it, to weaken it further from within. No sooner was
this weakness detected by other groups than every sort of grievance
and partisan interest became a moral justification for efforts to
supplant the rulers. The attempt of the dominant crowd to retain its
hold by repeating its traditional justification-platitudes, unchanged,
but with greater emphasis, may be seen in the orations of Cicero. It
would be well if some one besides high-school students and their
Latin teachers were to take up the study of Cicero; the social and
psychological situation which this orator and writer of moral essays
reveals has some suggestive similarities to things which are
happening to-day.
The century and more of unrest which preceded both the
Reformation and the French Revolution is in each instance a long
story. But in both there is the same gradual loss of prestige on the
part of the dominant crowd; the same inability of this crowd to
change with the changes of time; to find new sanctions for itself
when the old ones were no longer believed; the same unadaptability,
the same intellectual and moral bankruptcy, therefore, the same
gradual disintegration from within; the same resort to
sentimentalism and ineffective use of force, the same circle of
hungry counter-crowds waiting around with their tongues hanging
out, ready to pounce upon that before which they had previously
groveled, and to justify their ravenousness as devotion to principle;
the same growing fearlessness, beginning as perfectly loyal desire to
reform certain abuses incidental to the existing order, and
advancing, with every sign of disillusionment or weakness, to moral
indignation, open attack upon fundamental control ideas, bitter
hostility, augmented by the repressive measures taken by the
dominant crowd to conserve a status quo which no longer gained
assent in the minds of a growing counter-crowd; finally force, and a
new dominant crowd more successful now in justifying old tyrannies
by principles not yet successfully challenged.

In the light of these historical analogies the record of events
during the last seventy-five years in western Europe and America is
rather discomforting reading, and I fear the student of social
psychology will find little to reassure him in the pitiable lack of
intellectual leadership, the tendency to muddle through, the
unteachableness and general want of statesmanlike vision displayed
by our present dominant crowds. If a considerable number of people
of all classes, those who desire change as well as those who oppose
it, could free their thinking from the mechanisms of the crowd-mind,
it might be possible to find the working solution of some of our
pressing social problems and save our communities from the
dreadful experience of another revolution. Our hope lies in the
socially minded person who is sufficiently in touch with reality to be
also a non-crowd man.
Anyone who is acquainted with the state of the public mind at
present, knows that a priori arguments against revolution as such
are not convincing, except to those who are already convinced on
other ground. The dominant crowd in each historical epoch gained
its original supremacy by means of revolution. One can hardly make
effective use of the commonplace antirevolutionary propaganda of
defense of a certain order which has among its most ardent
supporters people who are proud to call themselves sons and
daughters of the Revolution. Skeptics at once raise the question
whether, according to such abstract social ethics, revolutionists
become respectable only after they are successful or have been a
long time dead. In fact, the tendency to resort to such reasoning is
one among many symptoms that the conservative mind has
permitted itself to become quite as much a crowd-phenomenon as
has the radical mind.
The correct approach here is psychological and pragmatic. There
is an increasingly critical social situation, demanding far-reaching
reconstructive change; only the most hopeless crowd-man would
presume to deny this fact. The future all depends upon the mental
processes with which we attempt to meet this situation. Nothing but

useless misery can result from dividing crowd against crowd. Crowd-
thinking, as I have said, does not solve problems. It only creates
ideal compensations and defense devices for our inner conflicts.
Conservative crowd-behavior has always done quite as much as
anything else to precipitate a revolutionary outbreak. Radical crowd-
behavior does not resolve the situation, it only inverts it. Any real
solution lies wholly outside present crowd-dilemmas. What the social
situation demands most is a different kind of thinking, a new
education, an increasing number of people who understand
themselves and are intellectually and morally independent of the
tyranny of crowd-ideas.
From what has been said above, it follows that revolutionary
propaganda is not directly the cause of insurrection. Such
propaganda is itself an effect of the unconscious reaction between a
waning and a crescent crowd. It is a symptom of the fact that a
large number of people have ceased to believe in or assent to the
continued dominance of the present controlling crowd and are
looking to another.
There is always a tendency among conservative crowds to
hasten their own downfall by the manner in which they deal with
revolutionary propaganda. The seriousness of the new issue is
denied; the crowd seeks to draw attention back to the old issue
which it fought and won years ago in the hour of its ascendancy.
The fact that the old charms and shibboleths no longer work, that
they do not now apply, that the growing counter-crowd is able to
psychoanalyze them, discover the hidden motives which they
disguise, and laugh at them, is stoutly denied. The fiction is
maintained to the effect that present unrest is wholly uncalled-for,
that everything is all right, that the agitators who "make people
discontented" are alien and foreign and need only be silenced with a
time-worn phrase, or, that failing, shut up by force or deported, and
all will be well.

I do not doubt that before the Reformation and the French
Revolution there were ecclesiastics and nobles aplenty who were
quite sure that the masses would never have known they were
miserable if meddling disturbers had not taken the trouble to tell
them so. Even an honest critical understanding of the demands of
the opposing crowd is discouraged, possibly because it is rightly felt
that the critical habit of mind is as destructive of one crowd-complex
as the other and the old crowd prefers to remain intact and die in
the last ditch rather than risk dissolution, even with the promise of
averting a revolution. Hence the Romans were willing to believe that
the Christians worshiped the head of an ass. The mediæval
Catholics, even at Leos court, failed to grasp the meaning of the
outbreak in north Germany. Thousands saw in the Reformation only
the alleged fact that the monk Luther wanted to marry a wife. To-
day one looks almost in vain among business men, editors, and
politicians for a more intelligent understanding of socialism. A crowd
goes down to its death fighting bogies, and actually running upon
the sword of its real enemy, because a crowd, once its constellation
of ideas is formed, never learns anything.
The crowd-group contains in itself, in the very nature of crowd-
thinking, the germs which sooner or later lay it low. When a crowd
first becomes dominant, it carries into a place of power a number of
heterogeneous elements which have, up to this time, been united in
a great counter-crowd because of their common dissatisfaction with
the old order. Gradually the special interests of these several groups
become separated. The struggle for place is continued as a factional
fight within the newly ruling crowd. This factional struggle greatly
complicates every revolutionary movement. We witness this in the
murderously hostile partisan conflicts which broke out in the
revolutionary Assemblies in France. It is seen again in the
Reformation, which had hardly established itself when the
movement was rent by intense sectarian rivalries of all sorts. The
same is true of Russia since the fall of the Tsar, and of Mexico ever
since the overthrow of the Diaz regime. If these factional struggles
go so far as to result in schism—that is, in a conscious repudiation

by one or more factions of the revolutionary creed which had
formerly united them all, there is disintegration and in all probability
a return to the old ruling crowd.
This reaction may also be made possible by a refusal of one
faction to recognize the others as integral parts of the newly
triumphant crowd. If the new crowd after its victory can hold itself
together, the revolution is established. It then becomes the task of
the leading faction in the newly dominant crowd to grab the lions
share of the spoils for itself, give the other factions only so much
prestige as will keep alive in their minds the belief that they, too,
share in the new victory for "humanity" and hold the new social
order together, while at the same time justifying its own leadership
by the compulsive power of the idea which they all alike believe.
This belief, as we have seen, is the sine qua non of the continued
existence of any crowd. A dominant crowd survives so long as its
belief is held uncritically and repeated and acted upon automatically
both by the members of the crowd and its victims. When the
factions which have been put at a disadvantage by the leading
faction renounce the belief, or awake to the fact that they "have
been cheated," disintegration begins.
Between the crowds professed belief and the things which it
puts into practice there is a great chasm. Yet the fiction is uniformly
maintained that the things done are the correct and faithful
application of the great principles to which the crowd is devoted. We
saw in our study of crowd-ideas in general that such ideas are not
working programs, but are screens which disguise and apparently
justify the real unconscious motive of crowd-behavior. The crowd
secures its control, first, by proclaiming in the most abstract form
certain generally accepted principles, such as freedom,
righteousness, brotherly love—as though these universal "truths"
were its own invention and exclusive monopoly. Next, certain logical
deductions are made from these principles which, when carried to
their logical conclusions regardless of fact or the effect produced,
make the thing which the crowd really wants and does appear to be

a vindication of the first principles. It is these inferences which go to
make up the conscious thinking or belief of the crowd. Thus in the
revolutionary convention in France all agree to the principles of
Liberty, Equality, and Fraternity. Fidelity to these principles would to
a non-crowd mean that the believer should not try to dictate to his
fellows what they must believe and choose, that he would exercise
good will in his dealings with them and show them the same respect
which he wished them to have for himself. But the crowd does not
understand principles in this manner. Do all agree to the great
slogan of the revolution? Well, then, fidelity to Liberty, Equality, and
Fraternity demands that the enemies of these principles and the
crowds definition of them be overthrown. The Mountain is the truly
faithful party, hence to the guillotine with the Gironde. This chasm
between crowd faith and crowd practice is well illustrated in the case
of those Southern patriots in America who were ready to fight and
die for the rights of man as expressed in the Declaration of
Independence, but refused to apply the principle of the inalienable
rights of all men to their own black slaves. Or, again in the case of
nineteenth-century capitalism, liberty must be given to all alike.
Liberty means equal opportunity. Equal opportunity means free
competition in business. Free competition exists only where there is
an "incentive"; hence the investor must be encouraged and his gains
protected by law. Therefore anti-capitalistic doctrines must be
suppressed as subversive of our free institutions. Immigrants to
whom for a generation we have extended the hospitality of our
slums and labor camps, and the opportunity of freely competing with
our well-intrenched corporations, must be made to feel their
ingratitude if they are so misguided as to conclude, from the fact
that hundreds of leading radicals have been made to serve jail
sentences, while after thirty years of enforcing the antitrust law not
a single person has ever been sent to prison, that possibly this is not
a free land.
Or again—one convicts himself of being a crowd-man who
shows partiality among crowds—the principle of democracy is
generally accepted. Then there should be industrial democracy as

well as political—hence the "Dictatorship of the Proletariat"—for the
workers are "the people." Parliamentary assemblies elected by all
the people do not necessarily represent labor. Organized labor,
therefore, though a minority of the whole, should establish
"industrial democracy" by force. So, according to Bolshevist crowd-
logic, democracy means the rule of a minority by means of force.
Now it is this fictitious, paranoiac, crowd-logic which one must
be able to dispel before he can extricate himself from the clutches of
his crowd. If he subjects the whole fabric of abstractions to critical
analysis, revalues it, puts himself above it, assumes a pragmatic
attitude toward whatever truths it contains, dares to test these
truths by their results in experience and to use them for desired
ends; if, in short, he scrutinizes his own disguised impulses, brings
them to consciousness as what they are, and refuses to be deceived
as to their real import, even when they appear dressed in such
sheeps clothing as absolutes and first principles, he becomes a non-
crowd man, a social being in the best sense.
Those, however, who continue to give assent to the crowds first
principles, who still accept its habit of a priori reasoning, merely
substituting for its accepted deductions others of their own which in
turn serve to conceal and justify their own unconscious desires, will
turn from the old crowd only to be gobbled up by a new and
counter-crowd. Such people have not really changed. They
denounce the old crowd on the ground that "it has not lived up to its
principles." It is a significant fact that a crowds rule is generally
challenged in the name of the very abstract ideas of which it has
long posed as the champion.
For instance, there is liberty. Every crowd demands it when it is
seeking power; no crowd permits it when it is in power. A crowd
which is struggling for supremacy is really trying to free itself and as
many people as possible from the control of another crowd.
Naturally, the struggle for power appears to consciousness as a
struggle for liberty as such. The controlling crowd is correctly seen to

be a tyrant and oppressor. What the opposition crowd does not
recognize is its own wish to oppress, hidden under its struggle for
power. We have had occasion to note the intolerance of the crowd-
mind as such. A revolutionary crowd, with all its lofty idealism about
liberty, is commonly just as intolerant as a reactionary crowd. It
must be so in order to remain a crowd. Once it is triumphant it may
exert its pressure in a different direction, but the pinch is there just
the same. Like its predecessor, it must resort to measures of
restraint, possibly even a "reign of terror," in order that the new-won
"liberty"—which is to say, its own place at the head of the procession
—may be preserved. The denial of freedom appears therefore as its
triumph, and for a time people are deceived. They think they are
free because everyone is talking about liberty.
Eventually some one makes the discovery that people do not
become free just by repeating the magic word "liberty." A
disappointed faction of the newly emancipated humanity begins to
demand its "rights." The crowd hears its own catchwords quoted
against itself. It proceeds to prove that freedom exists by
denouncing the disturbers and silencing them, if necessary, by force.
The once radical crowd has now become reactionary. Its dream of
world emancipation is seen to be a hoax. Lovers of freedom now
yoke themselves in a new rebel crowd so that oppressed humanity
may be liberated from the liberators. Again, the will to power is
clothed in the dream symbols of an emancipated society, and so on
around and around the circle, until people learn that with crowds
freedom is impossible. For men to attain to mastery of themselves is
as abhorrent to one crowd as to another. The crowd merely wants
freedom to be a crowd—that is, to set up its own tyranny in the
place of that which offends the self-feeling of its members.
The social idealism of revolutionary crowds is very significant for
our view of the crowd-mind. There are certain forms of revolutionary
belief which are repeated again and again with such uniformity that
it would seem the unconscious of the race changes very little from
age to age. The wish-fancy which motivates revolutionary activity

always appears to consciousness as the dream of an ideal society, a
world set free; the reign of brotherly love, peace, and justice. The
folly and wickedness of man is to cease. There will be no more
incentive for men to do evil. The lion and the lamb shall lie down
together. Old extortions and tyrannies are to be left behind. There is
to be a new beginning, poverty is to be abolished, Gods will is to be
done in earth, or men are at last to live according to reason, and the
inalienable rights of all are to be secured; or the co-operative
commonwealth is to be established, with no more profit-seeking and
each working gladly for the good of all. In other words, the mind of
revolutionary crowds is essentially eschatological, or Messianic. The
crowd always imagines its own social dominance is a millennium.
And this trait is common to revolutionary crowds in all historical
periods.
We have here the psychological explanation of the Messianic
faith which is set forth with tremendous vividness in Biblical
literature. The revolutionary import of the social teaching of both the
Hebrew and Christian religions is so plain that I do not see how any
honest and well-informed person can even attempt to deny it. The
telling effectiveness with which this element in religious teaching
may be used by clever radicals to convict the apologists of the
present social order by the words out of their own mouths is evident
in much of the socialist propaganda to-day. The tendency of the will
to revolt, to express itself in accepted religious symbols, is a thing to
be expected if the unconscious plays the important part in crowd-
behavior that we have contended that it does.
The eighth-century Hebrew prophet mingles his denunciations of
those who join house to house and field to field, who turn aside the
way of the meek, and sit in Samaria in the corner of a couch and on
the silken cushions of a bed, who have turned justice to wormwood
and cast down righteousness to the earth, etc., etc.,—reserving his
choicest woes of course for the foreign oppressors of "my people"—
with promises of "the day of the Lord" with all that such a day

implies, not only of triumph of the oppressed over their enemies, but
of universal happiness.
Similarly the same complex of ideas appears in the writings
which deal with the Hebrew "Captivity" in the sixth century B.C.,
with the revolt of the Maccabeans, and again in the impotent hatred
against the Romans about the time of the origin of Christianity.
The New Testament dwells upon some phase of this theme on
nearly every page. Blessed are ye poor, and woe unto you who are
rich, you who laugh now. The Messiah has come and with him the
Kingdom of the Heavens, but at present the kingdom is revealed
only to the believing few, who are in the world, but not of it.
However, the Lord is soon to return; in fact, this generation shall not
pass away until all these things be accomplished. After a period of
great trial and suffering there is to be a new world, and a new and
holy Jerusalem, coming down from the skies and establishing itself
in place of the old. All the wicked, chiefly those who oppress the
poor, shall be cast into a lake of fire. There shall be great rejoicing,
and weeping and darkness and death shall be no more.
The above sketch of the Messianic hope is so brief as to be
hardly more than a caricature, but it will serve to make my point
clear, that Messianism is a revolutionary crowd phenomenon. This
subject has been presented in great detail by religious writers in
recent years, so that there is hardly a member of the reading public
who is not more or less familiar with the "social gospel." My point is
that all revolutionary propaganda is "social gospel." Even when
revolutionists profess an antireligious creed, as did the Deists of the
eighteenth century, and as do many modern socialists with their
"materialist interpretation of history," nevertheless the element of
irreligion extends only to the superficial trappings of the
revolutionary crowd-faith, and even here is not consistent. At bottom
the revolutionists dream of a new world is religious.

I am using the word "religious" in this connection in its popular
sense, meaning no more than that the revolutionary crowd
rationalizes its dream of a new world-order in imagery which repeats
over and over again the essentials of the Biblical "day of the Lord,"
or "kingdom of heaven" to be established in earth. This notion of
cosmic regeneration is very evident in the various "utopian" socialist
theories. The Fourierists and St. Simonists of the early part of the
nineteenth century were extremely Messianic. So-called "scientific
socialists" are now inclined to ridicule such idealistic speculation, but
one has only to scratch beneath the surface of present-day socialist
propaganda to find under its materialist jargon the same old dream
of the ages. A great world-change is to come suddenly. With the
triumph of the workers there will be no more poverty or ignorance,
no longer any incentive to men to do evil to one another. The
famous "Manifesto" is filled with such ideas. Bourgeois society is
doomed and about to fall. Forces of social evolution inevitably point
to the world-wide supremacy of the working class, under whose mild
sway the laborer is to be given the full product of his toil, the
exploitation of children is to cease, true liberty will be achieved,
prostitution, which is somehow a bourgeois institution, is to be
abolished, everyone will be educated, production increased till there
is enough for all, the cities shall no more lord it over the rural
communities, all alike will perform useful labor, waste places of the
earth will become cultivated lands and the fertility of the soil will be
increased in accordance with a common plan, the state, an
instrument of bourgeois exploitation, will cease to exist; in fact, the
whole wicked past is to be left behind, for as
The Communist revolution is the most radical rupture with
traditional property relations, no wonder that its development
involves the most radical rupture with traditional ideas.
In fine,

In place of the old bourgeois society with its classes and
class antagonisms we shall have an association in which the free
development of each is the condition for the free development
of all.
Le Bon says of the French Revolution:
The principles of the Revolution speedily inspired a wave of
mystic enthusiasm analogous to those provoked by the various
religious beliefs which had preceded it. All they did was to
change the orientation of a mental ancestry which the centuries
had solidified.
So there is nothing astonishing in the savage zeal of the
men of the Convention. Their mystic mentality was the same as
that of the Protestants at the time of the Reformation. The
principal heroes of the Terror—Couthon, Saint Just, Robespierre,
etc.—were apostles. Like Polyeuctes destroying the altars of the
false gods to propagate his faith, they dreamed of converting
the globe.... The mystic spirit of the leaders of the Revolution
was betrayed in the least details of their public life. Robespierre,
convinced that he was supported by the Almighty, assured his
hearers in a speech that the Supreme Being had "decreed the
Republic since the beginning of time."
A recent writer, after showing that the Russian revolution has
failed to put the Marxian principles into actual operation, says of
Lenin and his associates:
They have caught a formula of glittering words; they have
learned the verbal cadences which move the masses to ecstasy;
they have learned to paint a vision of heaven that shall outflare
in the minds of their followers the shabby realities of a Bolshevik
earth. They are master phraseocrats, and in Russia they have
reared an empire on phraseocracy.

The alarmists who shriek of Russia would do well to turn
their thoughts from Russias socialistic menace. The peril of
Russia is not to our industries, but to our states. The menace of
the Bolsheviki is not an economic one, it is a political menace. It
is the menace of fanatic armies, drunken with phrases and
sweeping forward under Lenin like a Muscovite scourge. It is the
menace of intoxicated proletarians, goaded by invented visions
to seek to conquer the world.
In Nicolai Lenin the Socialist, we have naught to fear. In
Nicolai Lenin the political chief of Russias millions, we may well
find a menace, for his figure looms over the world. His Bolshevik
abracadabra has seduced the workers of every race. His stealthy
propaganda has shattered the morale of every army in the
world. His dreams are winging to Napoleonic flights, and well he
may dream of destiny; for in an age when we bow to phrases, it
is Lenin who is the master phraseocrat of the world.
Passing over the question of Lenins personal ambitions, and
whether our own crowd-stupidity, panic, and wrong-headed Allied
diplomacy may not have been contributing causes of the menace of
Bolshevism, it can hardly be denied that Bolshevism, like all other
revolutionary crowd-movements, is swayed by a painted vision of
heaven which outflares the miseries of earth. Every revolutionary
crowd of every description is a pilgrimage set out to regain our lost
Paradise.
Now it is this dream of paradise, or ideal society, which deserves
analytical study. Why does it always appear the minute a crowd is
sufficiently powerful to dream of world-power? It will readily be
conceded that this dream has some function in creating certain
really desirable social values. But such values cannot be the
psychogenesis of the dream. If the dream were ever realized, I think
William James was correct in saying that we should find it to be but
a "sheeps heaven and lubberland of joy," and that life in it would be
so "mawkish and dishwatery" that we should gladly return to this

world of struggle and challenge, or anywhere else, if only to escape
the deadly inanity.
We have already noted the fact that this dream has the function
of justifying the crowd in its revolt and will to rule. But this is by no
means all. The social idealism has well been called a dream, for that
is just what it is, the daydream of the ages. It is like belief in fairies,
or the Cinderella myth. It is the Jack-and-the-beanstalk philosophy.
The dream has exactly the same function as the Absolute, and the
ideal world-systems of the paranoiac; it is an imaginary refuge from
the real. Like all other dreams, it is the realization of a wish. I have
long been impressed with the static character of this dream; not only
is it much the same in all ages, but it is always regarded as the great
culmination beyond which the imagination cannot stretch. Even
those who hold the evolutionary view of reality and know well that
life is continuous change, and that progress cannot be fixed in any
passing moment, however sweet, are generally unable to imagine
progress going on after the establishment of the ideal society and
leaving it behind.
Revolutionary propaganda habitually stops, like the nineteenth-
century love story, with a general statement, "and so they lived
happily ever after." It is really the end, not the beginning or middle
of the story. It is the divine event toward which the whole creation
moves, and having reached it, stops. Evolution having been wound
up to run to just this end, time and change and effort may now be
discontinued. There is nothing further to do. In other words, the
ideal is lifted clear out of time and all historical connections. As in
other dreams, the empirically known sequence of events is ignored.
Whole centuries of progress and struggle and piecemeal experience
are telescoped into one imaginary symbolic moment. The moment
now stands for the whole process, or rather it is substituted for the
process. We have taken refuge from the real into the ideal. The
"Kingdom of Heaven," "Paradise," "The Return to Man in the State of
Nature," "Back to Primitive New Testament Christianity," "The Age of
Reason," "Utopia," the "Revolution," the "Co-operative

Commonwealth," all mean psychologically the same thing. And that
thing is not at all a scientific social program, but a symbol of an
easier and better world where desires are realized by magic, and
everyones check drawn upon the bank of existence is cashed. Social
idealism of revolutionary crowds is a mechanism of compensation
and escape for suppressed desires.
Is there any easier way of denying the true nature and
significance of our objective world than by persuading ourselves that
that world is even now doomed, and is bound suddenly to be
transformed into the land of our hearts desire? Is it not to be
expected that people would soon learn how to give those desires
greater unction, and to encourage one another in holding to the
fictions by which those desires could find their compensation and
escape, by resorting to precisely the crowd-devices which we have
been discussing?
The Messianists of Bible times expected the great transformation
and world cataclysm to come by means of a divine miracle. Those
who are affected by the wave of premillennialism which is now
running through certain evangelical Christian communions are
experiencing a revival of this faith with much of its primitive
terminology.
Evolutionary social revolutionists expect the great day to come
as the culmination of a process of economic evolution. This is what
is meant by "evolutionary and revolutionary socialism." The wish-
fancy is here rationalized as a doctrine of evolution by revolution.
Thus the difference between the social revolutionist and the Second
Adventist is much smaller than either of them suspects. As Freud
would doubtless say, the difference extends only to the "secondary
elaboration of the manifest dream formation"—the latent dream
thought is the same in both cases. The Adventist expresses the wish
in the terminology of a prescientific age, while the social
revolutionist makes use of modern scientific jargon. Each alike finds
escape from reality in the contemplation of a new-world system. The

faith of each is a scheme of redemption—that is, of "compensation."
Each contemplates the sudden, cataclysmic destruction of the
"present evil world," and its replacement by a new order in which
the meek shall inherit the earth. To both alike the great event is
destined, in the fullness of time, to come as a thief in the night. In
the one case it is to come as the fulfillment of prophecy; in the other
the promise is underwritten and guaranteed by impersonal forces of
"economic evolution."
This determinism is in the one case what Bergson calls "radical
finalism," and in the other "radical mechanism." But whether the
universe exists but to reel off a divine plan conceived before all
worlds, or be but the mechanical swinging of the shuttle of cause
and effect, what difference is there if the point arrived at is the
same? In both cases this point was fixed before the beginning of
time, and the meaning of the universe is just that and nothing else,
since that is what it all comes to in the end.
Whether the hand which turns the crank of the world-machine
be called that of God or merely "Evolution," it is only a verbal
difference; it is in both cases "a power not ourselves which makes
for righteousness." And the righteousness? Why, it is just the
righteousness of our own crowd—in other words, the crowds bill of
rights painted in the sky by our own wish-fancy, and dancing over
our heads like an aurora borealis. It is the history of all crowds that
this dazzling pillar of fire in the Arctic night is hailed as the "rosy-
fingered dawn" of the Day of the Lord.
Or, to change the figure somewhat, the faithful crowd has but to
follow its fiery cloud to the promised land which flows with milk and
honey; then march for an appointed time about the walls of the
wicked bourgeois Jericho, playing its propaganda tune until the walls
fall down by magic and the world is ours. No revolution is possible
without a miracle and a brass band.

I have no desire to discourage those who have gone to work at
the real tasks of social reconstruction—certainly no wish to make this
study an apology for the existing social order. In the face of the ugly
facts which on every hand stand as indictments of what is called
"capitalism," it is doubtful if anyone could defend the present system
without recourse to a certain amount of cynicism or cant. The
widespread social unrest which has enlisted in its service so much of
the intellectual spirit of this generation surely could never have come
about without provocation more real than the work of a mere
handful of "mischief-making agitators." The challenge to modern
society is not wholly of crowd origin.
But it is one thing to face seriously the manifold problems of
reconstruction of our social relations, and it is quite another thing to
persuade oneself that all these entangled problems have but one
imaginary neck which is waiting to be cut with a single stroke of the
sword of revolution in the hands of "the people." Hundreds of times
I have heard radicals, while discussing certain evils of present
society, say, "All these things are but symptoms, effects; to get rid of
them you must remove the cause." That cause is always, in
substance, the present economic system.
If this argument means that, instead of thinking of the various
phases of social behavior as isolated from one another, we should
conceive of them as so interrelated as to form something like a more
or less causally connected organic whole, I agree. But if it means
something else—and it frequently does—the argument is based upon
a logical fallacy. The word "system" is not a causal term; it is purely
descriptive. The facts referred to, whatever connections we may
discover among them, are not the effects of a mysterious "system"
behind the facts of human behavior; the facts themselves, taken
together, are the system.
The confusion of causal and descriptive ideas is a habit common
to both the intellectualist philosopher and the crowd-minded. It
enables people to turn their gaze from the empirical Many to the

fictitious One, from the real to the imaginary. The idea of a system
behind, over, outside, and something different from the related facts
which the term "system" is properly used to describe, whether that
system be a world-system, a logical system, or a social system,
whether it be capitalism or socialism, "system" so conceived is a
favorite crowd-spook. It is the same logical fallacy as if one spoke of
the temperature of this May day as the effect of the climate, when
all know that the term climate is simply (to paraphrase James) the
term by which we characterize the temperature, weather, etc., which
we experience on this and other days. We have already seen to what
use the crowd-mind puts all such generalizations.
A popular revolutionary philosophy of history pictures the
procession of the ages as made up of a pageant of spook-social
systems, each distinct from the others and coming in its appointed
time. But social systems do not follow in a row, like elephants in a
circus parade—each huge beast with its trunk coiled about the end
of his predecessors tail. The greater part of this "evolutionary and
revolutionary" pageantry is simply dream-stuff. Those who try to
march into Utopia in such an imaginary parade are not even trying
to reconstruct society; they are sociological somnambulists.
The crowd-mind clings to such pageantry because, as we saw in
another connection, the crowd desires to believe that evolution
guarantees its own future supremacy. It then becomes unnecessary
to solve concrete problems. One need only possess an official
program of the order of the parade. In other words, the crowd must
persuade itself that only one solution of the social problem is
possible, and that one inevitable—its own.
Such thinking wholly misconceives the nature of the social
problem. Like all the practical dilemmas of life, this problem,
assuming it to be in any sense a single problem, is real just because
more than one solution is possible. The task here is like that of
choosing a career. Whole series of partially foreseen possibilities are
contingent upon certain definite choices. Aside from our choosing,

many sorts of futures may be equally possible. Our intervention at
this or that definite point is an act by which we will one series of
possibilities rather than another into reality. But the act of
intervention is never performed once for all. Each intervention leads
only to new dilemmas, among which we must again choose and
intervene. It is mainly in order to escape from the necessity of facing
this terrifying series of unforeseeable dilemmas that the crowd-man
walketh in a vain show.
In pointing out the futility of present-day revolutionary crowd-
thinking, I am only striving to direct, in however small a degree, our
thought and energies into channels which lead toward desired
results. It is not by trombones that we are to redeem society, nor is
the old order going to tumble down like the walls of Jericho, and a
complete new start be given. Civilization cannot be wiped out and
begun all over again. It constitutes the environment within which
our reconstructive thinking must, by tedious effort, make certain
definite modifications. Each such modification is a problem in itself,
to be dealt with, not by belief in miracle, but by what Dewey calls
"creative intelligence." Each such modification must be achieved by
taking all the known facts, which are relevant, into account. As such
it is a new adaptation, and the result of a series of such adaptations
may be as great and radical a social transformation as one may have
the courage to set as the goal of a definite policy of social effort. But
there is a world of difference between social thinking of this kind,
where faith is a working hypothesis, and that which ignores the
concrete problems that must be solved to reach the desired goal,
and, after the manner of crowds, dreams of entering fairyland, or of
pulling a new world en bloc down out of the blue, by the magic of
substituting new tyrannies for old.
Revolutionary crowd-thinking is not "creative intelligence." It is
hocus-pocus, a sort of social magic formula like the "mutabor" in the
Arabian Nights; it is an Aladdins-lamp philosophy. And here we may
sum up this part of our argument. The idea of the revolution is to
the crowd a symbol, the function of which is compensation for the

burdens of the struggle for existence, for the feeling of social
inferiority, and for desires suppressed by civilization. It is an
imaginary escape from hard reality, a new-world system in which the
ego seeks refuge, a defense mechanism under the compulsive
influence of which crowds behave like somnambulistic individuals. It
is the apotheosis of the under crowd itself and the transcendental
expression and justification of its will to rule. It is made up of just
those broad generalizations which are of use in keeping that crowd
together. It gives the new crowd unction in its fight with the old,
since it was precisely these same dream-thoughts which the old
crowd wrote on its banners in the day when it, too, was blowing
trumpets outside the walls of Jericho.

VIII
THE FRUITS OF REVOLUTION—NEW CROWD-TYRANNIES
FOR OLD
So much for the psychology of the revolutionary propaganda.
Now let us look at what happens in the moment of revolutionary
outbreak. We have dwelt at some length on the fact that a
revolution occurs when a new crowd succeeds in displacing an old
one in position of social control. At first there is a general feeling of
release and of freedom. There is a brief period of ecstasy, of good
will, a strange, almost mystical magnanimity. A flood of oratory is
released in praise of the "new day of the people." Everyone is a
"comrade." Everyone is important. There is an inclination to trust
everyone. This Easter-morning state of mind generally lasts for some
days—until people are driven by the pinch of hunger to stop talking
and take up again the routine tasks of daily living. We have all read
how the "citizens" of the French Revolution danced in the streets for
sheer joy in their new-won liberty. Those who were in Petrograd
during the days which immediately followed the downfall of the Tsar
bear witness to a like almost mystical sense of the general goodness
of human kind and of joy in human fellowship.
With the return to the commonplace tasks of daily life, some
effort, and indeed further rationalization, is needed to keep up the
feeling that the new and wonderful age has really come to stay.
Conflicts of interest and special grievances are viewed as involving
the vital principles of the Revolution. People become impatient and

censorious. There is a searching of hearts. People watch their
neighbors, especially their rivals, to make sure that nothing in their
behavior shall confirm the misgivings which are vaguely felt in their
own minds. The rejoicing and comradeship which before were
spontaneous are now demanded. Intolerance toward the vanquished
crowd reappears with increased intensity, not a little augmented by
the knowledge that the old enemies are now at "the peoples" mercy.
There is a demand for revenge for old abuses. The displaced
crowd likely as not, foreseeing the doom which awaits its members,
seeks escape by attempting a counter-revolution. A propaganda of
sympathy is carried on among members of this same class who
remain in the dominant crowd in communities not affected by the
revolution. There is secret plotting and suspicion of treason on every
hand. People resort to extravagant expressions of their revolutionary
principles, not only to keep up their own faith in them, but to show
their loyalty to the great cause. The most fanatical and
uncompromising members of the group gain prominence because of
their excessive devotion. By the very logic of crowd-thinking,
leadership passes to men who are less and less competent to deal
with facts and more and more extreme in their zeal. Hence the usual
decline from the Mirabeaus to the Dantons and Cariers, and from
these to the Marats and Robespierres, from the Milukoffs to the
Kerenskys and from the Kerenskys to the Trotzkys. With each excess
the crowd must erect some still new defense against the inevitable
disclosure of the fact that the people are not behaving at all as if
they were living in the kingdom of heaven. With each farther
deviation from the plain meaning of facts, the revolution must resort
to more severe measures to sustain itself, until finally an
unsurmountable barrier is reached, such as the arrival on the scene
of a Napoleon. Then the majority are forced to abandon the vain
hope of really attaining Utopia, and content themselves with fictions
to the effect that what they have really is Utopia—or with such other
mechanisms as will serve to excuse and minimize the significance of
existing facts and put off the complete realization of the ideal until
some future stage of progress. It is needless to add that those who

have most profited by the revolutionary change are also most ready
to take the lead in persuading their neighbors to be content with
these rational compromises.
Meanwhile, however, the revolutionary leaders have set up a
dictatorship of their own, which, while necessary to "save the
revolution," is itself a practical negation of the revolutionary dream
of a free world. This dictatorship, finally passing into the hands of
the more competent element of the revolutionary crowd, justifies
itself to the many; professing and requiring of all a verbal assent to
the revolutionary creed of which its very existence is a fundamental
repudiation. This group becomes in time the nucleus about which
society finally settles down again in comparative peace and
equilibrium.
In general, then, it may be said that a revolution does not and
cannot realize the age-long dream of a world set free. Its results
may be summed up as follows: a newly dominant crowd, a new
statement of old beliefs, new owners of property in the places of the
old, new names for old tyrannies. Looking back over the history of
the several great tidal waves of revolution which have swept over
the civilization which is to-day ours, it would appear that one effect
of them has been to intensify the hold which crowd-thinking has
upon all of us, also to widen the range of the things which we
submit to the crowd-mind for final judgment. In confirmation of this
it is to be noted that it is on the whole those nations which have
been burnt over by both the Reformation and the eighteenth-century
revolution which exhibit the most chauvian brand of nationalism and
crowd-patriotism. It is these same nations also which have most
highly depersonalized their social relationships, political structures,
and ideals. It is these nations also whose councils are most
determined by spasms of crowd-propaganda.
The modern man doubtless has a sense of self in a degree
unknown—except by the few—in earlier ages, but along with this
there exists in "modern ideas," a complete system of crowd-ideas

with which the conscious self comes into conflict at every turn. Just
how far the revolutionary crowds of the past have operated to
provide the stereotyped forms in which present crowd-thinking is
carried on, it is almost impossible to learn. But that their influence
has been great may be seen by anyone who attempts a
psychological study of "public opinion."
Aside from the results mentioned, I think the deposit of
revolutionary movements in history has been very small. It may be
that, in the general shake-up of such a period, a few vigorous spirits
are tossed into a place where their genius has an opportunity which
it would otherwise have failed to get. But it would seem that on the
whole the idea that revolutions help the progress of the race is a
hoax. Where advancement has been achieved in freedom, in
intelligence, in ethical values, in art or science, in consideration for
humanity, in legislation, it has in each instance been achieved by
unique individuals, and has spread chiefly by personal influence,
never gaining assent except among those who have power to
recreate the new values won in their own experience.
Whenever we take up a new idea as a crowd, we at once turn it
into a catchword and a fad. Faddism, instead of being merely a
hunger for the new is rather an expression of the crowd-will to
uniformity. To be "old-fashioned" and out of date is as truly to be a
nonconformist as to be a freak or an originator. Faddism is neither
radicalism nor a symptom of progress. It is a mark of the passion for
uniformity or the conservatism of the crowd-mind. It is change; but
its change is insignificant.
It is often said that religious liberty is the fruit of the
Reformation. If so it is an indirect result and one which the
reformers certainly did not desire. They sought liberty only for their
own particular propaganda, a fact which is abundantly proved by
Calvins treatment of Servetus and of the Anabaptists, by Luthers
attitude toward the Saxon peasants, by the treatment of Catholics in
England, by the whole history of Cromwells rule, by the persecution

of Quakers and all other "heretics" in our American colonies—
Pennsylvania, I believe, excepted—down to the date of the American
Revolution.
It just happened that Protestantism as the religion of the
bourgeois fell into the hands of a group, who, outside their religious-
crowd interests were destined to be the greatest practical
beneficiaries of the advancement of applied science. Between
applied science and science as a cultural discipline—that is, science
as a humanistic study—the line is hard to draw. The Humanist spirit
of the sciences attained a certain freedom, notwithstanding the fact
that the whole Reformation was really a reactionary movement
against the Renaissance; in spite, moreover, of the patent fact that
the Protestant churches still, officially at least, resist the free spirit of
scientific culture.
It is to the free spirits of the Italian Renaissance, also to the
Jeffersons and Franklins and Paines, the Lincolns and Ingersolls, the
Huxleys and Darwins and Spencers, the men who dared alone to
resist the religious crowd-mind and to undermine the abstract ideas
in which it had intrenched itself, to whom the modern world owes its
religious and intellectual liberty.
The same is true of political liberty. England, which is the most
free country in the world to-day, never really experienced the
revolutionary crowd-movement of the eighteenth century. Instead,
the changes came by a process of gradual reconstruction. And it is
with just such an opportunist reconstructive process that England
promises now to meet and solve the problems of the threatened
social revolution. In contrast with Russia, Socialism in England has
much ground for hope of success. The radical movement in England
is on the whole wisely led by men who with few exceptions can think
realistically and pragmatically, and refuse to be swept off their feet
by crowd-abstractions. The British Labor party is the least crowd-
minded of any of the socialistic organizations of our day. The
Rochdale group has demonstrated that if it is co-operation that

people desire as a solution of the economic problem, the way to
solve it is to co-operate along definite and practicable lines; the co-
operators have given up belief in the miracle of Jericho. The British
trade-union movement has demonstrated the fact that organization
of this kind succeeds in just the degree that it can rise above crowd-
thinking and deal with a suggestion of concrete problems according
to a statesmanlike policy of concerted action.
To be sure it cannot be denied that the social reconstruction in
England is seriously menaced by the tendency to crowd-behavior. At
best it reveals hardly more than the superior advantage to the whole
community of a slightly less degree of crowd-behavior; but when
compared with the Socialist movement in Russia, Germany, and the
United States, it would seem that radicalism in England has at least
a remote promise of reaching a working solution of the social
problem; and that is more than can at present be said for the
others.
In the light of what has been said about the psychology of
revolution, I think we may hazard an opinion about the vaunted
"Dictatorship of the Proletariat"—an idea that has provided some
new catchwords for the crowd which is fascinated by the soviet
revolution in Russia. Granting for the sake of argument that such a
dictatorship would be desirable from any point of view—I do not see
how the mere fact that people work proves their capacity to rule,
horses also work—would it be possible? I think not. Even the
temporary rule of Lenin in Russia can hardly be called a rule of the
working class. Bolshevist propaganda will have it that such a
dictatorship of the working class is positively necessary if we are
ever to get away from the abuses of present "capitalistic society."
Moreover, it is argued that this dictatorship of the organized workers
could not be undemocratic, for since vested property is to be
abolished and everyone forced to work for his living, all will belong
to the working class, and therefore the dictatorship of the proletariat
is but the dictatorship of all.

In the first place, assuming that it is the dictatorship of all who
survive the revolution, this dictatorship of all over each is not liberty
for anyone; it may leave not the tiniest corner where one may be
permitted to be master of himself. The tyranny of all over each is as
different from freedom as is pharisaism from spiritual living.
Again, what is there to show that this imagined dictatorship of
all is to be shared equally by all, and if not have we not merely set
up a new privileged class—the very thing which the Socialist Talmud
has always declared it is the mission of the workers to destroy
forever? While the workers are still a counter-crowd, struggling for
power against the present ruling class, they are of course held
together by a common cause—namely, their opposition to capital.
But with labors triumph, everybody becomes a worker, and there is
no one longer to oppose. That which held the various elements of
labor together in a common crowd of revolt has now ceased to exist,
"class consciousness" has therefore no longer any meaning. Labor
itself has ceased to exist as a class by reason of its very triumph.
What then remains to hold its various elements together in a
common cause? Nothing at all. The solidarity of the workers
vanishes, when the struggle which gave rise to that solidarity
ceases. There remains now nothing but the humanitarian principle of
the solidarity of the human race. Solidarity has ceased to be an
economic fact, and has become purely "ideological."
Since by hypothesis everyone is a worker, the dictatorship of the
workers is a dictatorship based not on labor as such, but upon a
universal human quality. It would be quite as truly a dictatorship of
everyone if based upon any other common human quality—say, the
fact that we are all bipeds, that we all have noses, or the fact of the
circulation of the blood. As the purely proletarian character of this
dictatorship becomes meaningless, the crowd-struggle switches from
that of labor as a whole against capital, to a series of struggles
within the dominant labor group itself.

The experience of Russia has even now shown that if the soviets
are to save themselves from nation-wide bankruptcy, specially
trained men must be found to take charge of their industrial and
political activities. Long training is necessary for the successful
management of large affairs, and becomes all the more
indispensable as industry, education, and political affairs are
organized on a large scale. Are specially promising youths to be set
apart from early childhood to prepare themselves for these positions
of authority? Or shall such places be filled by those vigorous few
who have the ambition and the strength to acquire the necessary
training while at the same time working at their daily tasks? In either
case an intellectual class must be developed. Does anyone imagine
that this new class of rulers will hesitate to make use of every
opportunity to make itself a privileged class?
"But what opportunity can there be," is the reply, "since private
capital is to be abolished?" Very well, there have been ruling classes
before in history who did not enjoy the privilege of owning private
property. The clergy of the Middle Ages was such a class, and their
dominance was quite as effective and as enduring as is that of our
commercial classes today. But let us not deceive ourselves; in a
soviet republic there would be opportunity aplenty for exploitation.
As the solidarity of labor vanished, each important trade-group
would enter into rivalry with the others for leadership in the co-
operative commonwealth. Every economic advantage which any
group possessed would be used in order to lord it over the rest.
For instance, let us suppose that the workers in a strategic
industry, such as the railways, or coal mines, should make the
discovery that by going on a strike they could starve the community
as a whole into submission and gain practically anything they might
demand. Loyalty to the rest of labor would act no more as a check
to such ambitions than does loyalty to humanity in general now. As
we have seen, the crowd is always formed for the unconscious
purpose of relaxing the social control by mechanisms which mutually
justify such antisocial conduct on the part of members of the crowd.

There is every reason, both economic and psychological, why the
workers in each industry would become organized crowds seeking to
gain for their particular groups the lions share of the spoils of the
social revolution. What would there be, then, to prevent the workers
of the railroads or some other essential industry from exploiting the
community quite as mercilessly as the capitalists are alleged to do at
present? Nothing but the rivalry of other crowds who were seeking
the same dominance. In time a modus vivendi would doubtless be
reached whereby social control would be shared by a few of the
stronger unions—and their leaders.
The strike has already demonstrated the fact that in the hands
of a well-organized body of laborers, especially in those trades
where the number of apprentices may be controlled, industrial
power becomes a much more effective weapon than it is in the
hands of the present capitalistic owners.
A new dictatorship, therefore, must inevitably follow the social
revolution, in support of which a favored minority will make use of
the industrial power of the community, just as earlier privileged
classes used military power and the power of private property. And
this new dominance would be just as predatory, and would justify
itself, as did the others, by the platitudes of crowd-thinking. The so-
called dictatorship turns out, on examination, to be the dictatorship
of one section of the proletariat over the rest of it. The dream of
social redemption by such means is a pure crowd-idea.

IX
FREEDOM AND GOVERNMENT BY CROWDS
The whole philosophy of politics comes down at last to a
question of four words. Who is to govern? Compared with this
question the problem of the form of government is relatively
unimportant. Crowd-men, whatever political faith they profess,
behave much the same when they are in power. The particular forms
of political organization through which their power is exerted are
mere incidentals. There is the same self-laudation, the same tawdry
array of abstract principles, the same exploitation of under crowds,
the same cunning in keeping up appearances, the same preference
of the charlatan for positions of leadership and authority.
Machiavellis Prince, or Dostoievskys Grand Inquisitor, would serve
just as well as the model for the guidance of a Cæsar Borgia, a
leader of Tammany Hall, a chairman of the National Committee of a
political party, or a Nicolai Lenin.
Ever since the days of Rousseau certain crowds have persisted in
the conviction that all tyrannies were foisted upon an innocent
humanity by a designing few. There may have been a few instances
in history where such was the case, but tyrannies of that kind have
never lasted long. For the most part the tyrant is merely the
instrument and official symbol of a dominant crowd. His acts are his
crowds acts, and without his crowd to support him he very soon
goes the way of the late Sultan of Turkey. The Cæsars were hardly
more than "walking delegates," representing the ancient Roman
Soldiers soviet. They were made and unmade by the army which,

though Cæsars might come and Cæsars might go, continued to lord
it over the Roman world. While the army was pagan, even the mild
Marcus Aurelius followed Neros example of killing Christians. When
finally the army itself became largely Christian, and the fiction that
the Christians drank human blood, worshiped the head of an ass,
and were sexually promiscuous was no longer good patriotic
propaganda, the Emperor Constantine began to see visions of the
Cross in the sky. The Pope, who is doubtless the most absolute
monarch in the Occident, is, however, "infallible" only when he
speaks ex-cathedra—that is, as the "Church Herself." His infallibility
is that of the Church. All crowds in one way or another claim
infallibility. The tyrant Robespierre survived only so long as did his
particular revolutionary crowd in France.
The fate of Savonarola was similar. From his pulpit he could rule
Florence with absolute power just so long as he told his crowd what
it wished to hear, and so long as his crowd was able to keep itself
together and remain dominant. The Stuarts, Hohenzollerns,
Hapsburgs, and Romanoffs, with all their claims to divine rights,
were little more than the living symbols of their respective nation-
crowds. They vanished when they ceased to represent successfully
the crowd-will.
In general, then, it may be said that where the crowd is, there is
tyranny. Tyranny may be exercised through one agent or through
many, but it nearly always comes from the same source—the crowd.
Crowd-rule may exist in a monarchical form of government, or in a
republic. The personnel of the dominant crowd will vary with a
change in the form of the state, but the spirit will be much the
same. Conservative writers are in the habit of assuming that
democracy is the rule of crowds pure and simple. Whether crowd-
government is more absolute in a democracy than in differently
constituted states is a question. The aim of democratic constitutions
like our own is to prevent any special crowd from intrenching itself in
a position of social control and thus becoming a ruling class. As the
experiment has worked out thus far it can hardly be said that it has

freed us from the rule of crowds. It has, however, multiplied the
number of mutually suspicious crowds, so that no one of them has
for long enjoyed a sufficiently great majority to make itself clearly
supreme, though it must be admitted that up to the present the
business-man crowd has had the best of the deal. The story of the
recent Eighteenth Amendment shows how easy it is for a determined
crowd, even though in a minority, to force its favorite dogmas upon
the whole community. We shall doubtless see a great deal more of
this sort of thing in the future than we have in the past. And if the
various labor groups should become sufficiently united in a
"proletarian" crowd there is nothing to prevent their going to any
extreme.
We are passing through a period of socialization. All signs point
to the establishment of some sort of social state or industrial
commonwealth. No one can foresee the extent, to which capital now
privately owned is to be transferred to the public. It is doubtful if
anything can be done to check this process. The tendency is no
sooner blocked along one channel than it begins to seep through
another. In itself there need be nothing alarming about this
transition. If industry could be better co-ordinated and more wisely
administered by non-crowd men for the common good, the change
might work out to our national advantage.
It is possible to conceive of a society in which a high degree of
social democracy, even communism, might exist along with a
maximum of freedom and practical achievement. But we should first
have to get over our crowd-ways of thinking and acting. People
would have to regard the state as a purely administrative affair. They
would have to organize for definite practical ends, and select their
leaders and administrators very much as certain corporations now
do, strictly on the basis of their competency. Political institutions
would have to be made such that they could not be seized by special
groups to enhance themselves at the expense of the rest.
Partisanship would have to cease. Every effort would have to be
made to loosen the social control over the individuals personal

Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!
textbookfull.com

Image Processing Computer Vision and Pattern Recognition Hamid R. Arabnia

More Related Content

Similar to Image Processing Computer Vision and Pattern Recognition Hamid R. Arabnia (20)

Recently uploaded (20)

Image Processing Computer Vision and Pattern Recognition Hamid R. Arabnia