Big Data In Complex And Social Networks My T
Thai Weili Wu Hui Xiong download
https://ebookbell.com/product/big-data-in-complex-and-social-
networks-my-t-thai-weili-wu-hui-xiong-51753960
Explore and download more ebooks at ebookbell.com
Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Big Data In Complex Systems Challenges And Opportunities 1st Edition
Aboul Ella Hassanien
https://ebookbell.com/product/big-data-in-complex-systems-challenges-
and-opportunities-1st-edition-aboul-ella-hassanien-4972908
Data Science In Theory And Practice Techniques For Big Data Analytics
And Complex Data Sets Maria C Mariani
https://ebookbell.com/product/data-science-in-theory-and-practice-
techniques-for-big-data-analytics-and-complex-data-sets-maria-c-
mariani-52557058
Humancomputer Interaction And Knowledge Discovery In Complex
Unstructured Big Data Third International Workshop Hcikdd 2013 Held At
Southchi 2013 Maribor Slovenia July 13 2013 Proceedings 1st Edition
Cagatay Turkay
https://ebookbell.com/product/humancomputer-interaction-and-knowledge-
discovery-in-complex-unstructured-big-data-third-international-
workshop-hcikdd-2013-held-at-southchi-2013-maribor-slovenia-
july-13-2013-proceedings-1st-edition-cagatay-turkay-4241684
International Conference On Oriental Thinking And Fuzzy Logic
Celebration Of The 50th Anniversary In The Era Of Complex Systems And
Big Data 1st Edition Bingyuan Cao
https://ebookbell.com/product/international-conference-on-oriental-
thinking-and-fuzzy-logic-celebration-of-the-50th-anniversary-in-the-
era-of-complex-systems-and-big-data-1st-edition-bingyuan-cao-5484080
Big Data In Finance Opportunities And Challenges Of Financial
Digitalization Thomas Walker
https://ebookbell.com/product/big-data-in-finance-opportunities-and-
challenges-of-financial-digitalization-thomas-walker-46495150
Big Data In Energy Economics Hui Liu Nikolaos Nikitas Yanfei Li
https://ebookbell.com/product/big-data-in-energy-economics-hui-liu-
nikolaos-nikitas-yanfei-li-46668440
Big Data In Medical Science And Healthcare Management Diagnosis
Therapy Side Effects Peter Langkafel Editor
https://ebookbell.com/product/big-data-in-medical-science-and-
healthcare-management-diagnosis-therapy-side-effects-peter-langkafel-
editor-51110618
Big Data In Bioeconomy Results From The European Databio Project 1st
Edition Caj Sdergrd
https://ebookbell.com/product/big-data-in-bioeconomy-results-from-the-
european-databio-project-1st-edition-caj-sdergrd-51699828
Big Data In Oncology Impact Challenges And Risk Assessment Neeraj
Kumar Fuloria Rishabha Malviya Swati Verma Balamurugan Balusamy
https://ebookbell.com/product/big-data-in-oncology-impact-challenges-
and-risk-assessment-neeraj-kumar-fuloria-rishabha-malviya-swati-verma-
balamurugan-balusamy-53249324
Big Data In Complex And Social Networks My T Thai Weili Wu Hui Xiong
Big Data In Complex And Social Networks My T Thai Weili Wu Hui Xiong
BIG DATA
IN COMPLEX
AND SOCIAL
NETWORKS
Chapman & Hall/CRC
Big Data Series
PUBLISHED TITLES
SERIES EDITOR
Sanjay Ranka
AIMS AND SCOPE
This series aims to present new research and applications in Big Data, along with the computa-
tional tools and techniques currently in development. The inclusion of concrete examples and
applications is highly encouraged.The scope of the series includes, but is not limited to, titles in the
areas of social networks, sensor networks, data-centric computing, astronomy, genomics, medical
data analytics, large-scale e-commerce, and other relevant topics that may be proposed by poten-
tial contributors.
BIG DATA COMPUTING: A GUIDE FOR BUSINESS AND TECHNOLOGY
MANAGERS
Vivek Kale
BIG DATA IN COMPLEX AND SOCIAL NETWORKS
My T. Thai, Weili Wu, and Hui Xiong
BIG DATA OF COMPLEX NETWORKS
Matthias Dehmer, Frank Emmert-Streib, Stefan Pickl, and Andreas Holzinger
BIG DATA : ALGORITHMS, ANALYTICS, AND APPLICATIONS
Kuan-Ching Li, Hai Jiang, Laurence T.Yang, and Alfredo Cuzzocrea
NETWORKING FOR BIG DATA
ShuiYu, Xiaodong Lin, Jelena Mišić, and Xuemin (Sherman) Shen
BIG DATA
IN COMPLEX
AND SOCIAL
NETWORKS
EDITED BY
My T. Thai
University of Florida, USA
Weili Wu
University of Texas at Dallas, USA
Hui Xiong
Rutgers, The State University of New Jersey, USA
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2017 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works
Printed on acid-free paper
Version Date: 20161014
International Standard Book Number-13: 978-1-4987-2684-9 (Hardback)
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information stor-
age or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copy-
right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222
Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro-
vides licenses and registration for a variety of users. For organizations that have been granted a photo-
copy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Contents
Preface vii
Editors ix
Section I Social Networks and Complex Networks
Chapter 1  Hyperbolic Big Data Analytics within Complex
and Social Networks 3
Eleni Stai, Vasileios Karyotis, Georgios Katsinis, Eirini Eleni
Tsiropoulou and Symeon Papavassiliou
Chapter 2  Scalable Query and Analysis for Social Networks 37
Tak-Lon (Stephen) Wu, Bingjing Zhang, Clayton Davis, Emilio
Ferrara, Alessandro Flammini, Filippo Menczer and Judy Qiu
Section II Big Data and Web Intelligence
Chapter 3  Predicting Content Popularity in Social Networks 65
Yan Yan, Ruibo Zhou, Xiaofeng Gao and Guihai Chen
Chapter 4  Mining User Behaviors in Large Social Networks 95
Meng Jiang and Peng Cui
Section III Security and Privacy Issues of Social
Networks
Chapter 5  Mining Misinformation in Social Media 125
Liang Wu, Fred Morstatter, Xia Hu and Huan Liu
v
vi  Contents
Chapter 6  Rumor Spreading and Detection in Online Social
Networks 153
Wen Xu and Weili Wu
Section IV Applications
Chapter 7  A Survey on Multilayer Networks and the
Applications 183
Huiyuan Zhang, Huiling Zhang and My T. Thai
Chapter 8  Exploring Legislative Networks in a Multiparty
System 213
Jose Manuel Magallanes
Index 233
Preface
In the past decades, the world has witnessed a blossom of online social net-
works, such as Facebook and Twitter. This has revolutionized the way of hu-
man interaction and drastically changed the landscape of information sharing
in cyberspace nowadays. Along with the explosive growth of social networks,
huge volumes of data have been generating. The research of big data, referring
to these large datasets, gives insight into many domains, especially in complex
and social network applications.
In the research area of big data, the management and analysis of large-
scale datasets are quite challenging due to the highly unstructured data col-
lected. The large size of social networks, spatio-temporal effect and interaction
between users are among various challenges in uncovering behavioral mecha-
nisms. Many recent research projects are involved in processing and analyzing
data from social networks and attempt to better understand the complex net-
works, which motivates us to prepare an in-depth material on recent advances
in areas of big data and social networks.
This handbook is to provide recent developments on theoretical, algorith-
mic and application aspects of big data in complex social networks. The hand-
book consists of four parts, covering a wide range of topics. The first part
focuses on data storage and data processing. The efficient storage of data can
fundamentally support intensive data access and queries, which enables so-
phisticated analysis. Data processing and visualization help to communicate
information clearly and efficiently. The second part of this handbook is devoted
to the extraction of essential information and the prediction of web content.
By performing big data analysis, we can better understand the interests, lo-
cation and search history of users and have more accurate prediction of users’
behaviors. The book next focuses on the protection of privacy and security
in Part 3. Modern social media enables people to share and seek information
effectively, but also provides effective channels for rumor and misinformation
propagation. It is essentially important to model the rumor diffusion, identify
misinformation from massive data and design intervention strategies. Finally,
Part 4 discusses the emergent application of big data and social networks. It
is particularly interested in multilayer networks and multiparty systems.
We would like to take this opportunity to thank all authors, the anonymous
referees, and Taylor  Francis Group for helping us to finalize this handbook.
Our thanks also go to our students for their help during the processing of all
contributions. Finally, we hope that this handbook will encourage research on
vii
viii  Preface
the many intriguing open questions and applications in the area of big data
and social networks that still remain.
My T. Thai
Weili Wu
Hui Xiong
Editors
My T. Thai is a professor and associate chair for research in the department
of computer and information sciences and engineering at the University of
Florida. She received her PhD degree in computer science from the Univer-
sity of Minnesota in 2005. Her current research interests include algorithms,
cybersecurity and optimization on network science and engineering, including
communication networks, smart grids, social networks and their interdepen-
dency. The results of her work have led to 5 books and 120+ articles published
in various prestigious journals and conferences on networking and combina-
torics.
Dr. Thai has engaged in many professional activities. She has been a TPC-
chair for many IEEE conferences, has served as an associate editor for Journal
of Combinatorial Optimization (JOCO), Optimization Letters, Journal of Dis-
crete Mathematics, IEEE Transactions on Parallel and Distributed Systems,
and a series editor of Springer Briefs in Optimization. Recently, she has co-
founded and is co-Editor-in-Chief of Computational Social Networks journal.
She has received many research awards including a UF Research Foundation
Fellowship, UF Provosts Excellence Award for Assistant Professors, a Depart-
ment of Defense (DoD) Young Investigator Award, and an NSF (National
Science Foundation) CAREER Award.
Weili Wu is a full professor in the department of computer science, Univer-
sity of Texas at Dallas. She received her PhD in 2002 and MS in 1998 from
the department of computer science, University of Minnesota, Twin City. She
received her BS in 1989 in mechanical engineering from Liaoning University of
Engineering and Technology in China. From 1989 to 1991, she was a mechani-
cal engineer at Chinese Academy of Mine Science and Technology. She was an
associate researcher and associate chief engineer in Chinese Academy of Mine
Science and Technology from 1991 to 1993. Her current research mainly deals
with the general research area of data communication and data management.
Her research focuses on the design and analysis of algorithms for optimiza-
tion problems that occur in wireless networking environments and various
database systems. She has published more than 200 research papers in vari-
ous prestigious journals and conferences such as IEEE Transaction on Knowl-
edge and Data Engineering (TKDE), IEEE Transactions on Mobile Comput-
ing (TMC), IEEE Transactions on Multimedia (TMM), ACM Transactions
on Sensor Networks (TOSN), IEEE Transactions on Parallel and Distributed
ix
x  Editors
Systems (TPDS), IEEE/ACM Transactions on Networking (TON), Journal
of Global Optimization (JGO), Journal of Optical Communications and Net-
working (JOCN), Optimization Letters (OPTL), IEEE Communications Let-
ters (ICL), Journal of Parallel and Distributed Computing (JPDC), Journal
of Computational Biology (JCB), Discrete Mathematics (DM), Social Network
Analysis and Mining (SNAM), Discrete Applied Mathematics (DAM), IEEE
INFOCOM (The Conference on Computer Communications), ACM SIGKDD
(International Conference on Knowledge Discovery  Data Mining), Interna-
tional Conference on Distributed Computing Systems (ICDCS), International
Conference on Database and Expert Systems Applications (DEXA), SIAM
Conference on Data Mining, as well as many others. Dr. Wu is associate edi-
tor of SOP Transactions on Wireless Communications (STOWC), Computa-
tional Social Networks, Springer and International Journal of Bioinformatics
Research and Applications (IJBRA). Dr. Wu is a senior member of IEEE.
Hui Xiong is currently a full professor of management science and informa-
tion systems at Rutgers Business School and the director of Rutgers Center for
Information Assurance at Rutgers, the State University of New Jersey, where
he received a two-year early promotion/tenure (2009), the Rutgers Univer-
sity Board of Trustees Research Fellowship for Scholarly Excellence (2009),
and the ICDM-2011 Best Research Paper Award (2011).
Dr. Xiong is a prominent researcher in the areas of business intelligence,
data mining, big data, and geographic information systems (GIS). For his out-
standing contributions to these areas, he was elected an ACM Distinguished
Scientist. He has a distinguished academic record that includes 200+ referred
papers and an authoritative Encyclopedia of GIS (Springer, 2008). He is serv-
ing on the editorial boards of IEEE Transactions on Knowledge and Data En-
gineering (TKDE), ACM Transactions on Management Information Systems
(TMIS) and IEEE Transactions on Big Data. Also, he served as a program
co-chair of the Industrial and Government Track for the 18th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining (KDD), a
program co-chair for the IEEE 2013 International Conference on Data Mining
(ICDM-2013), and a general co-chair for the IEEE 2015 International Confer-
ence on Data Mining (ICDM-2015).
I
Social Networks and Complex
Networks
Big Data In Complex And Social Networks My T Thai Weili Wu Hui Xiong
C H A P T E R 1
A Hyperbolic Big Data
Analytics Framework
within Complex and
Social Networks
Eleni Stai, Vasileios Karyotis, Georgios Katsinis, Eirini
Eleni Tsiropoulou and Symeon Papavassiliou
CONTENTS
1.1 Introduction ................................................ 4
1.1.1 Scope and Objectives .............................. 5
1.1.2 Outline ............................................. 6
1.2 Big Data and Network Science ............................ 6
1.2.1 Complex Networks, Big Data and the Big Data
Chain .............................................. 6
1.2.2 Big Data Challenges and Complex Networks ...... 8
1.3 Big Data Analytics based on Hyperbolic Space ........... 9
1.3.1 Fundamentals of Hyperbolic Geometric Space .... 11
1.4 Data Correlations and Dimensionality Reduction in
Hyperbolic Space .......................................... 14
1.4.1 Example ............................................ 15
1.5 Embedding of Networked Data in Hyperbolic Space and
Applications ............................................... 17
1.5.1 Rigel Embedding in the Hyperboloid Model ...... 17
1.5.2 HyperMap Embedding ............................. 19
1.6 Greedy Routing over Hyperbolic Coordinates and
Applications within Complex and Social Networks ....... 21
1.7 Optimization Techniques over Hyperbolic Space for
Decision-Making in Big Data .............................. 23
3
4  Big Data in Complex and Social Networks
1.7.1 The Case of Advertisement Allocation over Online
Social Networks .................................... 23
1.7.2 The Case of File Allocation Optimization in
Wireless Cellular Networks ........................ 27
1.8 Visualization Analytics in Hyperbolic Space .............. 29
1.8.1 Adaptive Focus in Hyperbolic Space .............. 30
1.8.2 Hierarchical (Tree) Graphs ........................ 31
1.8.3 General Graphs .................................... 31
1.9 Conclusions ................................................ 32
Acknowledgment ........................................... 32
Further Reading ........................................... 32
D
ata management and analysis has stimulated paradigm shifts in
decision-making in various application domains. Especially the emer-
gence of big data along with complex and social networks has stretched the
imposed requirements to the limit, with numerous and crucial potential bene-
fits. In this chapter, based on a novel approach for big data analytics (BDA),
we focus on data processing and visualization and their relations with com-
plex network analysis. Thus, we adopt a holistic perspective with respect to
complex/social networks that generate massive data and relevant analytics
techniques, which jointly impact societal operations, e.g., marketing, adver-
tising, resource allocation, etc., closing a loop between data generation and
exploitation within complex networks themselves. In the latest literature, a
strong relation between hyperbolic geometry and complex networks is shown,
as the latter eventually exhibit a hidden hyperbolic structure. Inspired by
this fact, the methodology adopted in this chapter leverages on key properties
of the hyperbolic metric space for complex and social networks, exploited in
a general framework that includes processes for data correlation/clustering,
missing data (e.g., links) inference, social network analysis metrics efficient
computations, optimization, resource (advertisements, files, etc.) allocation
and visualization analytics. More specifically, the proposed framework con-
sists of the above hyperbolic geometry based processes/components, arranged
in a chain form. Some of those components can also be applied independently,
and potentially combined with other traditional statistical learning techniques.
We emphasize the efficiency of each process in the complex networks domain,
while also pinpointing open and interesting research directions.
1.1 INTRODUCTION
Data processing and analysis was one of the main drivers for the prolif-
eration of computers (processing) and communications networks (analysis
and transfer). However, lately, a paradigm shift is witnessed where networks
Hyperbolic Big Data Analytics within Complex and Social Networks  5
themselves, e.g., social networks and sensor networks, can create data as well,
and, in fact, in massive quantities. Indeed, gigantic datasets are produced
on purpose or spontaneously, and stored by traditional and new applica-
tions/services.
Characteristic examples include the envisaged Internet of Things (IoT)
paradigm [1], where pervasive sensors and actuators for almost every aspect
of human activity will collect, process and make decisions on massive data,
e.g., for surveillance, healthcare, etc. Similarly, the Internet, mobile networks,
and overlaying (social) networks, i.e., Google, Facebook and others described
in [2], [3], are responsible for the explosion of produced and transferred data.
Collecting, processing and analyzing these data generated at unprecedented
rates has concentrated significant research, technological and financial interest
lately, in a broader framework popularly known as “big data analytics” (BDA)
[2]. The current setting is only expected to intensify in the future, since the
expanding complex and social networks are expected to generate much more
massive amounts of complexly inter-related information and impose harsher
data storage, processing, analysis and visualization requirements.
1.1.1 Scope and Objectives
Given the aforementioned setting and the fact that significant research and
technological progress has taken place regarding the lower level aspects, e.g.,
storage and processing, this chapter focuses more on aspects of data analytics.
It aspires to provide a framework for combining traditional methodologies
(e.g., statistical learning) with novel techniques (e.g., communications theory)
providing holistic and efficient solutions.
More specifically, we adopt a radical perspective for performing data an-
alytics, advocating the use of cross-discipline mathematical tools, and more
specifically exploiting properties of hyperbolic space [4], [5]. We postulate that
hyperbolic metric spaces can provide the substrate required in data analytics
for keeping up with the pace of data volume explosion and required processing.
The main goal is to briefly describe a holistic framework for data represen-
tation, analysis (e.g., correlation, clustering, prediction), visualization, and
decision making in complex and social networks, based on the principles of
hyperbolic geometry and its properties. Then, the chapter will touch on sev-
eral key BDA aspects, i.e., data correlation, dimensionality reduction, data
and networks’ embeddings, navigation, social networks analysis (SNA) met-
rics’ computation and optimization, and show how they are accommodated
by the above framework, along with the associated benefits achieved. The
chapter will also explain the salient characteristics of these approaches re-
lated to the features and properties of complex and social networks of interest
generating massive datasets of diverse types. Finally, throughout the chapter,
we highlight the key directions that will be of great potential interest in the
future.
6  Big Data in Complex and Social Networks
1.1.2 Outline
The rest of this chapter is organized as follows. In Section 1.2 the relation
between complex networks-big data processes and their emerging challenges
are presented, while in Section 1.3 the proposed hyperbolic geometry based
approach is introduced and analyzed. Section 1.4 describes how to perform
data correlation, and dimensionality reduction over hyperbolic space. In Sec-
tion 1.5 several types of data embeddings on hyperbolic space, along with
their properties especially related to complex networks are studied. In Section
1.6, we examine the navigability of complex networks embedded in hyperbolic
space via greedy routing techniques. In Section 1.7 optimization methodolo-
gies over large complex and social network graphs using hyperbolic space are
described, while applications on advertisement and file allocation problems are
pinpointed. In Section 1.8, visualization techniques based on hyperbolic space
and their proporties/advantages versus Euclidean based ones are surveyed.
Finally, Section 1.9 concludes the chapter.
1.2 BIG DATA AND NETWORK SCIENCE
1.2.1 Complex Networks, Big Data and the Big Data Chain
Diverse types of complex and social networks are nowadays responsible for
both massive data generation and transfers. The corresponding research and
technological progress has been cumulatively addressed under the Network
Science/Complex Network Analysis (CNA) domain [6].
It has been observed that several types of networks demonstrate similar,
or identical behaviors. For example, modern societies are nowadays charac-
terized as connected, inter-connected and inter-dependent via various network
structures. Communication and social networks have been co-evolving in the
last decade into a complex hierarchical system, which asymmetrically expands
in time, as shown in Figure 1.1. The interconnecting physical layer expands
orders of magnitude faster than the growth rate of the overlaying social one.
This leads to the generation of massive quantities of data from both layers, for
different purposes, e.g., data transferred in the low layer, control and peer data
at the higher, etc., in unprecedented rates compared to the past. This form
of “social IoT” (s-IoT) [7] is tightly related to the big data setting, as stor-
age, analysis and inference over gigantic datasets impose stringent resource
requirements and are tightly inter-related with the structure and operation of
the complex and social networks involved. Various forms of BDA are applied
nowadays in diverse disciplines, e.g., banking, retail chains/shopping, health-
care, insurance, public utilities, SNA, etc., where diverse complex networks
produce and transfer data.
Computers have revolutionized the whole process chain of data analyt-
ics, allowing automation in a supervised manner. Nowadays, such a chain is
part of a broader BDA pipeline that includes collection, correlation, manage-
ment, search  retrieval and visualization of data and analysis results, in
Hyperbolic Big Data Analytics within Complex and Social Networks  7
FIGURE 1.1 Communication (complex) – social network co-evolution.
unprecedented scales compared to the past [2]. More specifically, the BDA
pipeline consists of data generation, acquisition, storage, analysis, vi-
sualization and interpretation processes.
Data generation involves creating data from multiple, diverse and dis-
tributed sources including sensors, video, click streams, etc. Data acquisi-
tion refers to obtaining information and it is subdivided into data collection,
data transmission, and data pre-processing. The first refers to retrieving raw
data from real-world objects, the second refers to a transmission process from
data sources to appropriate storage systems, while the third one to all those
techniques that may be needed prior to the main analysis stage, e.g., data
integration, cleansing, transformation and reduction. Data integration aims
at combining data residing in different sources and providing a unified data
perspective. Data cleansing refers to determining inaccurate, incomplete, or
unreasonable data and amending or removing (transforming) these data to
improve data quality. Data reduction aims at decreasing the degree of redun-
dancy of available data, which would in other cases increase data transmission
overhead, storage costs, data inconsistency, reliability reduction and data cor-
ruption.
Analysis is the main stage of the BDA pipeline and can take multiple
forms. The goal is to extract useful values, suggest conclusions and/or support
decision-making. It can be descriptive, predictive and prescriptive. It may use
8  Big Data in Complex and Social Networks
data visualization techniques, statistical analysis or data mining techniques in
order to fulfill its goals and interpret the results. All the pre-analytics, ana-
lytics and post-analytics stages (i.e., visualization and interpretation) of BDA
described above can only become more diverse and very informative within
the complex and social network ecosystems considered in this chapter. Thus,
even though BDA is characterized by the four V’s — Volume (of data), Veloc-
ity (generation speed), Veracity (quality) and Variability (heterogeneity) —
the above settings create a new “V” feature for BDA, namely Value, rendering
them essentially a new and in fact “expensive” commodity for our information
societies.
1.2.2 Big Data Challenges and Complex Networks
Several challenges emerge due to the fact that big data carry special char-
acteristics, e.g., heterogeneity, spurious correlations, incidental endogeneities,
noise accumulation, etc. [2], which become even more intense within the com-
plex/social network environment. Challenges related to BDA can be distin-
guished in challenges related to data, and challenges related to processes of the
BDA pipeline. Table 1.1 summarizes these two types of challenges.
Data-related challenges correspond to the four “V’s” of BDA with the ad-
dition of privacy that relates more to personal data protection. The first two
deal with storage and timeliness issues emerging from the explosion of data
generated/collected, and the following two with the reliability and heterogene-
ity of data due to multiple sources and types of data.
Additional challenges emerging with respect to the big data pipeline deal
with the data collection and transferring requirements imposed, the pre-
processing and analysis of data with respect to the associated complexity,
accurate and distributed computation, the accumulated noise, as well as other
peripheral issues, such as data and results visualization, interpretation of re-
sults and issues related to cloud storage, computing and services in general.
TABLE 1.1 Big Data Challenges
Big Data Challenges
Data-Related BDA Pipeline-Related
Volume
Collection
Transferring
Velocity
Pre-processing
Analysis
Veracity
Complexity
Distributed operation
Variety
Accuracy
Noise
Visualization
Privacy
Cloud computing
Interpretation
Hyperbolic Big Data Analytics within Complex and Social Networks  9
1.3 BIG DATA ANALYTICS BASED ON HYPERBOLIC SPACE
The aforementioned challenges will require radical approaches for efficiently
tackling the emerging problems and keeping up with the anticipated explosion
of produced data. In this chapter, we describe a methodology that is capable
of addressing holistically the above challenges and provide impetus for more
efficient analytics in the future. The framework is conceptually shown in Fig-
ures 1.2 and 1.3 and it is mainly based on the properties of hyperbolic metric
spaces (a brief summary of which is included in the forthcoming subsection
1.3.1). This approach provides a generic computational substrate for data rep-
resentation, analysis (e.g., correlation and clustering), inference, visualization,
search  navigation, and decision-making (via, e.g., optimization). The pro-
posed framework builds on primitive pre-processing operations of traditional
BDA techniques, e.g., statistical learning, and further complements them in
terms of analytics and interpretation/visualization to allow more scalable,
powerful and efficient inference and decision-making.
Figure 1.2 shows the observed evolution of data volumes until today, where
nowadays more than big, i.e., “hyperbolic”, data require processing. The pro-
posed framework suggests a lean approach for tackling with such scaling. Input
data may take either raw or networked form, where the latter corresponds to
correlated data (nodes) and their correlations/relations (links between nodes)
drawn from combinations of complex/social networks. Their analysis leads to
sophisticated decision-making for challenging problems over large data sets,
Data
collectors/
owners
Normal Data
Hyperbolic Data
Embedding
on
Hyperbolic
space
Data
Correlations,
Clustering,
Network
Creation
Data
Visualization
Decision
Making/
Optimization
Big Data
Search 
Navigation,
Efficient
Computations 
Optimization
FIGURE 1.2 Evolution of data volume (from data to “hyperbolic” data),
proposed framework’s functionalities and interaction with complex and
social networks.
10  Big Data in Complex and Social Networks
Big Data
(Hyperbolic)
Dimensionality
Reduction
(Hyperbolic)
Correlations
Network Estimation
Hyperbolic
Embedding
Hyperbolic
Resource Allocation
Optimization
Hyperbolic
Visualization
Analytics
Inference,
Clustering, Search 
Navigation, SNA
Metrics Computations
FIGURE 1.3 The workflow of the proposed hyperbolic geometry based
approach for BDA over complex and social networks.
e.g., resource allocation and optimization, thus eventually having an impact
on the networks themselves, closing the loop of an evolutionary bond between
networks (humans, IoT)-data-machines (analytics)(Figure 1.2).
The role of the term “hyperbolic” in the proposed approach is twofold.
On one hand, it successfully indicates the passage from “big data” to even
more, i.e., “hyperbolic data”, denoting the tendency of growth of the avail-
able data to be handled and analyzed in the future. On the other hand, it
emphasizes the benefit of the use of hyperbolic geometry for BDA. The core
of this approach is the fact that, as it is shown in the literature, networks
of arbitrarily large size can be embedded in low-dimensional (even as small
as two) hyperbolic spaces without sacrificing important information as far as
network communication (e.g., routing) and structure (e.g., scale-free proper-
ties [50]) are concerned [8], [9], [5]. Thus, hyperbolic spaces are congruent with
complex network topologies and are much more appropriate for representing
and analyzing big data than Euclidean spaces.
The specific workflow of the proposed framework is shown in Figure 1.3.
It starts with obtaining data and determining a suitable data representation
model. Input (big) data from complex and social networks might be in raw
(e.g., list) form, or in the form of a data network representing their correla-
tions. Pre-processing of data follows, consisting of dimensionality reduction,
correlations and generation of networks over data that may be performed
either following traditional techniques or using hyperbolic geometry’s prop-
erties. The data representation after their pre-processing (e.g., network or
Hyperbolic Big Data Analytics within Complex and Social Networks  11
raw form) will either lead to or determine the appropriate methodology for
the following data embedding into the hyperbolic geometric space (subject
of Section 1.5). Data embedding is the assignment of coordinates to network
nodes in the hyperbolic metric space. Properly visualizing the accumulated
and inferred data following the analysis bears significant importance. The pro-
posed framework will leverage on flexible (systolic) hyperbolic geometry based
mechanisms for data visualization, in order to allow their holistic and simul-
taneously focused view and more informed decision-making. This is capable
of providing visualization tools that capture simultaneously global patterns
and structural information, e.g., hierarchy, node centrality/importance, etc.,
and local characteristics, e.g., similarities, in an efficient and systolic manner,
which hides/reveals detail when this is required by the decision-making in a
scalable manner. The latter approach can be very useful in applications and
studies of CNA/SNA.
In this chapter, we also describe techniques for extracting useful infor-
mation from the data under processing and analysis for different application
domains. Following and depending on the data embedding, further data cor-
relation/clustering and inference may be attained, in which various forms of
(possibly hierarchical) data communities/clusters will be built and missing
data (e.g., links) will be predicted from the input data within accuracy and
time constraints imposed. Leveraging the hyperbolic distance function and
greedy routing techniques, efficient SNA metrics computations (such as cen-
tralities, the computation of which becomes hard over large data sets) will
be studied and proposed. The proposed framework also allows performing
efficient and suitable for large data sets optimization for advertisements’ allo-
cation and other — mainly of discrete nature — resources’ allocation problems
(e.g., file allocation over distributed cache memories in a 5G environment).
In the following, we first present some background on hyperbolic space
and then present the proposed framework in more detail. Following, we de-
scribe in more detail techniques enabled by the framework for performing and
exploiting the analytics over the embedded data.
1.3.1 Fundamentals of Hyperbolic Geometric Space
Non-Euclidean geometries, e.g., hyperbolic geometry [4], emerged by ques-
tioning and modifying the fifth (parallel) postulate of Euclidean geometry.
According to the latter, given a line and a point that does not lie on it, there
is exactly one line going through the given point that is parallel to the given
line. As far as hyperbolic geometry is concerned, the parallel postulate changes
as follows: Given a line and a point that does not lie on it, there is more than
one line going through the given point that is parallel to the given line.
The n-dimensional hyperbolic space, denoted as Hn
, is an n-dimensional
Riemannian manifold with negative curvature c which is most often considered
constant and equal to c = −1. Several models of hyperbolic space exist such
as the Poincare disk model, the Poincare half-space model, the Hyperboloid
12  Big Data in Complex and Social Networks
model, the Klein model, etc. These models are isometric,1
i.e., any two of
them can be related by a transformation which preserves all the geometrical
properties (e.g., distance) of the space. We will describe in detail and use in
our approach the Poincare models (disk and half space) which are mostly used
in practical applications.
For instance, the Hyperboloid model realizes the Hn
hyperbolic space as
a hyperboloid in Rn+1
= {(x0, ..., xn)|xi ∈ R, i = {0, 1, ..., n}} such that
x2
0 − x2
1 − . . . − x2
n = 1, x0  0. Hyperbolic spaces have a metric function
(distance) that differs from the familiar Euclidean distance, while also differs
among the diverse models. In the case of the Hyperboloid model, for two
points x = (x0, ..., xn), y = (y0, ..., yn), their hyperbolic distance is given by
[4]:
cosh dH(x, y) =
r
1 + kxk
2
 
1 + kyk
2

−  x, y , (1.1)
where k·k is the Euclidean norm and  ·, ·  represents the inner product.
The Hyperboloid model can be used to construct the Poincare disk/ball model,
where the latter is a perspective projection of the former viewed from (x0 =
−1, x1 = 0, . . . , xn = 0), projecting the upper half hyperboloid onto an Rn
unit ball centered at x0 = 0.
Specifically, focusing on the two dimensions, the whole infinite hyperbolic
plane can be represented inside the finite unit disk D = {z ∈ kzk  1} of
the Euclidean space, which is the 2-dimensional Poincare disk model. The
hyperbolic distance function dP D(zi, zj), for two points zi, zj, in the Poincare
disk model is given by [4], [11]:
cosh dP D(zi, zj) =
2 kzi − zjk
2
(1 − kzik
2
)(1 − kzjk
2
)
+ 1. (1.2)
The Euclidean circle ϑD = {z ∈ kzk = 1} is the boundary at infinity for
the Poincare disk model. In addition, in this model, the shortest hyperbolic
path between two nodes is either a part of a diameter of D, or a part of
a Euclidean circle in D perpendicular to the boundary ϑD, as illustrated in
Figure 1.4(a). Note that these shortest path curves differ from the cords that
would be implied by the Euclidean metric.
Let us now consider the following map in the two dimensions, z = w−i
1−iw ,
where z, with kzk  1, is a point expressed as a complex number on the
Poincare disk model and i is the imaginary unit. Then w is a point (complex
number) on the Poincare half-space model. This map sends z = −i to w = 0,
z = 1 to w = 1 and z = i to w = ∞ (note that the extension to more
dimensions is trivial).
According to the Poincare half-space model of Hn
, every point is rep-
resented by a pair (w0, w) where, w0 ∈ R+
and w ∈ Rn−1
. The distance
1Isometry is a map that preserves distance [10] between metric spaces.
Hyperbolic Big Data Analytics within Complex and Social Networks  13
FIGURE 1.4 Poincare disk (a) and half-space (b) models along with their
shortest paths in two dimensions: part of a diameter of D or a part
of a Euclidean circle in D perpendicular to the boundary ϑD for the
disk model and vertical lines and semicircles perpendicular to R for
the half-space model. (c) shows the Voronoi tesselation of the Poincare
disk into hyperbolic triangles of equal area.
between two points (w1
0, w1
), (w2
0, w2
) on the Poincare half-space model is
defined as [12]:
cosh dP H((w1
0, w1
), (w2
0, w2
)) = 1 +
(w1
0 − w2
0)2
+ w1
− w2 2
2w1
0w2
0
. (1.3)
Figure 1.4(b) depicts indicative shortest path curves for the Poincare half-
space model similarly with the Poincare disk model in Figure 1.4(a).
A remarkable advantage of hyperbolic space, regarding its application in
BDA (see Sections 1.5 and 1.8), is its property of “exponential scaling” with
respect to the radial coordinate. Specifically, the circumference C and area A
of a circle of radius r in the 2-dimensional (2D) Poincare disk model are given
by the following relations [46], [4], [8]:
C(r) = 2π sinh(r), A(r) = 4πsinh2
(r/2). (1.4)
Therefore, for small radius r, e.g., around the center of the Poincare disk, the
hyperbolic space looks flat, while for larger r, both the circumference and the
area grow exponentially with r. The exponential scaling with radius is illus-
trated in Figure 1.4(c) which shows a tesselation of the Poincare disk into hy-
perbolic triangles of equal area. The triangles appear increasingly smaller the
closer they are to the circumference in the Euclidean visual representation of
the triangulation. In the following, we describe the different components syn-
thesizing the proposed framework, even though several parts can be combined
and employed jointly.
14  Big Data in Complex and Social Networks
1.4 DATA CORRELATIONS AND DIMENSIONALITY REDUCTION
IN HYPERBOLIC SPACE
In this section, we describe two basic functionalities of the proposed frame-
work (Figures 1.2 and 1.3). The first deals with inferring correlations among
data, yielding network structures representing such relations (nodes-data,
correlations-edges). The second deals with a distance-preserving dimensional-
ity reduction approach over the hyperbolic space (i.e. multidimensional scaling
[12], [13]) with multiple practical applications, e.g., various efficient compu-
tations, efficient data visualization, etc. Each functionality of course can be
applied independently.
We assume generic forms of “data items”, each of which can be unrolled
in a set of features. The set of features will be common for all data items,
e.g., customer’s parameters such as payment information, demographic in-
formation, etc., when customers correspond to data items. Before analytics
one needs to apply a method for clustering/reduction of these features to a
set of latent features (considered important to fully describe each data item).
Examples of such methods include spectral clustering [principal component
analysis (PCA)] [14], [15] singular value decomposition (SVD) [14], [15], etc.,
where each can be appropriately sped up to scale with large datasets, as in [15],
[16], [17]. Following, correlations may be inferred via the application of sim-
ilarity/distance metrics to quantify similarities on various data aspects (e.g.,
between pairs of data items). A thorough survey of similarity metrics such as
cosine, Pearson, etc. is performed in [18]. Another widely accepted approach
for computing similarities is the one that identifies distribution functions in
the parameters of interest and then exploits an appropriate distribution com-
parison metric, e.g., Kullback-Leibler divergence [19], [20] for probabilistic
distributions. Hyperbolic distance may also serve as a similarity measure, as
described in the following. Other ways of clustering and network estimation
include [14] partitional algorithms (k-means and its variations, etc.), hierarchi-
cal algorithms (agglomerative, divisive), the “lasso” algorithm and its variants
that are based on convex optimization [21] producing a graph representation
of the data, etc. In the case of the proposed framework, it is beneficial to con-
sider hierarchical clustering of data for allowing efficient visualization using
the two- or three-dimensional hyperbolic space (Section 1.8).
Data correlations in hyperbolic space can be achieved via the hyperbolic
distance function over the hyperbolic space of a suitable dimension — e.g.,
equal to the number of important features of users/products — applied on
pairs of data items to reveal their hidden dependencies/correlations with re-
spect to their features to a controllable extent. As an example, if having only
two latent features describing the data items, we can assign the radial and an-
gular coordinates of the 2D Poincare disk model according to the values of each
feature correspondingly. Then, we consider linking two nodes together only if
their hyperbolic distance (e.g., Equation (1.2) for the Poincare disk model) is
less than a predefined upper bound. By controlling this upper bound, one can
Hyperbolic Big Data Analytics within Complex and Social Networks  15
control the “neighborhood” of each node and thus the extent to which the
correlations among data reach. In other words, important correlations may be
considered up to a controllable extent via a threshold value over hyperbolic
distance. This is a simple model of data correlation; however, its effective-
ness lies in its simplicity and the fact that it can lead to a simultaneous data
correlation, analysis and visualization.
After embedding the data pieces/nodes on the k-dimensional Poincare half-
space model (k corresponds to the number of latent features), one can apply a
dimension reduction distance-preserving technique over the hyperbolic space,
such as the one proposed in [13], [12]. Importantly, if choosing the dimension
of the final metric space equal to 2 or 3, we will be able to achieve simultane-
ously a visualization of the data set and its analysis/navigation (Sections 1.6
and 1.8). Particularly, regarding the dimensionality reduction over hyperbolic
space, we provide the following two theorems from the literature [12], [22].
Given an n-point subset S of the hyperbolic space, let T be its projection
on Rn−1
(i.e., the Poincare half-space model, Section 1.3.1). By Johnson-
Linderstrauss Lemma [22], there exists an embedding of T, determined by
a function f, into the O

(logn)
ε2

-dimensional Euclidean space such that for
every points x1, x2 ∈ T, kx1 − x2k ≤ kf(x1) − f(x2)k ≤ (1 + ε) kx1 − x2k,
ε  0.
Theorem 1.1 (Dimension reduction for Hn
)
Consider the map g : Hn
→ HO(logn)
defined by g(w0, w) = (w0, f(w)). Then
for every two points (w1
0, w1
), (w2
0, w2
) at hyperbolic distance ∆, we have:
∆ ≤ dP H(g(w1
0, w1
), g(w2
0, w2
)) ≤

1 +
3ε
1 + ∆

∆. (1.5)
Dimensionality reduction can be performed efficiently via the Fast Johnson-
Linderstrauss Transform of Ailon and Chazelle [22], which is a low-distortion
embedding of d-dimensional hyperbolic space to O(log n)-dimensional hyper-
bolic space (n is the number of points to be embedded) based on the precon-
ditioning of a sparse projection matrix with a randomized Fourier transform.
Note that n will be equal to the number of data items, and this will be
achieved by assigning the zero value to all dimensions of each data piece after
the kth
dimension in raw up to the n − 1 one.
Theorem 1.2 (Embedding into the hyperbolic plane (for visualization pur-
poses))
Assume that the distance between every two points in S is at least ln(12n)
ε ,
then there exists an embedding of S into the hyperbolic plane H2
with distance
distortion at most 1 + ε.
1.4.1 Example
A similar methodology of data correlation over hyperbolic space is applied in
[9], where the new nodes added in the network embedding in the hyperbolic
16  Big Data in Complex and Social Networks
space form connections with existing ones. The popularity of the latter and
the similarity of the new nodes with the existing ones is taken into account in
determining the connections of the new nodes in the embedding. More specif-
ically, newcomers choose existing nodes to connect via optimizing the product
of similarity and popularity with them. In [9], the procedure of the simulta-
neous data embedding/visualization and correlation in hyperbolic space is as
follows, starting with an initially empty network.
1. At time t ≥ 1, a new node t is added to the embedded network and it is
assigned the polar coordinates (rt, θt) where the angular coordinate, θt,
is sampled uniformly at random from [0, 2π] and the radial coordinate,
rt, relates to the birth date of node t via the relation rt(t) = ln t. Every
existing node s  t increases its radial coordinate to rs(t) = βrs(t) +
(1 − β)rt(t), β ∈ [0, 1].
2. The new node t connects with a subset of existing nodes {s}, where
s  t, ∀s. This subset consists of the m nodes with the m smallest
values of product s · θst, where m is a parameter controlling the average
node degree (i.e., the extent of the correlations among nodes), and θst
is the angular distance between nodes s and t.
Actually, by following the above steps for a network construction over data,
it turns out that new nodes connect simply to their closest m nodes in hyper-
bolic distance. The hyperbolic distance in the Poincare disk (Equation (1.2))
between two nodes at polar coordinates (rt, θt) and (rs, θs) is approximately
equal to xst = rs +rt +ln(θst/2) = ln(s·t·θst/2). Therefore, the sets of nodes
{s} minimizing xst or s · θst for each newcomer t are identical. At the second
step above, in order to reduce network clustering [23], the newcomer node t in-
stead of connecting with its m closest nodes may select randomly a node s  t
and form a connection with s with probability equal to p(xst) = 1
[1+e(xst−Rt)/T ]
,
where T is a temperature parameter and Rt is a threshold value. This step is
repeated until m nodes are selected to connect to node t.
Here, the radial coordinate abstracts the popularity of a node. The smaller
the radial coordinate of a node (the closer the node in the center of the
Poincare disk) the more popular it is, thus the more likely it is for it to attract
new connections (we will elaborate more on this fact in Section 1.5, see also
the hyperbolic distance functions in Section 1.3.1). The increase of the radial
coordinate expresses any attenuation of nodes’ popularity with time, which
is equal to zero when β = 1. Note that, in complex networks the time pres-
ence of a node in the network is strongly related to its popularity. Specifically,
the scale-free structure of complex networks is mainly due to the preferential
attachment of newcomers, as the network grows, to existing nodes with high
degree. Thus, nodes of high degree continue to increase their connectivity, and
these nodes are with higher probability older nodes assuming that initially all
nodes have the same degree. Therefore, in the above mapping of nodes to
hyperbolic coordinates, the similarity characteristic is mapped to the angu-
lar coordinate (here assigned randomly), while the popularity characteristic
Hyperbolic Big Data Analytics within Complex and Social Networks  17
is mapped to the radial coordinate and hyperbolic distance is used to pre-
dict/infer connections between pairs of nodes based on their characteristics.
As a result, hyperbolic distance serves as a convenient single-metric represen-
tation of a combination of popularity (radial) and similarity (angular).
1.5 EMBEDDING OF NETWORKED DATA IN HYPERBOLIC
SPACE AND APPLICATIONS
In this section, in order to perform data embedding, it is assumed that data
items are already available in network form. Thus, the focus shifts on obtaining
different embeddings into latent hyperbolic coordinates in conjunction with
several applications over complex large-scale networks, such as graph theo-
retic and SNA metrics’ computation (e.g., centrality metrics), missing links’
prediction, etc. Two types of embedding in the low-dimensional hyperbolic
space are presented. In the first (Subsection 1.5.1), the latent node coordi-
nates in hyperbolic space are determined so that the hyperbolic distances
between node pairs are approximately equal to their graph distances initial
network. Towards this objective, multidimensional scaling (MDS) is applied
[24]. Given n the number of network nodes (data items), MDS has a running
time of O(n3
) and requires space O(n2
) (distance matrix between all node
pairs). Since the complexity of MDS is extremely high for large-scale networks,
landmark-based MDS has been introduced [24], based on the graph and hy-
perbolic space distances among k chosen landmarks and the rest of nodes.
With landmark-based MDS, the running time reduces to O(kn) and the space
to O(dkn+k3
), where d is the dimension of the hyperbolic space and it should
also hold d  k  n. By considering d, k as small constants, landmark-based
MDS has a linear running time complexity. The second type of embedding
(Subsection 1.5.2), applies statistical learning methods to embed a complex
network graph in hyperbolic space by constructing a new network graph try-
ing to mimic with high probability the initial graph structure [5]. Contrary
to the first approach the node pairs’ hyperbolic distances may differ signifi-
cantly from their initial graph distances. The statistical learning techniques
applied are based on maximum likelihood estimation for the node coordinates’
inference, while global (i.e., for the whole network) and local (i.e., for every
node) likelihood functions are defined and maximized, where local likelihood
functions serve to approximate the global ones for complexity reductions.
1.5.1 Rigel Embedding in the Hyperboloid Model
Several complex and social network analysis problems such as computation
of node centralities, community detection, etc., are based on node distances
which appear hard to compute within large-scale graphs such as online so-
cial networks with millions of nodes. However, for marketing purposes, such
computations become necessary or even critical for companies, e.g., to locate
the more influential/central node for achieving efficient marketing. Therefore,
18  Big Data in Complex and Social Networks
several works in literature [24], [12] have attempted to propose algorithms for
network embeddings (e.g., in Euclidean or hyperbolic space) so that the in-
ferred coordinates can be used for approximating node distances in the initial
graph. We will focus on large-scale network embedding in hyperbolic space
and specifically on the Rigel embedding proposed in [24], which achieves low
distortion (of distance) error and answers to queries for node distances and
shortest paths in microseconds even for up to 43 million nodes compared to
the order of seconds of a traditional breadth-first-search (BFS) algorithm.
Importantly, Rigel allows for parallelization in computations which is a great
advantage in the field of BDA. Experimental results in [25], [8], focused on
embedding Internet distances in hyperbolic space, have shown less distortion
with respect to the node distances in the initial graph, compared with other
embeddings in Euclidean coordinates. This fact is also verified in [26] via em-
pirical computation of distortion metrics for diverse coordinate systems where
it is shown that hyperbolic space achieves significantly more accurate results
than Euclidean and spherical ones.
Let us assume that the network consists of N nodes. Rigel employs the Hy-
perboloid model of hyperbolic space with distance function given by Equation
(1.1). Rigel applies landmark-based MDS, where L  N nodes are chosen
as landmarks in the network graph. Landmarks may be chosen as high-degree
nodes, if the given network is scale-free, otherwise they can be chosen ran-
domly. First, the hyperbolic coordinates of the landmarks are computed with
the aid of a global optimization algorithm aiming to achieve that the dis-
tances between the landmarks in the Hyperboloid are as close as possible to
their matching path distances in the graph. This is the bootstrapping step
of Rigel. Then, the hyperbolic coordinates of the rest of the nodes are cali-
brated, so that each node’s distances to all landmarks in the Hyperboloid are
very close to the corresponding actual path distances in the network graph.
Note that the authors of [24] studied the accuracy of Rigel with respect to
the dimensions of the hyperbolic space and showed that the former increases
with the increase of the latter. However, the number of landmarks should be
higher than the dimension of the embedding space, thus leading to a trade-off
between accuracy and complexity [24].
Importantly for large-scale network graphs, a parallel version of Rigel is
proposed in [24], offering great improvement in the complexity of Rigel, the
latter increasing linearly with the network size. Both steps of Rigel (boot-
strapping and embedding in the Hyperboloid model) can be parallelized in a
number of servers at most equal to the number of landmarks. One or more
landmarks are assigned to each server and the rest of the nodes are distributed
in a balanced way across servers. It is shown that parallel Rigel performs sim-
ilarly with respect to accuracy as Rigel.
Concerning the effectiveness and efficiency of Rigel in computing SNA
[27] and graph analysis metrics, experiments and comparisons with existing
schemes are performed in [24]. Regarding the graph analysis metrics of ra-
dius, diameter and average path length, which are applied in identifying the
Hyperbolic Big Data Analytics within Complex and Social Networks  19
small-world property of a network [6], [51], Rigel resulted in values extremely
close to the ground truth. Note that distances in Rigel are given by Equa-
tion (1.1). Rigel’s performance in computing node centralities that constitute
an important SNA metric for industries is also examined in [24]. Closeness
centrality [27] is considered, according to which the most central node is the
one that has the lowest average distance to all other nodes in the network.
Rigel achieved a high accuracy in identifying the node ranking with respect
to closeness centrality and outperforms existing schemes.
1.5.2 HyperMap Embedding
This section uses statistical learning methods to embed a social graph in
hyperbolic coordinates, focusing on the HyperMap embedding algorithm in-
troduced in [5]. HyperMap leverages the emerging relation between complex
network topologies and hyperbolic geometry [8]. Due to their scale-free prop-
erty, complex networks exhibit hierarchical, i.e., tree-like structure [28], while
hyperbolic geometry is the geometry of trees. More specifically, the similarity
between an infinite tree graph and the hyperbolic space provides an intuition
about the hidden hyperbolic structure of complex networks. The exponential
scaling of a circle and an area of a disk in hyperbolic space (explained in Sec-
tion 1.3.1) coincides with the scaling of the number of nodes with respect to
their distance from the root of the tree in an “e-ary” tree [8]. To make this
clearer, let us examine a b-ary tree which is a tree with branch factor equal
to b. The number of nodes located at distance exactly R from the root of the
tree is (b + 1)b(R−1)
∼ bR
and the number of nodes being at distance at most
R from the root of the tree is (b+1)bR
−2
(b−1) ∼ bR
. As a result, hyperbolic space
can be seen as a continuous version of a tree, a fact realized as the exponen-
tial expansion property of the hyperbolic space. Scale-free complex networks
are characterized by heterogeneity regarding the node degree, where the ma-
jority of nodes is assigned low node degree (power-law degree distribution),
implying a tree-like network organization indicating the existence of a hidden
hyperbolic metric space [28].
The example of the simultaneous embedding and creation of a growing ran-
dom network provided in Subsection 1.4.1 leads to the formation of network
graphs with the following two characteristics: (i) they appear to be highly
clustered [23] since the links added between close nodes in hyperbolic dis-
tance lead to the formation of a large number of triangles and (ii) they have
power-law degree distribution, i.e., two basic properties of complex networks’
structure. These statements further support the existence of an underlying
hidden hyperbolic space in complex networks’ structure. On one hand, a ran-
dom network created over hyperbolic space as in Subsection 1.4.1 emerges to
be scale-free while on the other hand, a scale-free network is proven to have
negative curvature [8] (similarly to hyperbolic metric spaces).
Based on these studies and observations, HyperMap aims at embedding a
given complex (social) network in hyperbolic space in a way that is congruent
20  Big Data in Complex and Social Networks
with the embedding of an extended version of the model of Subsection 1.4.1.
The extension lies basically in providing the possibility to add links between
existing nodes, while in Subsection 1.4.1 new links can be added only between a
newcomer and an existing node. Precisely, HyperMap finds nodes’ angular and
radial coordinates such that the probability that the given complex network
is produced by this extended model of Subsection 1.4.1 is maximized.
HyperMap assigns hyperbolic coordinates to the nodes inside the Poincare
disk by maximizing approximately but in an efficient manner a globally de-
fined likelihood function over the node pairs’ hyperbolic distances (which are
functions of nodes’ hyperbolic coordinates) expressed considering the given
complex network’s links. Specifically, in order to mimic the network cre-
ation/hyperbolic embedding of Subsection 1.4.1, it first performs a maximum
likelihood estimation of the appearance (i.e., birth) times of the given net-
work’s nodes (let t denote their number). Then, after estimating the time
sequence of nodes’ arrivals, it replays the hyperbolic growth of the network
roughly similarly to the steps of the model of Subsection 1.4.1. The difference
lies in the computation of the angular coordinates where HyperMap computes
the angular coordinate θi of node i, i.e., with sequence number i, via maximiz-
ing a local likelihood function defined for node i equivalently to maximizing
the aforementioned global likelihood function with respect to θi. Specifically,
the HyperMap embedding algorithm receives as basic input the adjacency
matrix of the given complex network and performs the following steps:
1. It sorts nodes in decreasing order with respect to their degree in the
given complex network, where node 1 corresponds to the one with the
highest node degree. Node 1 receives r1 = 0 and a random angular
coordinate θ1 ∈ [0, 2π] (i.e., it is placed on the center of the Poincare
disk model).
2. For i = 1 to t do
(a) Node i arrives (is born) and is assigned the radial coordinate
ri = 2
ζ ln i, where ζ = |c| (Subsection 1.3.1) is the constant ab-
solute curvature value of the hyperbolic space provided as input to
HyperMap. Usually, ζ = 1. Every existing node s  i increases its
radial coordinate to rs(t) = βrs(t) + (1 − β)ri(t), β ∈ [0, 1], where
β is provided as input to HyperMap.
(b) The angular coordinate θi is computed via maximizing a local like-
lihood function defined for node i.
HyperMap embedding also provides the possibility to predict missing links
of the given complex network, efficiently and with high accuracy. Link predic-
tion is a very important process on the study of large-scale networks since
topology measurements for inferring their structure may miss part of the
links. In HyperMap, prediction is based on the aforementioned possibility
of internal link addition, i.e., between pairs of existing nodes. Specifically, two
Hyperbolic Big Data Analytics within Complex and Social Networks  21
(non-neighboring) existing nodes k, l are connected at time t (i.e., prediction
of a missing link in the initial complex network) with probability equal to
p(xkl) = 1
[1+eζ(xkl−Rt)/2T
]
. HyperMap’s performance to predict missing links is
evaluated according to diverse indices and shown to be very satisfactory, while
it outperforms several well-known classical link prediction methods such as
Common-Neighbors, Katz Index, Hierarchical Random Graph Model, Degree-
Product, Inverse Shortest Path, etc. [5].
1.6 GREEDY ROUTING OVER HYPERBOLIC COORDINATES AND
APPLICATIONS WITHIN COMPLEX AND SOCIAL NETWORKS
This section mostly concerns the navigability of networks embedded in hy-
perbolic space [28]. A network embedded in a geometric space is navigable,
if one can perform efficient greedy routing on the network using the node
coordinates in the underlying geometric space [5].
After embedding the network graph (or the correlated data) in the hy-
perbolic geometric space, greedy routing over hyperbolic coordinates can be
used to navigate or route messages from source to destination. Specifically,
each node forwards the message to its neighbor closer in hyperbolic distance
to the destination. As a result, greedy routing uses only local information, i.e.,
each node’s necessary knowledge is limited to the hyperbolic coordinates of its
neighbors and the destination. Due to this fact, greedy routing can be adapted
and applied for performing efficient search and navigation in large data sets
[24], [26], [29], while we foresee its applications in SNA metrics’ computation
and in recommender systems [30]. A disadvantage of greedy routing lies in the
case of failure to deliver a message to the destination when a node does not
have a neighbor closer to the destination than itself (local minima of distance).
In this case, the message gets blocked in the specific node [11] with no further
forwarding via greedy routing.
With respect to networks with hidden hyperbolic structure (i.e., scale-free
complex networks), greedy routing based on hyperbolic coordinates/distances
achieves a very high success rate (close to 100%), as it is shown through
experimental examination in literature. Also, in this case the paths obtained
via greedy routing are very close to the global shortest paths between the
corresponding node pairs. Specifically, in [8], [9], the performance of greedy
routing is studied over the synthetic networks constructed similarly to the
example of Subsection 1.4.1 (in a way congruent to the exponential expansion
of hyperbolic space) and it is shown to achieve success rate close to 100% and
stretch with respect to the shortest paths close to 1. This is a very important
property showing the small-world navigability of this particular category of
networks [31]. The success of greedy routing over hyperbolic space is strongly
tied with the fact that hyperbolic space has a tightly connected core, where
all paths between nodes pass through. This is the reason why shortest paths
in hyperbolic space can be found efficiently and with high accuracy [8]. In
[5], the performance of greedy routing in the AS Internet graph is examined
22  Big Data in Complex and Social Networks
when using the HyperMap inferred hyperbolic coordinates (Section 1.5). Note
that the AS Internet graph exhibits a scale-free structure [6], [23]. Due to the
congruency between the scale-free network topology and hyperbolic geometry,
the success of greedy routing over hyperbolic coordinates is much improved
compared to the case when the real coordinates are used, while the length of
the paths paved by greedy routing is roughly the same with one of the shortest
paths. HyperMap actually estimates the node coordinates that best fit a given
network.
“Greedy embeddings” in other than Euclidean metric spaces [11], [49] have
been proposed to optimize greedy routing techniques. In the case of a greedy
embedding of any network (not only scale-free) in hyperbolic space, the success
rate of greedy routing becomes exactly 100%. In [11], a distributed implemen-
tation of a greedy embedding in two-dimensional hyperbolic space is proposed,
which also can be applied in dynamic network conditions, by assigning hyper-
bolic coordinates to new nodes without re-embedding the whole network. The
greedy embedding is constructed by choosing a spanning tree of the graph of
the initial network and then embedding the spanning tree into the hyperbolic
space according to the algorithm of [11]. Following this algorithm, after having
assigned hyperbolic coordinates to the root of the tree inside a specific area
of the Poincare disk model, each node computes its own coordinates using the
ones of its parent, in such a way that the hyperbolic bisector of the embed-
ded spanning tree edge between the node and its parent does not intersect
any other embedded edge of the spanning tree. The greedy embedding of a
spanning tree of a graph implies the greedy embedding of the whole graph.
Importantly, it is proven that every graph has a greedy embedding in two-
dimensional hyperbolic space [49]. For all these reasons, hyperbolic geometry
dominates over the Euclidean one for performing greedy routing. Note that a
greedy embedding basically ensures the existence of at least one greedy path
between each source-destination pair, thus 100% success of greedy routing.
Greedy embeddings have been applied successfully in communications net-
works, e.g., [11], [32], [33], however, in the case of large scale networks their
implementation may impose challenges due to the need of a spanning tree of
the whole graph, thus opening new research directions in BDA. The average
length of the paths paved by greedy routing is a crucial performance factor
to evaluate. In the case of greedy hyperbolic embedding, different choices of
spanning tree and the root of the spanning tree (e.g., shortest path spanning
tree rooted at the node with highest degree, or spanning tree derived via a
random walk) will lead to different routing paths and path lengths between
pairs of sources and destinations [34].
Greedy routing can become node-degree aware by exploiting the node de-
gree metric available in network graphs [26]. This enhancement may improve
its performance, since apart from the reason that high degree nodes are “more
connected” to other nodes, they also tend to be embedded nearer to the core
of the network (e.g., center of the Poincare disk) than the lower degree nodes.
Other enhancements of greedy routing (e.g., Gravity-Pressure Greedy For-
Hyperbolic Big Data Analytics within Complex and Social Networks  23
warding [11]) have been also proposed to enhance its performance for dynamic
network conditions, e.g., random node arrivals and departures [11]. Based on
all the advantages of greedy routing techniques over hyperbolic coordinates,
we envision their suitability and efficiency for the computation of SNA met-
rics that demand knowledge of paths between node pairs, e.g., betweenness
centrality [27] often used for defining most influential nodes for information
propagation purposes.
1.7 OPTIMIZATION TECHNIQUES OVER HYPERBOLIC SPACE
FOR DECISION-MAKING IN BIG DATA
1.7.1 The Case of Advertisement Allocation over Online Social Networks
Analysis of big data leads to problems of large-scale optimization. Since op-
timization involving large data sets is not only expensive but suffers from
slow numerical rates of convergence, new approaches are required. Through-
out this subsection, we will describe and study the advertisement (ad) al-
location problem and how it can be significantly simplified computationally
leveraging hyperbolic space’s properties for large-scale networks, following the
approach of [35]. A common advertising mechanism used by, e.g., an online
social network (OSN) platform for the distribution of advertisements over its
users is of auction-style where the advertisers place bids on users’ impressions
(e.g., clicks) based on their budget constraints, while the platform’s owner
seeks to maximize its revenue. In an OSN, users’ impressions are not ad hoc
since users get influenced by their acquaintances and, therefore, the social
influence should be taken into consideration in the optimization. This is due
to the fact that a user’s engagement may influence other users depending on
the influence strength of the former. According to [35], a fairness constraint
should be added in the optimization problem so that “a similar users’ influence
distribution becomes assigned to each advertiser”.
Initially, we review the conventional way to formulate an advertisement
allocation problem over an OSN, which is the following (Equations (1.6)-(1.9))
Integer Programming (IP) problem.
max
S,I
|A|
X
j=1
pj
X
ui∈Sj
Ii,jg(ui) subject to: (1.6)
pj
X
ui∈Sj
Ii,jg(ui) ≤ bj, ∀aj ∈ A (budget constraint) (1.7)
X
j:ui∈Sj
Ii,j ≤ Ii, ∀ui ∈ V (impression constraint) (1.8)
Ii,j ∈ N, (S, I) ∈ RD (domain constraint) (1.9)
where aj is an advertisement (corresponding to an advertiser), A is the set
of all advertisements, pj the bid of the advertiser j which is considered ho-
24  Big Data in Complex and Social Networks
mogeneous over all users, ui is a user of the OSN (node of the network) with
maximum number of impressions assigned to all advertisers (
P
j:ui∈Sj
Ii,j)
equal to Ii and social influence given by g(ui). RD is a feasible set expressing
domain constraints, e.g., fairness or priority constraints among advertisers.
Furthermore, S, I are the optimization variables where S = {S1, S2, ..., S|A|}
is the allocation strategy, i.e., the set of users assigned to each advertiser and
I = {Ii,j|ui ∈ Si, aj ∈ A} is the users’ impressions allocation strategy, i.e.,
the number of impressions of a user assigned to each advertiser where Ii,j = 0
if ui /
∈ Sj. Also, V stands for the set of users. Note that the total number of
impressions of a user is upper bounded due to the limited time that a user
spends on OSNs daily.
The IP problem formulation has two significant disadvantages. Firstly, the
decision variable I has an order of |A| · |V |, implying an extreme increase in
dimensionality for the modern OSNs consisting of billions of users. Secondly,
the domain constraints mentioned above are hard to express in such an IP
formulation setting. The most common and important domain constraint (RD)
is the fairness one as it constitutes a requirement and business model of most
OSN platforms. Except fairness, several other kinds of domain constraints are
described and handled in [35], such as the priority model and the hybrid model
that combines fairness with priority. In this chapter, we will focus only on the
fairness constraint, as it is very representative on indicating the computational
efficiency when utilizing the properties of hyperbolic space in the ad allocation
problem for large-scale OSNs.
In [35], an alternative problem formulation of the advertisement allocation
problem is proposed based on the mapping of the OSN in hyperbolic space
(performed as in Sections 1.4 and 1.5). Following the new methodology the
disadvantages of the IP problem formulation are tackled in a significant degree
as (i) the discrete nature of the advertisement allocation problem (due to
I, S) becomes continuous leveraging region-wise integrals on the continuous
hyperbolic geometric space, allowing for dimensionality reduction reaching
a final one of order O(|A|), (ii) in many cases the domain constraints can
be efficiently represented and visualized. For the latter and considering the
fairness domain constraints, note that two fan (or pie) shapes on the Poincare
disk indicate the same distribution of user influence due to the properties of
a complex network’s (e.g., OSN) mapping in hyperbolic space (Section 1.5)
that will be also pinpointed below.
For the network mapping in hyperbolic space, the HyperMap scheme (Sec-
tion 1.5) is used. The mapping exhibits the important properties of OSNs,
such as the power-law degree distribution (scale-free property), the commu-
nity structure, and the efficient network navigability via greedy routing using
local information (related to small-world phenomenon, Section 1.6). One im-
portant aspect is that after the mapping of the network on the Poincare disk,
the expected node degree, pd(r), depends on the radial coordinate and is given
by pd(r) ∝ e− r
2 , while the node density is expressed as pn(r) ∝ er
. This means
Hyperbolic Big Data Analytics within Complex and Social Networks  25
that every circle on the Poincare disk has uniform node density, while the node
degree-node density is exponentially distributed along the radius. This expo-
nential dependence of node degree and node density on the radius can be
exploited for capturing the users’ influence factor discussed above, while the
continuity of the hyperbolic space can be leveraged for approximating the sum
over users of the advertisement allocation problem with integrals over certain
areas where users are mapped to. In this case, the advertisement allocation
problem seeks an optimal allocation strategy that assigns to each advertiser
a region of population and a maximum revenue is achieved.
Considering all the above, the advertisement allocation problem, after the
mapping of the OSN in hyperbolic space, becomes:
max
S,I
|A|
X
j=1
pjfj(S, I) (volume assignment) subject to: (1.10)
pjfj(S, I) ≤ bj, ∀j ∈ {1, ..., |A|} (budget constraint) (1.11)
|A|
X
j=1
σi(Sj, I) ≤ Ii, ∀ui ∈ V (impression constraint) (1.12)
Ii,j ∈ N+
, ∀ui ∈ Sj : aj ∈ A (1.13)
S ∈ RD (domain constraint) (1.14)
where fj(S, I) is a function of the impressions assigned to the advertisement
aj, σi(Sj, I) is the amount of the impressions of user ui that become assigned
on advertisement aj. According to this meta-formulation, an allocation strat-
egy or a shape design is given for S (e.g., fan-shape for the fairness model,
ring-shape for the priority model [35]) which also determines the fj(S, I) func-
tion. The dependence of the fj, σi functions on I is due to the multiple im-
pressions that a user has and may assign to different advertisers. Therefore,
the areas assigned to different advertisers over the hyperbolic space may be
overlapping complicating the optimization problem (Equations (1.10)–(1.14)).
However in [35], this issue is resolved via a methodology denoted as Unit Im-
pression Decomposition that leads to a multi-stage optimization problem with
unit impressions (and nonoverlapping areas among advertisers) at each stage.
For simplicity, suppose that Ii = 1, ∀ ui ∈ V . Thus, fj, σi depend only on
S. In the following, we will study the case of the fan-shape allocation strat-
egy that expresses fairness with respect to social influence in users’ allocation
among advertisers. Then, the allocation area Sj for the advertiser j has a
fan-shape or pie-shape of angle θj in the Poincare disk (as shown in Figure
1.5). Then, fj(Sj) is computed as follows:
fj(Sj) = fj(θj) = a
Z R
0
eτ
Z θj
0
(1 + w · δ(τ))dldτ = q · θj, (1.15)
26  Big Data in Complex and Social Networks
FIGURE 1.5 An example of an OSN’s users’ allocation to six advertisers
considering fairness with respect to the social influence (node degree).
Each advertiser is assigned a pie-shaped area over the Poincare disk,
on which the users’ OSN is embedded.
where R  1 the radius of the disk inside the Poincare disk where the em-
bedded OSN network lies in and q a constant appropriately determined after
tedious computations. Also, the quantity a(1 + w · δ(τ)) of the integral rep-
resents the profit that each node lying on radius τ attributes to its assigned
advertiser where w, a constants and δ(τ) the node degree, where δ(τ) = g ·e
τ
2 ,
g a constant. Thus, the advertisement allocation problem (for one stage in
case of non-unit users’ impressions) with fairness domain constraints attains
a linear programming (LP) form as follows:
max
Θ
|A|
X
j=1
pjqθj subject to: (1.16)
pj · q · θj ≤ bj, ∀j ∈ {1, ..., |A|} (1.17)
θj ≥ 0, ∀j ∈ {1, ..., |A|} (1.18)
|A|
X
j=1
θj ≤ 2π (1.19)
Hyperbolic Big Data Analytics within Complex and Social Networks  27
In this problem formulation (Equations (1.16)–(1.19)), the optimization vari-
able is Θ = {θ1, ...θ|A|} ∈ [0, 2π]|A|
, which has only |A| dimensions, a sig-
nificant reduction to the |A| × |V | dimensions of the conventional problem
formulation (note that |V | is potentially in the order of billions). Each ad-
vertiser aj is assigned a sector of angle θj in the Poincare disk. Note that
the variable S is not needed anymore in the problem formulation. Also, it is
a convex problem that can be solved efficiently [35]. Two more observations
that further support the efficiency of the last problem formulation are the
following. Since the regions can be arranged very tightly close to each other,
all the users’ impressions will be utilized as long as the demand (budget of
advertisers) is more or equal to the supply (users’ impressions). Also, due to
this fact all the stages of the unit impression decomposition (in the case of
non-unit users’ impressions) can be performed in parallel to reduce the com-
putation time [35] which is a very important advantage of this approach for
big data analysis and computations.
1.7.2 The Case of File Allocation Optimization in Wireless Cellular
Networks
In modern wireless cellular networks the shift from the reactive to proactive
networking paradigm is a common trend [36]. The need for a smarter network
that incorporates proactive mechanisms is driven by the increasing mobile
data traffic [37]. One type of mechanism for proactive network operation which
has already been proposed in the literature [36, 38] is the file/content caching
at the edge of the network, i.e., at the evolved NodeBs (eNBs), small cell base
stations (Home eNBs) or at the user equipment (UE) devices. Pushing content
at the edge of the network alleviates the network from redundant data traffic
and serves users requests at lower transmission delays.
In this subsection, we focus on the problem of optimal file placement in
different cache memories lying at various components of a mobile cellular net-
work. This problem can be cast in a form similar to the problem of Subsection
1.7.1 for achieving efficiency, since it bears similar social and complex charac-
teristics, as well as a similarly large-scale nature, as will be explained in the
following. The size and especially the number of the available files becomes
extremely large and the number of the connected devices is increasing [39].
In this subsection, we describe a formulation of an optimization problem for
distributing files having a complex networked structure over a large number of
heterogeneous caches in a fair way, targeting at reducing the system delay of
file downloading. Fairness is meant in terms of the popularity of each file, e.g.,
a particular cache should not monopolize all popular files. For example, con-
sider the WWW graph [23], where an edge represents a link from a webpage
to another. Therefore, high (in-) degree [6] of a page implies high popularity,
since this webpage is pointed by many others, thus it is more likely to be vis-
ited, i.e., requested. In this context, the following file placement optimization
28  Big Data in Complex and Social Networks
problem is formulated (Equations (1.20)–(1.24)), aiming to determine in an
optimal way the allocation of files in cache memories.
max
I,S
X
fi∈F,j∈M
li
cj
· Ii,j · g(fi) subject to (1.20)
|M|
X
j=1
Ii,j ≤ Ii, ∀i = 1, 2, ..., |F| (1.21)
1
cj
|F |
X
i=1
li · Ii,j · g(fi) ≤ sj, ∀j = 1, 2, ..., |M| (1.22)
Ii,j∈ {0, 1}, ∀i = 1, 2, ..., |F|, ∀j = 1, 2, ..., |M| (1.23)
(S, I) ∈ RD (domain constraint) (1.24)
where cj is the capacity of the transmission link between the memory cache
j and the provider of the file fi, li is the size of the file fi, M is the set of
the memory caches, F is the set of files, Ii,j is the indicator variable of the
placement of a file fi in memory cache j and g(fi) is a social influence factor
associated to file fi. Ii,j is either 1 if the file fi is placed in memory cache j
or 0 otherwise, while Ii stands for the maximum number of caches into which
the file fi can be stored and sj relates to the capacity of cache j. Finally,
S = {S1, S2, ..., S|M|} is the allocation strategy, i.e., the set of files assigned
to each cache memory.
The placement of a file fi at the memory cache j has a certain benefit
for the network in terms of the average system delay improvement. Each
placement of a file to a cache memory offloads the network from the time
needed to download a file from the file/content provider. This benefit can be
on average quantified by the term li
cj
. Thus, the above file allocation problem
maximizes the total benefit in terms of the system delay improvement from
the placement of certain files in the available cache memories. This problem
is of integer programming form, thus being NP-hard, while also attaining
large-scale characteristics, as mentioned before. Thus, alternative approaches
need to be taken into account in order to tackle efficiently the large scale and
discrete nature of this problem. It can be observed by the following mapping
table (Table 1.2) that the file placement problem in memory caches is of a
similar nature to the problem of advertisement allocation, presented in the
previous section (Subsection 1.7.1). Following the arguments and analysis of
Hyperbolic Big Data Analytics within Complex and Social Networks  29
TABLE 1.2 Mapping of the file allocation problem to the advertisement
allocation problem.
Advertisement allocation in
users
File allocation in caches
Advertisement (A) Cache Memory (M)
Users (ui, V ) Files (fi, F)
Price Bid (pj) The inverse of the capacity of the
link between cache
and the file provider (1/cj)
Social factor g(ui) Social factor g(fi)
Ad budget constraint (bj) Storage capacity constraint of
the cache memory (sj)
Subsection 1.7.1, the files’ network graph can be embedded in the hyperbolic
space. After this mapping the file allocation problem takes the following form:
max
I
X
j∈M
1
cj
· fj(S, I) (volume assignment) subject to (1.25)
1
cj
· fj(S, I) ≤ sj, ∀j = 1, 2, ..., |M| (storage constraint) (1.26)
|M|
X
j=1
σi(Sj, I) ≤ Ii, ∀i = 1, 2, ..., |F| (1.27)
Ii,j∈ {0, 1}, ∀i = 1, 2, ..., |F|, ∀j = 1, 2, ..., |M| (1.28)
(S, I) ∈ RD (domain constraint) (1.29)
where fj(S, I) is a function of the number (or size) of files and their social in-
fluence that are assigned to the memory cache j. Following the lines of Section
1.7.1, this formulation leads to a significant reduction of the dimensionality
from O(|M||F|) to O(|M|), and provides the flexibility of applying the desired
fairness policy with respect to the social characteristics of the available files.
1.8 VISUALIZATION ANALYTICS IN HYPERBOLIC SPACE
Visual analytics consists of analytical reasoning facilitated by the visual in-
terface, integrating the analytic capabilities of the computer and the abilities
of the human analyst. The visual analytics approach relies on interactive and
integrated visualizations for exploratory data analysis in order to identify un-
expected trends, outliers or patterns. By putting a human back into the loop
to guide the analysis, interactive data visualizations have an important role
to play, e.g., as in [41].
Large datasets challenge the ability to visualize, navigate and understand
relationships among data. In general, displaying large collections of data
30  Big Data in Complex and Social Networks
(rolled out in many dimensions) within a limited display area requires caution
to avoid missing the necessary details. Especially when data analytics yield
graphs of nodes (data points) and edges (relations among data points), prop-
erly depicting such inter-relations is crucial for facilitating better analysis.
Displays of large graphs (typically derived in BDA) in Euclidean spaces may
not utilize efficiently the available space and impose limitations on the order
of the graph that can be handled. Contrary to that, hyperbolic space offers
significant advantages in this direction by allowing the display of an arbitrarily
large structure within a bounded, finite space (e.g., Poincare disk model), si-
multaneously providing the possibility of changing the focus to specific areas,
while retaining the whole picture of the data structure.
Hyperbolic-based visualization may significantly assist data analysis and
corresponding decision making via a holistic rather than a focused view on
the data structure and correlations. For example, it is possible to identify
important/influential nodes, thus avoiding or reducing a significant amount
of computations over large data sets, e.g., shortest paths for identifying node
centralities (SNA metrics). The advantages of hyperbolic space with respect
to data visualization and BDA can be summarized as follows:
i. The hyperbolic space grows exponentially with its radius around each point.
This property is ideal for embedding hierarchical data represented as tree
graphs, and consequently scale-free graphs often emerging in social network
analysis and BDA (see Subsection 1.5).
ii. The Poincare disk model of hyperbolic space exhibits a fish-eye property of
dynamic focus, allowing real-time interactive navigation, e.g., via the mouse.
There are many visualization techniques that utilize hyperbolic space em-
bedding. Most of them focus on hierarchical or tree-like graph embedding.
Generally, depending on the data representation, different techniques can be
applied, as described in the following.
1.8.1 Adaptive Focus in Hyperbolic Space
The visualization of large datasets in general suffers from a difficulty to show
both focus and global context. Adjusting the focus is an important advan-
tage of using hyperbolic space for data visualization. In order to change the
focus point in the Poincare disk, a translation operation can be applied that
corresponds to a user’s mouse click and drag events. This translation is de-
noted as Mobius transformation, symbolized by T(z), where z is a point in
complex conjugates in the Poincare disk. In this case, the isometric Mobius
transformation for a point z can be written as [46]:
z0
= T(z; c, b) =
bz + c
cbz + 1
, |b| = 1, |c|  1. (1.30)
The complex number b describes a pure rotation of the Poincare disk around
the origin 0. The translation by c maps the origin to c, and c becomes the
new center 0 (if b = 1). In Figure 1.4 (c), the triangles mapped in the center
Hyperbolic Big Data Analytics within Complex and Social Networks  31
of the Poincare disk can be seen in detail, a fact that does not hold for the
triangles mapped close to the periphery, although all triangles are of equal
size. Applying such Mobius transformations (Equation (1.30)) can transfer
the focus to other triangles of interest by moving them to the center of the
Poincare disk.
1.8.2 Hierarchical (Tree) Graphs
Data visualization inside the Poincare disk (in 2D) can be performed by using
successive applications of the Mobius transformation given in Section 1.8.1,
Equation (1.30) [40]. Each tree node receives a certain open space, called “pie
segment”, where it chooses the locations of its siblings. This is denoted as a
treemap [42]. Then, for all its siblings, it calls recursively the layout routine
after applying the Mobius transformation. A similar visualization technique
is developed in [43] in the 3D hyperbolic space, although navigation in 3D
is more complex. Given a hierarchical structure of data (similar to a tree
structure), large directed graphs can be efficiently visualized in 3D hyperbolic
space, since due to its exponential increase the same room can be allocated
to every embedded node no matter how deep it lies in the tree.
1.8.3 General Graphs
In this case, two basic visualization techniques in the 2D hyperbolic space can
be identified. The combination of these two techniques in a hybrid scheme
allows for a more efficient visualization.
Self-Organizing Map (SOM) in Hyperbolic Space (HSOM) [44].
Firstly, a feature map is built, composed by a lattice of nodes (neurons) while a
reference vector (prototype vector) is attached to each node. The position of a
new data vector in the visualization is determined by the discrete (best-match)
node in the lattice chosen via minimizing the hyperbolic distance (Poincare
disk) of the new vector over all existing prototype vectors (nodes) in the lat-
tice.
Hyperbolic Multidimensional Scaling (HMS) [45]. This visualization
technique suitably represents the proximity relations (dissimilarities) of N
objects by distances between points in the Poincare disk model of hyper-
bolic space. Therefore, comparing the spatial positions of two nodes on the
Poincare disk provides strong intuition for the similarity/dissimilarity of their
corresponding features.
Hybrid Scheme [46]. Each one of the HSOM and HMS schemes has ad-
vantages and disadvantages. HSOM processes only vectorial data and scales
linearly in the number of nodes and HMS uses dissimilarity data and grows
as the square of the number of nodes. Thus, HSOM may accommodate higher
data quantities, while HMS accommodates a more general data form, i.e., the
dissimilarity one. The proposed hybrid scheme in [46] exploits the advantages
of each isolated visualization technique. It firstly creates a coarse-grain theme
32  Big Data in Complex and Social Networks
map of the data via HSOM (which accommodates more data) and then uses
HMS for detailed inspection of data subsets where data similarities are con-
tinuously reflected as spatial proximities. Importantly, the display paradigm
employs in both cases the hyperbolic plane in order to profit from its focus
and context technique (as explained above).
Finally, there are two existing applications for visualizing data in hyper-
bolic space, namely Hyperbolic Tree Viewer [47] and Hypertree [48]. Although
many other examples exist, they all suffer from different shortcomings, in par-
ticular problems regarding the inclusion of additional data dimensions, and
the absence of a means to guide the user to those regions of the data that
might be called “interesting”, calling for novel approaches.
1.9 CONCLUSIONS
In this chapter, we developed a big data analytics (BDA) and exploitation
framework for complex and social networks leveraging significant properties
of hyperbolic geometry in this field. Briefly, many scale-free complex and social
networks are characterized by a hidden hyperbolic structure, and thus embed-
ding their large-scale produced data in hyperbolic space emerges natural, also
allowing for their efficient handling, processing and exploitation via informa-
tion extraction. In this context, our proposed framework collects methodolo-
gies over hyperbolic coordinates for several processes concerning complex and
social networks, such as correlations and clustering, missing links’ inference,
efficient SNA metrics’ computations, optimized resource allocation and visual-
ization analytics. We envision that the proposed framework will revolutionize
BDA in complex and social networks and will maximize the benefit from data
analytics generated from the latter, for the latter.
ACKNOWLEDGMENT
This research is co-financed by the European Union (European Social Fund)
and Hellenic national funds through the Operational Program ‘Education and
Lifelong Learning’ (NSRF 2007-2013).
FURTHER READING
1. D. Puccinelli, M. Haenggi, “Wireless Sensor Networks: Applications and
Challenges of Ubiquitous Sensing”, IEEE Circuits and Systems Magazine,
Vol. 5, No. 3, pp. 19-31, 2005.
2. C.L.P. Chen, C.-Y. Zhang, “Data-intensive Applications, Challenges, Tech-
niques and Technologies: A Survey on Big Data”, Elsevier Information
Sciences, No. 275, pp. 314-347, 2014.
Hyperbolic Big Data Analytics within Complex and Social Networks  33
3. K.-C. Chen, M. Chiang, H.V. Poor, “From Technological Networks to
Social Networks”, IEEE Journal on Selected Areas in Communica-
tions/Supplement (JSAC), Vol. 31, No. 9, pp. 548-572, September 2013.
4. J. W. Anderson, Hyperbolic Geometry, 2nd ed. Springer, 2007.
5. F. Papadopoulos, C. Psomas, D. Krioukov, “Network Mapping by Replaying
Hyperbolic Growth”, IEEE/ACM Transactions on Networking, Vol. 23,
No. 1, pp. 198-211, Feb. 2015.
6. V. Karyotis, E. Stai, S. Papavassiliou, Evolutionary Dynamics of Complex
Communications Networks, CRC Press - Taylor  Francis Group, Boca
Raton, FL, 2013.
7. L. Atzori, A. Iera, G. Morabito, M. Nitti, “The Social Internet of Things
(SIoT) - When social networks meet the Internet of Things: Concept, Ar-
chitecture and Network Characterization”, Computer Networks, Elsevier,
Vol. 56, No. 16, pp. 3594-3608, 2012.
8. F. Papadopoulos, D. Krioukov, M. Bogua, A. Vahdat, “Greedy Forward-
ing in Dynamic Scale-Free Networks Embedded in Hyperbolic Metric
Spaces”, in. Proc. of IEEE INFOCOM, pp. 14-19, March 2010.
9. F. Papadopoulos, M. Kitsak, M. A. Serrano, M. Bogu, D. Krioukov, “Popu-
larity vs. Similarity in Growing Networks”, Nature, Vol. 489, pp. 537-540,
Sept. 2012.
10. F. S. Beckman, D. A. Quarles, “On Isometries of Euclidean Space”, Proc.
Amer. Math. Soc., Vol. 4, pp. 810-815, 1953.
11. A. Cvetkovski, M. Crovella, “Hyperbolic Embedding and Routing for Dy-
namic Graphs”, IEEE INFOCOM, pp. 1647-1655, April 2009.
12. I. Benjamini, Y. Makarychev, “Dimension Reduction for Hyperbolic
Space”, American Mathematical Society, Vol. 137, No. 2, pp. 695-698,
Feb. 2009.
13. D. A. Tran, K. Vut, “Dimensionality Reduction in Hyperbolic Data Spaces:
Bounding Reconstructed-Information Loss”, in Proc. of 7th
IEEE/ACIS
Int’l Conf. on Computer and Information Science, pp. 133-139, May 2008.
14. R. Lior, O. Maimon, “Clustering methods”, Data Mining and Knowledge
Discovery Handbook, Springer US, pp. 321-352, 2005.
15. D. Yan, L. Huang, M. I. Jordan, “Fast Approximate Spectral Clustering”,
in Proc. of the 15th ACM Conference on Knowledge Discovery and Data
Mining (SIGKDD), Paris, France, 2009.
16. Y. Koren, R. Bell, C. Volinsky, “Matrix Factorization Techniques for Rec-
ommender Systems”, Computer, Vol. 42, No. 8, pp. 30-37, August 2009.
34  Big Data in Complex and Social Networks
17. A. K. Menon, C. Elkan, “Fast Algorithms for Approximating the Singular
Value Decomposition”, ACM Transactions on Knowledge Discovery from
Data (TKDD), Vol. 5, No. 2, Feb. 2011.
18. S.-H. Cha, “Comprehensive Survey on Distance/Similarity Measures be-
tween Probability Density Functions”, Int’l Journal of Mathematical
Models and Methods in Applied Sciences, Vol. 1, No. 4, pp. 300-307, 2007.
19. L. Lee, “Measures of Distributional Similarity”, 37th Annual Meeting of
the Association for Computational Linguistics, pp. 25-32, 1999.
20. M.E.J. Newman, Networks: An Introduction, Oxford, UK: Oxford Univer-
sity Press, 2010.
21. T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning,
Springer, 2008.
22. N. Ailon, B. Chazelle, “The Fast Johnson-Lindenstrauss Transform and
Approximate Nearest Neighbors”, SIAM J. Comput., Vol. 39, No. 1, pp.
302-322, 2009.
23. R. Albert, A.-L. Barabasi, “Statistical Mechanics of Complex Networks”,
Reviews of Modern Physics, Vol. 74, No. 1, pp. 47-97, Jan. 2002.
24. X. Zhao, A. Sala, H. Zheng, B. Y. Zhao, “Efficient Shortest Paths on
Massive Social Graphs”, IEEE Collaborate Communic., pp. 77-86, 2011.
25. Y. Shavitt, T. Tankel, “Hyperbolic Embedding of Internet Graph for Dis-
tance Estimation and Overlay Construction”, IEEE/ACM Trans. on Net-
working, Vol. 16, No. 1, pp. 25-36, Feb. 2008.
26. X. Ban, J. Gao, A. van de Rijt, “Navigation in Real-World Complex Net-
works through Embedding in Latent Spaces”, ALENEX, pp. 138-148,
2010.
27. S. P. Borgatti, “Centrality and Network Flow”, Social Networks (Elsevier),
pp. 55-71, 2004.
28. M. Boguna, D. Krioukov, K. C. Claffy, “Navigability of Complex Net-
works”, Nature Physics, Vol. 5, pp. 74-80, 2009.
29. J. Zhang, “Greedy Forwarding for Mobile Social Networks Embedded in
Hyperbolic Spaces”, in Proc. of the ACM SIGCOMM, New York, NY,
pp. 555-556, 2013.
30. J. Bobadilla, F. Ortega, A. Hernando, A. Gutierrez, “Recommender Sys-
tems Survey”, Knowledge-Based Systems, Elsevier, Vol. 46, pp. 109-132,
April 2013.
Hyperbolic Big Data Analytics within Complex and Social Networks  35
31. J. Kleinberg, “The Small-World Phenomenon: an Algorithmic Perspec-
tive”, In Proc. of the 32 Annual ACM Symposium on Theory of Comput-
ing (STOC ’00), New York, NY, USA, pp. 163-170, 2000.
32. E. Stai, J. S. Baras, S. Papavassiliou, “Social Networks over Wireless Net-
works”, in Proc. of the 51st IEEE Conf. on Decision and Control (CDC),
Hawaii, Dec. 2012.
33. E. Stai, S. Papavassiliou, J. S. Baras, “Performance-Aware Cross-
Layer Design in Wireless Multihop Networks via a Weighted Back-
pressure Approach”, IEEE/ACM Transactions on Networking, DOI:
10.1109/TNET.2014.2360942, October 2014.
34. E. Stai, S. Papavassiliou, J. S. Baras, “A Coalitional Game Based Approach
for Multi-Metric Optimal Routing in Wireless Networks”, in Proc. of
the 24th Annual IEEE Int’l Symposium on Personal, Indoor and Mobile
Radio Commun. (PIMRC), pp. 1935-1939, London, UK, Sept. 2013.
35. P. Gao, H. Miao, J. S. Baras, “Social Network Ad Allocation via Hyperbolic
Embedding”, in Proc. of 53rd IEEE Conference on Decision and Control
(CDC), pp. 4875-4880, December 2014.
36. E. Bastug, M. Bennis, M. Debbah, “Living on the Edge: The role of Proac-
tive Caching in 5G Wireless Networks”, IEEE Communications Maga-
zine, Vol. 52, No. 8, pp. 82-89, 2014.
37. Cisco, “Cisco Visual Networking Index: Global Mobile Data Traffic Fore-
cast Update 2013-2018”, White Paper, [Online] http://goo.gl/l77HAJ,
2014.
38. F. Pantisano, M. Bennis, W. Saad, M. Debbah, “In-Network Caching and
Content Placement in Cooperative Small Cell Networks”, 1st Int’l Con-
ference on 5G for Ubiquitous Connectivity (5GU), 2014.
39. Ericsson, “5G Radio Access-Research and Vision”, White Paper, June 2013.
40. J. Lamping, R. Rao, P. Pirolli. “A Focus+Context Technique Based on
Hyperbolic Geometry for Viewing Large Hierarchies”, ACM SIGCHI, pp.
401-408, 1995.
41. U. C. Turker, S. Balcisoy, “A Visualization Technique for Large Temporal
Social Network Datasets in Hyperbolic Space”, Journal of Visual Lan-
guages and Computing, Vol. 25, pp. 227-242, 2014.
42. H.-C. Lam, I.D. Dinov, “Hyperbolic Wheel: A Novel Hyperbolic Space
Graph Viewer for Hierarchical Information Content”, ISRN Computer
Graphics, Volume 2012, article ID 609234, 2012.
36  Big Data in Complex and Social Networks
43. T. Munzner, “H3: Laying out Large Directed Graphs in 3D Hyperbolic
Space”, in Proc. of IEEE Symposium on Information Visualization, pp.
2-10, 1997.
44. H. Ritter, “Self-organizing Maps on non-Euclidean Spaces”, In Kohonen
Maps, Elsevier, pp. 97-110, 1999.
45. J. Walter, H. Ritter, “On Interactive Visualization of High-Dimensional
Data Using the Hyperbolic Plane”, in Proc. of ACM Int’l. Conference on
Knowledge Discovery and Data Mining (SIGKDD), pp. 123-131, 2002.
46. J. Walter, J. Ontrup, D. Wessling, H. Ritter, “Interactive Visualization and
Navigation in Large Data Collections Using the Hyperbolic Space”, in
Proc. of the 3rd IEEE International Conference on Data Mining (ICDM),
pp. 355-362, Nov. 2003.
47. J. Lamping, R. Rao, “Laying out and Visualizing Large Trees Using a
Hyperbolic Space”, in Proc. ACM Symp User Interface Software and
Technology, pp. 13-14, 1994.
48. J. Bingham, S. Sudarsanam, “Visualizing Large Hierarchical Clusters in
Hyperbolic Space”, Bioinformatics, Vol. 16, No. 7, pp. 660-661, 2000.
49. R. Kleinberg, “Geographic Routing Using Hyperbolic Space”, in Proc. of
IEEE INFOCOM, pp. 1902-1909, May 2007.
50. A.-L. Barabasi, E. Bonabeau, “Scale-Free Networks”, Scientific American,
pp. 50-59, May 2003.
51. D. J. Watts, S. H. Strogatz, “Collective Dynamics of ‘Small-World’ Net-
works”, Nature, Vol. 393, pp. 440-442, Jun. 1998.
C H A P T E R 2
Scalable Query and
Analysis for Social
Networks: An Integrated
High-Level Dataflow
System with Pig and
Harp
Tak-Lon (Stephen) Wu, Bingjing Zhang, Clayton Davis,
Emilio Ferrara, Alessandro Flammini, Filippo Menczer
and Judy Qiu
CONTENTS
2.1 Introduction ................................................ 38
2.2 Apache High-Level Language, Syntax and its Common
Features .................................................... 39
2.2.1 Pig .................................................. 39
2.2.2 Hive ................................................ 42
2.2.3 Spark SQL/Shark .................................. 44
2.3 Pig, Hive and Spark SQL Comparison .................... 45
2.4 Ad-hoc Queries: Truthy and Twitter Data ................ 46
2.5 Iterative Scientific Applications ........................... 47
2.5.1 K-means Clustering and PageRank ................ 48
2.6 Benchmarks ................................................ 51
2.6.1 Performance of Ad-hoc Queries ................... 51
2.6.2 Performance of Data Analysis ..................... 53
37
38  Big Data in Complex and Social Networks
2.7 Conclusion ................................................. 56
Bibliography ............................................... 57
E
very day, vast amounts of data are being collected from social network
(e.g., Twitter) applications, and in response there is a growing need for
analysis methods that can handle this terabyte-size input. To provide an ef-
fective and advanced data processing environment for various types of social
data analysis such as political discourses, trending topics, evolution of user
behavior, social bots detection and orchestrated campaigns, we need to sup-
port both query and complex analysis efficiently. Use of high-level scripting
languages to solve big data problems has become a mainstream approach for
sophisticated data mining and analysis. In particular, high-level interfaces such
as Pig, Hive, and Spark SQL are being used on top of the Hadoop framework.
This simplifies coding of complex tasks in MapReduce-style systems while im-
proving the flexibility of database systems through user-defined aggregations.
In this chapter we will compare different approaches of building high-level
dataflow systems and propose an integrated solution with Pig and Harp (a
plugin to Hadoop) along with giving extensive benchmarks. The results show
that Pig and Harp integration for sophisticated iterative applications runs at
a factor of 2 to 10 times faster than Pig or Hive implementation executed on
Hadoop.
2.1 INTRODUCTION
Social media represents a precious data source providing tremendous amounts
of streaming information for analytics and research applications. Many re-
search projects are involved in performing intensive analysis on such data, and
the outcome of this analysis is drawing the attention of various applications,
including market sales analysts, societal studies (including political polariza-
tion [10], congressional elections [14, 13], protest events [12, 11], and the spread
of misinformation [47, 37]) and information diffusion [24]. Compared to other
problems in computing, social media analysis is “special”; it normally focuses
on a subset of data related to a target social event within a specific time frame.
To further investigate the inter-relationship of such subsets of data, various
sophisticated algorithms and complex data transformations may be applied
into a series of stages [19]. Therefore, developing a programmable solution for
social media data must include features like expressiveness, ability for data ex-
traction, reusability and interoperability with different computation runtimes.
Apache high-level languages and Apache Hadoop [1] ecosystem are some of
the existing building block solutions that match the requirements for social
network analysis.
The use of high-level language platforms is not just limited to social media
data. Other fields of research such as workflow provenance [7], network traffic
Scalable Query and Analysis for Social Networks  39
analysis [26, 23], and geographic data analysis [6] have proved the adaptation
of these solutions boosts and scales up their historical data analysis. However,
the complex workflows characterizing existing platforms makes it difficult for
users to decide what language and low level runtimes best match their needs.
Motivated by these challenges, our goal is to provide a comprehensive survey
of these high-level abstractions involving experiments with real social media
data examples and common query and analysis applications.
The rest of the chapter is organized as follows. Section 2.2 gives an overview
of Apache high-level languages, especially Pig [22], Hive [40] and Spark SQL
[2, 44]. The first two build on Hadoop while Harp [47] and Spark [46] are
Apache iterative MapReduce frameworks offering support to complex parallel
data systems. Section 2.3 provides a comparison of these languages’ features
especially the important user-defined functions that make MapReduce a sim-
plified and scalable solution. Sections 2.4 and 2.5 introduce applications that
are used for benchmarking later in the chapter. Section 2.4 introduces the
Truthy project and the types of queries that it needs to run on top of Twitter
data, while Section 2.5 discusses three data analytics use-cases and how to
express them in high-level languages. Section 2.6 presents the performance
evaluation of the applications presented in Sections 2.4 and 2.5, and the tech-
nologies of Section 2.2. Section 2.7 draws our conclusions.
2.2 APACHE HIGH-LEVEL LANGUAGE, SYNTAX AND ITS
COMMON FEATURES
Programming languages have been developed for more than 50 years. Each
language has its own compiler/interpreter and executes a physical plan on
top of the low level (operating) system. Apache high-level languages share
the common features of traditional programming languages; in many cases, a
compiler built for such a language supports several fundamental functions and
operations: a syntax parser, type and compile time semantic checking, logical
plan generator and optimizer, and physical plan generator and executor. Here
ANTLR (ANother Tool for Language Recognition) [34] is the general syntax
parser for Pig, Hive, and Spark SQL. Each language has its own types and
plan generator and optimizer, but all of them use YARN [42] as their resource
management tool. The next sections will discuss details of Apache Pig, Apache
Hive and Apache Spark SQL.
2.2.1 Pig
Pig is a high-level dataflow system that yields simple data transformations
in pipeline for large amounts of semi-structured data stored in Hadoop com-
patible file storage. Applications such as massive system log analysis and tra-
ditional Extract, Transform, and Load (ETL) data processing are performed
regularly. Pig was first introduced by Yahoo!, and became one of the most
Random documents with unrelated
content Scribd suggests to you:
Big Data In Complex And Social Networks My T Thai Weili Wu Hui Xiong
Big Data In Complex And Social Networks My T Thai Weili Wu Hui Xiong
Big Data In Complex And Social Networks My T Thai Weili Wu Hui Xiong
The Project Gutenberg eBook of Harry Harding
—Messenger 45
This ebook is for the use of anyone anywhere in the United States
and most other parts of the world at no cost and with almost no
restrictions whatsoever. You may copy it, give it away or re-use it
under the terms of the Project Gutenberg License included with this
ebook or online at www.gutenberg.org. If you are not located in the
United States, you will have to check the laws of the country where
you are located before using this eBook.
Title: Harry Harding—Messenger 45
Author: Alfred Raymond
Release date: July 15, 2016 [eBook #52578]
Most recently updated: October 23, 2024
Language: English
Credits: Produced by Donald Cummings and the Online Distributed
Proofreading Team at http://www.pgdp.net
*** START OF THE PROJECT GUTENBERG EBOOK HARRY HARDING
—MESSENGER 45 ***
Harry Harding
—Messenger “45”
By
ALFRED RAYMOND
Copyright 1917, by
CUPPLES  LEON COMPANY
CONTENTS
CHAPTER PAGE
IA Menace to the School 1
IIOn the Trail of a Job 9
IIIAn Anxious Moment 27
IVA Surprise and a Disappointment 37
VFriends and Foes 51
VIAt the End of the Day 67
VIITeddy Comes Into His Own 75
VIIIThe Recruits to Company A 81
IXThe Bitterness of Injustice 95
XBreakers Ahead for Harry 105
XITeddy Burke Distinguishes Himself 116
XIIA Disastrous Combat 122
XIIIThe Measure of a Man 129
XIVThe Price of Honesty 138
XVA Fateful Game of Catch 148
XVIAll in the Day’s Work 158
XVIIThe Singer and the Song 169
XVIIIConfidences 178
XIXThe Belated Dawn 185
XXTeddy’s Triumph 191
XXIGetting Even with the Gobbler 202
XXIIA Disturbing Conversation 213
XXIIIHarry Pays His Debt 224
XXIVWriting the Welcome Address 239
XXVCommencement 250
Big Data In Complex And Social Networks My T Thai Weili Wu Hui Xiong
HARRY HARDING
—Messenger “45”
“I
CHAPTER I
A MENACE TO THE SCHOOL
will drown and no one shall help me,” announced Miss Alton
defiantly.
The first class in English accepted this remarkable statement in
absolute silence, their eyes fixed on their teacher. As she stood high
and dry on the platform, facing her class, there seemed little
possibility of such a catastrophe overtaking her, therefore, they
knitted their wise young brows, not in fear of her demise by
drowning, but in puzzled worry over the intricacies of shall and will.
“I will drown,” repeated Miss Alton firmly, “and no one——”
“Oh-h-h!” a piercing shriek rent the grammar-laden air. As though
about to prove her declaration, Miss Alton made a sudden dive off
the platform that carried her half-way up an aisle toward the
immediate vicinity of that anguished voice.
The first class in grammar immediately forgot the uses of shall and
will and twisted about on their benches to view their teacher’s
hurried progress toward the scene of action.
“It’s Teddy Burke,” muttered a boy to his nearest classmate.
“Wonder what he’s done.”
Miss Alton had now brought up between two seats at the rear of
the room. In one of them sat a little girl, her head buried in her
arms. Directly opposite her sat a red-haired boy. His thin face wore
an expression of deep disgust, but his big black eyes were dancing
with mischief. As the teacher approached, he made an ineffectual
dive toward a grayish object on the floor. Miss Alton was too quick
for him. She stooped, uttered a half-horrified exclamation, then
gathered the object in. It was a most terrifying imitation of a snake,
made of rubber, and coiled realistically.
“Theodore Burke, what does this mean?” she demanded, holding
out the snake and glaring at the offender.
The little girl raised her head from her arms and eyed the culprit
with reproachful horror. “He put it on my seat,” she accused. “I
thought it was alive, and it scared me awful.” Her voice rose to a
wail on the last word.
“This is too much. You’ve gone just a little too far, young man.
Come with me.” Miss Alton stood over the red-haired lad, looking like
a grim figure of Justice.
The boy shot a glance of withering scorn at his tearful victim, then
rose from his seat.
Grasping him none too gently by the arm, Miss Alton piloted him
down the aisle and out of the door. It closed with a resounding
bang.
A buzz of conversation began in the big schoolroom. Two or three
little girls left their seats and gathered about the heroine of the
disquieting adventure, while half a dozen boys of the eighth grade of
the West Park Grammar School put their heads together to discuss
this latest bit of mischief on the part of their leader and idol, Teddy
Burke.
Meanwhile, Teddy, of the black eyes and Titian hair, was being
marched rapidly toward the principal’s office.
Miss Alton flung open the door and ushered him into the august
presence of Mr. Waldron, the principal, with, “Here is an incorrigible
boy, Mr. Waldron.”
The principal, a short, stern-faced man, adjusted his eye-glasses
and stared hard at Teddy. The boy hung his head, then raising his
eyes regarded Mr. Waldron defiantly.
“So you are here again, young man, for the third time in two
weeks,” thundered the principal. “What has this bad boy done, Miss
Alton?”
Miss Alton began an indignant recital of Teddy’s latest misdeed.
The principal frowned as he listened. When she had finished, he
fixed Teddy with severe eyes.
“Let me see. The last time you were here it was for interrupting
the devotional exercises by putting a piece of ice inside the collar of
one of your schoolmates. Aren’t you ashamed of yourself? How
would you like to have your schoolmates play upon you the unkind
pranks you are so fond of playing upon them?”
“I wouldn’t care,” returned the boy, unabashed. “I wouldn’t make
a fuss, either.”
“Miss Alton is right,” snapped Mr. Waldron, his face reddening
angrily at the boy’s retort. “You are, indeed, an incorrigible boy. I
think I had better put your case before the Board of Education.
There are special schools for bad boys like you. We don’t care to
have such a boy among us. You are a menace to the school.” He
continued to lecture Teddy sharply, ending with, “Take him back to
your room for the day, Miss Alton, but make him remain after the
others have gone home this afternoon. By that time I shall have
decided what we had better do with him.”
Teddy walked down the corridor ahead of Miss Alton with a sinking
heart. Was he a menace to the school and could Mr. Waldron really
put him in a school for bad boys? He had heard of such schools. He
had heard, too, that sometimes the boys came out of them much
worse than when they entered. The murmur of voices came to his
ears as Miss Alton flung open the door and urged him into the
schoolroom. The noise died a sudden death as she stepped over the
threshold.
“Go to your seat,” she ordered coldly.
Teddy obeyed. The little girl, whose shriek had caused his
downfall, eyed him with horror. Even in the midst of his troubles he
could not resist giving her an impish grin. She promptly made a face
at him and looked the other way. The smile vanished from Teddy’s
face. Then he folded his hands on his desk and thought busily for
the next five minutes.
The class resumed its interrupted recitation. Suddenly the boy
reached into his desk and began stealthily to take out his
belongings. The books belonged to the school, but a pencil box, a
knife, a box of marbles, a top, a dilapidated baseball, a magnet and
a small, round mirror with which he delighted to cast white shadows
on the books of the long-suffering eighth-grade girls, were treasures
of his own. Stuffing them into his pockets he replaced the books;
then he sat very still. It was almost time for the recess bell to ring.
He hardly thought Miss Alton would order him to keep his seat. Such
light punishments were not for him. To-night—but there would be no
to-night in school for him. When recess came he would go outside
and say good-bye to the fellows, then he would start out and hunt a
job. He was almost sixteen, and the law said a boy could work when
he was fourteen, if he had a certificate. Well, he would get that
certificate. His mother would let him go to work if he wanted to. She
was so busy with her own affairs she never cared much what he did.
If he had a job, then Mr. Waldron couldn’t send him to a reform
school. That was the place where incorrigible boys were sent.
Teddy did not stop to consider that his mother might prove a
match for Miss Alton and Mr. Waldron when it came to a question of
her son’s incorrigibility. He thought only of putting himself beyond
the reach of the school authorities by his own efforts.
The recess bell rang at last and the pupils filed out in orderly rows
to the big, grassy yard, at one side of the school building. Teddy was
at once surrounded by half a dozen boys, his particular friends. The
girls collected in little groups about the yard to comment on Teddy’s
iniquity. They eyed him askance with curious, aloof glances. The
boys, however, were deeply interested in the possible outcome of
Teddy’s rash defiance.
“You’re goin’ to get fired all right,” was the cheerful prophecy of
one boy. “What’ll your mother say?”
“She won’t say,” giggled a freckle-faced boy. “She’ll just take Ted
across her knee and——”
“Well, I guess not,” flung back Teddy. “I’m not going to wait to get
fired, either. I’m going to beat it. When the recess bell rings I’m not
going in with the rest of you. See here,” Teddy began pulling his
various treasured belongings out of his pockets. “I brought all this
stuff out to give you fellows. I sha’n’t want it. I’m going down to
Martin Brothers’ Department Store and get a job. That’s what I’m
going to do. Here’s my looking glass, Sam. Every time you cast a
shadow with it, think of me. And you can have my marbles, Bob.”
Teddy distributed his belongings rapidly about the little circle. The
boys took them with some reluctance. They had far rather have
Teddy Burke, ringleader of all their mischief, with them than his
belongings.
“Aw, why don’t you get your mother to come down here and fix it
up with those old cranks?” demanded Sam Marvin regretfully. “It
ain’t your stuff we want, Ted. It’s you. What’re we goin’ to do
without you?”
“Be good,” grinned Teddy. “I’m a menace to the school, you
know.”
“I wish I was goin’ to work,” said Bob Rayburn sadly. “Pa won’t let
me, though.”
“Honestly, won’t your mother lick you if she finds out about what
happened to-day?” inquired Arthur Post, a tall, thin boy with a
solemn face.
“Lick nothing,” retorted Ted. “She isn’t going to find out about it.
I’m going to tell her myself. She’ll say I can go to work if I feel like
it.”
His chums eyed him with mingled admiration and regret. To them
Teddy was a hero.
“There goes the bell. I’ve got to beat it. Don’t any of you start to
go in till I get to the corner,” directed Ted. “Then she,” he jerked his
thumb in Miss Alton’s direction, “won’t know I’ve skipped until it’s
too late. I’ll let you know where I am as soon as I get that job.
Good-bye, fellows. Be sure and do what smarty Alton tells you, and
don’t go bringing any rubber snakes to school. You can have that
one of mine if you can get it away from old Cross-patch.”
With an air of gay bravado Teddy raised his hand in a kind of
parting salute, then darted down the yard and through the gateway
to the street. At the corner he waved his hand again, then swung
out of sight, leaving a little knot of boys to gaze regretfully after him
and wonder how they could possibly get along without wide-awake,
mischievous Teddy Burke.
“I
CHAPTER II
ON THE TRAIL OF A JOB
don’t know what we are going to do, Harry, if the cost of living
goes any higher.” Mrs. Harding stared across the little center
table at her sixteen-year-old son, an expression of deep worry
looking out of her patient, brown eyes. “A dollar used to seem like
quite a lot of money, but it doesn’t go far these days. I’ve spent
every cent I dare this week for groceries, and we’ve still three days
to go until I’ll have the money for this dress. I’ve got to sew every
minute to get it done. Thank goodness, the rent’s paid for this
month. But you must have a new pair of shoes and I don’t know
where they are going to come from.” The little woman sighed, then
attacked her sewing with fresh energy. “I can’t stop even to
complain,” she added bravely.
“You’ll just have to let me go to work, Mother.” Harry Harding laid
the text-book he was studying on the table and regarded his mother
with serious eyes.
“But I don’t want to take you out of school, Harry,” she protested.
“You are getting along so well. Why, next year you’ll be in high
school.”
“No, I won’t, Mother. Do you think that a great big boy like me is
going to let his mother support him any longer? It’s time I went to
work. Besides, I haven’t the money for clothes and books and all the
other things high school fellows have to have. I’m past sixteen. Lots
of boys have to go to work when they’re only fourteen. I guess it
won’t hurt me any to begin now.”
“But I want you to have an education, Harry. If your father had
lived, he intended to let you go through high school and then to
college.” Mrs. Harding’s voice trembled a little. The sudden death of
her husband two years previous had been a shock from which she
had never quite recovered. It was hard for her even to mention his
name without shedding tears.
“I’ll get an education, somehow, and work, too,” returned Harry
confidently. “There are night schools where a fellow can go and
learn things. Please let me quit school to-morrow and try,” he
pleaded. “I can’t earn much at first, but even three dollars a week’ll
help some. I’ve got to start some time, you know. If you won’t let
me go to work I could sell papers after school.”
“No, you couldn’t,” retorted his mother with decision. “I’d rather
have you leave school than see you racing around the city streets
selling papers. That’s one thing you sha’n’t do.”
“Then let me go and hunt a job,” begged the boy.
“I’ll think it over. Now go on studying your lesson and don’t tease
me any more about it.”
Harry took up his book obediently enough. His frequent pleading
to leave school to go to work had always been promptly vetoed by
his mother. She had struggled desperately to keep her son in school
and was willing to go on with the struggle. It was Harry himself who
had repeatedly begged her to allow him to take his place in the
work-a-day world. She could never quite bring herself to the point of
consenting to the boy’s plea. But, to-night, as she thought darkly of
their poverty and of their continual fight against actual want she was
nearer consent than she had ever been before.
Perhaps Harry felt this, for it was not long until the book went
down on the table again. “Do say you’ll let me try, Mother,” he
implored earnestly. “You don’t know how much it means to me. It
isn’t as if I’d stop trying to learn things as soon as I started to work.
I’d study harder than ever. Just think how much the money would
help us after I’d been working awhile. Why, some of the greatest
men that ever lived had to quit school and go to work when they
were lots younger than I. Benjamin Franklin did, and so did Abraham
Lincoln. Just yesterday the teacher read us a story of how Lincoln
earned his first dollar when he was a boy.”
Mrs. Harding looked wistfully at her son’s eager face. “My little
son, do you want to help mother so much?” she asked tenderly. Her
voice trembled a little.
“You know I do. Oh, Mother, may I try? Are you going to say ‘yes’
at last?” Harry sprang from his chair and going to his mother’s chair
slipped his arm around her neck.
“Well,” began the little woman reluctantly, “if you are so set on
working, I guess you might as well try it. But remember, Harry, if
you don’t like it, you can go back to school. We’ll get along some
way.”
“But I shall like it,” protested Harry. “I’ve always said I was going
to be a business man when I grew up. If I start right now maybe I’ll
be one in a few years.”
“But where are you going to look for work, child?” asked Mrs.
Harding. Now that she had given her son the longed-for sanction to
make his own way, she began to feel something of his boyish
enthusiasm.
“I don’t know,” returned Harry thoughtfully. Then, seized with a
sudden inspiration, “I guess I’ll look in the Journal. That always has
a lot of advertisements.”
Picking up the evening paper, which lay on the center table, Harry
turned its leaves to the column of “Male Help Wanted,” and scanned
it earnestly. “Here’s one, Mother. ‘Boy wanted for errands, good
chance for advancement. Opportunity to learn business. 894 Tyler.’
That sounds good.” Taking the stub of a lead pencil from his pocket,
Harry carefully marked it. “Oh, here’s another. ‘Bright boy for office
work. 1684 Cameron.’” This advertisement was duly checked. Harry
went eagerly down the column until he had marked six
advertisements. “There, that will do to start with. If I don’t get a
position at any of those places I’ll try again when to-morrow’s paper
comes out. But surely some of them will have a chance for me. It’s
nine o’clock. I guess I’ll go to bed right now, so as to be up bright
and early in the morning.”
Piling his books on one arm, Harry went over to his mother and
kissed her good night. “You must keep thinking hard that I’m going
to get one of those positions, Mother,” he said brightly. Then he went
into the tiny room that was really half of his mother’s room,
curtained off for his use. Harry was very proud of his little room. It
was so small it held nothing but his cot bed, one chair, a small table
and a bamboo book-case of two shelves, which he had bought in a
second-hand store for a quarter. This held the few books he owned
and was dear to his heart.
After he had undressed and lay down on his bed he found that he
was too much excited over the prospect of his new venture to sleep.
Already he could see himself in a beautiful office, with soft rugs on
the floor and shining oak furniture. He could imagine himself saying,
“Yes, sir,” and “no, sir,” to his employer, and listening with alert
respectfulness to his orders. He would prove himself so willing to
work and perform whatever he was given to do so faithfully that in
time he would be promoted to something better. His favorite story-
book hero, Dick Reynolds, had begun work as an office boy and had
done wonderful things. Why couldn’t the same things happen again
to him?
When at ten o’clock his mother stole into the room, as was her
nightly custom before going to bed, for a last look at her son, she
saw two bright, wide-awake eyes peering at her. “This will never do,
little man,” she said, patting his cheek. “You must go to sleep, if you
are anxious to be up early to-morrow morning.”
“I’ll try, Mother,” sighed Harry, “but I just can’t help thinking about
it.”
After his mother had kissed him again and gone to her own room,
Harry shut his eyes tightly and resolved to go to sleep. When finally
the sandman did visit him, he dreamed that he was Dick Reynolds
and had secured a position in a bank. He was the president’s office
boy, and the president had sent him to the City Hall with a bag full
of bank notes. He ran all the way from the bank to the Hall and was
just going in the door when two boys leaped out from behind it and
tried to take the bag away from him. He fought like a tiger, but he
had to hang on to the bag with one hand while he knocked down
the thieves with the other. As fast as he knocked them down they
bobbed up again. Finally, one of them hit him over the head with an
arithmetic. It was his own book. He recognized it by the green paper
cover he had put on it. He wondered as he fought how the boy
happened to have his arithmetic. Then the other boy suddenly took
a long coil of rope from under his coat and lassoed him. He felt
himself falling, falling. He struck the pavement with a terrible crash.
Then——
“Why, Harry, what is the matter?” The City Hall, the money bag,
even the robbers had faded away, and Harry found himself sitting on
the bare floor, blinking up at his mother, who bent anxiously over
him.
“I guess I must have been asleep, Mother, and fell out of bed.”
Harry eyed his mother sheepishly. “I dreamed I had a job in a bank
and was fighting two fellows who tried to take a whole lot of money
away from me. What time is it?”
“It’s ten minutes to twelve. Now, go straight to sleep, or I won’t
call you early.”
Harry obediently climbed back into bed and was not heard from
again that night. It seemed to him as though he had hardly gone to
sleep before he heard his mother calling, “Six o’clock, Harry.” The
boy was out of bed in an instant. He pattered to the window,
rubbing the sleep out of his eyes as he went. The light of a perfect
day in early October shone in as he raised the shade. If good
weather were a happy omen, then surely he would obtain that which
he was going forth so earnestly to seek.
His mother had taken special pains with his breakfast that
morning, and though he was quivering with excitement over what
was to be his first venture into the busy world of trade, he tried to
show his appreciation of her tender thoughtfulness by eating a
hearty meal. In his neat, blue serge suit, he had put on his Sunday
best, his well-shined shoes and his clean, white shirt with its
immaculate collar, he was above reproach as far as attire went, and
his bright, boyish face with its clear, blue eyes and clean-cut,
resolute mouth made him a boy to be proud of. So his mother
thought as she looked approvingly at him across the table. She
stifled the sigh of regret that her boy must so early take his place
among the bread-winners, and listened to his eager plan of what he
intended to do with an encouraging smile.
“Well, Mother, I’m off. That was a dandy breakfast. You know what
I like, don’t you. I wish all the boys in the world had mothers like
you. I don’t know when I’ll be back. If I don’t come home all day,
you’ll know I’m working.” Reaching to the nail where he always hung
his cap, Harry stood for an instant with it in his hand. Then he kissed
his mother and went manfully down the two flights of stairs to the
street.
He had clipped from the paper the section of the want column
with the advertisements he had marked. Now he studied it earnestly
and set out for the Tyler Street address. It was at least fifteen
squares from his home, but the clock on a nearby church had just
chimed out the hour of seven. In his pocket reposed twenty cents in
small change. He had earned it by doing errands after school. But he
made up his mind that not a penny of it should go for carfare if he
could help it. He had plenty of time to walk. He would very likely
reach the place he had selected for his first call before the office was
open. He wondered what sort of building it would be, and whether it
was an office building or a factory. More than one person glanced in
friendly fashion at the erect, manly lad as he hurried along. There
was something in his earnest young face that commanded attention
and instant approbation.
“There it is,” he murmured as, after a half-hour’s brisk walk he
came opposite a tall rather dingy-looking brick building. “That must
be the office over there where the sign is hanging out.”
Hurrying across the street the boy approached the door over
which hung the sign, “The Knickerbocker Worsted Mills.” He read it
aloud, then looked a trifle disappointed. This did not exactly accord
with his ideas of a position. Then he laughed at his own mental
hesitation. “What do you care if it is a mill office, Harry Harding,” he
murmured. “It’s work you’re looking for, and you can’t expect to
have everything just the way you want it.”
Turning the knob on the door that bore a small sign of “Office,”
the boy opened it and stepped inside a long room that had the
shining oak furniture of his dreams. This room was divided off into
many compartments by little oak fences with swinging gates. Near
the door, at a little desk, sat a boy of about his own age. As he
stepped into the room the boy rose to meet him.
“Whada yuh want?” he asked superciliously.
“Good morning,” said Harry politely. “I came in answer to your
advertisement in the Journal for a boy. To whom do I go?”
“Yuh don’t go unless I let yuh in,” declared the boy ill-naturedly.
“Anyway, the position’s filled. The boss just hired a boy about ten
minutes ago. That’s him over there.” He pointed to a black-haired
lad, who had just emerged from a room adjoining the long office.
“That’s the kid. Yuh better beat it. Nothin’ doin’ around here.”
“Can’t I see the manager or—or—someone?” persisted Harry.
“Naw, yuh can’t. Think I wanta get my head snapped off by buttin’
in where Mr. Warner’s openin’ his mail? Guess I know my business.
Didn’t the boss just say, ‘Fred, if any more boys come here answerin’
our ad, tell ’em we’ve hired a boy?’ There’s nothin’ doin’, I tell yuh.
Can’t yuh understand that?”
“Yes, I can understand that,” retorted Harry with spirit. “What I
can’t understand is how a big firm like this happens to have such a
rude office boy. Good morning.”
Harry walked away, his cheeks burning, eyes snapping, leaving the
disagreeable boy to gaze after him in positive astonishment.
Once outside the office, Harry paused and taking out the section
of newspaper he had marked, scanned it earnestly. The next nearest
place he had selected was at least a mile and a half from where he
stood. It was twenty minutes to eight o’clock. “I guess I’d better
ride,” mused Harry. “The earlier I reach a place, the better my
chance will be to get something to do. I hope all the places won’t be
like that mill. Why, I didn’t have a chance to talk to a soul except
that smart office boy.”
When, at a few minutes after eight o’clock, Harry climbed the
steps of an imposing building of white stone, and was waved to a
door on the right by a uniformed attendant, he entered a good-sized
ante-room, only to find it filled with boys of anywhere from fourteen
to eighteen years of age. They were not making so much noise as
one might expect at least fifteen active boys to make, yet a distinct
buzz of conversation was going on.
Harry paused irresolutely. His eyes met those of a thin, red-haired,
black-eyed boy with a mischievous face who stood just to the right
of the door. The black-eyed boy grinned in friendly fashion. “Hullo,”
he said.
“Good-morning,” returned Harry, answering the grin with a
pleasant smile. “Are all these boys looking for the same position?”
“Yep,” nodded the black-eyed boy. “I guess the fellow that’s in the
office now is going to get it. He’s been there quite a while.”
He had hardly finished speaking when the door to the inner office
opened and a tall, severe-looking man appeared. “We won’t need
you, boys,” he said curtly. “The position is filled.” He waved his arm
as though to shoo the waiting throng of lads out of the ante-room,
then disappeared. The door closed after him with a reverberating
bang that shattered the hopes of the fifteen waiting youngsters.
“Huh,” ejaculated the black-eyed boy in disgust, “no more offices
like this for me. I’ve been to two before this, and every time I’m too
late. I guess these fellows that get the jobs get up in the middle of
the night. Me for Martin’s Department Store. That’s where I ought to
have gone in the first place.”
“Do they need boys there?” asked Harry. He had walked beside his
new acquaintance as far as the door. Here they paused. The
attendant eyed them threateningly.
“I hope so. Come on. Let’s get out of here. That man in the
uniform will hurt his eyes tryin’ to look a hole through us.” The thin
little boy urged Harry out of the building and down the steps to the
street. “Say, what’s your name?” he asked curiously.
“Harry Harding. What is yours?”
“My name’s Theodore Burke, but everybody calls me Ted or Teddy,
and I just quit school to find a job.”
“I haven’t quit yet,” declared Harry, “but I’m going to as soon as I
find work.”
“Then you didn’t get fired?”
“Oh, no. I am going to work to help my mother. I am obliged to
find work.”
“I had a fight with the teacher,” related Teddy, with unabashed
candor. “She said I was a menace to the West Park School, and she
was going to have me put in a school for tough kids. So I gave the
fellows my stuff and beat it at recess. Ma was mad, but she got over
it right away and said I could go to work if I wanted to.”
“The teacher couldn’t put you in a school for tough boys, unless
you did something pretty bad,” informed Harry.
“I put a rubber snake in a girl’s seat,” confessed Ted, “and she
hollered like anything.” His black eyes twinkled.
Harry laughed. “Nobody could put you in a reform school for that,”
he said wisely. “The teacher was trying to scare you. I guess you’re
just full of mischief, that’s all.”
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

More Related Content

PDF
Social Media Analytics Strategies And Governance Hamid Jahankhani
PDF
Frontiers In Data Science 1st Edition Matthias Dehmer Frank Emmertstreib
PDF
International Journal of Engineering Research and Development
PDF
Big data privacy issues in public social media
PDF
Big Data In Education The Digital Future Of Learning Policy And Practice 1st ...
PDF
PDF
SECUREWALL-A FRAMEWORK FOR FINEGRAINED PRIVACY CONTROL IN ONLINE SOCIAL NETWORKS
PDF
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
Social Media Analytics Strategies And Governance Hamid Jahankhani
Frontiers In Data Science 1st Edition Matthias Dehmer Frank Emmertstreib
International Journal of Engineering Research and Development
Big data privacy issues in public social media
Big Data In Education The Digital Future Of Learning Policy And Practice 1st ...
SECUREWALL-A FRAMEWORK FOR FINEGRAINED PRIVACY CONTROL IN ONLINE SOCIAL NETWORKS
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTION

Similar to Big Data In Complex And Social Networks My T Thai Weili Wu Hui Xiong (20)

PDF
CBSE Open Textbook English
PDF
Englishmain12classix 131025065953-phpapp01
PPT
Presentation big data and social media final_video
PDF
Digital Twin Technology And Applications A Daniel Srinivasan Sriramulu
PDF
Frontiers in Data Science 1st Edition Matthias Dehmer
PDF
Beyond Broadband Access Developing Databased Information Policy Strategies Ri...
PDF
Data Clustering Algorithms and Applications First Edition Charu C. Aggarwal
PDF
Trusted Data A New Framework for Identity and Data Sharing revised and expand...
PDF
Trusted Data A New Framework for Identity and Data Sharing revised and expand...
PDF
Information Systems What Every Business Student Needs To Know Mallach
PDF
Information Systems What Every Business Student Needs To Know Mallach
PDF
Information and Communication Technologies in Healthcare 1st Edition Stephan ...
PDF
Big Data Concepts Technologies And Applications Mohammad Shahid Husain
PDF
New and Emerging Forms of Data
PPTX
G-12 UNIT -1.pptx information system and thier application
PDF
Information Assurance Architecture Keith D Willett
PDF
PPTX
Social networking boon or a bane
PPTX
Social Media Ethics
PDF
Data Clustering Algorithms and Applications First Edition Charu C. Aggarwal
CBSE Open Textbook English
Englishmain12classix 131025065953-phpapp01
Presentation big data and social media final_video
Digital Twin Technology And Applications A Daniel Srinivasan Sriramulu
Frontiers in Data Science 1st Edition Matthias Dehmer
Beyond Broadband Access Developing Databased Information Policy Strategies Ri...
Data Clustering Algorithms and Applications First Edition Charu C. Aggarwal
Trusted Data A New Framework for Identity and Data Sharing revised and expand...
Trusted Data A New Framework for Identity and Data Sharing revised and expand...
Information Systems What Every Business Student Needs To Know Mallach
Information Systems What Every Business Student Needs To Know Mallach
Information and Communication Technologies in Healthcare 1st Edition Stephan ...
Big Data Concepts Technologies And Applications Mohammad Shahid Husain
New and Emerging Forms of Data
G-12 UNIT -1.pptx information system and thier application
Information Assurance Architecture Keith D Willett
Social networking boon or a bane
Social Media Ethics
Data Clustering Algorithms and Applications First Edition Charu C. Aggarwal
Ad

Recently uploaded (20)

PDF
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
PDF
International_Financial_Reporting_Standa.pdf
PPTX
Share_Module_2_Power_conflict_and_negotiation.pptx
PDF
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 2).pdf
PPTX
What’s under the hood: Parsing standardized learning content for AI
PDF
LEARNERS WITH ADDITIONAL NEEDS ProfEd Topic
PDF
English Textual Question & Ans (12th Class).pdf
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
PDF
Uderstanding digital marketing and marketing stratergie for engaging the digi...
PPTX
Virtual and Augmented Reality in Current Scenario
PPTX
Module on health assessment of CHN. pptx
PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
PDF
HVAC Specification 2024 according to central public works department
PPTX
Unit 4 Computer Architecture Multicore Processor.pptx
PDF
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
PDF
My India Quiz Book_20210205121199924.pdf
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PDF
LIFE & LIVING TRILOGY- PART (1) WHO ARE WE.pdf
PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
International_Financial_Reporting_Standa.pdf
Share_Module_2_Power_conflict_and_negotiation.pptx
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 2).pdf
What’s under the hood: Parsing standardized learning content for AI
LEARNERS WITH ADDITIONAL NEEDS ProfEd Topic
English Textual Question & Ans (12th Class).pdf
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
Uderstanding digital marketing and marketing stratergie for engaging the digi...
Virtual and Augmented Reality in Current Scenario
Module on health assessment of CHN. pptx
FORM 1 BIOLOGY MIND MAPS and their schemes
HVAC Specification 2024 according to central public works department
Unit 4 Computer Architecture Multicore Processor.pptx
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
My India Quiz Book_20210205121199924.pdf
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
LIFE & LIVING TRILOGY- PART (1) WHO ARE WE.pdf
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
Ad

Big Data In Complex And Social Networks My T Thai Weili Wu Hui Xiong

  • 1. Big Data In Complex And Social Networks My T Thai Weili Wu Hui Xiong download https://ebookbell.com/product/big-data-in-complex-and-social- networks-my-t-thai-weili-wu-hui-xiong-51753960 Explore and download more ebooks at ebookbell.com
  • 2. Here are some recommended products that we believe you will be interested in. You can click the link to download. Big Data In Complex Systems Challenges And Opportunities 1st Edition Aboul Ella Hassanien https://ebookbell.com/product/big-data-in-complex-systems-challenges- and-opportunities-1st-edition-aboul-ella-hassanien-4972908 Data Science In Theory And Practice Techniques For Big Data Analytics And Complex Data Sets Maria C Mariani https://ebookbell.com/product/data-science-in-theory-and-practice- techniques-for-big-data-analytics-and-complex-data-sets-maria-c- mariani-52557058 Humancomputer Interaction And Knowledge Discovery In Complex Unstructured Big Data Third International Workshop Hcikdd 2013 Held At Southchi 2013 Maribor Slovenia July 13 2013 Proceedings 1st Edition Cagatay Turkay https://ebookbell.com/product/humancomputer-interaction-and-knowledge- discovery-in-complex-unstructured-big-data-third-international- workshop-hcikdd-2013-held-at-southchi-2013-maribor-slovenia- july-13-2013-proceedings-1st-edition-cagatay-turkay-4241684 International Conference On Oriental Thinking And Fuzzy Logic Celebration Of The 50th Anniversary In The Era Of Complex Systems And Big Data 1st Edition Bingyuan Cao https://ebookbell.com/product/international-conference-on-oriental- thinking-and-fuzzy-logic-celebration-of-the-50th-anniversary-in-the- era-of-complex-systems-and-big-data-1st-edition-bingyuan-cao-5484080
  • 3. Big Data In Finance Opportunities And Challenges Of Financial Digitalization Thomas Walker https://ebookbell.com/product/big-data-in-finance-opportunities-and- challenges-of-financial-digitalization-thomas-walker-46495150 Big Data In Energy Economics Hui Liu Nikolaos Nikitas Yanfei Li https://ebookbell.com/product/big-data-in-energy-economics-hui-liu- nikolaos-nikitas-yanfei-li-46668440 Big Data In Medical Science And Healthcare Management Diagnosis Therapy Side Effects Peter Langkafel Editor https://ebookbell.com/product/big-data-in-medical-science-and- healthcare-management-diagnosis-therapy-side-effects-peter-langkafel- editor-51110618 Big Data In Bioeconomy Results From The European Databio Project 1st Edition Caj Sdergrd https://ebookbell.com/product/big-data-in-bioeconomy-results-from-the- european-databio-project-1st-edition-caj-sdergrd-51699828 Big Data In Oncology Impact Challenges And Risk Assessment Neeraj Kumar Fuloria Rishabha Malviya Swati Verma Balamurugan Balusamy https://ebookbell.com/product/big-data-in-oncology-impact-challenges- and-risk-assessment-neeraj-kumar-fuloria-rishabha-malviya-swati-verma- balamurugan-balusamy-53249324
  • 6. BIG DATA IN COMPLEX AND SOCIAL NETWORKS
  • 7. Chapman & Hall/CRC Big Data Series PUBLISHED TITLES SERIES EDITOR Sanjay Ranka AIMS AND SCOPE This series aims to present new research and applications in Big Data, along with the computa- tional tools and techniques currently in development. The inclusion of concrete examples and applications is highly encouraged.The scope of the series includes, but is not limited to, titles in the areas of social networks, sensor networks, data-centric computing, astronomy, genomics, medical data analytics, large-scale e-commerce, and other relevant topics that may be proposed by poten- tial contributors. BIG DATA COMPUTING: A GUIDE FOR BUSINESS AND TECHNOLOGY MANAGERS Vivek Kale BIG DATA IN COMPLEX AND SOCIAL NETWORKS My T. Thai, Weili Wu, and Hui Xiong BIG DATA OF COMPLEX NETWORKS Matthias Dehmer, Frank Emmert-Streib, Stefan Pickl, and Andreas Holzinger BIG DATA : ALGORITHMS, ANALYTICS, AND APPLICATIONS Kuan-Ching Li, Hai Jiang, Laurence T.Yang, and Alfredo Cuzzocrea NETWORKING FOR BIG DATA ShuiYu, Xiaodong Lin, Jelena Mišić, and Xuemin (Sherman) Shen
  • 8. BIG DATA IN COMPLEX AND SOCIAL NETWORKS EDITED BY My T. Thai University of Florida, USA Weili Wu University of Texas at Dallas, USA Hui Xiong Rutgers, The State University of New Jersey, USA
  • 9. CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2017 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Printed on acid-free paper Version Date: 20161014 International Standard Book Number-13: 978-1-4987-2684-9 (Hardback) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information stor- age or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copy- right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro- vides licenses and registration for a variety of users. For organizations that have been granted a photo- copy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com
  • 10. Contents Preface vii Editors ix Section I Social Networks and Complex Networks Chapter 1 Hyperbolic Big Data Analytics within Complex and Social Networks 3 Eleni Stai, Vasileios Karyotis, Georgios Katsinis, Eirini Eleni Tsiropoulou and Symeon Papavassiliou Chapter 2 Scalable Query and Analysis for Social Networks 37 Tak-Lon (Stephen) Wu, Bingjing Zhang, Clayton Davis, Emilio Ferrara, Alessandro Flammini, Filippo Menczer and Judy Qiu Section II Big Data and Web Intelligence Chapter 3 Predicting Content Popularity in Social Networks 65 Yan Yan, Ruibo Zhou, Xiaofeng Gao and Guihai Chen Chapter 4 Mining User Behaviors in Large Social Networks 95 Meng Jiang and Peng Cui Section III Security and Privacy Issues of Social Networks Chapter 5 Mining Misinformation in Social Media 125 Liang Wu, Fred Morstatter, Xia Hu and Huan Liu v
  • 11. vi Contents Chapter 6 Rumor Spreading and Detection in Online Social Networks 153 Wen Xu and Weili Wu Section IV Applications Chapter 7 A Survey on Multilayer Networks and the Applications 183 Huiyuan Zhang, Huiling Zhang and My T. Thai Chapter 8 Exploring Legislative Networks in a Multiparty System 213 Jose Manuel Magallanes Index 233
  • 12. Preface In the past decades, the world has witnessed a blossom of online social net- works, such as Facebook and Twitter. This has revolutionized the way of hu- man interaction and drastically changed the landscape of information sharing in cyberspace nowadays. Along with the explosive growth of social networks, huge volumes of data have been generating. The research of big data, referring to these large datasets, gives insight into many domains, especially in complex and social network applications. In the research area of big data, the management and analysis of large- scale datasets are quite challenging due to the highly unstructured data col- lected. The large size of social networks, spatio-temporal effect and interaction between users are among various challenges in uncovering behavioral mecha- nisms. Many recent research projects are involved in processing and analyzing data from social networks and attempt to better understand the complex net- works, which motivates us to prepare an in-depth material on recent advances in areas of big data and social networks. This handbook is to provide recent developments on theoretical, algorith- mic and application aspects of big data in complex social networks. The hand- book consists of four parts, covering a wide range of topics. The first part focuses on data storage and data processing. The efficient storage of data can fundamentally support intensive data access and queries, which enables so- phisticated analysis. Data processing and visualization help to communicate information clearly and efficiently. The second part of this handbook is devoted to the extraction of essential information and the prediction of web content. By performing big data analysis, we can better understand the interests, lo- cation and search history of users and have more accurate prediction of users’ behaviors. The book next focuses on the protection of privacy and security in Part 3. Modern social media enables people to share and seek information effectively, but also provides effective channels for rumor and misinformation propagation. It is essentially important to model the rumor diffusion, identify misinformation from massive data and design intervention strategies. Finally, Part 4 discusses the emergent application of big data and social networks. It is particularly interested in multilayer networks and multiparty systems. We would like to take this opportunity to thank all authors, the anonymous referees, and Taylor Francis Group for helping us to finalize this handbook. Our thanks also go to our students for their help during the processing of all contributions. Finally, we hope that this handbook will encourage research on vii
  • 13. viii Preface the many intriguing open questions and applications in the area of big data and social networks that still remain. My T. Thai Weili Wu Hui Xiong
  • 14. Editors My T. Thai is a professor and associate chair for research in the department of computer and information sciences and engineering at the University of Florida. She received her PhD degree in computer science from the Univer- sity of Minnesota in 2005. Her current research interests include algorithms, cybersecurity and optimization on network science and engineering, including communication networks, smart grids, social networks and their interdepen- dency. The results of her work have led to 5 books and 120+ articles published in various prestigious journals and conferences on networking and combina- torics. Dr. Thai has engaged in many professional activities. She has been a TPC- chair for many IEEE conferences, has served as an associate editor for Journal of Combinatorial Optimization (JOCO), Optimization Letters, Journal of Dis- crete Mathematics, IEEE Transactions on Parallel and Distributed Systems, and a series editor of Springer Briefs in Optimization. Recently, she has co- founded and is co-Editor-in-Chief of Computational Social Networks journal. She has received many research awards including a UF Research Foundation Fellowship, UF Provosts Excellence Award for Assistant Professors, a Depart- ment of Defense (DoD) Young Investigator Award, and an NSF (National Science Foundation) CAREER Award. Weili Wu is a full professor in the department of computer science, Univer- sity of Texas at Dallas. She received her PhD in 2002 and MS in 1998 from the department of computer science, University of Minnesota, Twin City. She received her BS in 1989 in mechanical engineering from Liaoning University of Engineering and Technology in China. From 1989 to 1991, she was a mechani- cal engineer at Chinese Academy of Mine Science and Technology. She was an associate researcher and associate chief engineer in Chinese Academy of Mine Science and Technology from 1991 to 1993. Her current research mainly deals with the general research area of data communication and data management. Her research focuses on the design and analysis of algorithms for optimiza- tion problems that occur in wireless networking environments and various database systems. She has published more than 200 research papers in vari- ous prestigious journals and conferences such as IEEE Transaction on Knowl- edge and Data Engineering (TKDE), IEEE Transactions on Mobile Comput- ing (TMC), IEEE Transactions on Multimedia (TMM), ACM Transactions on Sensor Networks (TOSN), IEEE Transactions on Parallel and Distributed ix
  • 15. x Editors Systems (TPDS), IEEE/ACM Transactions on Networking (TON), Journal of Global Optimization (JGO), Journal of Optical Communications and Net- working (JOCN), Optimization Letters (OPTL), IEEE Communications Let- ters (ICL), Journal of Parallel and Distributed Computing (JPDC), Journal of Computational Biology (JCB), Discrete Mathematics (DM), Social Network Analysis and Mining (SNAM), Discrete Applied Mathematics (DAM), IEEE INFOCOM (The Conference on Computer Communications), ACM SIGKDD (International Conference on Knowledge Discovery Data Mining), Interna- tional Conference on Distributed Computing Systems (ICDCS), International Conference on Database and Expert Systems Applications (DEXA), SIAM Conference on Data Mining, as well as many others. Dr. Wu is associate edi- tor of SOP Transactions on Wireless Communications (STOWC), Computa- tional Social Networks, Springer and International Journal of Bioinformatics Research and Applications (IJBRA). Dr. Wu is a senior member of IEEE. Hui Xiong is currently a full professor of management science and informa- tion systems at Rutgers Business School and the director of Rutgers Center for Information Assurance at Rutgers, the State University of New Jersey, where he received a two-year early promotion/tenure (2009), the Rutgers Univer- sity Board of Trustees Research Fellowship for Scholarly Excellence (2009), and the ICDM-2011 Best Research Paper Award (2011). Dr. Xiong is a prominent researcher in the areas of business intelligence, data mining, big data, and geographic information systems (GIS). For his out- standing contributions to these areas, he was elected an ACM Distinguished Scientist. He has a distinguished academic record that includes 200+ referred papers and an authoritative Encyclopedia of GIS (Springer, 2008). He is serv- ing on the editorial boards of IEEE Transactions on Knowledge and Data En- gineering (TKDE), ACM Transactions on Management Information Systems (TMIS) and IEEE Transactions on Big Data. Also, he served as a program co-chair of the Industrial and Government Track for the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), a program co-chair for the IEEE 2013 International Conference on Data Mining (ICDM-2013), and a general co-chair for the IEEE 2015 International Confer- ence on Data Mining (ICDM-2015).
  • 16. I Social Networks and Complex Networks
  • 18. C H A P T E R 1 A Hyperbolic Big Data Analytics Framework within Complex and Social Networks Eleni Stai, Vasileios Karyotis, Georgios Katsinis, Eirini Eleni Tsiropoulou and Symeon Papavassiliou CONTENTS 1.1 Introduction ................................................ 4 1.1.1 Scope and Objectives .............................. 5 1.1.2 Outline ............................................. 6 1.2 Big Data and Network Science ............................ 6 1.2.1 Complex Networks, Big Data and the Big Data Chain .............................................. 6 1.2.2 Big Data Challenges and Complex Networks ...... 8 1.3 Big Data Analytics based on Hyperbolic Space ........... 9 1.3.1 Fundamentals of Hyperbolic Geometric Space .... 11 1.4 Data Correlations and Dimensionality Reduction in Hyperbolic Space .......................................... 14 1.4.1 Example ............................................ 15 1.5 Embedding of Networked Data in Hyperbolic Space and Applications ............................................... 17 1.5.1 Rigel Embedding in the Hyperboloid Model ...... 17 1.5.2 HyperMap Embedding ............................. 19 1.6 Greedy Routing over Hyperbolic Coordinates and Applications within Complex and Social Networks ....... 21 1.7 Optimization Techniques over Hyperbolic Space for Decision-Making in Big Data .............................. 23 3
  • 19. 4 Big Data in Complex and Social Networks 1.7.1 The Case of Advertisement Allocation over Online Social Networks .................................... 23 1.7.2 The Case of File Allocation Optimization in Wireless Cellular Networks ........................ 27 1.8 Visualization Analytics in Hyperbolic Space .............. 29 1.8.1 Adaptive Focus in Hyperbolic Space .............. 30 1.8.2 Hierarchical (Tree) Graphs ........................ 31 1.8.3 General Graphs .................................... 31 1.9 Conclusions ................................................ 32 Acknowledgment ........................................... 32 Further Reading ........................................... 32 D ata management and analysis has stimulated paradigm shifts in decision-making in various application domains. Especially the emer- gence of big data along with complex and social networks has stretched the imposed requirements to the limit, with numerous and crucial potential bene- fits. In this chapter, based on a novel approach for big data analytics (BDA), we focus on data processing and visualization and their relations with com- plex network analysis. Thus, we adopt a holistic perspective with respect to complex/social networks that generate massive data and relevant analytics techniques, which jointly impact societal operations, e.g., marketing, adver- tising, resource allocation, etc., closing a loop between data generation and exploitation within complex networks themselves. In the latest literature, a strong relation between hyperbolic geometry and complex networks is shown, as the latter eventually exhibit a hidden hyperbolic structure. Inspired by this fact, the methodology adopted in this chapter leverages on key properties of the hyperbolic metric space for complex and social networks, exploited in a general framework that includes processes for data correlation/clustering, missing data (e.g., links) inference, social network analysis metrics efficient computations, optimization, resource (advertisements, files, etc.) allocation and visualization analytics. More specifically, the proposed framework con- sists of the above hyperbolic geometry based processes/components, arranged in a chain form. Some of those components can also be applied independently, and potentially combined with other traditional statistical learning techniques. We emphasize the efficiency of each process in the complex networks domain, while also pinpointing open and interesting research directions. 1.1 INTRODUCTION Data processing and analysis was one of the main drivers for the prolif- eration of computers (processing) and communications networks (analysis and transfer). However, lately, a paradigm shift is witnessed where networks
  • 20. Hyperbolic Big Data Analytics within Complex and Social Networks 5 themselves, e.g., social networks and sensor networks, can create data as well, and, in fact, in massive quantities. Indeed, gigantic datasets are produced on purpose or spontaneously, and stored by traditional and new applica- tions/services. Characteristic examples include the envisaged Internet of Things (IoT) paradigm [1], where pervasive sensors and actuators for almost every aspect of human activity will collect, process and make decisions on massive data, e.g., for surveillance, healthcare, etc. Similarly, the Internet, mobile networks, and overlaying (social) networks, i.e., Google, Facebook and others described in [2], [3], are responsible for the explosion of produced and transferred data. Collecting, processing and analyzing these data generated at unprecedented rates has concentrated significant research, technological and financial interest lately, in a broader framework popularly known as “big data analytics” (BDA) [2]. The current setting is only expected to intensify in the future, since the expanding complex and social networks are expected to generate much more massive amounts of complexly inter-related information and impose harsher data storage, processing, analysis and visualization requirements. 1.1.1 Scope and Objectives Given the aforementioned setting and the fact that significant research and technological progress has taken place regarding the lower level aspects, e.g., storage and processing, this chapter focuses more on aspects of data analytics. It aspires to provide a framework for combining traditional methodologies (e.g., statistical learning) with novel techniques (e.g., communications theory) providing holistic and efficient solutions. More specifically, we adopt a radical perspective for performing data an- alytics, advocating the use of cross-discipline mathematical tools, and more specifically exploiting properties of hyperbolic space [4], [5]. We postulate that hyperbolic metric spaces can provide the substrate required in data analytics for keeping up with the pace of data volume explosion and required processing. The main goal is to briefly describe a holistic framework for data represen- tation, analysis (e.g., correlation, clustering, prediction), visualization, and decision making in complex and social networks, based on the principles of hyperbolic geometry and its properties. Then, the chapter will touch on sev- eral key BDA aspects, i.e., data correlation, dimensionality reduction, data and networks’ embeddings, navigation, social networks analysis (SNA) met- rics’ computation and optimization, and show how they are accommodated by the above framework, along with the associated benefits achieved. The chapter will also explain the salient characteristics of these approaches re- lated to the features and properties of complex and social networks of interest generating massive datasets of diverse types. Finally, throughout the chapter, we highlight the key directions that will be of great potential interest in the future.
  • 21. 6 Big Data in Complex and Social Networks 1.1.2 Outline The rest of this chapter is organized as follows. In Section 1.2 the relation between complex networks-big data processes and their emerging challenges are presented, while in Section 1.3 the proposed hyperbolic geometry based approach is introduced and analyzed. Section 1.4 describes how to perform data correlation, and dimensionality reduction over hyperbolic space. In Sec- tion 1.5 several types of data embeddings on hyperbolic space, along with their properties especially related to complex networks are studied. In Section 1.6, we examine the navigability of complex networks embedded in hyperbolic space via greedy routing techniques. In Section 1.7 optimization methodolo- gies over large complex and social network graphs using hyperbolic space are described, while applications on advertisement and file allocation problems are pinpointed. In Section 1.8, visualization techniques based on hyperbolic space and their proporties/advantages versus Euclidean based ones are surveyed. Finally, Section 1.9 concludes the chapter. 1.2 BIG DATA AND NETWORK SCIENCE 1.2.1 Complex Networks, Big Data and the Big Data Chain Diverse types of complex and social networks are nowadays responsible for both massive data generation and transfers. The corresponding research and technological progress has been cumulatively addressed under the Network Science/Complex Network Analysis (CNA) domain [6]. It has been observed that several types of networks demonstrate similar, or identical behaviors. For example, modern societies are nowadays charac- terized as connected, inter-connected and inter-dependent via various network structures. Communication and social networks have been co-evolving in the last decade into a complex hierarchical system, which asymmetrically expands in time, as shown in Figure 1.1. The interconnecting physical layer expands orders of magnitude faster than the growth rate of the overlaying social one. This leads to the generation of massive quantities of data from both layers, for different purposes, e.g., data transferred in the low layer, control and peer data at the higher, etc., in unprecedented rates compared to the past. This form of “social IoT” (s-IoT) [7] is tightly related to the big data setting, as stor- age, analysis and inference over gigantic datasets impose stringent resource requirements and are tightly inter-related with the structure and operation of the complex and social networks involved. Various forms of BDA are applied nowadays in diverse disciplines, e.g., banking, retail chains/shopping, health- care, insurance, public utilities, SNA, etc., where diverse complex networks produce and transfer data. Computers have revolutionized the whole process chain of data analyt- ics, allowing automation in a supervised manner. Nowadays, such a chain is part of a broader BDA pipeline that includes collection, correlation, manage- ment, search retrieval and visualization of data and analysis results, in
  • 22. Hyperbolic Big Data Analytics within Complex and Social Networks 7 FIGURE 1.1 Communication (complex) – social network co-evolution. unprecedented scales compared to the past [2]. More specifically, the BDA pipeline consists of data generation, acquisition, storage, analysis, vi- sualization and interpretation processes. Data generation involves creating data from multiple, diverse and dis- tributed sources including sensors, video, click streams, etc. Data acquisi- tion refers to obtaining information and it is subdivided into data collection, data transmission, and data pre-processing. The first refers to retrieving raw data from real-world objects, the second refers to a transmission process from data sources to appropriate storage systems, while the third one to all those techniques that may be needed prior to the main analysis stage, e.g., data integration, cleansing, transformation and reduction. Data integration aims at combining data residing in different sources and providing a unified data perspective. Data cleansing refers to determining inaccurate, incomplete, or unreasonable data and amending or removing (transforming) these data to improve data quality. Data reduction aims at decreasing the degree of redun- dancy of available data, which would in other cases increase data transmission overhead, storage costs, data inconsistency, reliability reduction and data cor- ruption. Analysis is the main stage of the BDA pipeline and can take multiple forms. The goal is to extract useful values, suggest conclusions and/or support decision-making. It can be descriptive, predictive and prescriptive. It may use
  • 23. 8 Big Data in Complex and Social Networks data visualization techniques, statistical analysis or data mining techniques in order to fulfill its goals and interpret the results. All the pre-analytics, ana- lytics and post-analytics stages (i.e., visualization and interpretation) of BDA described above can only become more diverse and very informative within the complex and social network ecosystems considered in this chapter. Thus, even though BDA is characterized by the four V’s — Volume (of data), Veloc- ity (generation speed), Veracity (quality) and Variability (heterogeneity) — the above settings create a new “V” feature for BDA, namely Value, rendering them essentially a new and in fact “expensive” commodity for our information societies. 1.2.2 Big Data Challenges and Complex Networks Several challenges emerge due to the fact that big data carry special char- acteristics, e.g., heterogeneity, spurious correlations, incidental endogeneities, noise accumulation, etc. [2], which become even more intense within the com- plex/social network environment. Challenges related to BDA can be distin- guished in challenges related to data, and challenges related to processes of the BDA pipeline. Table 1.1 summarizes these two types of challenges. Data-related challenges correspond to the four “V’s” of BDA with the ad- dition of privacy that relates more to personal data protection. The first two deal with storage and timeliness issues emerging from the explosion of data generated/collected, and the following two with the reliability and heterogene- ity of data due to multiple sources and types of data. Additional challenges emerging with respect to the big data pipeline deal with the data collection and transferring requirements imposed, the pre- processing and analysis of data with respect to the associated complexity, accurate and distributed computation, the accumulated noise, as well as other peripheral issues, such as data and results visualization, interpretation of re- sults and issues related to cloud storage, computing and services in general. TABLE 1.1 Big Data Challenges Big Data Challenges Data-Related BDA Pipeline-Related Volume Collection Transferring Velocity Pre-processing Analysis Veracity Complexity Distributed operation Variety Accuracy Noise Visualization Privacy Cloud computing Interpretation
  • 24. Hyperbolic Big Data Analytics within Complex and Social Networks 9 1.3 BIG DATA ANALYTICS BASED ON HYPERBOLIC SPACE The aforementioned challenges will require radical approaches for efficiently tackling the emerging problems and keeping up with the anticipated explosion of produced data. In this chapter, we describe a methodology that is capable of addressing holistically the above challenges and provide impetus for more efficient analytics in the future. The framework is conceptually shown in Fig- ures 1.2 and 1.3 and it is mainly based on the properties of hyperbolic metric spaces (a brief summary of which is included in the forthcoming subsection 1.3.1). This approach provides a generic computational substrate for data rep- resentation, analysis (e.g., correlation and clustering), inference, visualization, search navigation, and decision-making (via, e.g., optimization). The pro- posed framework builds on primitive pre-processing operations of traditional BDA techniques, e.g., statistical learning, and further complements them in terms of analytics and interpretation/visualization to allow more scalable, powerful and efficient inference and decision-making. Figure 1.2 shows the observed evolution of data volumes until today, where nowadays more than big, i.e., “hyperbolic”, data require processing. The pro- posed framework suggests a lean approach for tackling with such scaling. Input data may take either raw or networked form, where the latter corresponds to correlated data (nodes) and their correlations/relations (links between nodes) drawn from combinations of complex/social networks. Their analysis leads to sophisticated decision-making for challenging problems over large data sets, Data collectors/ owners Normal Data Hyperbolic Data Embedding on Hyperbolic space Data Correlations, Clustering, Network Creation Data Visualization Decision Making/ Optimization Big Data Search Navigation, Efficient Computations Optimization FIGURE 1.2 Evolution of data volume (from data to “hyperbolic” data), proposed framework’s functionalities and interaction with complex and social networks.
  • 25. 10 Big Data in Complex and Social Networks Big Data (Hyperbolic) Dimensionality Reduction (Hyperbolic) Correlations Network Estimation Hyperbolic Embedding Hyperbolic Resource Allocation Optimization Hyperbolic Visualization Analytics Inference, Clustering, Search Navigation, SNA Metrics Computations FIGURE 1.3 The workflow of the proposed hyperbolic geometry based approach for BDA over complex and social networks. e.g., resource allocation and optimization, thus eventually having an impact on the networks themselves, closing the loop of an evolutionary bond between networks (humans, IoT)-data-machines (analytics)(Figure 1.2). The role of the term “hyperbolic” in the proposed approach is twofold. On one hand, it successfully indicates the passage from “big data” to even more, i.e., “hyperbolic data”, denoting the tendency of growth of the avail- able data to be handled and analyzed in the future. On the other hand, it emphasizes the benefit of the use of hyperbolic geometry for BDA. The core of this approach is the fact that, as it is shown in the literature, networks of arbitrarily large size can be embedded in low-dimensional (even as small as two) hyperbolic spaces without sacrificing important information as far as network communication (e.g., routing) and structure (e.g., scale-free proper- ties [50]) are concerned [8], [9], [5]. Thus, hyperbolic spaces are congruent with complex network topologies and are much more appropriate for representing and analyzing big data than Euclidean spaces. The specific workflow of the proposed framework is shown in Figure 1.3. It starts with obtaining data and determining a suitable data representation model. Input (big) data from complex and social networks might be in raw (e.g., list) form, or in the form of a data network representing their correla- tions. Pre-processing of data follows, consisting of dimensionality reduction, correlations and generation of networks over data that may be performed either following traditional techniques or using hyperbolic geometry’s prop- erties. The data representation after their pre-processing (e.g., network or
  • 26. Hyperbolic Big Data Analytics within Complex and Social Networks 11 raw form) will either lead to or determine the appropriate methodology for the following data embedding into the hyperbolic geometric space (subject of Section 1.5). Data embedding is the assignment of coordinates to network nodes in the hyperbolic metric space. Properly visualizing the accumulated and inferred data following the analysis bears significant importance. The pro- posed framework will leverage on flexible (systolic) hyperbolic geometry based mechanisms for data visualization, in order to allow their holistic and simul- taneously focused view and more informed decision-making. This is capable of providing visualization tools that capture simultaneously global patterns and structural information, e.g., hierarchy, node centrality/importance, etc., and local characteristics, e.g., similarities, in an efficient and systolic manner, which hides/reveals detail when this is required by the decision-making in a scalable manner. The latter approach can be very useful in applications and studies of CNA/SNA. In this chapter, we also describe techniques for extracting useful infor- mation from the data under processing and analysis for different application domains. Following and depending on the data embedding, further data cor- relation/clustering and inference may be attained, in which various forms of (possibly hierarchical) data communities/clusters will be built and missing data (e.g., links) will be predicted from the input data within accuracy and time constraints imposed. Leveraging the hyperbolic distance function and greedy routing techniques, efficient SNA metrics computations (such as cen- tralities, the computation of which becomes hard over large data sets) will be studied and proposed. The proposed framework also allows performing efficient and suitable for large data sets optimization for advertisements’ allo- cation and other — mainly of discrete nature — resources’ allocation problems (e.g., file allocation over distributed cache memories in a 5G environment). In the following, we first present some background on hyperbolic space and then present the proposed framework in more detail. Following, we de- scribe in more detail techniques enabled by the framework for performing and exploiting the analytics over the embedded data. 1.3.1 Fundamentals of Hyperbolic Geometric Space Non-Euclidean geometries, e.g., hyperbolic geometry [4], emerged by ques- tioning and modifying the fifth (parallel) postulate of Euclidean geometry. According to the latter, given a line and a point that does not lie on it, there is exactly one line going through the given point that is parallel to the given line. As far as hyperbolic geometry is concerned, the parallel postulate changes as follows: Given a line and a point that does not lie on it, there is more than one line going through the given point that is parallel to the given line. The n-dimensional hyperbolic space, denoted as Hn , is an n-dimensional Riemannian manifold with negative curvature c which is most often considered constant and equal to c = −1. Several models of hyperbolic space exist such as the Poincare disk model, the Poincare half-space model, the Hyperboloid
  • 27. 12 Big Data in Complex and Social Networks model, the Klein model, etc. These models are isometric,1 i.e., any two of them can be related by a transformation which preserves all the geometrical properties (e.g., distance) of the space. We will describe in detail and use in our approach the Poincare models (disk and half space) which are mostly used in practical applications. For instance, the Hyperboloid model realizes the Hn hyperbolic space as a hyperboloid in Rn+1 = {(x0, ..., xn)|xi ∈ R, i = {0, 1, ..., n}} such that x2 0 − x2 1 − . . . − x2 n = 1, x0 0. Hyperbolic spaces have a metric function (distance) that differs from the familiar Euclidean distance, while also differs among the diverse models. In the case of the Hyperboloid model, for two points x = (x0, ..., xn), y = (y0, ..., yn), their hyperbolic distance is given by [4]: cosh dH(x, y) = r 1 + kxk 2 1 + kyk 2 − x, y , (1.1) where k·k is the Euclidean norm and ·, · represents the inner product. The Hyperboloid model can be used to construct the Poincare disk/ball model, where the latter is a perspective projection of the former viewed from (x0 = −1, x1 = 0, . . . , xn = 0), projecting the upper half hyperboloid onto an Rn unit ball centered at x0 = 0. Specifically, focusing on the two dimensions, the whole infinite hyperbolic plane can be represented inside the finite unit disk D = {z ∈ kzk 1} of the Euclidean space, which is the 2-dimensional Poincare disk model. The hyperbolic distance function dP D(zi, zj), for two points zi, zj, in the Poincare disk model is given by [4], [11]: cosh dP D(zi, zj) = 2 kzi − zjk 2 (1 − kzik 2 )(1 − kzjk 2 ) + 1. (1.2) The Euclidean circle ϑD = {z ∈ kzk = 1} is the boundary at infinity for the Poincare disk model. In addition, in this model, the shortest hyperbolic path between two nodes is either a part of a diameter of D, or a part of a Euclidean circle in D perpendicular to the boundary ϑD, as illustrated in Figure 1.4(a). Note that these shortest path curves differ from the cords that would be implied by the Euclidean metric. Let us now consider the following map in the two dimensions, z = w−i 1−iw , where z, with kzk 1, is a point expressed as a complex number on the Poincare disk model and i is the imaginary unit. Then w is a point (complex number) on the Poincare half-space model. This map sends z = −i to w = 0, z = 1 to w = 1 and z = i to w = ∞ (note that the extension to more dimensions is trivial). According to the Poincare half-space model of Hn , every point is rep- resented by a pair (w0, w) where, w0 ∈ R+ and w ∈ Rn−1 . The distance 1Isometry is a map that preserves distance [10] between metric spaces.
  • 28. Hyperbolic Big Data Analytics within Complex and Social Networks 13 FIGURE 1.4 Poincare disk (a) and half-space (b) models along with their shortest paths in two dimensions: part of a diameter of D or a part of a Euclidean circle in D perpendicular to the boundary ϑD for the disk model and vertical lines and semicircles perpendicular to R for the half-space model. (c) shows the Voronoi tesselation of the Poincare disk into hyperbolic triangles of equal area. between two points (w1 0, w1 ), (w2 0, w2 ) on the Poincare half-space model is defined as [12]: cosh dP H((w1 0, w1 ), (w2 0, w2 )) = 1 + (w1 0 − w2 0)2 + w1 − w2 2 2w1 0w2 0 . (1.3) Figure 1.4(b) depicts indicative shortest path curves for the Poincare half- space model similarly with the Poincare disk model in Figure 1.4(a). A remarkable advantage of hyperbolic space, regarding its application in BDA (see Sections 1.5 and 1.8), is its property of “exponential scaling” with respect to the radial coordinate. Specifically, the circumference C and area A of a circle of radius r in the 2-dimensional (2D) Poincare disk model are given by the following relations [46], [4], [8]: C(r) = 2π sinh(r), A(r) = 4πsinh2 (r/2). (1.4) Therefore, for small radius r, e.g., around the center of the Poincare disk, the hyperbolic space looks flat, while for larger r, both the circumference and the area grow exponentially with r. The exponential scaling with radius is illus- trated in Figure 1.4(c) which shows a tesselation of the Poincare disk into hy- perbolic triangles of equal area. The triangles appear increasingly smaller the closer they are to the circumference in the Euclidean visual representation of the triangulation. In the following, we describe the different components syn- thesizing the proposed framework, even though several parts can be combined and employed jointly.
  • 29. 14 Big Data in Complex and Social Networks 1.4 DATA CORRELATIONS AND DIMENSIONALITY REDUCTION IN HYPERBOLIC SPACE In this section, we describe two basic functionalities of the proposed frame- work (Figures 1.2 and 1.3). The first deals with inferring correlations among data, yielding network structures representing such relations (nodes-data, correlations-edges). The second deals with a distance-preserving dimensional- ity reduction approach over the hyperbolic space (i.e. multidimensional scaling [12], [13]) with multiple practical applications, e.g., various efficient compu- tations, efficient data visualization, etc. Each functionality of course can be applied independently. We assume generic forms of “data items”, each of which can be unrolled in a set of features. The set of features will be common for all data items, e.g., customer’s parameters such as payment information, demographic in- formation, etc., when customers correspond to data items. Before analytics one needs to apply a method for clustering/reduction of these features to a set of latent features (considered important to fully describe each data item). Examples of such methods include spectral clustering [principal component analysis (PCA)] [14], [15] singular value decomposition (SVD) [14], [15], etc., where each can be appropriately sped up to scale with large datasets, as in [15], [16], [17]. Following, correlations may be inferred via the application of sim- ilarity/distance metrics to quantify similarities on various data aspects (e.g., between pairs of data items). A thorough survey of similarity metrics such as cosine, Pearson, etc. is performed in [18]. Another widely accepted approach for computing similarities is the one that identifies distribution functions in the parameters of interest and then exploits an appropriate distribution com- parison metric, e.g., Kullback-Leibler divergence [19], [20] for probabilistic distributions. Hyperbolic distance may also serve as a similarity measure, as described in the following. Other ways of clustering and network estimation include [14] partitional algorithms (k-means and its variations, etc.), hierarchi- cal algorithms (agglomerative, divisive), the “lasso” algorithm and its variants that are based on convex optimization [21] producing a graph representation of the data, etc. In the case of the proposed framework, it is beneficial to con- sider hierarchical clustering of data for allowing efficient visualization using the two- or three-dimensional hyperbolic space (Section 1.8). Data correlations in hyperbolic space can be achieved via the hyperbolic distance function over the hyperbolic space of a suitable dimension — e.g., equal to the number of important features of users/products — applied on pairs of data items to reveal their hidden dependencies/correlations with re- spect to their features to a controllable extent. As an example, if having only two latent features describing the data items, we can assign the radial and an- gular coordinates of the 2D Poincare disk model according to the values of each feature correspondingly. Then, we consider linking two nodes together only if their hyperbolic distance (e.g., Equation (1.2) for the Poincare disk model) is less than a predefined upper bound. By controlling this upper bound, one can
  • 30. Hyperbolic Big Data Analytics within Complex and Social Networks 15 control the “neighborhood” of each node and thus the extent to which the correlations among data reach. In other words, important correlations may be considered up to a controllable extent via a threshold value over hyperbolic distance. This is a simple model of data correlation; however, its effective- ness lies in its simplicity and the fact that it can lead to a simultaneous data correlation, analysis and visualization. After embedding the data pieces/nodes on the k-dimensional Poincare half- space model (k corresponds to the number of latent features), one can apply a dimension reduction distance-preserving technique over the hyperbolic space, such as the one proposed in [13], [12]. Importantly, if choosing the dimension of the final metric space equal to 2 or 3, we will be able to achieve simultane- ously a visualization of the data set and its analysis/navigation (Sections 1.6 and 1.8). Particularly, regarding the dimensionality reduction over hyperbolic space, we provide the following two theorems from the literature [12], [22]. Given an n-point subset S of the hyperbolic space, let T be its projection on Rn−1 (i.e., the Poincare half-space model, Section 1.3.1). By Johnson- Linderstrauss Lemma [22], there exists an embedding of T, determined by a function f, into the O (logn) ε2 -dimensional Euclidean space such that for every points x1, x2 ∈ T, kx1 − x2k ≤ kf(x1) − f(x2)k ≤ (1 + ε) kx1 − x2k, ε 0. Theorem 1.1 (Dimension reduction for Hn ) Consider the map g : Hn → HO(logn) defined by g(w0, w) = (w0, f(w)). Then for every two points (w1 0, w1 ), (w2 0, w2 ) at hyperbolic distance ∆, we have: ∆ ≤ dP H(g(w1 0, w1 ), g(w2 0, w2 )) ≤ 1 + 3ε 1 + ∆ ∆. (1.5) Dimensionality reduction can be performed efficiently via the Fast Johnson- Linderstrauss Transform of Ailon and Chazelle [22], which is a low-distortion embedding of d-dimensional hyperbolic space to O(log n)-dimensional hyper- bolic space (n is the number of points to be embedded) based on the precon- ditioning of a sparse projection matrix with a randomized Fourier transform. Note that n will be equal to the number of data items, and this will be achieved by assigning the zero value to all dimensions of each data piece after the kth dimension in raw up to the n − 1 one. Theorem 1.2 (Embedding into the hyperbolic plane (for visualization pur- poses)) Assume that the distance between every two points in S is at least ln(12n) ε , then there exists an embedding of S into the hyperbolic plane H2 with distance distortion at most 1 + ε. 1.4.1 Example A similar methodology of data correlation over hyperbolic space is applied in [9], where the new nodes added in the network embedding in the hyperbolic
  • 31. 16 Big Data in Complex and Social Networks space form connections with existing ones. The popularity of the latter and the similarity of the new nodes with the existing ones is taken into account in determining the connections of the new nodes in the embedding. More specif- ically, newcomers choose existing nodes to connect via optimizing the product of similarity and popularity with them. In [9], the procedure of the simulta- neous data embedding/visualization and correlation in hyperbolic space is as follows, starting with an initially empty network. 1. At time t ≥ 1, a new node t is added to the embedded network and it is assigned the polar coordinates (rt, θt) where the angular coordinate, θt, is sampled uniformly at random from [0, 2π] and the radial coordinate, rt, relates to the birth date of node t via the relation rt(t) = ln t. Every existing node s t increases its radial coordinate to rs(t) = βrs(t) + (1 − β)rt(t), β ∈ [0, 1]. 2. The new node t connects with a subset of existing nodes {s}, where s t, ∀s. This subset consists of the m nodes with the m smallest values of product s · θst, where m is a parameter controlling the average node degree (i.e., the extent of the correlations among nodes), and θst is the angular distance between nodes s and t. Actually, by following the above steps for a network construction over data, it turns out that new nodes connect simply to their closest m nodes in hyper- bolic distance. The hyperbolic distance in the Poincare disk (Equation (1.2)) between two nodes at polar coordinates (rt, θt) and (rs, θs) is approximately equal to xst = rs +rt +ln(θst/2) = ln(s·t·θst/2). Therefore, the sets of nodes {s} minimizing xst or s · θst for each newcomer t are identical. At the second step above, in order to reduce network clustering [23], the newcomer node t in- stead of connecting with its m closest nodes may select randomly a node s t and form a connection with s with probability equal to p(xst) = 1 [1+e(xst−Rt)/T ] , where T is a temperature parameter and Rt is a threshold value. This step is repeated until m nodes are selected to connect to node t. Here, the radial coordinate abstracts the popularity of a node. The smaller the radial coordinate of a node (the closer the node in the center of the Poincare disk) the more popular it is, thus the more likely it is for it to attract new connections (we will elaborate more on this fact in Section 1.5, see also the hyperbolic distance functions in Section 1.3.1). The increase of the radial coordinate expresses any attenuation of nodes’ popularity with time, which is equal to zero when β = 1. Note that, in complex networks the time pres- ence of a node in the network is strongly related to its popularity. Specifically, the scale-free structure of complex networks is mainly due to the preferential attachment of newcomers, as the network grows, to existing nodes with high degree. Thus, nodes of high degree continue to increase their connectivity, and these nodes are with higher probability older nodes assuming that initially all nodes have the same degree. Therefore, in the above mapping of nodes to hyperbolic coordinates, the similarity characteristic is mapped to the angu- lar coordinate (here assigned randomly), while the popularity characteristic
  • 32. Hyperbolic Big Data Analytics within Complex and Social Networks 17 is mapped to the radial coordinate and hyperbolic distance is used to pre- dict/infer connections between pairs of nodes based on their characteristics. As a result, hyperbolic distance serves as a convenient single-metric represen- tation of a combination of popularity (radial) and similarity (angular). 1.5 EMBEDDING OF NETWORKED DATA IN HYPERBOLIC SPACE AND APPLICATIONS In this section, in order to perform data embedding, it is assumed that data items are already available in network form. Thus, the focus shifts on obtaining different embeddings into latent hyperbolic coordinates in conjunction with several applications over complex large-scale networks, such as graph theo- retic and SNA metrics’ computation (e.g., centrality metrics), missing links’ prediction, etc. Two types of embedding in the low-dimensional hyperbolic space are presented. In the first (Subsection 1.5.1), the latent node coordi- nates in hyperbolic space are determined so that the hyperbolic distances between node pairs are approximately equal to their graph distances initial network. Towards this objective, multidimensional scaling (MDS) is applied [24]. Given n the number of network nodes (data items), MDS has a running time of O(n3 ) and requires space O(n2 ) (distance matrix between all node pairs). Since the complexity of MDS is extremely high for large-scale networks, landmark-based MDS has been introduced [24], based on the graph and hy- perbolic space distances among k chosen landmarks and the rest of nodes. With landmark-based MDS, the running time reduces to O(kn) and the space to O(dkn+k3 ), where d is the dimension of the hyperbolic space and it should also hold d k n. By considering d, k as small constants, landmark-based MDS has a linear running time complexity. The second type of embedding (Subsection 1.5.2), applies statistical learning methods to embed a complex network graph in hyperbolic space by constructing a new network graph try- ing to mimic with high probability the initial graph structure [5]. Contrary to the first approach the node pairs’ hyperbolic distances may differ signifi- cantly from their initial graph distances. The statistical learning techniques applied are based on maximum likelihood estimation for the node coordinates’ inference, while global (i.e., for the whole network) and local (i.e., for every node) likelihood functions are defined and maximized, where local likelihood functions serve to approximate the global ones for complexity reductions. 1.5.1 Rigel Embedding in the Hyperboloid Model Several complex and social network analysis problems such as computation of node centralities, community detection, etc., are based on node distances which appear hard to compute within large-scale graphs such as online so- cial networks with millions of nodes. However, for marketing purposes, such computations become necessary or even critical for companies, e.g., to locate the more influential/central node for achieving efficient marketing. Therefore,
  • 33. 18 Big Data in Complex and Social Networks several works in literature [24], [12] have attempted to propose algorithms for network embeddings (e.g., in Euclidean or hyperbolic space) so that the in- ferred coordinates can be used for approximating node distances in the initial graph. We will focus on large-scale network embedding in hyperbolic space and specifically on the Rigel embedding proposed in [24], which achieves low distortion (of distance) error and answers to queries for node distances and shortest paths in microseconds even for up to 43 million nodes compared to the order of seconds of a traditional breadth-first-search (BFS) algorithm. Importantly, Rigel allows for parallelization in computations which is a great advantage in the field of BDA. Experimental results in [25], [8], focused on embedding Internet distances in hyperbolic space, have shown less distortion with respect to the node distances in the initial graph, compared with other embeddings in Euclidean coordinates. This fact is also verified in [26] via em- pirical computation of distortion metrics for diverse coordinate systems where it is shown that hyperbolic space achieves significantly more accurate results than Euclidean and spherical ones. Let us assume that the network consists of N nodes. Rigel employs the Hy- perboloid model of hyperbolic space with distance function given by Equation (1.1). Rigel applies landmark-based MDS, where L N nodes are chosen as landmarks in the network graph. Landmarks may be chosen as high-degree nodes, if the given network is scale-free, otherwise they can be chosen ran- domly. First, the hyperbolic coordinates of the landmarks are computed with the aid of a global optimization algorithm aiming to achieve that the dis- tances between the landmarks in the Hyperboloid are as close as possible to their matching path distances in the graph. This is the bootstrapping step of Rigel. Then, the hyperbolic coordinates of the rest of the nodes are cali- brated, so that each node’s distances to all landmarks in the Hyperboloid are very close to the corresponding actual path distances in the network graph. Note that the authors of [24] studied the accuracy of Rigel with respect to the dimensions of the hyperbolic space and showed that the former increases with the increase of the latter. However, the number of landmarks should be higher than the dimension of the embedding space, thus leading to a trade-off between accuracy and complexity [24]. Importantly for large-scale network graphs, a parallel version of Rigel is proposed in [24], offering great improvement in the complexity of Rigel, the latter increasing linearly with the network size. Both steps of Rigel (boot- strapping and embedding in the Hyperboloid model) can be parallelized in a number of servers at most equal to the number of landmarks. One or more landmarks are assigned to each server and the rest of the nodes are distributed in a balanced way across servers. It is shown that parallel Rigel performs sim- ilarly with respect to accuracy as Rigel. Concerning the effectiveness and efficiency of Rigel in computing SNA [27] and graph analysis metrics, experiments and comparisons with existing schemes are performed in [24]. Regarding the graph analysis metrics of ra- dius, diameter and average path length, which are applied in identifying the
  • 34. Hyperbolic Big Data Analytics within Complex and Social Networks 19 small-world property of a network [6], [51], Rigel resulted in values extremely close to the ground truth. Note that distances in Rigel are given by Equa- tion (1.1). Rigel’s performance in computing node centralities that constitute an important SNA metric for industries is also examined in [24]. Closeness centrality [27] is considered, according to which the most central node is the one that has the lowest average distance to all other nodes in the network. Rigel achieved a high accuracy in identifying the node ranking with respect to closeness centrality and outperforms existing schemes. 1.5.2 HyperMap Embedding This section uses statistical learning methods to embed a social graph in hyperbolic coordinates, focusing on the HyperMap embedding algorithm in- troduced in [5]. HyperMap leverages the emerging relation between complex network topologies and hyperbolic geometry [8]. Due to their scale-free prop- erty, complex networks exhibit hierarchical, i.e., tree-like structure [28], while hyperbolic geometry is the geometry of trees. More specifically, the similarity between an infinite tree graph and the hyperbolic space provides an intuition about the hidden hyperbolic structure of complex networks. The exponential scaling of a circle and an area of a disk in hyperbolic space (explained in Sec- tion 1.3.1) coincides with the scaling of the number of nodes with respect to their distance from the root of the tree in an “e-ary” tree [8]. To make this clearer, let us examine a b-ary tree which is a tree with branch factor equal to b. The number of nodes located at distance exactly R from the root of the tree is (b + 1)b(R−1) ∼ bR and the number of nodes being at distance at most R from the root of the tree is (b+1)bR −2 (b−1) ∼ bR . As a result, hyperbolic space can be seen as a continuous version of a tree, a fact realized as the exponen- tial expansion property of the hyperbolic space. Scale-free complex networks are characterized by heterogeneity regarding the node degree, where the ma- jority of nodes is assigned low node degree (power-law degree distribution), implying a tree-like network organization indicating the existence of a hidden hyperbolic metric space [28]. The example of the simultaneous embedding and creation of a growing ran- dom network provided in Subsection 1.4.1 leads to the formation of network graphs with the following two characteristics: (i) they appear to be highly clustered [23] since the links added between close nodes in hyperbolic dis- tance lead to the formation of a large number of triangles and (ii) they have power-law degree distribution, i.e., two basic properties of complex networks’ structure. These statements further support the existence of an underlying hidden hyperbolic space in complex networks’ structure. On one hand, a ran- dom network created over hyperbolic space as in Subsection 1.4.1 emerges to be scale-free while on the other hand, a scale-free network is proven to have negative curvature [8] (similarly to hyperbolic metric spaces). Based on these studies and observations, HyperMap aims at embedding a given complex (social) network in hyperbolic space in a way that is congruent
  • 35. 20 Big Data in Complex and Social Networks with the embedding of an extended version of the model of Subsection 1.4.1. The extension lies basically in providing the possibility to add links between existing nodes, while in Subsection 1.4.1 new links can be added only between a newcomer and an existing node. Precisely, HyperMap finds nodes’ angular and radial coordinates such that the probability that the given complex network is produced by this extended model of Subsection 1.4.1 is maximized. HyperMap assigns hyperbolic coordinates to the nodes inside the Poincare disk by maximizing approximately but in an efficient manner a globally de- fined likelihood function over the node pairs’ hyperbolic distances (which are functions of nodes’ hyperbolic coordinates) expressed considering the given complex network’s links. Specifically, in order to mimic the network cre- ation/hyperbolic embedding of Subsection 1.4.1, it first performs a maximum likelihood estimation of the appearance (i.e., birth) times of the given net- work’s nodes (let t denote their number). Then, after estimating the time sequence of nodes’ arrivals, it replays the hyperbolic growth of the network roughly similarly to the steps of the model of Subsection 1.4.1. The difference lies in the computation of the angular coordinates where HyperMap computes the angular coordinate θi of node i, i.e., with sequence number i, via maximiz- ing a local likelihood function defined for node i equivalently to maximizing the aforementioned global likelihood function with respect to θi. Specifically, the HyperMap embedding algorithm receives as basic input the adjacency matrix of the given complex network and performs the following steps: 1. It sorts nodes in decreasing order with respect to their degree in the given complex network, where node 1 corresponds to the one with the highest node degree. Node 1 receives r1 = 0 and a random angular coordinate θ1 ∈ [0, 2π] (i.e., it is placed on the center of the Poincare disk model). 2. For i = 1 to t do (a) Node i arrives (is born) and is assigned the radial coordinate ri = 2 ζ ln i, where ζ = |c| (Subsection 1.3.1) is the constant ab- solute curvature value of the hyperbolic space provided as input to HyperMap. Usually, ζ = 1. Every existing node s i increases its radial coordinate to rs(t) = βrs(t) + (1 − β)ri(t), β ∈ [0, 1], where β is provided as input to HyperMap. (b) The angular coordinate θi is computed via maximizing a local like- lihood function defined for node i. HyperMap embedding also provides the possibility to predict missing links of the given complex network, efficiently and with high accuracy. Link predic- tion is a very important process on the study of large-scale networks since topology measurements for inferring their structure may miss part of the links. In HyperMap, prediction is based on the aforementioned possibility of internal link addition, i.e., between pairs of existing nodes. Specifically, two
  • 36. Hyperbolic Big Data Analytics within Complex and Social Networks 21 (non-neighboring) existing nodes k, l are connected at time t (i.e., prediction of a missing link in the initial complex network) with probability equal to p(xkl) = 1 [1+eζ(xkl−Rt)/2T ] . HyperMap’s performance to predict missing links is evaluated according to diverse indices and shown to be very satisfactory, while it outperforms several well-known classical link prediction methods such as Common-Neighbors, Katz Index, Hierarchical Random Graph Model, Degree- Product, Inverse Shortest Path, etc. [5]. 1.6 GREEDY ROUTING OVER HYPERBOLIC COORDINATES AND APPLICATIONS WITHIN COMPLEX AND SOCIAL NETWORKS This section mostly concerns the navigability of networks embedded in hy- perbolic space [28]. A network embedded in a geometric space is navigable, if one can perform efficient greedy routing on the network using the node coordinates in the underlying geometric space [5]. After embedding the network graph (or the correlated data) in the hy- perbolic geometric space, greedy routing over hyperbolic coordinates can be used to navigate or route messages from source to destination. Specifically, each node forwards the message to its neighbor closer in hyperbolic distance to the destination. As a result, greedy routing uses only local information, i.e., each node’s necessary knowledge is limited to the hyperbolic coordinates of its neighbors and the destination. Due to this fact, greedy routing can be adapted and applied for performing efficient search and navigation in large data sets [24], [26], [29], while we foresee its applications in SNA metrics’ computation and in recommender systems [30]. A disadvantage of greedy routing lies in the case of failure to deliver a message to the destination when a node does not have a neighbor closer to the destination than itself (local minima of distance). In this case, the message gets blocked in the specific node [11] with no further forwarding via greedy routing. With respect to networks with hidden hyperbolic structure (i.e., scale-free complex networks), greedy routing based on hyperbolic coordinates/distances achieves a very high success rate (close to 100%), as it is shown through experimental examination in literature. Also, in this case the paths obtained via greedy routing are very close to the global shortest paths between the corresponding node pairs. Specifically, in [8], [9], the performance of greedy routing is studied over the synthetic networks constructed similarly to the example of Subsection 1.4.1 (in a way congruent to the exponential expansion of hyperbolic space) and it is shown to achieve success rate close to 100% and stretch with respect to the shortest paths close to 1. This is a very important property showing the small-world navigability of this particular category of networks [31]. The success of greedy routing over hyperbolic space is strongly tied with the fact that hyperbolic space has a tightly connected core, where all paths between nodes pass through. This is the reason why shortest paths in hyperbolic space can be found efficiently and with high accuracy [8]. In [5], the performance of greedy routing in the AS Internet graph is examined
  • 37. 22 Big Data in Complex and Social Networks when using the HyperMap inferred hyperbolic coordinates (Section 1.5). Note that the AS Internet graph exhibits a scale-free structure [6], [23]. Due to the congruency between the scale-free network topology and hyperbolic geometry, the success of greedy routing over hyperbolic coordinates is much improved compared to the case when the real coordinates are used, while the length of the paths paved by greedy routing is roughly the same with one of the shortest paths. HyperMap actually estimates the node coordinates that best fit a given network. “Greedy embeddings” in other than Euclidean metric spaces [11], [49] have been proposed to optimize greedy routing techniques. In the case of a greedy embedding of any network (not only scale-free) in hyperbolic space, the success rate of greedy routing becomes exactly 100%. In [11], a distributed implemen- tation of a greedy embedding in two-dimensional hyperbolic space is proposed, which also can be applied in dynamic network conditions, by assigning hyper- bolic coordinates to new nodes without re-embedding the whole network. The greedy embedding is constructed by choosing a spanning tree of the graph of the initial network and then embedding the spanning tree into the hyperbolic space according to the algorithm of [11]. Following this algorithm, after having assigned hyperbolic coordinates to the root of the tree inside a specific area of the Poincare disk model, each node computes its own coordinates using the ones of its parent, in such a way that the hyperbolic bisector of the embed- ded spanning tree edge between the node and its parent does not intersect any other embedded edge of the spanning tree. The greedy embedding of a spanning tree of a graph implies the greedy embedding of the whole graph. Importantly, it is proven that every graph has a greedy embedding in two- dimensional hyperbolic space [49]. For all these reasons, hyperbolic geometry dominates over the Euclidean one for performing greedy routing. Note that a greedy embedding basically ensures the existence of at least one greedy path between each source-destination pair, thus 100% success of greedy routing. Greedy embeddings have been applied successfully in communications net- works, e.g., [11], [32], [33], however, in the case of large scale networks their implementation may impose challenges due to the need of a spanning tree of the whole graph, thus opening new research directions in BDA. The average length of the paths paved by greedy routing is a crucial performance factor to evaluate. In the case of greedy hyperbolic embedding, different choices of spanning tree and the root of the spanning tree (e.g., shortest path spanning tree rooted at the node with highest degree, or spanning tree derived via a random walk) will lead to different routing paths and path lengths between pairs of sources and destinations [34]. Greedy routing can become node-degree aware by exploiting the node de- gree metric available in network graphs [26]. This enhancement may improve its performance, since apart from the reason that high degree nodes are “more connected” to other nodes, they also tend to be embedded nearer to the core of the network (e.g., center of the Poincare disk) than the lower degree nodes. Other enhancements of greedy routing (e.g., Gravity-Pressure Greedy For-
  • 38. Hyperbolic Big Data Analytics within Complex and Social Networks 23 warding [11]) have been also proposed to enhance its performance for dynamic network conditions, e.g., random node arrivals and departures [11]. Based on all the advantages of greedy routing techniques over hyperbolic coordinates, we envision their suitability and efficiency for the computation of SNA met- rics that demand knowledge of paths between node pairs, e.g., betweenness centrality [27] often used for defining most influential nodes for information propagation purposes. 1.7 OPTIMIZATION TECHNIQUES OVER HYPERBOLIC SPACE FOR DECISION-MAKING IN BIG DATA 1.7.1 The Case of Advertisement Allocation over Online Social Networks Analysis of big data leads to problems of large-scale optimization. Since op- timization involving large data sets is not only expensive but suffers from slow numerical rates of convergence, new approaches are required. Through- out this subsection, we will describe and study the advertisement (ad) al- location problem and how it can be significantly simplified computationally leveraging hyperbolic space’s properties for large-scale networks, following the approach of [35]. A common advertising mechanism used by, e.g., an online social network (OSN) platform for the distribution of advertisements over its users is of auction-style where the advertisers place bids on users’ impressions (e.g., clicks) based on their budget constraints, while the platform’s owner seeks to maximize its revenue. In an OSN, users’ impressions are not ad hoc since users get influenced by their acquaintances and, therefore, the social influence should be taken into consideration in the optimization. This is due to the fact that a user’s engagement may influence other users depending on the influence strength of the former. According to [35], a fairness constraint should be added in the optimization problem so that “a similar users’ influence distribution becomes assigned to each advertiser”. Initially, we review the conventional way to formulate an advertisement allocation problem over an OSN, which is the following (Equations (1.6)-(1.9)) Integer Programming (IP) problem. max S,I |A| X j=1 pj X ui∈Sj Ii,jg(ui) subject to: (1.6) pj X ui∈Sj Ii,jg(ui) ≤ bj, ∀aj ∈ A (budget constraint) (1.7) X j:ui∈Sj Ii,j ≤ Ii, ∀ui ∈ V (impression constraint) (1.8) Ii,j ∈ N, (S, I) ∈ RD (domain constraint) (1.9) where aj is an advertisement (corresponding to an advertiser), A is the set of all advertisements, pj the bid of the advertiser j which is considered ho-
  • 39. 24 Big Data in Complex and Social Networks mogeneous over all users, ui is a user of the OSN (node of the network) with maximum number of impressions assigned to all advertisers ( P j:ui∈Sj Ii,j) equal to Ii and social influence given by g(ui). RD is a feasible set expressing domain constraints, e.g., fairness or priority constraints among advertisers. Furthermore, S, I are the optimization variables where S = {S1, S2, ..., S|A|} is the allocation strategy, i.e., the set of users assigned to each advertiser and I = {Ii,j|ui ∈ Si, aj ∈ A} is the users’ impressions allocation strategy, i.e., the number of impressions of a user assigned to each advertiser where Ii,j = 0 if ui / ∈ Sj. Also, V stands for the set of users. Note that the total number of impressions of a user is upper bounded due to the limited time that a user spends on OSNs daily. The IP problem formulation has two significant disadvantages. Firstly, the decision variable I has an order of |A| · |V |, implying an extreme increase in dimensionality for the modern OSNs consisting of billions of users. Secondly, the domain constraints mentioned above are hard to express in such an IP formulation setting. The most common and important domain constraint (RD) is the fairness one as it constitutes a requirement and business model of most OSN platforms. Except fairness, several other kinds of domain constraints are described and handled in [35], such as the priority model and the hybrid model that combines fairness with priority. In this chapter, we will focus only on the fairness constraint, as it is very representative on indicating the computational efficiency when utilizing the properties of hyperbolic space in the ad allocation problem for large-scale OSNs. In [35], an alternative problem formulation of the advertisement allocation problem is proposed based on the mapping of the OSN in hyperbolic space (performed as in Sections 1.4 and 1.5). Following the new methodology the disadvantages of the IP problem formulation are tackled in a significant degree as (i) the discrete nature of the advertisement allocation problem (due to I, S) becomes continuous leveraging region-wise integrals on the continuous hyperbolic geometric space, allowing for dimensionality reduction reaching a final one of order O(|A|), (ii) in many cases the domain constraints can be efficiently represented and visualized. For the latter and considering the fairness domain constraints, note that two fan (or pie) shapes on the Poincare disk indicate the same distribution of user influence due to the properties of a complex network’s (e.g., OSN) mapping in hyperbolic space (Section 1.5) that will be also pinpointed below. For the network mapping in hyperbolic space, the HyperMap scheme (Sec- tion 1.5) is used. The mapping exhibits the important properties of OSNs, such as the power-law degree distribution (scale-free property), the commu- nity structure, and the efficient network navigability via greedy routing using local information (related to small-world phenomenon, Section 1.6). One im- portant aspect is that after the mapping of the network on the Poincare disk, the expected node degree, pd(r), depends on the radial coordinate and is given by pd(r) ∝ e− r 2 , while the node density is expressed as pn(r) ∝ er . This means
  • 40. Hyperbolic Big Data Analytics within Complex and Social Networks 25 that every circle on the Poincare disk has uniform node density, while the node degree-node density is exponentially distributed along the radius. This expo- nential dependence of node degree and node density on the radius can be exploited for capturing the users’ influence factor discussed above, while the continuity of the hyperbolic space can be leveraged for approximating the sum over users of the advertisement allocation problem with integrals over certain areas where users are mapped to. In this case, the advertisement allocation problem seeks an optimal allocation strategy that assigns to each advertiser a region of population and a maximum revenue is achieved. Considering all the above, the advertisement allocation problem, after the mapping of the OSN in hyperbolic space, becomes: max S,I |A| X j=1 pjfj(S, I) (volume assignment) subject to: (1.10) pjfj(S, I) ≤ bj, ∀j ∈ {1, ..., |A|} (budget constraint) (1.11) |A| X j=1 σi(Sj, I) ≤ Ii, ∀ui ∈ V (impression constraint) (1.12) Ii,j ∈ N+ , ∀ui ∈ Sj : aj ∈ A (1.13) S ∈ RD (domain constraint) (1.14) where fj(S, I) is a function of the impressions assigned to the advertisement aj, σi(Sj, I) is the amount of the impressions of user ui that become assigned on advertisement aj. According to this meta-formulation, an allocation strat- egy or a shape design is given for S (e.g., fan-shape for the fairness model, ring-shape for the priority model [35]) which also determines the fj(S, I) func- tion. The dependence of the fj, σi functions on I is due to the multiple im- pressions that a user has and may assign to different advertisers. Therefore, the areas assigned to different advertisers over the hyperbolic space may be overlapping complicating the optimization problem (Equations (1.10)–(1.14)). However in [35], this issue is resolved via a methodology denoted as Unit Im- pression Decomposition that leads to a multi-stage optimization problem with unit impressions (and nonoverlapping areas among advertisers) at each stage. For simplicity, suppose that Ii = 1, ∀ ui ∈ V . Thus, fj, σi depend only on S. In the following, we will study the case of the fan-shape allocation strat- egy that expresses fairness with respect to social influence in users’ allocation among advertisers. Then, the allocation area Sj for the advertiser j has a fan-shape or pie-shape of angle θj in the Poincare disk (as shown in Figure 1.5). Then, fj(Sj) is computed as follows: fj(Sj) = fj(θj) = a Z R 0 eτ Z θj 0 (1 + w · δ(τ))dldτ = q · θj, (1.15)
  • 41. 26 Big Data in Complex and Social Networks FIGURE 1.5 An example of an OSN’s users’ allocation to six advertisers considering fairness with respect to the social influence (node degree). Each advertiser is assigned a pie-shaped area over the Poincare disk, on which the users’ OSN is embedded. where R 1 the radius of the disk inside the Poincare disk where the em- bedded OSN network lies in and q a constant appropriately determined after tedious computations. Also, the quantity a(1 + w · δ(τ)) of the integral rep- resents the profit that each node lying on radius τ attributes to its assigned advertiser where w, a constants and δ(τ) the node degree, where δ(τ) = g ·e τ 2 , g a constant. Thus, the advertisement allocation problem (for one stage in case of non-unit users’ impressions) with fairness domain constraints attains a linear programming (LP) form as follows: max Θ |A| X j=1 pjqθj subject to: (1.16) pj · q · θj ≤ bj, ∀j ∈ {1, ..., |A|} (1.17) θj ≥ 0, ∀j ∈ {1, ..., |A|} (1.18) |A| X j=1 θj ≤ 2π (1.19)
  • 42. Hyperbolic Big Data Analytics within Complex and Social Networks 27 In this problem formulation (Equations (1.16)–(1.19)), the optimization vari- able is Θ = {θ1, ...θ|A|} ∈ [0, 2π]|A| , which has only |A| dimensions, a sig- nificant reduction to the |A| × |V | dimensions of the conventional problem formulation (note that |V | is potentially in the order of billions). Each ad- vertiser aj is assigned a sector of angle θj in the Poincare disk. Note that the variable S is not needed anymore in the problem formulation. Also, it is a convex problem that can be solved efficiently [35]. Two more observations that further support the efficiency of the last problem formulation are the following. Since the regions can be arranged very tightly close to each other, all the users’ impressions will be utilized as long as the demand (budget of advertisers) is more or equal to the supply (users’ impressions). Also, due to this fact all the stages of the unit impression decomposition (in the case of non-unit users’ impressions) can be performed in parallel to reduce the com- putation time [35] which is a very important advantage of this approach for big data analysis and computations. 1.7.2 The Case of File Allocation Optimization in Wireless Cellular Networks In modern wireless cellular networks the shift from the reactive to proactive networking paradigm is a common trend [36]. The need for a smarter network that incorporates proactive mechanisms is driven by the increasing mobile data traffic [37]. One type of mechanism for proactive network operation which has already been proposed in the literature [36, 38] is the file/content caching at the edge of the network, i.e., at the evolved NodeBs (eNBs), small cell base stations (Home eNBs) or at the user equipment (UE) devices. Pushing content at the edge of the network alleviates the network from redundant data traffic and serves users requests at lower transmission delays. In this subsection, we focus on the problem of optimal file placement in different cache memories lying at various components of a mobile cellular net- work. This problem can be cast in a form similar to the problem of Subsection 1.7.1 for achieving efficiency, since it bears similar social and complex charac- teristics, as well as a similarly large-scale nature, as will be explained in the following. The size and especially the number of the available files becomes extremely large and the number of the connected devices is increasing [39]. In this subsection, we describe a formulation of an optimization problem for distributing files having a complex networked structure over a large number of heterogeneous caches in a fair way, targeting at reducing the system delay of file downloading. Fairness is meant in terms of the popularity of each file, e.g., a particular cache should not monopolize all popular files. For example, con- sider the WWW graph [23], where an edge represents a link from a webpage to another. Therefore, high (in-) degree [6] of a page implies high popularity, since this webpage is pointed by many others, thus it is more likely to be vis- ited, i.e., requested. In this context, the following file placement optimization
  • 43. 28 Big Data in Complex and Social Networks problem is formulated (Equations (1.20)–(1.24)), aiming to determine in an optimal way the allocation of files in cache memories. max I,S X fi∈F,j∈M li cj · Ii,j · g(fi) subject to (1.20) |M| X j=1 Ii,j ≤ Ii, ∀i = 1, 2, ..., |F| (1.21) 1 cj |F | X i=1 li · Ii,j · g(fi) ≤ sj, ∀j = 1, 2, ..., |M| (1.22) Ii,j∈ {0, 1}, ∀i = 1, 2, ..., |F|, ∀j = 1, 2, ..., |M| (1.23) (S, I) ∈ RD (domain constraint) (1.24) where cj is the capacity of the transmission link between the memory cache j and the provider of the file fi, li is the size of the file fi, M is the set of the memory caches, F is the set of files, Ii,j is the indicator variable of the placement of a file fi in memory cache j and g(fi) is a social influence factor associated to file fi. Ii,j is either 1 if the file fi is placed in memory cache j or 0 otherwise, while Ii stands for the maximum number of caches into which the file fi can be stored and sj relates to the capacity of cache j. Finally, S = {S1, S2, ..., S|M|} is the allocation strategy, i.e., the set of files assigned to each cache memory. The placement of a file fi at the memory cache j has a certain benefit for the network in terms of the average system delay improvement. Each placement of a file to a cache memory offloads the network from the time needed to download a file from the file/content provider. This benefit can be on average quantified by the term li cj . Thus, the above file allocation problem maximizes the total benefit in terms of the system delay improvement from the placement of certain files in the available cache memories. This problem is of integer programming form, thus being NP-hard, while also attaining large-scale characteristics, as mentioned before. Thus, alternative approaches need to be taken into account in order to tackle efficiently the large scale and discrete nature of this problem. It can be observed by the following mapping table (Table 1.2) that the file placement problem in memory caches is of a similar nature to the problem of advertisement allocation, presented in the previous section (Subsection 1.7.1). Following the arguments and analysis of
  • 44. Hyperbolic Big Data Analytics within Complex and Social Networks 29 TABLE 1.2 Mapping of the file allocation problem to the advertisement allocation problem. Advertisement allocation in users File allocation in caches Advertisement (A) Cache Memory (M) Users (ui, V ) Files (fi, F) Price Bid (pj) The inverse of the capacity of the link between cache and the file provider (1/cj) Social factor g(ui) Social factor g(fi) Ad budget constraint (bj) Storage capacity constraint of the cache memory (sj) Subsection 1.7.1, the files’ network graph can be embedded in the hyperbolic space. After this mapping the file allocation problem takes the following form: max I X j∈M 1 cj · fj(S, I) (volume assignment) subject to (1.25) 1 cj · fj(S, I) ≤ sj, ∀j = 1, 2, ..., |M| (storage constraint) (1.26) |M| X j=1 σi(Sj, I) ≤ Ii, ∀i = 1, 2, ..., |F| (1.27) Ii,j∈ {0, 1}, ∀i = 1, 2, ..., |F|, ∀j = 1, 2, ..., |M| (1.28) (S, I) ∈ RD (domain constraint) (1.29) where fj(S, I) is a function of the number (or size) of files and their social in- fluence that are assigned to the memory cache j. Following the lines of Section 1.7.1, this formulation leads to a significant reduction of the dimensionality from O(|M||F|) to O(|M|), and provides the flexibility of applying the desired fairness policy with respect to the social characteristics of the available files. 1.8 VISUALIZATION ANALYTICS IN HYPERBOLIC SPACE Visual analytics consists of analytical reasoning facilitated by the visual in- terface, integrating the analytic capabilities of the computer and the abilities of the human analyst. The visual analytics approach relies on interactive and integrated visualizations for exploratory data analysis in order to identify un- expected trends, outliers or patterns. By putting a human back into the loop to guide the analysis, interactive data visualizations have an important role to play, e.g., as in [41]. Large datasets challenge the ability to visualize, navigate and understand relationships among data. In general, displaying large collections of data
  • 45. 30 Big Data in Complex and Social Networks (rolled out in many dimensions) within a limited display area requires caution to avoid missing the necessary details. Especially when data analytics yield graphs of nodes (data points) and edges (relations among data points), prop- erly depicting such inter-relations is crucial for facilitating better analysis. Displays of large graphs (typically derived in BDA) in Euclidean spaces may not utilize efficiently the available space and impose limitations on the order of the graph that can be handled. Contrary to that, hyperbolic space offers significant advantages in this direction by allowing the display of an arbitrarily large structure within a bounded, finite space (e.g., Poincare disk model), si- multaneously providing the possibility of changing the focus to specific areas, while retaining the whole picture of the data structure. Hyperbolic-based visualization may significantly assist data analysis and corresponding decision making via a holistic rather than a focused view on the data structure and correlations. For example, it is possible to identify important/influential nodes, thus avoiding or reducing a significant amount of computations over large data sets, e.g., shortest paths for identifying node centralities (SNA metrics). The advantages of hyperbolic space with respect to data visualization and BDA can be summarized as follows: i. The hyperbolic space grows exponentially with its radius around each point. This property is ideal for embedding hierarchical data represented as tree graphs, and consequently scale-free graphs often emerging in social network analysis and BDA (see Subsection 1.5). ii. The Poincare disk model of hyperbolic space exhibits a fish-eye property of dynamic focus, allowing real-time interactive navigation, e.g., via the mouse. There are many visualization techniques that utilize hyperbolic space em- bedding. Most of them focus on hierarchical or tree-like graph embedding. Generally, depending on the data representation, different techniques can be applied, as described in the following. 1.8.1 Adaptive Focus in Hyperbolic Space The visualization of large datasets in general suffers from a difficulty to show both focus and global context. Adjusting the focus is an important advan- tage of using hyperbolic space for data visualization. In order to change the focus point in the Poincare disk, a translation operation can be applied that corresponds to a user’s mouse click and drag events. This translation is de- noted as Mobius transformation, symbolized by T(z), where z is a point in complex conjugates in the Poincare disk. In this case, the isometric Mobius transformation for a point z can be written as [46]: z0 = T(z; c, b) = bz + c cbz + 1 , |b| = 1, |c| 1. (1.30) The complex number b describes a pure rotation of the Poincare disk around the origin 0. The translation by c maps the origin to c, and c becomes the new center 0 (if b = 1). In Figure 1.4 (c), the triangles mapped in the center
  • 46. Hyperbolic Big Data Analytics within Complex and Social Networks 31 of the Poincare disk can be seen in detail, a fact that does not hold for the triangles mapped close to the periphery, although all triangles are of equal size. Applying such Mobius transformations (Equation (1.30)) can transfer the focus to other triangles of interest by moving them to the center of the Poincare disk. 1.8.2 Hierarchical (Tree) Graphs Data visualization inside the Poincare disk (in 2D) can be performed by using successive applications of the Mobius transformation given in Section 1.8.1, Equation (1.30) [40]. Each tree node receives a certain open space, called “pie segment”, where it chooses the locations of its siblings. This is denoted as a treemap [42]. Then, for all its siblings, it calls recursively the layout routine after applying the Mobius transformation. A similar visualization technique is developed in [43] in the 3D hyperbolic space, although navigation in 3D is more complex. Given a hierarchical structure of data (similar to a tree structure), large directed graphs can be efficiently visualized in 3D hyperbolic space, since due to its exponential increase the same room can be allocated to every embedded node no matter how deep it lies in the tree. 1.8.3 General Graphs In this case, two basic visualization techniques in the 2D hyperbolic space can be identified. The combination of these two techniques in a hybrid scheme allows for a more efficient visualization. Self-Organizing Map (SOM) in Hyperbolic Space (HSOM) [44]. Firstly, a feature map is built, composed by a lattice of nodes (neurons) while a reference vector (prototype vector) is attached to each node. The position of a new data vector in the visualization is determined by the discrete (best-match) node in the lattice chosen via minimizing the hyperbolic distance (Poincare disk) of the new vector over all existing prototype vectors (nodes) in the lat- tice. Hyperbolic Multidimensional Scaling (HMS) [45]. This visualization technique suitably represents the proximity relations (dissimilarities) of N objects by distances between points in the Poincare disk model of hyper- bolic space. Therefore, comparing the spatial positions of two nodes on the Poincare disk provides strong intuition for the similarity/dissimilarity of their corresponding features. Hybrid Scheme [46]. Each one of the HSOM and HMS schemes has ad- vantages and disadvantages. HSOM processes only vectorial data and scales linearly in the number of nodes and HMS uses dissimilarity data and grows as the square of the number of nodes. Thus, HSOM may accommodate higher data quantities, while HMS accommodates a more general data form, i.e., the dissimilarity one. The proposed hybrid scheme in [46] exploits the advantages of each isolated visualization technique. It firstly creates a coarse-grain theme
  • 47. 32 Big Data in Complex and Social Networks map of the data via HSOM (which accommodates more data) and then uses HMS for detailed inspection of data subsets where data similarities are con- tinuously reflected as spatial proximities. Importantly, the display paradigm employs in both cases the hyperbolic plane in order to profit from its focus and context technique (as explained above). Finally, there are two existing applications for visualizing data in hyper- bolic space, namely Hyperbolic Tree Viewer [47] and Hypertree [48]. Although many other examples exist, they all suffer from different shortcomings, in par- ticular problems regarding the inclusion of additional data dimensions, and the absence of a means to guide the user to those regions of the data that might be called “interesting”, calling for novel approaches. 1.9 CONCLUSIONS In this chapter, we developed a big data analytics (BDA) and exploitation framework for complex and social networks leveraging significant properties of hyperbolic geometry in this field. Briefly, many scale-free complex and social networks are characterized by a hidden hyperbolic structure, and thus embed- ding their large-scale produced data in hyperbolic space emerges natural, also allowing for their efficient handling, processing and exploitation via informa- tion extraction. In this context, our proposed framework collects methodolo- gies over hyperbolic coordinates for several processes concerning complex and social networks, such as correlations and clustering, missing links’ inference, efficient SNA metrics’ computations, optimized resource allocation and visual- ization analytics. We envision that the proposed framework will revolutionize BDA in complex and social networks and will maximize the benefit from data analytics generated from the latter, for the latter. ACKNOWLEDGMENT This research is co-financed by the European Union (European Social Fund) and Hellenic national funds through the Operational Program ‘Education and Lifelong Learning’ (NSRF 2007-2013). FURTHER READING 1. D. Puccinelli, M. Haenggi, “Wireless Sensor Networks: Applications and Challenges of Ubiquitous Sensing”, IEEE Circuits and Systems Magazine, Vol. 5, No. 3, pp. 19-31, 2005. 2. C.L.P. Chen, C.-Y. Zhang, “Data-intensive Applications, Challenges, Tech- niques and Technologies: A Survey on Big Data”, Elsevier Information Sciences, No. 275, pp. 314-347, 2014.
  • 48. Hyperbolic Big Data Analytics within Complex and Social Networks 33 3. K.-C. Chen, M. Chiang, H.V. Poor, “From Technological Networks to Social Networks”, IEEE Journal on Selected Areas in Communica- tions/Supplement (JSAC), Vol. 31, No. 9, pp. 548-572, September 2013. 4. J. W. Anderson, Hyperbolic Geometry, 2nd ed. Springer, 2007. 5. F. Papadopoulos, C. Psomas, D. Krioukov, “Network Mapping by Replaying Hyperbolic Growth”, IEEE/ACM Transactions on Networking, Vol. 23, No. 1, pp. 198-211, Feb. 2015. 6. V. Karyotis, E. Stai, S. Papavassiliou, Evolutionary Dynamics of Complex Communications Networks, CRC Press - Taylor Francis Group, Boca Raton, FL, 2013. 7. L. Atzori, A. Iera, G. Morabito, M. Nitti, “The Social Internet of Things (SIoT) - When social networks meet the Internet of Things: Concept, Ar- chitecture and Network Characterization”, Computer Networks, Elsevier, Vol. 56, No. 16, pp. 3594-3608, 2012. 8. F. Papadopoulos, D. Krioukov, M. Bogua, A. Vahdat, “Greedy Forward- ing in Dynamic Scale-Free Networks Embedded in Hyperbolic Metric Spaces”, in. Proc. of IEEE INFOCOM, pp. 14-19, March 2010. 9. F. Papadopoulos, M. Kitsak, M. A. Serrano, M. Bogu, D. Krioukov, “Popu- larity vs. Similarity in Growing Networks”, Nature, Vol. 489, pp. 537-540, Sept. 2012. 10. F. S. Beckman, D. A. Quarles, “On Isometries of Euclidean Space”, Proc. Amer. Math. Soc., Vol. 4, pp. 810-815, 1953. 11. A. Cvetkovski, M. Crovella, “Hyperbolic Embedding and Routing for Dy- namic Graphs”, IEEE INFOCOM, pp. 1647-1655, April 2009. 12. I. Benjamini, Y. Makarychev, “Dimension Reduction for Hyperbolic Space”, American Mathematical Society, Vol. 137, No. 2, pp. 695-698, Feb. 2009. 13. D. A. Tran, K. Vut, “Dimensionality Reduction in Hyperbolic Data Spaces: Bounding Reconstructed-Information Loss”, in Proc. of 7th IEEE/ACIS Int’l Conf. on Computer and Information Science, pp. 133-139, May 2008. 14. R. Lior, O. Maimon, “Clustering methods”, Data Mining and Knowledge Discovery Handbook, Springer US, pp. 321-352, 2005. 15. D. Yan, L. Huang, M. I. Jordan, “Fast Approximate Spectral Clustering”, in Proc. of the 15th ACM Conference on Knowledge Discovery and Data Mining (SIGKDD), Paris, France, 2009. 16. Y. Koren, R. Bell, C. Volinsky, “Matrix Factorization Techniques for Rec- ommender Systems”, Computer, Vol. 42, No. 8, pp. 30-37, August 2009.
  • 49. 34 Big Data in Complex and Social Networks 17. A. K. Menon, C. Elkan, “Fast Algorithms for Approximating the Singular Value Decomposition”, ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 5, No. 2, Feb. 2011. 18. S.-H. Cha, “Comprehensive Survey on Distance/Similarity Measures be- tween Probability Density Functions”, Int’l Journal of Mathematical Models and Methods in Applied Sciences, Vol. 1, No. 4, pp. 300-307, 2007. 19. L. Lee, “Measures of Distributional Similarity”, 37th Annual Meeting of the Association for Computational Linguistics, pp. 25-32, 1999. 20. M.E.J. Newman, Networks: An Introduction, Oxford, UK: Oxford Univer- sity Press, 2010. 21. T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning, Springer, 2008. 22. N. Ailon, B. Chazelle, “The Fast Johnson-Lindenstrauss Transform and Approximate Nearest Neighbors”, SIAM J. Comput., Vol. 39, No. 1, pp. 302-322, 2009. 23. R. Albert, A.-L. Barabasi, “Statistical Mechanics of Complex Networks”, Reviews of Modern Physics, Vol. 74, No. 1, pp. 47-97, Jan. 2002. 24. X. Zhao, A. Sala, H. Zheng, B. Y. Zhao, “Efficient Shortest Paths on Massive Social Graphs”, IEEE Collaborate Communic., pp. 77-86, 2011. 25. Y. Shavitt, T. Tankel, “Hyperbolic Embedding of Internet Graph for Dis- tance Estimation and Overlay Construction”, IEEE/ACM Trans. on Net- working, Vol. 16, No. 1, pp. 25-36, Feb. 2008. 26. X. Ban, J. Gao, A. van de Rijt, “Navigation in Real-World Complex Net- works through Embedding in Latent Spaces”, ALENEX, pp. 138-148, 2010. 27. S. P. Borgatti, “Centrality and Network Flow”, Social Networks (Elsevier), pp. 55-71, 2004. 28. M. Boguna, D. Krioukov, K. C. Claffy, “Navigability of Complex Net- works”, Nature Physics, Vol. 5, pp. 74-80, 2009. 29. J. Zhang, “Greedy Forwarding for Mobile Social Networks Embedded in Hyperbolic Spaces”, in Proc. of the ACM SIGCOMM, New York, NY, pp. 555-556, 2013. 30. J. Bobadilla, F. Ortega, A. Hernando, A. Gutierrez, “Recommender Sys- tems Survey”, Knowledge-Based Systems, Elsevier, Vol. 46, pp. 109-132, April 2013.
  • 50. Hyperbolic Big Data Analytics within Complex and Social Networks 35 31. J. Kleinberg, “The Small-World Phenomenon: an Algorithmic Perspec- tive”, In Proc. of the 32 Annual ACM Symposium on Theory of Comput- ing (STOC ’00), New York, NY, USA, pp. 163-170, 2000. 32. E. Stai, J. S. Baras, S. Papavassiliou, “Social Networks over Wireless Net- works”, in Proc. of the 51st IEEE Conf. on Decision and Control (CDC), Hawaii, Dec. 2012. 33. E. Stai, S. Papavassiliou, J. S. Baras, “Performance-Aware Cross- Layer Design in Wireless Multihop Networks via a Weighted Back- pressure Approach”, IEEE/ACM Transactions on Networking, DOI: 10.1109/TNET.2014.2360942, October 2014. 34. E. Stai, S. Papavassiliou, J. S. Baras, “A Coalitional Game Based Approach for Multi-Metric Optimal Routing in Wireless Networks”, in Proc. of the 24th Annual IEEE Int’l Symposium on Personal, Indoor and Mobile Radio Commun. (PIMRC), pp. 1935-1939, London, UK, Sept. 2013. 35. P. Gao, H. Miao, J. S. Baras, “Social Network Ad Allocation via Hyperbolic Embedding”, in Proc. of 53rd IEEE Conference on Decision and Control (CDC), pp. 4875-4880, December 2014. 36. E. Bastug, M. Bennis, M. Debbah, “Living on the Edge: The role of Proac- tive Caching in 5G Wireless Networks”, IEEE Communications Maga- zine, Vol. 52, No. 8, pp. 82-89, 2014. 37. Cisco, “Cisco Visual Networking Index: Global Mobile Data Traffic Fore- cast Update 2013-2018”, White Paper, [Online] http://goo.gl/l77HAJ, 2014. 38. F. Pantisano, M. Bennis, W. Saad, M. Debbah, “In-Network Caching and Content Placement in Cooperative Small Cell Networks”, 1st Int’l Con- ference on 5G for Ubiquitous Connectivity (5GU), 2014. 39. Ericsson, “5G Radio Access-Research and Vision”, White Paper, June 2013. 40. J. Lamping, R. Rao, P. Pirolli. “A Focus+Context Technique Based on Hyperbolic Geometry for Viewing Large Hierarchies”, ACM SIGCHI, pp. 401-408, 1995. 41. U. C. Turker, S. Balcisoy, “A Visualization Technique for Large Temporal Social Network Datasets in Hyperbolic Space”, Journal of Visual Lan- guages and Computing, Vol. 25, pp. 227-242, 2014. 42. H.-C. Lam, I.D. Dinov, “Hyperbolic Wheel: A Novel Hyperbolic Space Graph Viewer for Hierarchical Information Content”, ISRN Computer Graphics, Volume 2012, article ID 609234, 2012.
  • 51. 36 Big Data in Complex and Social Networks 43. T. Munzner, “H3: Laying out Large Directed Graphs in 3D Hyperbolic Space”, in Proc. of IEEE Symposium on Information Visualization, pp. 2-10, 1997. 44. H. Ritter, “Self-organizing Maps on non-Euclidean Spaces”, In Kohonen Maps, Elsevier, pp. 97-110, 1999. 45. J. Walter, H. Ritter, “On Interactive Visualization of High-Dimensional Data Using the Hyperbolic Plane”, in Proc. of ACM Int’l. Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 123-131, 2002. 46. J. Walter, J. Ontrup, D. Wessling, H. Ritter, “Interactive Visualization and Navigation in Large Data Collections Using the Hyperbolic Space”, in Proc. of the 3rd IEEE International Conference on Data Mining (ICDM), pp. 355-362, Nov. 2003. 47. J. Lamping, R. Rao, “Laying out and Visualizing Large Trees Using a Hyperbolic Space”, in Proc. ACM Symp User Interface Software and Technology, pp. 13-14, 1994. 48. J. Bingham, S. Sudarsanam, “Visualizing Large Hierarchical Clusters in Hyperbolic Space”, Bioinformatics, Vol. 16, No. 7, pp. 660-661, 2000. 49. R. Kleinberg, “Geographic Routing Using Hyperbolic Space”, in Proc. of IEEE INFOCOM, pp. 1902-1909, May 2007. 50. A.-L. Barabasi, E. Bonabeau, “Scale-Free Networks”, Scientific American, pp. 50-59, May 2003. 51. D. J. Watts, S. H. Strogatz, “Collective Dynamics of ‘Small-World’ Net- works”, Nature, Vol. 393, pp. 440-442, Jun. 1998.
  • 52. C H A P T E R 2 Scalable Query and Analysis for Social Networks: An Integrated High-Level Dataflow System with Pig and Harp Tak-Lon (Stephen) Wu, Bingjing Zhang, Clayton Davis, Emilio Ferrara, Alessandro Flammini, Filippo Menczer and Judy Qiu CONTENTS 2.1 Introduction ................................................ 38 2.2 Apache High-Level Language, Syntax and its Common Features .................................................... 39 2.2.1 Pig .................................................. 39 2.2.2 Hive ................................................ 42 2.2.3 Spark SQL/Shark .................................. 44 2.3 Pig, Hive and Spark SQL Comparison .................... 45 2.4 Ad-hoc Queries: Truthy and Twitter Data ................ 46 2.5 Iterative Scientific Applications ........................... 47 2.5.1 K-means Clustering and PageRank ................ 48 2.6 Benchmarks ................................................ 51 2.6.1 Performance of Ad-hoc Queries ................... 51 2.6.2 Performance of Data Analysis ..................... 53 37
  • 53. 38 Big Data in Complex and Social Networks 2.7 Conclusion ................................................. 56 Bibliography ............................................... 57 E very day, vast amounts of data are being collected from social network (e.g., Twitter) applications, and in response there is a growing need for analysis methods that can handle this terabyte-size input. To provide an ef- fective and advanced data processing environment for various types of social data analysis such as political discourses, trending topics, evolution of user behavior, social bots detection and orchestrated campaigns, we need to sup- port both query and complex analysis efficiently. Use of high-level scripting languages to solve big data problems has become a mainstream approach for sophisticated data mining and analysis. In particular, high-level interfaces such as Pig, Hive, and Spark SQL are being used on top of the Hadoop framework. This simplifies coding of complex tasks in MapReduce-style systems while im- proving the flexibility of database systems through user-defined aggregations. In this chapter we will compare different approaches of building high-level dataflow systems and propose an integrated solution with Pig and Harp (a plugin to Hadoop) along with giving extensive benchmarks. The results show that Pig and Harp integration for sophisticated iterative applications runs at a factor of 2 to 10 times faster than Pig or Hive implementation executed on Hadoop. 2.1 INTRODUCTION Social media represents a precious data source providing tremendous amounts of streaming information for analytics and research applications. Many re- search projects are involved in performing intensive analysis on such data, and the outcome of this analysis is drawing the attention of various applications, including market sales analysts, societal studies (including political polariza- tion [10], congressional elections [14, 13], protest events [12, 11], and the spread of misinformation [47, 37]) and information diffusion [24]. Compared to other problems in computing, social media analysis is “special”; it normally focuses on a subset of data related to a target social event within a specific time frame. To further investigate the inter-relationship of such subsets of data, various sophisticated algorithms and complex data transformations may be applied into a series of stages [19]. Therefore, developing a programmable solution for social media data must include features like expressiveness, ability for data ex- traction, reusability and interoperability with different computation runtimes. Apache high-level languages and Apache Hadoop [1] ecosystem are some of the existing building block solutions that match the requirements for social network analysis. The use of high-level language platforms is not just limited to social media data. Other fields of research such as workflow provenance [7], network traffic
  • 54. Scalable Query and Analysis for Social Networks 39 analysis [26, 23], and geographic data analysis [6] have proved the adaptation of these solutions boosts and scales up their historical data analysis. However, the complex workflows characterizing existing platforms makes it difficult for users to decide what language and low level runtimes best match their needs. Motivated by these challenges, our goal is to provide a comprehensive survey of these high-level abstractions involving experiments with real social media data examples and common query and analysis applications. The rest of the chapter is organized as follows. Section 2.2 gives an overview of Apache high-level languages, especially Pig [22], Hive [40] and Spark SQL [2, 44]. The first two build on Hadoop while Harp [47] and Spark [46] are Apache iterative MapReduce frameworks offering support to complex parallel data systems. Section 2.3 provides a comparison of these languages’ features especially the important user-defined functions that make MapReduce a sim- plified and scalable solution. Sections 2.4 and 2.5 introduce applications that are used for benchmarking later in the chapter. Section 2.4 introduces the Truthy project and the types of queries that it needs to run on top of Twitter data, while Section 2.5 discusses three data analytics use-cases and how to express them in high-level languages. Section 2.6 presents the performance evaluation of the applications presented in Sections 2.4 and 2.5, and the tech- nologies of Section 2.2. Section 2.7 draws our conclusions. 2.2 APACHE HIGH-LEVEL LANGUAGE, SYNTAX AND ITS COMMON FEATURES Programming languages have been developed for more than 50 years. Each language has its own compiler/interpreter and executes a physical plan on top of the low level (operating) system. Apache high-level languages share the common features of traditional programming languages; in many cases, a compiler built for such a language supports several fundamental functions and operations: a syntax parser, type and compile time semantic checking, logical plan generator and optimizer, and physical plan generator and executor. Here ANTLR (ANother Tool for Language Recognition) [34] is the general syntax parser for Pig, Hive, and Spark SQL. Each language has its own types and plan generator and optimizer, but all of them use YARN [42] as their resource management tool. The next sections will discuss details of Apache Pig, Apache Hive and Apache Spark SQL. 2.2.1 Pig Pig is a high-level dataflow system that yields simple data transformations in pipeline for large amounts of semi-structured data stored in Hadoop com- patible file storage. Applications such as massive system log analysis and tra- ditional Extract, Transform, and Load (ETL) data processing are performed regularly. Pig was first introduced by Yahoo!, and became one of the most
  • 55. Random documents with unrelated content Scribd suggests to you:
  • 59. The Project Gutenberg eBook of Harry Harding —Messenger 45
  • 60. This ebook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this ebook or online at www.gutenberg.org. If you are not located in the United States, you will have to check the laws of the country where you are located before using this eBook. Title: Harry Harding—Messenger 45 Author: Alfred Raymond Release date: July 15, 2016 [eBook #52578] Most recently updated: October 23, 2024 Language: English Credits: Produced by Donald Cummings and the Online Distributed Proofreading Team at http://www.pgdp.net *** START OF THE PROJECT GUTENBERG EBOOK HARRY HARDING —MESSENGER 45 ***
  • 61. Harry Harding —Messenger “45” By ALFRED RAYMOND Copyright 1917, by CUPPLES LEON COMPANY
  • 62. CONTENTS CHAPTER PAGE IA Menace to the School 1 IIOn the Trail of a Job 9 IIIAn Anxious Moment 27 IVA Surprise and a Disappointment 37 VFriends and Foes 51 VIAt the End of the Day 67 VIITeddy Comes Into His Own 75 VIIIThe Recruits to Company A 81 IXThe Bitterness of Injustice 95 XBreakers Ahead for Harry 105 XITeddy Burke Distinguishes Himself 116 XIIA Disastrous Combat 122 XIIIThe Measure of a Man 129 XIVThe Price of Honesty 138 XVA Fateful Game of Catch 148 XVIAll in the Day’s Work 158 XVIIThe Singer and the Song 169 XVIIIConfidences 178 XIXThe Belated Dawn 185 XXTeddy’s Triumph 191 XXIGetting Even with the Gobbler 202 XXIIA Disturbing Conversation 213 XXIIIHarry Pays His Debt 224 XXIVWriting the Welcome Address 239 XXVCommencement 250
  • 65. “I CHAPTER I A MENACE TO THE SCHOOL will drown and no one shall help me,” announced Miss Alton defiantly. The first class in English accepted this remarkable statement in absolute silence, their eyes fixed on their teacher. As she stood high and dry on the platform, facing her class, there seemed little possibility of such a catastrophe overtaking her, therefore, they knitted their wise young brows, not in fear of her demise by drowning, but in puzzled worry over the intricacies of shall and will. “I will drown,” repeated Miss Alton firmly, “and no one——” “Oh-h-h!” a piercing shriek rent the grammar-laden air. As though about to prove her declaration, Miss Alton made a sudden dive off the platform that carried her half-way up an aisle toward the immediate vicinity of that anguished voice. The first class in grammar immediately forgot the uses of shall and will and twisted about on their benches to view their teacher’s hurried progress toward the scene of action. “It’s Teddy Burke,” muttered a boy to his nearest classmate. “Wonder what he’s done.” Miss Alton had now brought up between two seats at the rear of the room. In one of them sat a little girl, her head buried in her arms. Directly opposite her sat a red-haired boy. His thin face wore an expression of deep disgust, but his big black eyes were dancing with mischief. As the teacher approached, he made an ineffectual dive toward a grayish object on the floor. Miss Alton was too quick for him. She stooped, uttered a half-horrified exclamation, then
  • 66. gathered the object in. It was a most terrifying imitation of a snake, made of rubber, and coiled realistically. “Theodore Burke, what does this mean?” she demanded, holding out the snake and glaring at the offender. The little girl raised her head from her arms and eyed the culprit with reproachful horror. “He put it on my seat,” she accused. “I thought it was alive, and it scared me awful.” Her voice rose to a wail on the last word. “This is too much. You’ve gone just a little too far, young man. Come with me.” Miss Alton stood over the red-haired lad, looking like a grim figure of Justice. The boy shot a glance of withering scorn at his tearful victim, then rose from his seat. Grasping him none too gently by the arm, Miss Alton piloted him down the aisle and out of the door. It closed with a resounding bang. A buzz of conversation began in the big schoolroom. Two or three little girls left their seats and gathered about the heroine of the disquieting adventure, while half a dozen boys of the eighth grade of the West Park Grammar School put their heads together to discuss this latest bit of mischief on the part of their leader and idol, Teddy Burke. Meanwhile, Teddy, of the black eyes and Titian hair, was being marched rapidly toward the principal’s office. Miss Alton flung open the door and ushered him into the august presence of Mr. Waldron, the principal, with, “Here is an incorrigible boy, Mr. Waldron.” The principal, a short, stern-faced man, adjusted his eye-glasses and stared hard at Teddy. The boy hung his head, then raising his eyes regarded Mr. Waldron defiantly.
  • 67. “So you are here again, young man, for the third time in two weeks,” thundered the principal. “What has this bad boy done, Miss Alton?” Miss Alton began an indignant recital of Teddy’s latest misdeed. The principal frowned as he listened. When she had finished, he fixed Teddy with severe eyes. “Let me see. The last time you were here it was for interrupting the devotional exercises by putting a piece of ice inside the collar of one of your schoolmates. Aren’t you ashamed of yourself? How would you like to have your schoolmates play upon you the unkind pranks you are so fond of playing upon them?” “I wouldn’t care,” returned the boy, unabashed. “I wouldn’t make a fuss, either.” “Miss Alton is right,” snapped Mr. Waldron, his face reddening angrily at the boy’s retort. “You are, indeed, an incorrigible boy. I think I had better put your case before the Board of Education. There are special schools for bad boys like you. We don’t care to have such a boy among us. You are a menace to the school.” He continued to lecture Teddy sharply, ending with, “Take him back to your room for the day, Miss Alton, but make him remain after the others have gone home this afternoon. By that time I shall have decided what we had better do with him.” Teddy walked down the corridor ahead of Miss Alton with a sinking heart. Was he a menace to the school and could Mr. Waldron really put him in a school for bad boys? He had heard of such schools. He had heard, too, that sometimes the boys came out of them much worse than when they entered. The murmur of voices came to his ears as Miss Alton flung open the door and urged him into the schoolroom. The noise died a sudden death as she stepped over the threshold. “Go to your seat,” she ordered coldly. Teddy obeyed. The little girl, whose shriek had caused his downfall, eyed him with horror. Even in the midst of his troubles he
  • 68. could not resist giving her an impish grin. She promptly made a face at him and looked the other way. The smile vanished from Teddy’s face. Then he folded his hands on his desk and thought busily for the next five minutes. The class resumed its interrupted recitation. Suddenly the boy reached into his desk and began stealthily to take out his belongings. The books belonged to the school, but a pencil box, a knife, a box of marbles, a top, a dilapidated baseball, a magnet and a small, round mirror with which he delighted to cast white shadows on the books of the long-suffering eighth-grade girls, were treasures of his own. Stuffing them into his pockets he replaced the books; then he sat very still. It was almost time for the recess bell to ring. He hardly thought Miss Alton would order him to keep his seat. Such light punishments were not for him. To-night—but there would be no to-night in school for him. When recess came he would go outside and say good-bye to the fellows, then he would start out and hunt a job. He was almost sixteen, and the law said a boy could work when he was fourteen, if he had a certificate. Well, he would get that certificate. His mother would let him go to work if he wanted to. She was so busy with her own affairs she never cared much what he did. If he had a job, then Mr. Waldron couldn’t send him to a reform school. That was the place where incorrigible boys were sent. Teddy did not stop to consider that his mother might prove a match for Miss Alton and Mr. Waldron when it came to a question of her son’s incorrigibility. He thought only of putting himself beyond the reach of the school authorities by his own efforts. The recess bell rang at last and the pupils filed out in orderly rows to the big, grassy yard, at one side of the school building. Teddy was at once surrounded by half a dozen boys, his particular friends. The girls collected in little groups about the yard to comment on Teddy’s iniquity. They eyed him askance with curious, aloof glances. The boys, however, were deeply interested in the possible outcome of Teddy’s rash defiance.
  • 69. “You’re goin’ to get fired all right,” was the cheerful prophecy of one boy. “What’ll your mother say?” “She won’t say,” giggled a freckle-faced boy. “She’ll just take Ted across her knee and——” “Well, I guess not,” flung back Teddy. “I’m not going to wait to get fired, either. I’m going to beat it. When the recess bell rings I’m not going in with the rest of you. See here,” Teddy began pulling his various treasured belongings out of his pockets. “I brought all this stuff out to give you fellows. I sha’n’t want it. I’m going down to Martin Brothers’ Department Store and get a job. That’s what I’m going to do. Here’s my looking glass, Sam. Every time you cast a shadow with it, think of me. And you can have my marbles, Bob.” Teddy distributed his belongings rapidly about the little circle. The boys took them with some reluctance. They had far rather have Teddy Burke, ringleader of all their mischief, with them than his belongings. “Aw, why don’t you get your mother to come down here and fix it up with those old cranks?” demanded Sam Marvin regretfully. “It ain’t your stuff we want, Ted. It’s you. What’re we goin’ to do without you?” “Be good,” grinned Teddy. “I’m a menace to the school, you know.” “I wish I was goin’ to work,” said Bob Rayburn sadly. “Pa won’t let me, though.” “Honestly, won’t your mother lick you if she finds out about what happened to-day?” inquired Arthur Post, a tall, thin boy with a solemn face. “Lick nothing,” retorted Ted. “She isn’t going to find out about it. I’m going to tell her myself. She’ll say I can go to work if I feel like it.” His chums eyed him with mingled admiration and regret. To them Teddy was a hero.
  • 70. “There goes the bell. I’ve got to beat it. Don’t any of you start to go in till I get to the corner,” directed Ted. “Then she,” he jerked his thumb in Miss Alton’s direction, “won’t know I’ve skipped until it’s too late. I’ll let you know where I am as soon as I get that job. Good-bye, fellows. Be sure and do what smarty Alton tells you, and don’t go bringing any rubber snakes to school. You can have that one of mine if you can get it away from old Cross-patch.” With an air of gay bravado Teddy raised his hand in a kind of parting salute, then darted down the yard and through the gateway to the street. At the corner he waved his hand again, then swung out of sight, leaving a little knot of boys to gaze regretfully after him and wonder how they could possibly get along without wide-awake, mischievous Teddy Burke.
  • 71. “I CHAPTER II ON THE TRAIL OF A JOB don’t know what we are going to do, Harry, if the cost of living goes any higher.” Mrs. Harding stared across the little center table at her sixteen-year-old son, an expression of deep worry looking out of her patient, brown eyes. “A dollar used to seem like quite a lot of money, but it doesn’t go far these days. I’ve spent every cent I dare this week for groceries, and we’ve still three days to go until I’ll have the money for this dress. I’ve got to sew every minute to get it done. Thank goodness, the rent’s paid for this month. But you must have a new pair of shoes and I don’t know where they are going to come from.” The little woman sighed, then attacked her sewing with fresh energy. “I can’t stop even to complain,” she added bravely. “You’ll just have to let me go to work, Mother.” Harry Harding laid the text-book he was studying on the table and regarded his mother with serious eyes. “But I don’t want to take you out of school, Harry,” she protested. “You are getting along so well. Why, next year you’ll be in high school.” “No, I won’t, Mother. Do you think that a great big boy like me is going to let his mother support him any longer? It’s time I went to work. Besides, I haven’t the money for clothes and books and all the other things high school fellows have to have. I’m past sixteen. Lots of boys have to go to work when they’re only fourteen. I guess it won’t hurt me any to begin now.” “But I want you to have an education, Harry. If your father had lived, he intended to let you go through high school and then to
  • 72. college.” Mrs. Harding’s voice trembled a little. The sudden death of her husband two years previous had been a shock from which she had never quite recovered. It was hard for her even to mention his name without shedding tears. “I’ll get an education, somehow, and work, too,” returned Harry confidently. “There are night schools where a fellow can go and learn things. Please let me quit school to-morrow and try,” he pleaded. “I can’t earn much at first, but even three dollars a week’ll help some. I’ve got to start some time, you know. If you won’t let me go to work I could sell papers after school.” “No, you couldn’t,” retorted his mother with decision. “I’d rather have you leave school than see you racing around the city streets selling papers. That’s one thing you sha’n’t do.” “Then let me go and hunt a job,” begged the boy. “I’ll think it over. Now go on studying your lesson and don’t tease me any more about it.” Harry took up his book obediently enough. His frequent pleading to leave school to go to work had always been promptly vetoed by his mother. She had struggled desperately to keep her son in school and was willing to go on with the struggle. It was Harry himself who had repeatedly begged her to allow him to take his place in the work-a-day world. She could never quite bring herself to the point of consenting to the boy’s plea. But, to-night, as she thought darkly of their poverty and of their continual fight against actual want she was nearer consent than she had ever been before. Perhaps Harry felt this, for it was not long until the book went down on the table again. “Do say you’ll let me try, Mother,” he implored earnestly. “You don’t know how much it means to me. It isn’t as if I’d stop trying to learn things as soon as I started to work. I’d study harder than ever. Just think how much the money would help us after I’d been working awhile. Why, some of the greatest men that ever lived had to quit school and go to work when they were lots younger than I. Benjamin Franklin did, and so did Abraham
  • 73. Lincoln. Just yesterday the teacher read us a story of how Lincoln earned his first dollar when he was a boy.” Mrs. Harding looked wistfully at her son’s eager face. “My little son, do you want to help mother so much?” she asked tenderly. Her voice trembled a little. “You know I do. Oh, Mother, may I try? Are you going to say ‘yes’ at last?” Harry sprang from his chair and going to his mother’s chair slipped his arm around her neck. “Well,” began the little woman reluctantly, “if you are so set on working, I guess you might as well try it. But remember, Harry, if you don’t like it, you can go back to school. We’ll get along some way.” “But I shall like it,” protested Harry. “I’ve always said I was going to be a business man when I grew up. If I start right now maybe I’ll be one in a few years.” “But where are you going to look for work, child?” asked Mrs. Harding. Now that she had given her son the longed-for sanction to make his own way, she began to feel something of his boyish enthusiasm. “I don’t know,” returned Harry thoughtfully. Then, seized with a sudden inspiration, “I guess I’ll look in the Journal. That always has a lot of advertisements.” Picking up the evening paper, which lay on the center table, Harry turned its leaves to the column of “Male Help Wanted,” and scanned it earnestly. “Here’s one, Mother. ‘Boy wanted for errands, good chance for advancement. Opportunity to learn business. 894 Tyler.’ That sounds good.” Taking the stub of a lead pencil from his pocket, Harry carefully marked it. “Oh, here’s another. ‘Bright boy for office work. 1684 Cameron.’” This advertisement was duly checked. Harry went eagerly down the column until he had marked six advertisements. “There, that will do to start with. If I don’t get a position at any of those places I’ll try again when to-morrow’s paper comes out. But surely some of them will have a chance for me. It’s
  • 74. nine o’clock. I guess I’ll go to bed right now, so as to be up bright and early in the morning.” Piling his books on one arm, Harry went over to his mother and kissed her good night. “You must keep thinking hard that I’m going to get one of those positions, Mother,” he said brightly. Then he went into the tiny room that was really half of his mother’s room, curtained off for his use. Harry was very proud of his little room. It was so small it held nothing but his cot bed, one chair, a small table and a bamboo book-case of two shelves, which he had bought in a second-hand store for a quarter. This held the few books he owned and was dear to his heart. After he had undressed and lay down on his bed he found that he was too much excited over the prospect of his new venture to sleep. Already he could see himself in a beautiful office, with soft rugs on the floor and shining oak furniture. He could imagine himself saying, “Yes, sir,” and “no, sir,” to his employer, and listening with alert respectfulness to his orders. He would prove himself so willing to work and perform whatever he was given to do so faithfully that in time he would be promoted to something better. His favorite story- book hero, Dick Reynolds, had begun work as an office boy and had done wonderful things. Why couldn’t the same things happen again to him? When at ten o’clock his mother stole into the room, as was her nightly custom before going to bed, for a last look at her son, she saw two bright, wide-awake eyes peering at her. “This will never do, little man,” she said, patting his cheek. “You must go to sleep, if you are anxious to be up early to-morrow morning.” “I’ll try, Mother,” sighed Harry, “but I just can’t help thinking about it.” After his mother had kissed him again and gone to her own room, Harry shut his eyes tightly and resolved to go to sleep. When finally the sandman did visit him, he dreamed that he was Dick Reynolds and had secured a position in a bank. He was the president’s office
  • 75. boy, and the president had sent him to the City Hall with a bag full of bank notes. He ran all the way from the bank to the Hall and was just going in the door when two boys leaped out from behind it and tried to take the bag away from him. He fought like a tiger, but he had to hang on to the bag with one hand while he knocked down the thieves with the other. As fast as he knocked them down they bobbed up again. Finally, one of them hit him over the head with an arithmetic. It was his own book. He recognized it by the green paper cover he had put on it. He wondered as he fought how the boy happened to have his arithmetic. Then the other boy suddenly took a long coil of rope from under his coat and lassoed him. He felt himself falling, falling. He struck the pavement with a terrible crash. Then—— “Why, Harry, what is the matter?” The City Hall, the money bag, even the robbers had faded away, and Harry found himself sitting on the bare floor, blinking up at his mother, who bent anxiously over him. “I guess I must have been asleep, Mother, and fell out of bed.” Harry eyed his mother sheepishly. “I dreamed I had a job in a bank and was fighting two fellows who tried to take a whole lot of money away from me. What time is it?” “It’s ten minutes to twelve. Now, go straight to sleep, or I won’t call you early.” Harry obediently climbed back into bed and was not heard from again that night. It seemed to him as though he had hardly gone to sleep before he heard his mother calling, “Six o’clock, Harry.” The boy was out of bed in an instant. He pattered to the window, rubbing the sleep out of his eyes as he went. The light of a perfect day in early October shone in as he raised the shade. If good weather were a happy omen, then surely he would obtain that which he was going forth so earnestly to seek. His mother had taken special pains with his breakfast that morning, and though he was quivering with excitement over what
  • 76. was to be his first venture into the busy world of trade, he tried to show his appreciation of her tender thoughtfulness by eating a hearty meal. In his neat, blue serge suit, he had put on his Sunday best, his well-shined shoes and his clean, white shirt with its immaculate collar, he was above reproach as far as attire went, and his bright, boyish face with its clear, blue eyes and clean-cut, resolute mouth made him a boy to be proud of. So his mother thought as she looked approvingly at him across the table. She stifled the sigh of regret that her boy must so early take his place among the bread-winners, and listened to his eager plan of what he intended to do with an encouraging smile. “Well, Mother, I’m off. That was a dandy breakfast. You know what I like, don’t you. I wish all the boys in the world had mothers like you. I don’t know when I’ll be back. If I don’t come home all day, you’ll know I’m working.” Reaching to the nail where he always hung his cap, Harry stood for an instant with it in his hand. Then he kissed his mother and went manfully down the two flights of stairs to the street. He had clipped from the paper the section of the want column with the advertisements he had marked. Now he studied it earnestly and set out for the Tyler Street address. It was at least fifteen squares from his home, but the clock on a nearby church had just chimed out the hour of seven. In his pocket reposed twenty cents in small change. He had earned it by doing errands after school. But he made up his mind that not a penny of it should go for carfare if he could help it. He had plenty of time to walk. He would very likely reach the place he had selected for his first call before the office was open. He wondered what sort of building it would be, and whether it was an office building or a factory. More than one person glanced in friendly fashion at the erect, manly lad as he hurried along. There was something in his earnest young face that commanded attention and instant approbation. “There it is,” he murmured as, after a half-hour’s brisk walk he came opposite a tall rather dingy-looking brick building. “That must
  • 77. be the office over there where the sign is hanging out.” Hurrying across the street the boy approached the door over which hung the sign, “The Knickerbocker Worsted Mills.” He read it aloud, then looked a trifle disappointed. This did not exactly accord with his ideas of a position. Then he laughed at his own mental hesitation. “What do you care if it is a mill office, Harry Harding,” he murmured. “It’s work you’re looking for, and you can’t expect to have everything just the way you want it.” Turning the knob on the door that bore a small sign of “Office,” the boy opened it and stepped inside a long room that had the shining oak furniture of his dreams. This room was divided off into many compartments by little oak fences with swinging gates. Near the door, at a little desk, sat a boy of about his own age. As he stepped into the room the boy rose to meet him. “Whada yuh want?” he asked superciliously. “Good morning,” said Harry politely. “I came in answer to your advertisement in the Journal for a boy. To whom do I go?” “Yuh don’t go unless I let yuh in,” declared the boy ill-naturedly. “Anyway, the position’s filled. The boss just hired a boy about ten minutes ago. That’s him over there.” He pointed to a black-haired lad, who had just emerged from a room adjoining the long office. “That’s the kid. Yuh better beat it. Nothin’ doin’ around here.” “Can’t I see the manager or—or—someone?” persisted Harry. “Naw, yuh can’t. Think I wanta get my head snapped off by buttin’ in where Mr. Warner’s openin’ his mail? Guess I know my business. Didn’t the boss just say, ‘Fred, if any more boys come here answerin’ our ad, tell ’em we’ve hired a boy?’ There’s nothin’ doin’, I tell yuh. Can’t yuh understand that?” “Yes, I can understand that,” retorted Harry with spirit. “What I can’t understand is how a big firm like this happens to have such a rude office boy. Good morning.”
  • 78. Harry walked away, his cheeks burning, eyes snapping, leaving the disagreeable boy to gaze after him in positive astonishment. Once outside the office, Harry paused and taking out the section of newspaper he had marked, scanned it earnestly. The next nearest place he had selected was at least a mile and a half from where he stood. It was twenty minutes to eight o’clock. “I guess I’d better ride,” mused Harry. “The earlier I reach a place, the better my chance will be to get something to do. I hope all the places won’t be like that mill. Why, I didn’t have a chance to talk to a soul except that smart office boy.” When, at a few minutes after eight o’clock, Harry climbed the steps of an imposing building of white stone, and was waved to a door on the right by a uniformed attendant, he entered a good-sized ante-room, only to find it filled with boys of anywhere from fourteen to eighteen years of age. They were not making so much noise as one might expect at least fifteen active boys to make, yet a distinct buzz of conversation was going on. Harry paused irresolutely. His eyes met those of a thin, red-haired, black-eyed boy with a mischievous face who stood just to the right of the door. The black-eyed boy grinned in friendly fashion. “Hullo,” he said. “Good-morning,” returned Harry, answering the grin with a pleasant smile. “Are all these boys looking for the same position?” “Yep,” nodded the black-eyed boy. “I guess the fellow that’s in the office now is going to get it. He’s been there quite a while.” He had hardly finished speaking when the door to the inner office opened and a tall, severe-looking man appeared. “We won’t need you, boys,” he said curtly. “The position is filled.” He waved his arm as though to shoo the waiting throng of lads out of the ante-room, then disappeared. The door closed after him with a reverberating bang that shattered the hopes of the fifteen waiting youngsters. “Huh,” ejaculated the black-eyed boy in disgust, “no more offices like this for me. I’ve been to two before this, and every time I’m too
  • 79. late. I guess these fellows that get the jobs get up in the middle of the night. Me for Martin’s Department Store. That’s where I ought to have gone in the first place.” “Do they need boys there?” asked Harry. He had walked beside his new acquaintance as far as the door. Here they paused. The attendant eyed them threateningly. “I hope so. Come on. Let’s get out of here. That man in the uniform will hurt his eyes tryin’ to look a hole through us.” The thin little boy urged Harry out of the building and down the steps to the street. “Say, what’s your name?” he asked curiously. “Harry Harding. What is yours?” “My name’s Theodore Burke, but everybody calls me Ted or Teddy, and I just quit school to find a job.” “I haven’t quit yet,” declared Harry, “but I’m going to as soon as I find work.” “Then you didn’t get fired?” “Oh, no. I am going to work to help my mother. I am obliged to find work.” “I had a fight with the teacher,” related Teddy, with unabashed candor. “She said I was a menace to the West Park School, and she was going to have me put in a school for tough kids. So I gave the fellows my stuff and beat it at recess. Ma was mad, but she got over it right away and said I could go to work if I wanted to.” “The teacher couldn’t put you in a school for tough boys, unless you did something pretty bad,” informed Harry. “I put a rubber snake in a girl’s seat,” confessed Ted, “and she hollered like anything.” His black eyes twinkled. Harry laughed. “Nobody could put you in a reform school for that,” he said wisely. “The teacher was trying to scare you. I guess you’re just full of mischief, that’s all.”
  • 80. Welcome to our website – the perfect destination for book lovers and knowledge seekers. We believe that every book holds a new world, offering opportunities for learning, discovery, and personal growth. That’s why we are dedicated to bringing you a diverse collection of books, ranging from classic literature and specialized publications to self-development guides and children's books. More than just a book-buying platform, we strive to be a bridge connecting you with timeless cultural and intellectual values. With an elegant, user-friendly interface and a smart search system, you can quickly find the books that best suit your interests. Additionally, our special promotions and home delivery services help you save time and fully enjoy the joy of reading. Join us on a journey of knowledge exploration, passion nurturing, and personal growth every day! ebookbell.com