SlideShare a Scribd company logo
Explore the full ebook collection and download it now at textbookfull.com
High Performance Computing on the Intel Xeon Phi
2014th Edition Endong Wang
https://textbookfull.com/product/high-performance-computing-
on-the-intel-xeon-phi-2014th-edition-endong-wang/
OR CLICK HERE
DOWLOAD EBOOK
Browse and Get More Ebook Downloads Instantly at https://textbookfull.com
Click here to visit textbookfull.com and download textbook now
Your digital treasures (PDF, ePub, MOBI) await
Download instantly and pick your perfect format...
Read anywhere, anytime, on any device!
Intel Xeon Phi Coprocessor Architecture and Tools The
Guide for Application Developers 1st Edition Rezaur Rahman
(Auth.)
https://textbookfull.com/product/intel-xeon-phi-coprocessor-
architecture-and-tools-the-guide-for-application-developers-1st-
edition-rezaur-rahman-auth/
textbookfull.com
Advances in High Performance Computing: Results of the
International Conference on “High Performance Computing”
Borovets, Bulgaria, 2019 Ivan Dimov
https://textbookfull.com/product/advances-in-high-performance-
computing-results-of-the-international-conference-on-high-performance-
computing-borovets-bulgaria-2019-ivan-dimov/
textbookfull.com
Tools for High Performance Computing 2015 Proceedings of
the 9th International Workshop on Parallel Tools for High
Performance Computing September 2015 Dresden Germany 1st
Edition Andreas Knüpfer
https://textbookfull.com/product/tools-for-high-performance-
computing-2015-proceedings-of-the-9th-international-workshop-on-
parallel-tools-for-high-performance-computing-september-2015-dresden-
germany-1st-edition-andreas-knupfer/
textbookfull.com
High Performance Computing in Science and Engineering '
18: Transactions of the High Performance Computing Center,
Stuttgart (HLRS) 2018 Wolfgang E. Nagel
https://textbookfull.com/product/high-performance-computing-in-
science-and-engineering-18-transactions-of-the-high-performance-
computing-center-stuttgart-hlrs-2018-wolfgang-e-nagel/
textbookfull.com
High Performance Computing in Science and Engineering 15
Transactions of the High Performance Computing Center
Stuttgart HLRS 2015 1st Edition Wolfgang E. Nagel
https://textbookfull.com/product/high-performance-computing-in-
science-and-engineering-15-transactions-of-the-high-performance-
computing-center-stuttgart-hlrs-2015-1st-edition-wolfgang-e-nagel/
textbookfull.com
High Performance Computing in Science and Engineering 16
Transactions of the High Performance Computing Center
Stuttgart HLRS 2016 1st Edition Wolfgang E. Nagel
https://textbookfull.com/product/high-performance-computing-in-
science-and-engineering-16-transactions-of-the-high-performance-
computing-center-stuttgart-hlrs-2016-1st-edition-wolfgang-e-nagel/
textbookfull.com
High Performance Computing for Geospatial Applications
Wenwu Tang
https://textbookfull.com/product/high-performance-computing-for-
geospatial-applications-wenwu-tang/
textbookfull.com
Heterogeneity, High Performance Computing, Self-
Organization and the Cloud Theo Lynn
https://textbookfull.com/product/heterogeneity-high-performance-
computing-self-organization-and-the-cloud-theo-lynn/
textbookfull.com
Parallel programming for modern high performance computing
systems Czarnul
https://textbookfull.com/product/parallel-programming-for-modern-high-
performance-computing-systems-czarnul/
textbookfull.com
High Performance Computing on the Intel Xeon Phi 2014th Edition Endong Wang
High-Performance
Computing on the
Intel®Xeon Phi™
EndongWang
Qing Zhang · Bo Shen
Guangyong Zhang
Xiaowei Lu
QingWu ·YajuanWang
How to Fully Exploit
MIC Architectures
High-Performance Computing on the
Intel
®
Xeon PhiTM
ThiS is a FM Blank Page
Endong Wang • Qing Zhang • Bo Shen
Guangyong Zhang • Xiaowei Lu
Qing Wu • Yajuan Wang
High-Performance
Computing on the
Intel®
Xeon PhiTM
How to Fully Exploit MIC Architectures
Endong Wang
Qing Zhang
Bo Shen
Guangyong Zhang
Xiaowei Lu
Qing Wu
Yajuan Wang
Inspur, Beijing, China
ISBN 978-3-319-06485-7 ISBN 978-3-319-06486-4 (eBook)
DOI 10.1007/978-3-319-06486-4
Springer Cham Heidelberg New York Dordrecht London
Library of Congress Control Number: 2014943522
# Springer International Publishing Switzerland 2014
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts
in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being
entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication
of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the
Publisher’s location, in its current version, and permission for use must always be obtained from
Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center.
Violations are liable to prosecution under the respective Copyright Law.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt
from the relevant protective laws and regulations and therefore free for general use.
While the advice and information in this book are believed to be true and accurate at the date of
publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for
any errors or omissions that may be made. The publisher makes no warranty, express or implied, with
respect to the material contained herein.
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)
Copyright # 2012 by China Water & Power Press, Beijing, China
Title of the Chinese original: MIC 高性能计算编程指南
ISBN: 978-7-5170-0338-0
All rights reserved
Translators
Dave Yuen
Chuck Li
Larry Zheng
Sergei Zhang
Caroline Qian
Foreword by Dr. Rajeeb Hazra
Today high-performance computing (HPC), especially the latest massively
supercomputers, has developed quickly in computing capacity and capability.
These developments are due to several innovations. Firstly, Moore’s law, named
after the Intel founder Gordon Moore, predicts that the number of semiconductor
transistors will double every 18–24 months. According to Moore’s law, Intel
continues to improve performance and shrink the size of transistors as well as to
reduce power consumption, all at the same time. Another innovation is a series of
CPU-improving microstructures which ensure that the performance of a single
thread coincides with the parallelism in each successive CPU generation.
The development of HPC plays an important role in society. Although people are
inclined to pay more attention to great scientific achievements such as the search for
the Higg’s boson or the cosmological model of cosmic expansion, the computing
capability that everyone can now acquire is also impressive. A modern two-socket
workstation based on the Intel®
Xeon series processors could show the same
performance as the top supercomputer from 15 years ago. In 1997, the fastest
supercomputer in the world was ASCI Red, which was the first computing system
to achieve over 1.0 teraflops. It had 9298 Intel Pentium Pro processors, and it cost
$55,000,000 per teraflop. In 2011, the cost per teraflop was reduced to less than
$1000. This reduction in cost makes high-performance computing accessible to a
much larger group of researchers.
To sufficiently make use of the ever-improving CPU performance, the applica-
tion itself must take advantage of the parallelism of today’s microprocessors.
Maximizing the application performance includes much more than simply tuning
the code. Current parallel applications make use of many complicated nesting
functions, from the message communication among processors to the parameters
in threads. With Intel CPUs, in many instances we could achieve gains of more than
ten times performance by exploiting the CPU’s parallelism.
The new Intel®
Xeon Phi™ coprocessor is built on the parallelism programming
principle of the Intel®
Xeon processor. It integrates many low-power consumption
cores, and every core contains a 512-bit SIMD processing unit and many new
vector instructions. This new CPU is also optimized for performance per watt. Due
to a computing capability of over one billion times per second, the Intel®
Xeon
Phi™ delivers a supercomputer on a chip. This brand new microstructure delivers
ground-breaking performance value per watt, but the delivered performance also
v
relies on those applications being sufficiently parallelized and expanded to utilize
many cores, threads, and vectors. Intel took a new measure to release this parallel-
ism ability. Intel followed the common programming languages (including C, C++,
and Fortran) and the current criteria. When readers and developers learn how to
optimize and make use of these languages, they are not forced to adopt nonstandard
or hardware-dependent programming modes. Furthermore, the method, based on
the criteria, ensures the most code reuse, and could create the most rewards by
compiling the transplantable, standardized language and applying it to present and
future compatible parallel code.
In 2011, Intel developed a parallel computing lab with Inspur in Beijing. This
new lab supplied the prior use and development environment of the Intel®
Xeon
processor and Intel®
Xeon Phi™ coprocessor to Inspur Group and to some excellent
application developers. Much programming experience can be found in this book.
We hope to help developers to produce more scientific discovery and creation, and
help the world to find more clean energy, more accurate weather forecasts, cures for
diseases, develop more secure currency systems, or help corporations to market
their products effectively.
We hope you enjoy and learn from this venerable book. This is the first book
ever published on how to use the Intel®
Xeon Phi™ coprocessor.
Santa Clara, CA Rajeeb Hazra
vi Foreword by Dr. Rajeeb Hazra
Foreword by Prof. Dr. Rainer Spurzem
Textbooks and teaching material for my Chinese students are often written in
English, and sometimes we try to find or produce a Chinese translation. In the
case of this textbook on high-performance computing with the Intel MIC, we now
have a remarkable opposite example—the Chinese copy appeared first, by Chinese
authors from Inspur Inc. led on by Wang Endong, and only some time later can all
of us English speakers enjoy and benefit from its valuable contents. This occasion
happens not by chance—two times in the past several years a Chinese supercom-
puter has been the certified fastest supercomputer in the world by the official
Top500 list (http://www.top500.org). Both times Chinese computational scientists
have found a special, innovative way to get to the top—once with NVIDIA’s GPU
accelerators (Tianhe-1A in 2010) and now, currently, with the Tianhe-2, which
realizes its computing power through an enormous number of Intel Xeon Phi
hardware, which is the topic of this book. China is rapidly ascending on the
platform of supercomputing usage and technology at a much faster pace than the
rest of the world. The new Intel Xeon Phi hardware, using the Intel MIC architec-
ture, has its first massive installation in China, and it has the potential for yet
another supercomputing revolution in the near future. The first revolution, in my
opinion, has been the transition from traditional mainframe supercomputers to
Beowulf PC clusters, and the second was the acceleration and parallelization of
computations by general-purpose computing on graphical processing units
(GPGPU). Now the stage is open for—possibly—another revolution by the advent
of Intel MIC architecture. The past revolutions of accelerators comprised a huge
qualitative step toward better price–performance ratio and better use of energy per
floating point operation. In some ways they democratized supercomputing by
making it possible for small teams or institutes to assemble supercomputers from
off-the-shelf components, and later even (GPGPU) provide massively parallel
computing in just a single desktop. The impact of Intel Xeon Phi and Intel MIC
on the market and on scientific supercomputing has yet to be seen. However,
already a few things can be anticipated; and let me add that I write this from the
perspective of a current heavy user and provider of GPGPU capacity and capability.
GPGPU architecture, while it provides outstanding performance for a fair range of
applications, is still not as common as expected a few years ago. Intel MIC, if it
fulfills the promise of top-class performance together with compatibility to a couple
of standard programming paradigms (such as OpenMP as it works on standard Intel
vii
CPUs, or MPI as it works on standard parallel computers) may quickly find a much
larger user community than GPU. I hope very much that this very fine book can help
students, staff, and faculty all over the world in achieving better results when
implementing and accelerating their tasks on this interesting new piece of hard-
ware, which will for sure appear on desktops, in institutional facilities, as well as in
numerous future supercomputers.
Beijing, China Rainer Spurzem
viii Foreword by Prof. Dr. Rainer Spurzem
Foreword by Endong Wang
Currently scientists and engineers everywhere are relentlessly seeking more com-
puting power. The capability of high-performance computing has become the
competition among the few powerful countries in the world. After the “million
millions flops” competition ended, the “trillion flops” contests have begun. The
technology of semiconductors restricts the frequency of processors, but multi-
processors and the many-integrated processors have become more and more impor-
tant. When various kinds of many-integrated cores came out, we found that
although the high point of computing has increased a lot, the compatibility of the
applications became worse, and the development of applications has become more
complicated. A lack of useful applications would render the supercomputer useless.
At the end of 2012, Intel corporation brought out the Intel®
Xeon Phi™
coprocessor based on the many-integrated core. This product integrated more
than 50 cores that were based on the x86 architecture into one PCI -Express
interface card. It is a powerful supplement to the Intel®
Xeon CPU, and brings a
new performance experience for a highly parallelized workload. It is easy to
program on this product, and there’s almost no difference when compared with
traditional programming. The code on the Intel®
Xeon Phi™ coprocessor could be
applied to a traditional platform based on CPU without any modifications, which
protects the user’s software investment. It can supply hundreds of running hardware
threads, which could bring high parallelism and meet the current demands of high
parallelization.
The Inspur-Intel China Parallel Computing Joint Lab was found on August
24, 2011. This lab aims to promote the trillion-flops supercomputing system
architecture and application innovation, establish the ecological condition of
high-performance computing, and accelerate supercomputing in China into the
trillion-flops era. The research and innovation in the Inspur-Intel China Parallel
Computing Joint Lab will make a positive impact on the development of
supercomputing in China in the next ten years, especially in the beginning of the
trillion-flops era for the rest of the world. The Inspur-Intel China Parallel Comput-
ing Joint Lab contributed to the completion of the Intel®
Xeon Phi™ coprocessor
and made a tremendous effort to popularize it.
This book was finished by several dedicated members of the Inspur-Intel China
Parallel Computing Joint Lab. In this book, relevant knowledge about the Intel®
Xeon Phi™ coprocessor, programming methods in using the Intel®
Xeon Phi™
ix
coprocessor, optimizations for the program, and two successful cases of applying
the Intel®
Xeon Phi™ coprocessor in practical high-performance computing are
introduced. This book has a clear structure and is easy to understand. It contains a
programming basis, optimization, and specific development projects. At the same
time, a lot of figures, diagrams, and segments of program were included to help
readers understand the material. The authors of this book have plenty of project
experience and have added their practical summaries of these projects. So this book
not only introduces the theory, but it also connects more closely to actual program-
ming. This book is also the first to introduce the Intel®
Xeon Phi™ coprocessor and
embodies the achievement of these authors. We hope to see China accumulate some
great experience in the field of HPC. The authors and the members of the Inspur-
Intel China Parallel Computing Joint Lab made great efforts to ensure the book
publishing coincides with the Intel®
Xeon Phi™ coprocessor, and they should be
respected for this.
We hope the readers will grasp the full use of the Intel®
Xeon Phi™ coprocessor
quickly after reading this book, and gain achievements in their own fields of HPC
application by making use of the Intel®
Xeon Phi™ coprocessor. The Inspur Group
hopes to dedicate themselves to HPC endeavors together with Intel Corporation.
Beijing, China Endong Wang
x Foreword by Endong Wang
Preface
High-performance computing (HPC) is a recently developed technology in the field
of computer science, and now computational science. HPC can secure a country’s
might, improve its national defense science, and promote the rapid development of
highly sophisticated weapons. HPC is one of the most important measures of a
country’s overall prowess and economic strength. With the rapid growth of an
information-based society, people are demanding more powerful capabilities in
information processing. HPC is used not only for oil exploration, weather predic-
tion, space technology, national defense, and scientific research, but also in finance,
government, education, business, network games, and other fields that demand
more computing capability. The drive in research to reach the goal of “trillion
flops” computing has begun, and people are looking forward to solving larger scale
and more complicated problems by using a trillion-flops supercomputer.
In this century, the many-integrated core (MIC) era has finally arrived. Today,
the HPC industry is going through a revolution, and parallel computing will be the
trend of the future as a prominent hot spot for scientific research. Current main-
stream research has adopted the CPU-homogeneous architecture, in which there are
dozens of cores in one node; this is not unusual. In large-scale computing,
thousands of cores will be needed. Meanwhile, the CPU-homogeneous architecture
faces a huge challenge because of its low performance-to-power ratio,
performance-to-access memory ratio, and low parallel efficiency. When computing
with the CPU+GPU heterogeneous architecture, the MIC acceleration technology
of GPU is used. More and more developers have become dedicated to this field, but
it also faces challenges such as fined-grained parallel algorithms, programming
efficiency, and performance on a large scale. This book focuses on the central issues
of how to improve the efficiency of large-scale computing, how to simultaneously
shorten programming cycles and increase software productivity, and how to reduce
power consumption.
Intel Corporation introduced the Intel®
Xeon Phi™ series products, which are
based on the MIC, to solve highly parallelized problems. The performance of the
double-precision of this product has reached teraflop levels. It is based on the
current x86 architecture, and supports OpenMP, pThread, MPI, and many parallel
programming models. It also supports the traditional C/C++/Intel®
Cilk™ Plus,
Fortran, and many other programming languages. It is programmed easily, and
many associated tools are supported. For applications that are difficult to realize by
xi
the traditional CPU platform, the MIC platform will greatly improve performance,
and the source code can be shared by the CPU and MIC platform without any
modifications. The combination of CPU and MIC in the x86 platform in heteroge-
neous computing provides HPC users with a new supercomputing solution.
Since the Inspur-Intel China Parallel Computing Joint Lab was established on
August 24, 2011, the members have dedicated themselves to HPC application
programs on the MIC platform, and have ensured that the Intel®
Xeon Phi™ series
products would be released smoothly. We have accumulated a large amount of
experience in exploring the software and hardware of MIC. It is a great honor for us
to participate in the technology revolution of HPC and introduce this book to
readers as pioneers. We hope more readers will make use of MIC technology and
enjoy the benefits brought forth by the Intel®
Xeon Phi™ series products.
Target Audience
The basic aim of this book is to help developers learn how to efficiently use the
Intel®
Xeon Phi™ series products, by which they can develop, transplant, and
optimize their parallel programs. The general content of this book introduces some
computing grammar, programming technology, and optimization methods in using
MIC, and we also offer some solutions to the problems encountered during actual
use based on our optimization experience.
We assume that readers already have some basic skills in parallel programming,
but have a scant knowledge of MIC. This book does not intend to introduce the
theory of parallel computing or algorithms, so we also assume that readers already
have this knowledge. In spite of this, when faced with the parallel algorithm, we
still describe it in a simple way. We assume that readers are familiar with OpenMP,
MPI, and other parallel models, but we also state the basic grammar. We assume
that readers can make use of any one of the C/C++/Fortran languages, and that
C/C++ is preferred. However, the ideas and advice stated in this book are also
adapted to other high-level languages. Moreover, when the Intel®
Xeon Phi™
series of products support other languages in the future, most of the optimization
methods and application experience will still be effective. Generally speaking, this
book is for three types of computing-oriented people:
Students and professional scientists and engineers in colleges, universities, and
research institutes, and developers engaged in parallel computing, multi-core, and
many integrated core technology.
IT employees, especially those who develop HPC software, improve application
performance by many-integrated cores, and pursue extreme performance in the
HPC field.
HPC users in other fields, including oil exploration, biological genetics, medical
imaging, finance, aerospace, meteorology, and materials chemistry. We hope to
help them to improve the original CPU performance by means of MIC and
ultimately increase productivity.
xii Preface
We wish to benefit more readers with this book. In the future we also hope to
engage more and more readers around the world.
About This Book
Because of the diverse characteristics of MIC architecture, this book cannot be
sorted strictly into well-defined sections. This book introduces the MIC program-
ming language and Intel®
Xeon Phi™ series products, and it also describes optimi-
zation in parallel programming. Through this book, we hope readers will fully
understand MIC, and we expect readers to make good use of MIC technology in
future practice.
This book includes three parts. The first one covers MIC basics, and includes
Chaps. 1–7, in which fundamental knowledge about the MIC technology is
introduced.
In Chap. 1, the development of parallel computing is recalled briefly. The current
hardware characteristics of parallel computing are compared. Then MIC tech-
nology is introduced, and the advantages of MIC are stated.
In Chap. 2, the hardware and software architecture of MIC are introduced. Although
there’s no influence on programming on MIC in the absence of this background
knowledge, exploring the MIC architecture deeply will help our programs
become more adapted to MIC.
In Chap. 3, by computing the circumference ratio pi, the characteristics of MIC
programming are directly demonstrated to readers. In addition, we introduce the
background procedures of the program.
In Chap. 4, the background knowledge of MIC programming is discussed, including
the basic grammar of OpenMP and MPI. If you have had this basic training, you
can skip this chapter altogether.
In Chap. 5, the programming model, grammar, environment variables, and compi-
lation options of MIC are introduced. You should be able to grasp the method of
writing your own MIC program by this chapter.
In Chap. 6, some debugging and optimization tools and their usage are introduced.
These tools bring a great deal of convenience to debugging and optimization.
In Chap. 7, some Intel mathematical libraries that have been adapted on MIC are
discussed, including VML, FFT, and Blas.
The second section covers performance optimization, and comprises Chaps. 8
and 9.
In Chap. 8, the basic principles and strategy of MIC optimization are introduced,
and then the methods and circumstance of MIC optimization are stated. The
general methods of MIC optimization are covered. Moreover, most of the
methods are applicable to the CPU platform, with a few exceptions.
Preface xiii
In Chap. 9, through the classical example in parallel computing—the optimization
of matrix multiplication—the optimization measures are stated step-by-step in
the method of integrating theory with practice.
The third and last section covers project development, and includes Chaps. 10
and 11.
In Chap. 10, we propose a set of methods to apply parallel computing to project
applications by summarizing our experiences on development and optimization
of our own projects. We also discuss how to determine if a serial or parallel CPU
program is suitable for MIC, and how to transplant the program onto MIC.
In Chap. 11, we show, using two actual cases of how the MIC technology influences
an actual project.
In the early stages, this book was initiated by Endong Wang, the director of the
State Key Laboratory of high-efficiency server and storage technology at the Inspur-
Intel China Parallel Computing Joint Lab, and the senior vice president of Inspur
Group Co., Ltd. Qing Zhang, the lead engineer of the Inspur-Intel China Parallel
Computing Joint Lab, formulated the plan, outline, structure, and content of every
chapter. Then, in the middle stage, Qing Zhang organized and led the team for this
book, checking and approving it regularly. He examined and verified the accuracy of
the content, the depth of the technology stated, and the readability of this book, and
gave feedback for revisions. This book was actually written by five engineers in the
Inspur-Intel China Parallel Computing Joint Lab: Bo Shen, Guangyong Zhang,
Xiaowei Lu, Qing Wu, and Yajuan Wang. The first chapter was written by Bo
Shen. The second chapter was written by Qing Wu and Bo Shen. The third through
fifth chapters were written by Bo Shen, and Yajuan Wang participated. The sixth
chapter was written by Qing Wu. The seventh chapter was written by Xiaowei
Lu. The eighth chapter was written by Guangyong Zhang, and Bo Shen and Yajuan
Wang participated. The ninth chapter was written by Guangyong Zhang. The tenth
chapter was written by Bo Shen. The eleventh chapter was written by Xiaowei Lu
and Guangyong Zhang. In the later stage, this book was finally approved by Endong
Wang, Qing Zhang, Dr. Warren from Intel, and Dr. Victor Lee.
The whole source code has been tested by the authors of this book, but because
of the initial stage of MIC technology, we cannot ensure that these codes will be
applicable in the latest release. Hence, if any updates come out for the compiler and
the execution environment of MIC, please consult the corresponding version
manual by Intel.
Acknowledgments
The publication of this book is the result of group cooperation. We would like to
show our respect to the people who gave their full support to the composition and
publication.
We must express our heartfelt thanks to Inspur Group and Intel Corporation,
who gave us such a good platform and offered the working opportunity in the
xiv Preface
Inspur-Intel China Parallel Computing Joint Lab. We are fortunate to be able to do
research on MIC technology.
We are grateful for the support of the leadership of Inspur Group, especially to
the director of the HPC Center, Inspur Group, Jun Liu, who supplied us with
financial support and solicitude.
We are grateful to Michael Casscles, Dr. Wanqing He, Hongchang Guo,
Dr. David Scott, Xiaoping Duan, and Dr. Victor Lee for their support of the
technology and resources for our daily work in the parallel computing joint lab.
We especially can’t forget Wanqing! He supplied us with plenty of guidance from
experience before writing this book. We are also grateful to Dr. Raj Hazra, GM
Technical Computing in Intel Corporation, and Joe Curley, MD Technical Com-
puting in Intel Corporation, for their support of the Inspur-Intel China Parallel
Computing Joint Lab.
We are grateful to our application users: BGP Inc., China National Petroleum
Corp, Institute of Biophysics, Chinese Academy of Sciences, Northwestern
Polytechnical University, Chinese Academy of Meteorological Sciences, and
Shandong University—especially Prof. Fei Sun and Dr. Kai Zhang from the
Institute of Biophysics Chinese Academy of Sciences—and Profs. Chengwen
Zhong and Qinjian Li from Northwestern Polytechnical University. The cases in
this book come from them.
We are grateful to Inspur Group and Intel Corporation for their support, espe-
cially the managers Yongchang Jiang and Ying Zhang from the High-Efficiency
Server Department, Inspur Group, who were able to save us a great deal of time.
We thank very much Dr. Haibo Xie and Xiaozhe Yang; we are unable to forget
this pleasant time.
We are grateful to the families of the authors for their consideration and
patience.
We thank the editors from China WaterPower Press, especially to the editor
Chunyuan Zhou and editor Yan Li for their tolerance of our demands. This book
could not possibly be published without their hard work.
We are very grateful for the English translation made by Professor David
A. Yuen and his team from the University of Minnesota, Twin Cities, China
University of Geosciences, Wuhan, consisting of Qiang (Chuck) Li, Liang (Larry
Beng) Zheng, Siqi (Sergei) Zhang, and Caroline Qian. Jed Brown and Karli Rupp
from Argonne National Laboratory also gave very useful advice, and finally, Prof.
Xiaowen Chu and Dr. Kayiyong Zhao from Hong Kong Baptist University are to be
thanked for their help in proofreading of the last few chapters.
Lastly, we are grateful to all the others whom we have not acknowledged.
MIC technology has just come out, so there are undoubtedly some mistakes to be
found in this book. We apologize for this and look forward to any suggestions from
our readers. This is the first book ever written in any language on MIC technology;
it was published in the fall of 2012, and is to be contrasted with the newer books
coming out from the USA in 2013 bearing the names of Intel Xeon Phi coprocessor.
Beijing, China Qing Zhang
Preface xv
ThiS is a FM Blank Page
Contents
Part I Fundamental Concepts of MIC
1 High-Performance Computing with MIC . . . . . . . . . . . . . . . . . . . . 3
1.1 A History of the Development of Multi-core and Many-Core
Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 An Introduction to MIC Technology . . . . . . . . . . . . . . . . . . . . 7
1.3 Why Does One Choose MIC? . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.1 SMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.2 Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.3 GPGPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 MIC Hardware and Software Architecture . . . . . . . . . . . . . . . . . . 13
2.1 MIC Hardware Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.2 Overview of MIC Hardware Architecture . . . . . . . . . . 14
2.1.3 The MIC Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.4 Ring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.1.5 Clock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.1.6 Page Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.1.7 System Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.1.8 Performance Monitoring Unit and Event Manager . . . . 38
2.1.9 Power Management . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.2 Software Architecture of MIC . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.2.2 Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.2.3 Linux Loader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.2.4 μOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.2.5 Symmetric Communication Interface . . . . . . . . . . . . . 45
2.2.6 Host Driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.2.7 Sysfs Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.2.8 MIC Software Stack of MPI Applications . . . . . . . . . . 49
2.2.9 Application Programming Interfaces . . . . . . . . . . . . . . 56
xvii
3 The First MIC Example: Computing Π . . . . . . . . . . . . . . . . . . . . . 57
4 Fundamentals of OpenMP and MPI Programming . . . . . . . . . . . . 61
4.1 OpenMP Foundation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.1.1 A Brief Introduction to OpenMP . . . . . . . . . . . . . . . . . 62
4.1.2 OpenMP Programming Module . . . . . . . . . . . . . . . . . 62
4.1.3 Brief Introduction to OpenMP Grammar . . . . . . . . . . . 62
4.2 Message-Passing Interface Basics . . . . . . . . . . . . . . . . . . . . . . 67
4.2.1 Start and End MPI Library . . . . . . . . . . . . . . . . . . . . . 69
4.2.2 Getting Information About the Environment . . . . . . . . 70
4.2.3 Send and Receive Messages . . . . . . . . . . . . . . . . . . . . 70
5 Programming the MIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.1 MIC Programming Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2 Application Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.2.1 CPU in Native Mode . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.2.2 CPU Primary, MIC Secondary Mode . . . . . . . . . . . . . 76
5.2.3 CPU and MIC “Peer-to-Peer” Mode . . . . . . . . . . . . . . 77
5.2.4 MIC Primary, CPU Secondary Mode . . . . . . . . . . . . . 77
5.2.5 MIC-Native Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.3 Basic Syntax of MIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.3.1 Offload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.3.2 Declarations of Variables and Functions . . . . . . . . . . . 100
5.3.3 Header File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.3.4 Environment Variables . . . . . . . . . . . . . . . . . . . . . . . . 102
5.3.5 Compiling Options . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.3.6 Other Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.4 MPI on MIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.4.1 MPI on MIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.4.2 MPI Programming on MIC . . . . . . . . . . . . . . . . . . . . . 106
5.4.3 MPI Environment Setting on MIC . . . . . . . . . . . . . . . 108
5.4.4 Compile and Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.4.5 MPI Examples on MIC . . . . . . . . . . . . . . . . . . . . . . . 111
5.5 SCIF Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.5.1 What Is SCIF? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.5.2 Basic Concepts of SCIF . . . . . . . . . . . . . . . . . . . . . . . 114
5.5.3 Communication Principles of SCIF . . . . . . . . . . . . . . . 116
5.5.4 SCIF’s API Functions . . . . . . . . . . . . . . . . . . . . . . . . 118
6 Debugging and Profiling Tools for the MIC . . . . . . . . . . . . . . . . . . 123
6.1 Intel’s MIC-Supported Tool Chains . . . . . . . . . . . . . . . . . . . . . 123
6.2 MIC Debugging Tool IDB . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.2.1 Overview of IDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.2.2 IDB Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.2.3 IDB Support and Requirements for MIC . . . . . . . . . . . 125
6.2.4 Debugging MIC Programs Using IDB . . . . . . . . . . . . . 125
6.3 MIC Profiling Tool VTune . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
xviii Contents
7 Intel Math Kernel Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
7.1 Introduction to the Intel Math Kernel Library . . . . . . . . . . . . . . 167
7.2 Using Intel MKL on MIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
7.2.1 Compiler-Aided Offload . . . . . . . . . . . . . . . . . . . . . . . 169
7.2.2 Automatic Offload Mode . . . . . . . . . . . . . . . . . . . . . . 171
7.3 Using FFT on the MIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
7.3.1 Introduction to FFT . . . . . . . . . . . . . . . . . . . . . . . . . . 175
7.3.2 A Method to Use FFT on the MIC . . . . . . . . . . . . . . . 175
7.3.3 Another Method to Use FFT on the MIC . . . . . . . . . . . 178
7.4 Use BLAS on the MIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
7.4.1 A Brief Introduction to BLAS . . . . . . . . . . . . . . . . . . . 184
7.4.2 How to Call BLAS on the MIC . . . . . . . . . . . . . . . . . . 185
Part II Performance Optimization
8 Performance Optimization on MIC . . . . . . . . . . . . . . . . . . . . . . . . 191
8.1 MIC Performance Optimization Strategy . . . . . . . . . . . . . . . . . 191
8.2 MIC Optimization Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
8.2.1 Parallelism Optimization . . . . . . . . . . . . . . . . . . . . . . 193
8.2.2 Memory Management Optimization . . . . . . . . . . . . . . 196
8.2.3 Data Transfer Optimization . . . . . . . . . . . . . . . . . . . . 199
8.2.4 Memory Access Optimization . . . . . . . . . . . . . . . . . . . 212
8.2.5 Vectorization Optimization . . . . . . . . . . . . . . . . . . . . . 216
8.2.6 Load Balance Optimization . . . . . . . . . . . . . . . . . . . . 225
8.2.7 Extensibility of MIC Threads Optimization . . . . . . . . . 228
9 MIC Optimization Example: Matrix Multiplication . . . . . . . . . . . . 231
9.1 Series Algorithm of Matrix Multiplication . . . . . . . . . . . . . . . . 231
9.2 Multi-thread Matrix Multiplication Based on OpenMP . . . . . . . 233
9.3 Multi-thread Matrix Multiplication Based on MIC . . . . . . . . . . 234
9.3.1 Basic Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
9.3.2 Vectorization Optimization . . . . . . . . . . . . . . . . . . . . . 235
9.3.3 SIMD Instruction Optimization . . . . . . . . . . . . . . . . . . 236
9.3.4 Block Matrix Multiplication . . . . . . . . . . . . . . . . . . . . 237
Part III Project Development
10 Developing HPC Applications Based on the MIC . . . . . . . . . . . . . . 259
10.1 Hotspot Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
10.1.1 Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
10.1.2 Hotspot Locating and Testing . . . . . . . . . . . . . . . . . . . 261
10.2 Program Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
10.2.1 Analysis of Program Port Modes . . . . . . . . . . . . . . . . . 264
10.2.2 Analysis of Size of the Computation . . . . . . . . . . . . . . 264
10.2.3 Characteristic Analysis . . . . . . . . . . . . . . . . . . . . . . . . 265
10.2.4 Parallel Analysis of Hotspots . . . . . . . . . . . . . . . . . . . 267
Contents xix
10.2.5 Vectorization Analysis . . . . . . . . . . . . . . . . . . . . . . . . 270
10.2.6 MIC Memory Analysis . . . . . . . . . . . . . . . . . . . . . . . . 270
10.2.7 Program Analysis Summary . . . . . . . . . . . . . . . . . . . . 271
10.3 MIC Program Development . . . . . . . . . . . . . . . . . . . . . . . . . . 271
10.3.1 OpenMP Parallelism Based on the CPU . . . . . . . . . . . 272
10.3.2 Thread Extension Based on MIC . . . . . . . . . . . . . . . . 273
10.3.3 Coordination Parallelism Based on Single-Node
CPU+MIC Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
10.3.4 MIC Cluster Parallelism . . . . . . . . . . . . . . . . . . . . . . . 274
11 HPC Applications Based on MIC . . . . . . . . . . . . . . . . . . . . . . . . . . 277
11.1 Parallel Algorithms of Electron Tomography Three-Dimensional
Reconstruction Based on Single-Node CPU+MIC Mode . . . . . . 278
11.1.1 Electron Tomography Three-Dimensional
Reconstruction Technology and Introduction of SIRT
Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
11.1.2 Analysis of the Sequential SIRT Program . . . . . . . . . . 281
11.1.3 Development of a Parallel SIRT Program Based on
OpenMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
11.1.4 Development of Parallel SIRT Programs Based
on the MIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
11.1.5 Design of the Heterogeneous and Hybrid Architecture
of CPU+MIC Mode Based on Single Nodes
and Multiple Cards . . . . . . . . . . . . . . . . . . . . . . . . . . 291
11.2 Parallel Algorithms of Large Eddy Simulation Based
on the Multi-node CPU+MIC Mode . . . . . . . . . . . . . . . . . . . . . 296
11.2.1 Large Eddy Simulation Based on the Lattice Boltzmann
Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
11.2.2 Analysis of Large Eddy Simulation Sequential (Serial)
Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
11.2.3 Parallel Algorithm of Large Eddy Simulation Based on
OpenMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
11.2.4 Parallel Algorithm of Large Eddy Simulation Based on
MIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
11.2.5 Parallel Algorithm of Large Eddy Simulation Based on
Multi-nodes and CPU+MIC Hybrid Platform . . . . . . . . 309
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
Appendix: Installation and Environment Configuration of MIC . . . . . . 325
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
xx Contents
Introduction to the Authors
Endong Wang is both a Director and Professor of
the Inspur-Intel China Parallel Computing Joint
Lab in Beijing, China. He has received a special
award from the China State Council, and is also a
member of a national advanced computing tech-
nology group of 863 experts, the director of the
State Key Laboratory for high-efficiency server
and storage technology, Senior Vice President of
the Inspur group, the chairman of the Chinese
Committee of the International Federation for
Information Processing (IFIP), and Vice Presi-
dent of the China Computer Industry Association.
He is the winner of the National Scientific and
Technology Progress Award as the first inventor
in three projects, the winner of the Ho Leung Ho
Lee Science and Technology innovation award
in 2009, and has garnered 26 national invention
patents.
Qing Zhang has a master’s degree in computer
science from Huazhong Technical University in
Wuhan and is now a chief engineer of the Inspur-
Intel China Parallel Computing Joint Lab. He is
manager of HPC application technology in
Inspur Group—which engages in HPC, parallel
computing, CPU multi-core, GPU, and MIC
technology—and is in charge of many heteroge-
neous parallel computing projects in life
sciences, petroleum, meteorology, and finance.
xxi
Bo Shen is a senior engineer of the Inspur-Intel
China Parallel Computing Joint Lab, and is
engaged in high-performance algorithms,
research, and application of software develop-
ment and optimization. He has many years of
experience concerning the development and opti-
mization in life sciences, petroleum, and
meteorology.
Guangyong Zhang has a master’s degree from
Inner Mongolia University, majoring in com-
puter architecture, and is now an R&D engineer
of the Inspur-Intel China Parallel Computing
Joint Lab, engaged in the development and opti-
mization of GPU/MIC HPC application software.
He has published many papers in key conference
proceedings and journals.
Xiaowei Lu received a master’s degree from
Dalian University of Technology, where he stud-
ied computer application technology, and is now
a senior engineer of the Inspur-Intel China Paral-
lel Computing Joint Lab, engaged in the algo-
rithm transplantation and optimization in many
fields. He is experienced in high-performance
heterogeneous coordinate computing
development.
xxii Introduction to the Authors
Qing Wu has a master’s degree from Jilin Univer-
sity in Changchun and is now a senior engineer of
the Inspur-Intel China Parallel Computing Joint
Lab, engaged in high-performance parallel com-
puting algorithm and hardware architecture as
well as software development and optimization.
He led many transplantation and optimization
projects concerning the heterogeneous coordinate
computing platform in petroleum.
Yajuan Wang has a master’s degree from the
Catholic University of Louvain, majoring in arti-
ficial intelligence. She is now a senior engineer of
the Inspur-Intel China Parallel Computing Joint
Lab, and is heavily involved in artificial intelli-
gence and password cracking.
Introduction to the Authors xxiii
Part I
Fundamental Concepts of MIC
The fundamental concepts of MIC architecture will be introduced in this section,
including the development history of HPC, the software and hardware architecture
of MIC, the installation and configuration of the MIC system, and the grammar
of MIC.
After finishing this section, readers will have learned the background of MIC
architecture and how to write HPC programs on the MIC.
High-Performance Computing with MIC
1
In this chapter, we will first review the history in the development of multi- and
many-core computers. Then we will give a brief introduction to Intel MIC technol-
ogy. Finally, we will compare MIC with other HPC technologies, as a background
reference of MIC technology for the reader.
Chapter Objectives. From this chapter, you will learn about:
• Developmental history of parallel computing, multi-core and many-core.
• Brief review of MIC technology
• Feature of MIC as compared to other multi-core and many-core technologies.
1.1 A History of the Development of Multi-core
and Many-Core Technology
The computer of yore had only one core, and it could only run a single program at a
time. Then batch processing was developed about 60 years ago, which allows for
multiple programs to launch at the same time. But they could only be launched
simultaneously; when they run on CPU, they still process sequentially. However,
when computer hardware was developed further, one program might not use up all
the computational power, thus wasting valuable resources. So the definition of
process was born (and then based on processes, threads were later developed).
The process switch was also developed, not only making full use of computing
power, but also defining the general meaning of “parallel”: running different tasks
at the same time. However, this “same time” is only based on a macroscopic
meaning: only one task can run on a single time segment.
From 1978, after the Intel 8086 processor was released, personal computers
became cheaper and more popular. Then, Intel launched the 8087 coprocessor,
which was a milestone event (it has great meaning for programmers: the IEEE
754 float standard was born because of the 8087 coprocessor). The coprocessor only
assists the main processor, and it has to work together with the central processor.
E. Wang et al., High-Performance Computing on the Intel®
Xeon Phi™,
DOI 10.1007/978-3-319-06486-4_1, # Springer International Publishing Switzerland 2014
3
The purpose of the 8087 coprocessor was that at that time, the central processor was
designed to work with integers and was weak on float support, but they could not
put more transistors into the chip. Thus, the 8087 coprocessor was built to assist
with float computation. The significance of this is that the coprocessor was born; the
computing process was not patented for CPU, but it now had a first helper. And
although after further development of manufacturing technology (i.e., 486DX), a
coprocessor was built into the CPU, the idea of a coprocessor never died.
Unfortunately, the growth of computational power never matches our
requirements. After the idea of processes was proposed, the computational power
came up short again. After Intel announced the 8086 in 1978, CPUs from Intel,
AMD, and other companies increased performance continuously, almost following
Moore’s law: every 18–24 months, the number of transistors doubled, and it
allowed for a rapid increase of speed and computational power.
When the manufacturing technology improves, the capability of the processing
unit also increases. Technologies such as super-scalar, super-pipeline, Very Long
Instruction World (VLIW), SIMD, hyper-threading, and branch prediction were
applied to the CPU simultaneously. Those technologies bring instruction-level
parallelism (ILP), which is the lowest level of parallelism. By CPU hardware
support, they allow parallelism of binary instructions even while running on a
single CPU. But this kind of parallelism is commonly controlled by hardware.
Programmers can only passively take advantage of technology development instead
of controlling the whole process. However, programmers can always adjust their
code or use some special assembler instructions to control CPU action indirectly,
even though the final implementation is still controlled by hardware.
The developmental speed of CPUs has slowed down in recent years. The
primary ways to improve single-core CPU performance are now to increase the
working frequency and improve instruction-level parallelism. Both of these
methods face problems: while manufacturing technology has improved, the size
of the transistor is close to the atomic level, which makes power leaking a serious
concern. The power consumption and heat generation per unit size has become
larger and larger, making it difficult to improve frequency as quickly as before;
4 GHz is the limit of most companies. On the other hand, there is not much
instruction-level parallelism in general-purpose computing. While there is a great
effort in changing the design, the performance increase is not proportional to the
increase in the number of transistors.
While the single CPU can no longer improve very much, using multiple CPUs at
the same time has become the natural next idea for scientists. Using multiple CPUs
on a single motherboard is a cost-efficient solution. However, this solution is
bounded by cost, so it is only popular on servers, which are not so sensitive to
cost and power consumption. The idea of using multiple CPUs is still widely used in
the area of high-performance computing.
As early as 1966, Michael Flynn classified computer architecture by instructions
and data flow: single instruction stream and single data stream (SISD), single
instruction stream and multiple data stream (SIMD), multiple instruction stream
and single data stream (MISD), and multiple instruction stream and multiple data
stream (MIMD), named as the Flynn classification. Within those classifications,
4 1 High-Performance Computing with MIC
MISD is very rare, and SISD is referred to as the primal batch machine model.
SIMD uses a single instruction controller, dealing with different data streams using
the same instructions. SIMD generally describes hardware; it is used in software
parallelism, in which the single instruction controller is referred to one instruction
instead of the specific hardware. MIMD refers to most of the parallel computers that
use different instruction streams to process different data. In commercial parallel
machines, MIMD is the most popular and SIMD is the second.
Based on the Flynn’s idea of classification for top-level computers, hardware
companies are now building supercomputers without taking cost into account.
“Supercomputers”, referring to those computers with performance in the leading
position (e.g., Top500), commonly have thousands of processors and a specially
designed memory and I/O system. Their architecture is quite different from per-
sonal computers, unlike some personal computers, which are connected to each
other. However, there is still a very strong bond between personal computers and
supercomputers. Just like high-level military and space technology, which can be
used later in normal life (e.g., the Internet), many new technologies used in
supercomputers can also be applied to the development of personal computers on
desktops. For example, some CPUs of supercomputers can be used directly on
personal computers, and some technologies, such as the vectorized unit or the
processor package, are already widely used on personal computers.
However, the cost of a supercomputer is too high, so most common research
facilities cannot afford it. While network technologies continue to advance, the
collaboration of multiple nodes becomes practical. Because all the nodes make a
fully functional computer, jobs can be sent to different nodes to achieve parallelism
among them, thus using computational resources efficiently. Collaboration can be
done through a network, a derivative two-architectures computer cluster, or
distributed computing.
A computer cluster is a group of computers connected by a network to form a
very tightly collaborating system. Distributed computing is the base of the popular
“cloud computing”, which cuts a huge computation job and data into many small
pieces, distributes them to many computers with a loose connection, and collects
the results. Although the performance is no match for the supercomputer, the cost is
much lower.
While other hardware companies are extending to different computer
architectures, the central processor companies are continuously increasing the
frequency of the CPU and changing the CPU architecture. However, limited by
manufacturing technologies, materials, and power consumption, after a period of
fast development, the CPU frequency has reached a bottleneck, and the progress for
increased processor frequency has slowed down. With current methods reaching a
dead end, processor companies like Intel and AMD are seeking other ways to
increase performance while maintaining or increasing the energy efficiency of
processors.
1.1 A History of the Development of Multi-core and Many-Core Technology 5
The commercial idea of CPU companies has not changed: if one CPU is not
enough, two will be used. Then, with better manufacturing technology, they could
put more cores on a single chip; thus, multi-core CPUs were born. In 2005, Intel and
AMD formally released dual-core CPUs into the market; in 2009, quad-core and
octo-core CPUs were announced by Intel. Multi-core CPUs became so popular that
nowadays even normal personal computers have multi-core CPUs, with perfor-
mance matching the previous multi-CPU server node. Even disregarding the
increase of single core performance, multi-core CPUs build multiple cores together,
and simply putting them together results in better connectivity between cores than
the multi-CPU architecture connected by a mainboard bus. There are also many
improvements nowadays for multi-core CPUs, such as the shared L3 cache, so that
collaboration between cores is much better.
Along with the transition from the single-core CPU to the multi-core,
programmers also began to change ideas, focusing more on multi-threading and
parallel programming. This idea was developed in the 1970s and 1980s, with
supercomputers and clusters already being built in the high-end of the field at that
time. Due to the cost, most programmers could only get their hands on single-core
computers. So when multi-core CPUs became popular, programmers started to use
tools like MPI and OpenMP, which had been dormant for quite a long time, and
could enjoy the full use of computational power.
While the requirements of computational power continue to grow relentlessly,
CPU performance is not the only problem. Power consumption is another, so people
remembered a good “helper” of the CPU: the coprocessor. In 2007, the populariza-
tion of general-purpose computing on graphical processing units (GPGPU) also
meant the return of coprocessors. Although the job of GPGPU concerns display and
image processing, its powerful float processing capabilities make it a natural
coprocessor. And as the developer of the coprocessor, Intel has never forgotten
8087, so in 2012 it introduced the MIC as a new generation of coprocessors, which
will make a great contribution to high-performance computation.
There are two key words concerning the development of the computer: need and
transmigration. Dire need is always the original power for pushing the development
of science and technology; as mentioned above, people needed a faster way to
calculate, resulting in the creation of the computer. Then they needed to fully use
the resources of the computer, so multi-processing and multi-threading came;
because of the increase in calculation capacity, more computation was required,
so the processing capacity of one core increased. And as the requirement to the
computational power is endless, but improving the processing capacity of one core
is limited, different multi-processors came, such as double CPU nodes, clusters, and
multi-core CPUs. Transmigration, in general terms, is to recycle. In programmers’
jargon it is iteration, the path of computer development, and the way to future
development. For example, the primal CPU is in charge of display, but the image
processing becomes more and more complex, so the CPU alone cannot handle
it. Then the GPU was born, and as manufacturing technology continued to advance,
the development of the CPU and GPU became more mature. The CPU and GPU
reunited again, such as the i-class CPUs of Intel, the APU of AMD, and NVIDIA is
6 1 High-Performance Computing with MIC
Exploring the Variety of Random
Documents with Different Content
“Huh! A bunch of stuck-up tenderfeet—that’s all they are! They
maybe learned that trick in a circus and pulled it off on us to make
us feel how little we know.”
“You couldn’t do it,” said the Parson, grimly.
“Well, I wouldn’t want to. A cow pony is good enough for me, or I
can walk when I have to.” And with that Hinkee Dee stalked away.
But the others did not conceal their admiration and amazement at
the feat of the boys. They crowded about, asked all sorts of
questions, and some of the cowboys patted the parts of the craft as
though soothing a restive horse of a new species.
“Well, I see you arrived,” remarked Mr. Munson, who came up
when the curiosity of the cowboys was about satisfied.
“Did you know they were up to this?” demanded the foreman.
“Well, I did see ’em tinkering with some contraption over in the
woods,” admitted the cattle buyer as he called himself. “But I
thought I’d let ’em surprise you.”
Professor Snodgrass, who had come back, his specimen boxes
filled, saw the gleaming wings of the airship and called:
“Oh, boys, are you going to make another flight? I want to go up,
for I have an idea there is a new species of high-flying butterfly in
this region and I’d like to get a specimen.”
“We’ll take you up after we’ve had something to eat,” said Bob.
“Fine!” cried the professor. “I’ll get my long-handled net ready.
Some of those butterflies are very shy in the upper air currents.”
“Do you mean to say you’re going up in that?” asked the Parson.
“Why not?” counter queried Professor Snodgrass. “I’ve done it
before.”
There was a murmur of surprise, and it was easy to see that the
professor had advanced greatly in the estimation of the cowboys.
The putting together of the airship, and its use by the boys made
quite a diversion at Square Z ranch, where novelties were rare. The
cowboys lost so much time from their routine work looking up at the
clouds for a sight of the craft that Dick Watson finally requested the
boys to make their flights at times when the employees were at
liberty, or else keep from circulating over the cattle ranges.
Professor Snodgrass went up not once but several times, and
made choice captures of upper air insects. Jerry and his chums tried
to induce some of the cowboys to take a flight with them. But
though Gimp almost allowed himself to be persuaded he finally
backed out, amid the jeers of his fellows.
The boys were in high spirits for the airship accomplished all they
expected it would in the way of gaining them more consideration.
The cowboys treated them as more than equals. They could not ask
enough questions about the workings of the airship, and few of
them would believe that it was not like a balloon, and that,
somehow or other, compressed gas caused it to rise.
Jerry tried to illustrate by scaling a piece of tin in the air, the flat
surface corresponding to the surface of the airship’s wings, and its
motion sustaining it, just as the motion of the airship, imparted to it
by the propeller kept the machine up. As soon as the forward motion
ceased down came the tin, just as down came the aeroplane.
But the cowboys were all incredulous in general, though Gimp and
the Parson had some idea of the theories involved.
As for Hinkee Dee, while he was plainly impressed, he did not
become at all friendly. Instead of being sarcastic, he was just plain
mean and insulting.
“Well, we’ll get him yet,” declared Jerry. “He can’t hold off forever.”
“I wonder what makes him this way?” asked Bob. “Is he afraid
we’ll discover the cattle thieves?”
“Looks that way,” replied Ned. “I guess he wants to solve the
mystery himself. But he’d better get busy.”
“He hasn’t done anything that I can see—except talk,” put in Jerry.
“No,” agreed Ned. “It’s queer. But we haven’t done much
ourselves. I say! let’s get busy, now we’ve had our fun in the
airship.”
“All right,” assented Jerry. “We’ll take a trip to-morrow over to the
place where we ran up against a stone wall last time.”
“In the airship?” asked Bob.
“No. Not this time. The ponies will do.”
It was boots and saddles early the next morning, the boys taking
their lunch with them.
“Good luck!” called the foreman after them. “If you don’t find the
rustlers, at least you’ve kept ’em away since you came, except for
that one raid.”
When he went out to the corral a little later and observed a pony
there he exclaimed to Gimp:
“Who’s horse is Jerry riding?”
“His own, ain’t he?”
“There’s his pony now,” said the foreman. “Where’s Go Some?”
“By stirrup!” cried the cowboy. “Jerry’s taken the wrong pony. That
imp Go Some will turn wild after he’s been ridden a few hours—he
always does. And the fellow that’s on his back—well, I wouldn’t give
much for his hide!” and he started off on a run.
CHAPTER XVII
ANOTHER RAID
“Here! where you goin’?” demanded the foreman after the
retreating cowboy.
“To see if I can catch that imp of Satan before he does any
mischief,” was the reply, shot back over Gimp’s shoulder. “I can’t see
how Jerry took the wrong pony.”
“They look a heap alike to a fellow that don’t know much about
hosses,” was the answer. “But if he doesn’t know Go Some’s tricks
he sure will be throwed, and likely trampled on. Think you can get to
him in time?”
“I don’t know. They didn’t say where they was goin’, but I’ll do my
best.”
Gimp threw his saddle over his own mount that was having a
“breather” after dinner, pulled tight the girths and swung himself up
with a peculiar hitch that, as much as had his reputed ability to
dance, had gained him his nickname.
“Try down by Bubblin’ Spring,” directed the foreman. “I think I
heard the professor say he was goin’ that way, and he asked the
boys to stop and flag him if they got the chance. He said he was
after some new kind of frog or other. The spring’s full of ’em.”
“All right,” answered Gimp, as he galloped off.
“Queer, though, how Jerry took the wrong pony,” murmured the
foreman as he went back to his office. “They look a bit alike—his’n
and Go Some, but the last is meaner’n pizen. He’ll trot along with
you for an hour or so and then he’ll get as wild as the wust buckin’
bronco that ever stiffened his legs and humped his back. Never
could account for it—never. Guess I’ll get rid of him—if Jerry comes
out of this all right. If he don’t I’ll shoot the imp.”
“What’s the matter? You got money in the bank?” asked Hinkee
Dee, sauntering out of the bunk house.
“Why?” the foreman queried.
“Talkin’ to yourself like that.”
“Oh! I was just wonderin’ why he took him.”
“Who took him?”
“Jerry—you know—one of the boys. He rode off on Go Some and
left his own pony. Mistake, I reckon, but it’s like to be a bad one for
him. You know Go Some.”
“I should say I did! Don’t care for his acquaintance, either.”
“Well, think of that tenderfoot lad on him. Gimp has rid off trying
to catch him. Maybe if you was to——”
“No thank you! I’ve got something else to do besides going to the
rescue of thick-headed tenderfeet.”
“But Jerry made a mistake I tell you! He took Go Some thinking he
was his own pony. Must have been tethered where he left his mount,
though I don’t see how that could be, as Go Some is never fastened
with the saddle ponies any more.”
Hinkee Dee said nothing as he strode away, but there was no look
of concern on his face as there was on the countenance of the
foreman.
“What’s the matter with your pony, Jerry?” asked Bob as he and
Ned rode beside their tall chum.
“Nothing that I know of. Why?”
“He seems to want to hurry up all the while. Never knew him to
be that way before. He was always at the tail end.”
“He is a bit speedy,” admitted Jerry, as he saw that his mount was
stepping along at a good pace. “I never paid much attention to him
before. Maybe he has some friends over this way. I wonder,” went on
Jerry, speculatively, “if any of the cow rustlers’ ponies could be
grazing around here?” for they were in the vicinity of the place
where they had picked up the trail of the last raiding party.
“It might be,” agreed Ned. “Horses have relations, same as other
animals, I reckon, and if your pony got a whiff of the family he might
be in a hurry to rub noses. But, however that may be, I’d give a
good bit to know where they hide their horses and the cattle. Hold
on there! Don’t be in such a rush!”
Jerry tried to rein in his mount, but it was too late, for, a moment
later, the animal had taken the bit in his teeth and was dashing
across the plain.
“What are you trying to do—start a race?” cried Ned.
“I’ll give you a brush!” added Bob, but he had a glimpse of Jerry’s
face as the lad tore past him, and Jerry’s countenance showed
anything but delight in a coming test of speed.
Meanwhile, Gimp, his anxious eyes scanning the horizon at every
rise he topped, was riding on, muttering to himself.
“That change of horses never was made natural,” he said.
“Somebody who didn’t like Jerry had a hand in it. Now I wonder who
it could be? Well, better not ask too many questions, I reckon. But
I’ll keep my eyes open.”
He trotted on, now and then speaking to his horse as a range
rider will often do. But Gimp saw no trace of the boys of whom he
was in search—at least not for over an hour after he had fared forth.
Then, as he turned away from Bubbling Springs where his search
had been unsuccessful, and headed for the defile where the trail of
the cattle rustlers had been lost, he descried in the distance three
figures, one far in advance of the others.
“That’s them, sure!” exclaimed Gimp. “And Go Some has done his
famous boltin’ stunt. Anyhow, Jerry’s still in the saddle. How long
he’ll stay is another matter. Hop along you rat-tailed runt!” and with
this affectionate epithet directed at his own steed, Gimp shook the
reins and galloped off, making sure Lizzie, his horned toad pet, was
safe in his pocket.
He was within five hundred feet of the leading, onrushing Go
Some when the maddened horse did just what was to be expected
of him. He began to buck, and as Jerry was no expert in the saddle
he shot out at the second landing. And then, with fury, Go Some
turned and rushed at the prostrate, motionless figure.
“GO SOME” TURNED AND RUSHED AT THE
PROSTRATE, MOTIONLESS FIGURE.
With yells of dismay, Ned and Bob tried to spur their already half-
exhausted animals forward to stop the maddened brute, but their
mounts were unable to give the necessary burst of speed.
“Leave him to me!” yelled Gimp, who rode up just then. “I’ll ’tend
to him!”
“Hump yourself now, you rat!” he yelled to his animal.
Like a polo pony, Blaze collided with the infuriated Go Some, the
two horses coming together with a thud that could be heard for a
long distance. Then Ned and Bob saw Gimp’s plan. He fairly knocked
the maddened animal to one side so it could not trample on the
unconscious Jerry.
But the shock was only momentarily successful. Thrown out of his
stride, and away from the object of his attack, Go Some swerved to
one side for an instant. But as he came on again, with no thought of
giving up his plan, Gimp was ready for him.
Drawing his revolver, the cowboy fired directly at the furious
animal. The bullet, as the marksman intended, creased a red line
along the beast’s neck, making a smarting, stinging wound.
“Maybe that’ll cure you!” muttered the cowboy as he saw the mad
horse turn and gallop away across the rolling plain. Then Gimp
reined Blaze in, and slipped out of the saddle. He knelt beside Jerry,
as Bob and Ned jumped from their mounts.
“Is he—is he——” faltered Chunky.
“Not by a long shot!” exclaimed Gimp. “There’s a lot of fight left in
him yet! He struck on his head and he’s insensible, but there don’t
nothin’ seem to be busted,” he added, feeling all over Jerry who lay
with closed eyes.
“How’re we going to get him home?” asked Ned, when his chum
had not aroused after they had wet his face with water and had tried
to force some between his lips.
“Guess one of you’ll have to ride back for the ambulance—I mean
a wagon,” Gimp answered.
“Our auto would be best,” suggested Ned. “I’ll go get it and run it
back here.”
Ned made good time back to the ranch, considering the half-
exhausted state of his pony, and he made better time back with the
automobile. Jerry was just opening his eyes when Ned returned, but
he went off in another spell of faintness as they lifted him up on the
pile of blankets that had been slipped in by the anxious foreman.
As the automobile, carefully and slowly driven by Ned, while Bob
and Gimp rode beside it, came within view of the Square Z buildings
they saw a horseman riding toward them.
“What’s up now; more trouble?” asked Gimp, as he recognized the
Parson, who seemed excited.
“I should say so! Munson’s been shot.”
“Shot! How?”
“In a cattle raid. There’s been another.”
CHAPTER XVIII
TWO INVALIDS
Gimp pulled up his horse sharply and looked narrowly at the
Parson.
“Where was the raid this time?” he asked.
“From the Bear Swamp range,” and he named a part of the Square
Z ranch that lay to the southeast, a low tract that was wet part of
the year.
“Bear Swamp, eh?” mused Gimp. “That’s where some of the good
stock was, too.”
“Yes, the old man had a nice bunch fattening there for a special
order. He’s ravin’ now.”
For the moment Bob and Ned were more interested in how
Munson had been shot than in the news of the cattle being driven
off. The same thought was in both their minds. Was the cattle buyer
shot while protecting the Square Z herd, or while participating in the
theft? This last fitted in with the suspicions in the minds of the two
boys. They wanted to ask a question but did not know just how,
when Gimp saved them the trouble.
“Where was Munson hit?” he asked. “In the back?” he added as a
significant after query.
The Parson laughed.
“It wouldn’t have surprised me if he had been on the run away
from the enemy when he got nipped,” he said, “but I’ll have to be
just and say it was in the leg, and head on at that.”
“What was he doing?” Gimp next demanded.
“He tried to plug some of the rustlers but they got him first, it
seems,” answered the Parson.
“Huh! It seems?” inquired Gimp. “Doesn’t anybody know?”
“Nobody was there but Munson, and we had to take his version of
it,” went on the narrator. “At least nobody but Munson came back to
Square Z after the fracas. The others rode away with the cattle.”
“Oh, then he was the only one who saw ’em. Which way did they
go?” asked Gimp, eagerly.
“Over there—same way as the others,” and the Parson pointed
toward the rocky defile near which all traces of the former bunch of
stolen cattle had been lost.
“Same gang then, I take it,” said Gimp, presently. “Go on. Spin the
yarn as we go along. We’ve got a sick boy here and the sooner the
doctor sees him the better.”
Gimp told the Parson, briefly, how Jerry had been hurt, and added
something about Hinkee Dee which Ned and Bob could not quite
catch. Then, in his turn, the Parson told of the raid.
Munson, it appeared, had ridden off, as he often did, to look at a
bunch of steers or to inspect some part of the ranch. He had come
back, riding a winded horse and with his right leg tied in bloody
bandages. His story was to the effect that as he approached a small
herd of cattle that were temporarily without cowboy watchers from
Square Z, he had seen the steers being rounded up by half a dozen
men, who started to drive them away.
“Munson said he knowed they wasn’t our men,” said the Parson,
“so he hailed ’em. They fired at him quick as a flash, and then he
said he was sure they were the rustlers. He shot back and thinks he
hit one, but they got him in the leg. He knows a little about medicine
it seems, so he tore up his shirt, bandaged the wound and rode
home. I guess most of us would have done the same.”
“Then he saw the rustlers?” asked Gimp, eagerly.
“Sure,” assented the Parson.
“Can’t he give a description so we can find ’em?”
“Well, he didn’t get near enough to see ’em clearly, he says. And
you know one cowboy on a horse looks pretty much like another,”
replied the Parson. “I guess Munson’s description won’t be much
help. But we’re going to get right on their trail, and maybe we’ll be
able to land ’em. They haven’t got such a start as before.”
Poor Jerry was beginning to recover consciousness when they
carried him into the ranch house. He opened his eyes.
“Are you badly hurt, old scout?” asked Bob, anxiously.
“Well,” was the slow and low-voiced answer, “I have felt better,”
and there was a faint smile which showed Jerry’s grit.
There were some modern conveniences at Square Z, a telephone
being one of them, and a message was sent to town for a physician,
who, fortunately, was in his office. He promised to come at once in
his automobile, and was at Square Z in a comparatively short time.
“You’ve got two invalids to look after, Doc,” remarked the foreman,
who had remained behind with the boys when Gimp and the Parson
had ridden off after the other cowboys who had already started the
chase.
“Two? I thought there was only one.”
“Visitor stayin’ here got himself shot-up,” and Mr. Watson briefly
described Munson’s hurt.
As Jerry seemed to be the worse injured, the doctor attended him
first, and after a searching examination announced, to the relief of
Bob and Ned, that their chum was not in a serious condition.
“He’s had a bad shaking up, and he’s as sore as a boil and will be
for some days,” declared the physician. “But nothing is broken, and I
think there will prove to be no internal injuries. He’s badly bruised
and he’ll have to stay in bed for three or four days. Now where’s the
other chap?”
But that was a question that could not be answered; at least off-
hand. For when they went to Munson’s room, whither he had limped
on his arrival at the ranch with the startling news, he was gone.
Some bloody bandages on a chair seemed to indicate that he had
dressed his wound again and gone. But where?
The cook solved the mystery by reporting that, just before the
arrival of the doctor, Munson had been seen riding away in the
direction taken by the pursuing cowboys.
“Well, he’s got grit, that’s what I say!” exclaimed the foreman.
Jerry was made as comfortable as possible, and then they could
only await the return of the cowboys from the chase to see how
Munson fared. And when he came riding in with the others, showing
little traces on his face of any pain or suffering, and heard the edict
that the doctor was to come to him, or he to go to the doctor, he
exclaimed:
“Not much! It isn’t the first time I’ve been shot, and it may not be
the last. I know how to doctor myself and I’m all right. I’ll be a little
lame and stiff for a while and I’ll have to lie around the bunk, but
that’ll be about all. No doctor for me!” and they could not persuade
him otherwise.
Then the talk turned to the results of the pursuit.
“They got clean away!” declared Gimp, in disappointed tones.
“Couldn’t find hide nor hair of ’em.”
“Where was the last trace?” asked the foreman.
“Same place as the others, near Horse Tail Gulch.” This, it
appeared, was the name of the ravine near which the boys had
made some observations. “We traced ’em to there,” explained the
Parson, “and that was all we could do.”
“Well, this sure is queer!” exclaimed Mr. Watson, banging his fist
down on the table. “I never knew cattle raids to be carried on like
this. They must give the beasts wings after they start to drive ’em
away.”
“It does seem so,” agreed Gimp. “What they do with ’em is a
mystery to me.”
“Could they mingle your cattle in with others from another ranch,
so you wouldn’t notice them?” asked Ned.
“Well, Son, they could do that if there was other herds with a
different brand than ours near here,” admitted the foreman. “But
there isn’t. I see your drift. You mean they’ll round up some of your
dad’s steers and when they get to where some other rancher has his
herds they’ll bunch ’em; is that it?”
“Yes,” nodded Ned.
“Well, I don’t hardly believe they’d do that. It would be too hard
work to cut out our cattle, and besides, as soon as the rancher saw
a new brand in with his beef he’d send word here. Our brand is
registered all over.
“Besides,” went on the foreman, “the thieves wouldn’t just cut out
our cattle and drive them on, after they’d let ’em mingle; they’d take
some of the other man’s, too. And we haven’t heard of any other
ranch being robbed the way Square Z has—at least, I haven’t,” he
concluded, looking at the cowboys.
“No, they seem to be picking on just us,” said the Parson.
“I guess my theory isn’t of much account,” admitted Ned. Then, as
the two boys left the group of ranchers, going off by themselves, he
added: “But we’ve got to do something—we’ve got to make good.”
“That’s right!” declared Bob. “We got the folks to consent to let us
try our hand at this rather than hire detectives, and they may call us
off if we don’t show results.”
The doctor came the next day and announced that Jerry was
doing finely, saying he could be up and around in another day.
Munson stuck to his decision not to have the physician look at the
wounded leg, and to this the medical man, with a shrug of his
shoulders, had to agree.
“It’s healing fine,” the cattle buyer said.
Jerry was able to be up the next day, and it was considered that
the two “invalids” were doing well. Ned and Bob wanted to stay
around the ranch to keep Jerry company, but he insisted that they
do what they could to get some clue to the mystery. So they rode off
each morning toward the gulch, but they were not successful in
uncovering anything. Nor were the cowboys, though they could not
devote much time to searching, since there was much work to be
done about the ranch.
Jerry had been questioned as to why he took Go Some in mistake
for his own horse.
“Why, I thought it was my own pony, that’s all,” he said. “The wild
one was tethered where I’d left mine, and I’m not sharp enough
about horses to tell one from another at a glance when they are as
much alike as those two.”
“Well, they are a bit alike,” admitted the foreman. “But someone
changed the places of the ponies, and I’d like to know who did it.”
The puzzle remained unsolved, however—at least for some time.
“Well, I guess I’ll be able to go about enough to-morrow to start
with Bob and Ned on a thorough search,” said Jerry to himself, about
a week after his accident, while he was moving about the house to
get the stiffness out of his muscles. “I’m feeling all right again.”
Munson had not been active, either, his leg developing a stiffness
that kept him to his room. He had been given an apartment to
himself instead of bunking in with the cowboys. Ned, Bob and Jerry,
too, as guests, had rooms to themselves in the same building.
As Jerry, walking in the Indian moccasins which he wore while in
the house, passed Munson’s room he was minded to go in and have
a talk with him. But as he noiselessly approached, something he saw
through the partially opened door caused him to pause.
The cattle buyer was changing his clothes. Jerry had a glimpse of
both his bare legs and on neither one was a trace of a bullet wound!
High Performance Computing on the Intel Xeon Phi 2014th Edition Endong Wang
CHAPTER XIX
ANOTHER ATTEMPT
“Well!” exclaimed Jerry to himself, “wouldn’t that make you
wonder if you were seeing things?”
For a moment he stood, fascinated by the thought of what it all
might mean, and he did not realize that it was not exactly the proper
thing to do. But Munson was without so much as a scar to show
where the bullet had gone in and been cut out, as he had claimed it
had been!
“I wonder if he could have said his arm instead of his leg?” mused
Jerry as he walked softly away, having given over his idea of
speaking to the cattle buyer. “Did I misunderstand them when they
told me about the shooting?”
Jerry tried to reason it out.
No, he was sure “leg” had been mentioned. Besides, he himself
had seen the blood-stained trousers the man had worn.
“And one doesn’t wear trousers on one’s arms. What does it all
mean?” Jerry mused.
He tried to think it out. Clearly, since there was no trace of a bullet
wound there could have been no bullet. And, by the same process of
reasoning, if there was no bullet there could have been no shot fired
at Munson.
“And if there wasn’t a shot there wasn’t the fight he described,
and maybe—yes, there was a cattle theft all right.” Jerry was sure of
that much, anyhow.
“But why should he fake a wound?” Jerry asked himself. “What
object could he have, unless he wanted to make himself out a hero.
I guess that must be it. He wanted to prove that he wasn’t afraid of
a gun. Well, maybe he isn’t. But this is a queer way to prove it. I
give it up!”
A little later as Jerry was sitting out in the sun Munson came
limping toward him.
“He’s keeping up the fake,” thought the tall lad. “And he does it
well. Limps just about enough, and not as much as at first. He
doesn’t forget, either. Must be a good actor.
“How’s the leg?” the boy asked, just to see what would be said.
“Oh, getting on fine!” was the enthusiastic answer. “I’ll be able to
leave the bandages off in a couple of days now,” and he motioned to
a bulge under his trousers where, evidently, he had wound some
cloth, uselessly, as Jerry knew.
“That’s good,” was Jerry’s comment. Then, just to see what the
effect would be, he remarked, as though in surprise:
“Oh, you were shot in the right leg, weren’t you?”
He thought perhaps Munson might surmise that he had been
suspected of faking, and would seem confused. But he was perfectly
cool and replied in casual tones:
“Sure it was the right leg. Did you think it was the left?”
“I had an idea,” Jerry answered.
“Yes, I’ll be in fine shape in a couple more days,” went on Munson,
“and then I can help you boys look for those cattle rustlers. I’d like
to get hold of the man who shot me.”
“You never will,” thought the lad grimly, “for there wasn’t any such
man. You’re a big faker; but what’s your game?”
Jerry cared more for that than for anything else just then. Was
Munson in with the thieves? If so, what would it benefit him to
pretend to be wounded? Jerry’s brain was tired with trying to get a
loose end of the tangle that he could follow.
Ned and Bob, going off by themselves to look for traces of the
thieves, were no more successful than the three chums had been
together. They returned at the end of a long day, tired and
disappointed.
Their zeal was quickened, however, when Jerry told them of the
queer discovery in regard to Munson.
“Whew!” whistled Ned. “There’s something doing here, all right.
He’s one of the cattle thieves as sure as guns! We’ve got to watch
him close.”
“I agree to that last part all right,” said Jerry. “But I’m not so sure
he’s in with the rustlers.”
“I am!” and Bob sided with Ned.
“Well, that’s one end to work on, and another is to see what
happened to your dad’s cattle,” said Jerry. “We’ll have another try at
the gulch, I think.”
“It’s only a waste of time,” declared Ned. “Bob and I have gone
over every inch of the ground there.”
“Well, I’m a bit freshened up by my rest,” insisted Jerry, “and I
want to take another look. But have you fellows formed ideas at all?”
“Half a dozen, and not one any good,” answered Bob. “Once I had
an idea that they took the cattle away in a big automobile from the
point where we lost trace of them.”
“They couldn’t do that without leaving marks of the wheels,” put
in Ned, “and we didn’t see any.”
“Then I got a crazy notion that they floated them down a river on
a raft,” went on Chunky. “Only,” and he grinned, “there isn’t any river
near there.”
“And then he sprang the tunnel theory,” laughed Ned.
“What’s that?” Jerry demanded.
“Oh, I had an idea there might be a secret underground passage
somewhere near the gulch, and the rustlers could slip the cattle
away through that. But we couldn’t find any tunnel.”
“And so we’re about at the end of our guessing,” resumed Ned.
“The only theories left are that the cattle sprout wings and jump
over the mountain range, or else they’re carried up in an elevator,
leaving no trace.”
“Well, we’ll see what we can find,” said Jerry. “What with that, and
keeping an eye on Munson, we’re going to have our hands full.”
“And our eyes, too,” laughed Ned.
“Want to take a spin in the airship?” asked Bob of Jerry.
“Not quite yet,” he replied. “I feel a bit weak still, and I haven’t
gotten back all my nerve. But you two go if you like.”
Bob and Ned did take a little flight just before supper, to the
delight and astonishment of the cowboys, who never wearied of
watching the evolutions of the aircraft, though once it made
considerable work for them, as in flying over a herd of cattle the
animals stampeded, when some of them saw the shadow of the big
wings hovering over them, and the cowboys had all they could do to
quiet the steers.
But, for all that, the plainsmen delighted to watch the boys sail
aloft. Few of them would venture very near the craft, however, for
fear, as one of them said, “she might turn around and chase us.” But
the airship gained for the boys a certain respect and awe that had
been lacking before. Hinkee Dee only remained hostile, but he was
less open in his antagonism now.
A day or two later the three boys were on their way to the baffling
gulch, or defile. Jerry, Bob and Ned rode their ponies easily along
the undulating grassy plains, Jerry having made sure this time that
he had his own horse. The wild one had wandered off the day of the
accident and had not come back to the ranch. Mr. Watson had told
the men not to make a search for him, as he was “too ornery for
anyone to own.”
Professor Snodgrass had been invited to accompany the boys, but
he said he was on the track of some new kind of moth, and its
feeding ground was in the opposite direction from the gulch.
“Well, see what you can find,” suggested Ned to Jerry, as the trio
reached the place where all traces of the stolen cattle had been lost.
“Bob and I have ridden all over the place, and we can’t find a crack
big enough to let a sheep through, let alone a steer.”
“We’ll see,” said Jerry. “Mind, I don’t say there is anything here,
but I just want to satisfy myself.”
They looked carefully in the vicinity of the entrance to the gulch,
or defile. It was at the top of a long low slope that extended along
the western boundary of Square Z ranch.
This ridge was really the last of a line of hills which lay at the foot
of the mountain slope. The ravine was a sort of V-shaped break in
the mountain wall. At one time it might have been a pass through
the mountains, but an upheaval of nature had closed it until now it
was but a wedge-shaped cut, or gash, into the stony side of the
mountain. Stony were the steep walls and also the floor, which was
covered with shale and flat rocks.
“There’ve been cattle along here,” declared Jerry, pausing at the
entrance to the gulch.
“Yes, everybody admits that,” conceded Ned. “And there’ve been
cattle in the gulch, too. You can see traces of ’em. But the mystery
is: how do they get out?”
Jerry looked about without answering.
CHAPTER XX
THE PROFESSOR’S DILEMMA
Ned, Bob and Jerry were perhaps better fitted to attempt to solve
a mystery of this kind than most young men would have been. They
had traveled considerably, and had been in strange situations. More
than once they had had to do with secret passageways and queer
tunnels which they had discovered only after long, tiresome search.
“But I never saw anything quite so plain as this,” confessed Jerry,
as he and his chums rode around the sides of the V-shaped gulch. It
was shaped like a V in two ways. That is, the entrance was of that
character and the sides sloped down from the top; though because
of the width of the floor, as it might be called, of the gulch the
outline of the elevation would better be represented by the letter U.
The opening of the gulch was perhaps half a mile in width, and
the two sides were a mile or more long. They came together,
gradually converging, until they formed the inside of a sharp wedge.
“Now the question,” said Jerry, “is whether or not there is an
opening in this V; and, if so—where?”
“Now you’ve said it!” exclaimed Ned. “Where? Beats any problem
in geometry I ever tackled.”
“Well, come on, let’s be systematic about this,” suggested Jerry.
“There are three of us, and we can divide this gulch into three
parts.”
The tall lad indicated some natural landmarks on the rocky walls
of the ravine. He would take from the entrance on the left to a third
of the way down the side. From there, extending part way up the
other side, and, of course, including the angle of the V, would be
Bob’s portion. The remainder would be inspected by Ned.
“But Bob and I have done it all before,” objected Ned. “We didn’t
find a thing.”
“And maybe we sha’n’t now,” admitted Jerry. “But it won’t be for
lack of trying. Come on now, start.”
“And you can both meet me at the end of the gulch,” suggested
Bob.
“Why meet you there?” Jerry asked.
“So you can eat,” was the ready response. “I’ve got the grub, you
know.”
“Trust you for that,” laughed Ned. “But it’s a good idea all the
same.”
The search began. The boys were sure the cattle had been driven
up to the entrance of the defile. In this they were supported by the
cowboys who agreed to the same thing. But there was a division of
opinion as to whether the steers had been driven into the gulch and
held there for a time.
There were objections to this theory on the ground that in some
cases pursuit had been made so soon after the raid that had the
cattle been held in the gulch they would have been seen.
Of course, they might have been kept there for a little while, and
then concealed, either further up the side of the mountain or among
the low foothills. But searches in these places had failed to give any
clue.
“The cattle come into this gulch,” was Jerry’s decision, “and we’ve
got to find out how they are taken out without being seen.”
The boys searched the rocky sides of the gulch thoroughly. They
even climbed part way up, but all to no purpose. When Jerry and
Ned met with Bob in the angle, and began to eat, they were no
nearer a solution of the mystery than at first.
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!
textbookfull.com

More Related Content

PDF
High Performance Computing on the Intel Xeon Phi 2014th Edition Endong Wang
PDF
Highperformance Computing On The Intel Xeon Phi 2014th Edition Endong Wang
PDF
Parallel universe-issue-29
PDF
End-to-End Big Data AI with Analytics Zoo
PDF
State of the Union eBPF - Linux Kernel Programming
PPTX
mech_Hyper_Threading_ppt[1].pptx Computer engineering
PDF
CC LECTURE NOTES (1).pdf
High Performance Computing on the Intel Xeon Phi 2014th Edition Endong Wang
Highperformance Computing On The Intel Xeon Phi 2014th Edition Endong Wang
Parallel universe-issue-29
End-to-End Big Data AI with Analytics Zoo
State of the Union eBPF - Linux Kernel Programming
mech_Hyper_Threading_ppt[1].pptx Computer engineering
CC LECTURE NOTES (1).pdf

Similar to High Performance Computing on the Intel Xeon Phi 2014th Edition Endong Wang (20)

DOCX
Open cv
PPTX
OpenPOWER and AI workshop at Barcelona Supercomputing Center
PDF
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
PDF
Lenovo ThinkBook 16p Gen 4: Spend wisely and be a pacesetter
PDF
Across the Silicon Spectrum: Xeon Phi to Quark – Unleash the Performance in Y...
PDF
OpenPOWER Workshop at IIT Roorkee
PDF
Isometric Making Essay
PDF
Bring AI to your computing tasks with new Lenovo laptops powered by Intel Cor...
PDF
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
PDF
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
PDF
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
PDF
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
PDF
Foundation of High Performance Computing HPC
PDF
Ben gurion university_data_desert
PPTX
oneAPI: Industry Initiative & Intel Product
PDF
Heterogeneous Computing with Open CL 1st Edition Perhaad Mistry And Dana Scha...
PDF
Implementing AI: Running AI at the Edge
 
PDF
Martin P. Bates - Interfacing PIC Microcontrollers_ Embedded Design by Intera...
DOCX
CIS 512 discussion post responses.CPUs and Programming Pleas.docx
PPT
Clean Slate Design - O Que é?
Open cv
OpenPOWER and AI workshop at Barcelona Supercomputing Center
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
Lenovo ThinkBook 16p Gen 4: Spend wisely and be a pacesetter
Across the Silicon Spectrum: Xeon Phi to Quark – Unleash the Performance in Y...
OpenPOWER Workshop at IIT Roorkee
Isometric Making Essay
Bring AI to your computing tasks with new Lenovo laptops powered by Intel Cor...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Foundation of High Performance Computing HPC
Ben gurion university_data_desert
oneAPI: Industry Initiative & Intel Product
Heterogeneous Computing with Open CL 1st Edition Perhaad Mistry And Dana Scha...
Implementing AI: Running AI at the Edge
 
Martin P. Bates - Interfacing PIC Microcontrollers_ Embedded Design by Intera...
CIS 512 discussion post responses.CPUs and Programming Pleas.docx
Clean Slate Design - O Que é?
Ad

Recently uploaded (20)

PDF
Business Ethics Teaching Materials for college
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Introduction-to-Social-Work-by-Leonora-Serafeca-De-Guzman-Group-2.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
Open Quiz Monsoon Mind Game Final Set.pptx
PDF
Basic Mud Logging Guide for educational purpose
PPTX
Cardiovascular Pharmacology for pharmacy students.pptx
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
Cell Structure & Organelles in detailed.
Business Ethics Teaching Materials for college
Microbial disease of the cardiovascular and lymphatic systems
Renaissance Architecture: A Journey from Faith to Humanism
Introduction-to-Social-Work-by-Leonora-Serafeca-De-Guzman-Group-2.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPH.pptx obstetrics and gynecology in nursing
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Week 4 Term 3 Study Techniques revisited.pptx
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
O7-L3 Supply Chain Operations - ICLT Program
Open Quiz Monsoon Mind Game Final Set.pptx
Basic Mud Logging Guide for educational purpose
Cardiovascular Pharmacology for pharmacy students.pptx
GDM (1) (1).pptx small presentation for students
human mycosis Human fungal infections are called human mycosis..pptx
STATICS OF THE RIGID BODIES Hibbelers.pdf
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
102 student loan defaulters named and shamed – Is someone you know on the list?
Cell Structure & Organelles in detailed.
Ad

High Performance Computing on the Intel Xeon Phi 2014th Edition Endong Wang

  • 1. Explore the full ebook collection and download it now at textbookfull.com High Performance Computing on the Intel Xeon Phi 2014th Edition Endong Wang https://textbookfull.com/product/high-performance-computing- on-the-intel-xeon-phi-2014th-edition-endong-wang/ OR CLICK HERE DOWLOAD EBOOK Browse and Get More Ebook Downloads Instantly at https://textbookfull.com Click here to visit textbookfull.com and download textbook now
  • 2. Your digital treasures (PDF, ePub, MOBI) await Download instantly and pick your perfect format... Read anywhere, anytime, on any device! Intel Xeon Phi Coprocessor Architecture and Tools The Guide for Application Developers 1st Edition Rezaur Rahman (Auth.) https://textbookfull.com/product/intel-xeon-phi-coprocessor- architecture-and-tools-the-guide-for-application-developers-1st- edition-rezaur-rahman-auth/ textbookfull.com Advances in High Performance Computing: Results of the International Conference on “High Performance Computing” Borovets, Bulgaria, 2019 Ivan Dimov https://textbookfull.com/product/advances-in-high-performance- computing-results-of-the-international-conference-on-high-performance- computing-borovets-bulgaria-2019-ivan-dimov/ textbookfull.com Tools for High Performance Computing 2015 Proceedings of the 9th International Workshop on Parallel Tools for High Performance Computing September 2015 Dresden Germany 1st Edition Andreas Knüpfer https://textbookfull.com/product/tools-for-high-performance- computing-2015-proceedings-of-the-9th-international-workshop-on- parallel-tools-for-high-performance-computing-september-2015-dresden- germany-1st-edition-andreas-knupfer/ textbookfull.com High Performance Computing in Science and Engineering ' 18: Transactions of the High Performance Computing Center, Stuttgart (HLRS) 2018 Wolfgang E. Nagel https://textbookfull.com/product/high-performance-computing-in- science-and-engineering-18-transactions-of-the-high-performance- computing-center-stuttgart-hlrs-2018-wolfgang-e-nagel/ textbookfull.com
  • 3. High Performance Computing in Science and Engineering 15 Transactions of the High Performance Computing Center Stuttgart HLRS 2015 1st Edition Wolfgang E. Nagel https://textbookfull.com/product/high-performance-computing-in- science-and-engineering-15-transactions-of-the-high-performance- computing-center-stuttgart-hlrs-2015-1st-edition-wolfgang-e-nagel/ textbookfull.com High Performance Computing in Science and Engineering 16 Transactions of the High Performance Computing Center Stuttgart HLRS 2016 1st Edition Wolfgang E. Nagel https://textbookfull.com/product/high-performance-computing-in- science-and-engineering-16-transactions-of-the-high-performance- computing-center-stuttgart-hlrs-2016-1st-edition-wolfgang-e-nagel/ textbookfull.com High Performance Computing for Geospatial Applications Wenwu Tang https://textbookfull.com/product/high-performance-computing-for- geospatial-applications-wenwu-tang/ textbookfull.com Heterogeneity, High Performance Computing, Self- Organization and the Cloud Theo Lynn https://textbookfull.com/product/heterogeneity-high-performance- computing-self-organization-and-the-cloud-theo-lynn/ textbookfull.com Parallel programming for modern high performance computing systems Czarnul https://textbookfull.com/product/parallel-programming-for-modern-high- performance-computing-systems-czarnul/ textbookfull.com
  • 5. High-Performance Computing on the Intel®Xeon Phi™ EndongWang Qing Zhang · Bo Shen Guangyong Zhang Xiaowei Lu QingWu ·YajuanWang How to Fully Exploit MIC Architectures
  • 6. High-Performance Computing on the Intel ® Xeon PhiTM
  • 7. ThiS is a FM Blank Page
  • 8. Endong Wang • Qing Zhang • Bo Shen Guangyong Zhang • Xiaowei Lu Qing Wu • Yajuan Wang High-Performance Computing on the Intel® Xeon PhiTM How to Fully Exploit MIC Architectures
  • 9. Endong Wang Qing Zhang Bo Shen Guangyong Zhang Xiaowei Lu Qing Wu Yajuan Wang Inspur, Beijing, China ISBN 978-3-319-06485-7 ISBN 978-3-319-06486-4 (eBook) DOI 10.1007/978-3-319-06486-4 Springer Cham Heidelberg New York Dordrecht London Library of Congress Control Number: 2014943522 # Springer International Publishing Switzerland 2014 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com) Copyright # 2012 by China Water & Power Press, Beijing, China Title of the Chinese original: MIC 高性能计算编程指南 ISBN: 978-7-5170-0338-0 All rights reserved Translators Dave Yuen Chuck Li Larry Zheng Sergei Zhang Caroline Qian
  • 10. Foreword by Dr. Rajeeb Hazra Today high-performance computing (HPC), especially the latest massively supercomputers, has developed quickly in computing capacity and capability. These developments are due to several innovations. Firstly, Moore’s law, named after the Intel founder Gordon Moore, predicts that the number of semiconductor transistors will double every 18–24 months. According to Moore’s law, Intel continues to improve performance and shrink the size of transistors as well as to reduce power consumption, all at the same time. Another innovation is a series of CPU-improving microstructures which ensure that the performance of a single thread coincides with the parallelism in each successive CPU generation. The development of HPC plays an important role in society. Although people are inclined to pay more attention to great scientific achievements such as the search for the Higg’s boson or the cosmological model of cosmic expansion, the computing capability that everyone can now acquire is also impressive. A modern two-socket workstation based on the Intel® Xeon series processors could show the same performance as the top supercomputer from 15 years ago. In 1997, the fastest supercomputer in the world was ASCI Red, which was the first computing system to achieve over 1.0 teraflops. It had 9298 Intel Pentium Pro processors, and it cost $55,000,000 per teraflop. In 2011, the cost per teraflop was reduced to less than $1000. This reduction in cost makes high-performance computing accessible to a much larger group of researchers. To sufficiently make use of the ever-improving CPU performance, the applica- tion itself must take advantage of the parallelism of today’s microprocessors. Maximizing the application performance includes much more than simply tuning the code. Current parallel applications make use of many complicated nesting functions, from the message communication among processors to the parameters in threads. With Intel CPUs, in many instances we could achieve gains of more than ten times performance by exploiting the CPU’s parallelism. The new Intel® Xeon Phi™ coprocessor is built on the parallelism programming principle of the Intel® Xeon processor. It integrates many low-power consumption cores, and every core contains a 512-bit SIMD processing unit and many new vector instructions. This new CPU is also optimized for performance per watt. Due to a computing capability of over one billion times per second, the Intel® Xeon Phi™ delivers a supercomputer on a chip. This brand new microstructure delivers ground-breaking performance value per watt, but the delivered performance also v
  • 11. relies on those applications being sufficiently parallelized and expanded to utilize many cores, threads, and vectors. Intel took a new measure to release this parallel- ism ability. Intel followed the common programming languages (including C, C++, and Fortran) and the current criteria. When readers and developers learn how to optimize and make use of these languages, they are not forced to adopt nonstandard or hardware-dependent programming modes. Furthermore, the method, based on the criteria, ensures the most code reuse, and could create the most rewards by compiling the transplantable, standardized language and applying it to present and future compatible parallel code. In 2011, Intel developed a parallel computing lab with Inspur in Beijing. This new lab supplied the prior use and development environment of the Intel® Xeon processor and Intel® Xeon Phi™ coprocessor to Inspur Group and to some excellent application developers. Much programming experience can be found in this book. We hope to help developers to produce more scientific discovery and creation, and help the world to find more clean energy, more accurate weather forecasts, cures for diseases, develop more secure currency systems, or help corporations to market their products effectively. We hope you enjoy and learn from this venerable book. This is the first book ever published on how to use the Intel® Xeon Phi™ coprocessor. Santa Clara, CA Rajeeb Hazra vi Foreword by Dr. Rajeeb Hazra
  • 12. Foreword by Prof. Dr. Rainer Spurzem Textbooks and teaching material for my Chinese students are often written in English, and sometimes we try to find or produce a Chinese translation. In the case of this textbook on high-performance computing with the Intel MIC, we now have a remarkable opposite example—the Chinese copy appeared first, by Chinese authors from Inspur Inc. led on by Wang Endong, and only some time later can all of us English speakers enjoy and benefit from its valuable contents. This occasion happens not by chance—two times in the past several years a Chinese supercom- puter has been the certified fastest supercomputer in the world by the official Top500 list (http://www.top500.org). Both times Chinese computational scientists have found a special, innovative way to get to the top—once with NVIDIA’s GPU accelerators (Tianhe-1A in 2010) and now, currently, with the Tianhe-2, which realizes its computing power through an enormous number of Intel Xeon Phi hardware, which is the topic of this book. China is rapidly ascending on the platform of supercomputing usage and technology at a much faster pace than the rest of the world. The new Intel Xeon Phi hardware, using the Intel MIC architec- ture, has its first massive installation in China, and it has the potential for yet another supercomputing revolution in the near future. The first revolution, in my opinion, has been the transition from traditional mainframe supercomputers to Beowulf PC clusters, and the second was the acceleration and parallelization of computations by general-purpose computing on graphical processing units (GPGPU). Now the stage is open for—possibly—another revolution by the advent of Intel MIC architecture. The past revolutions of accelerators comprised a huge qualitative step toward better price–performance ratio and better use of energy per floating point operation. In some ways they democratized supercomputing by making it possible for small teams or institutes to assemble supercomputers from off-the-shelf components, and later even (GPGPU) provide massively parallel computing in just a single desktop. The impact of Intel Xeon Phi and Intel MIC on the market and on scientific supercomputing has yet to be seen. However, already a few things can be anticipated; and let me add that I write this from the perspective of a current heavy user and provider of GPGPU capacity and capability. GPGPU architecture, while it provides outstanding performance for a fair range of applications, is still not as common as expected a few years ago. Intel MIC, if it fulfills the promise of top-class performance together with compatibility to a couple of standard programming paradigms (such as OpenMP as it works on standard Intel vii
  • 13. CPUs, or MPI as it works on standard parallel computers) may quickly find a much larger user community than GPU. I hope very much that this very fine book can help students, staff, and faculty all over the world in achieving better results when implementing and accelerating their tasks on this interesting new piece of hard- ware, which will for sure appear on desktops, in institutional facilities, as well as in numerous future supercomputers. Beijing, China Rainer Spurzem viii Foreword by Prof. Dr. Rainer Spurzem
  • 14. Foreword by Endong Wang Currently scientists and engineers everywhere are relentlessly seeking more com- puting power. The capability of high-performance computing has become the competition among the few powerful countries in the world. After the “million millions flops” competition ended, the “trillion flops” contests have begun. The technology of semiconductors restricts the frequency of processors, but multi- processors and the many-integrated processors have become more and more impor- tant. When various kinds of many-integrated cores came out, we found that although the high point of computing has increased a lot, the compatibility of the applications became worse, and the development of applications has become more complicated. A lack of useful applications would render the supercomputer useless. At the end of 2012, Intel corporation brought out the Intel® Xeon Phi™ coprocessor based on the many-integrated core. This product integrated more than 50 cores that were based on the x86 architecture into one PCI -Express interface card. It is a powerful supplement to the Intel® Xeon CPU, and brings a new performance experience for a highly parallelized workload. It is easy to program on this product, and there’s almost no difference when compared with traditional programming. The code on the Intel® Xeon Phi™ coprocessor could be applied to a traditional platform based on CPU without any modifications, which protects the user’s software investment. It can supply hundreds of running hardware threads, which could bring high parallelism and meet the current demands of high parallelization. The Inspur-Intel China Parallel Computing Joint Lab was found on August 24, 2011. This lab aims to promote the trillion-flops supercomputing system architecture and application innovation, establish the ecological condition of high-performance computing, and accelerate supercomputing in China into the trillion-flops era. The research and innovation in the Inspur-Intel China Parallel Computing Joint Lab will make a positive impact on the development of supercomputing in China in the next ten years, especially in the beginning of the trillion-flops era for the rest of the world. The Inspur-Intel China Parallel Comput- ing Joint Lab contributed to the completion of the Intel® Xeon Phi™ coprocessor and made a tremendous effort to popularize it. This book was finished by several dedicated members of the Inspur-Intel China Parallel Computing Joint Lab. In this book, relevant knowledge about the Intel® Xeon Phi™ coprocessor, programming methods in using the Intel® Xeon Phi™ ix
  • 15. coprocessor, optimizations for the program, and two successful cases of applying the Intel® Xeon Phi™ coprocessor in practical high-performance computing are introduced. This book has a clear structure and is easy to understand. It contains a programming basis, optimization, and specific development projects. At the same time, a lot of figures, diagrams, and segments of program were included to help readers understand the material. The authors of this book have plenty of project experience and have added their practical summaries of these projects. So this book not only introduces the theory, but it also connects more closely to actual program- ming. This book is also the first to introduce the Intel® Xeon Phi™ coprocessor and embodies the achievement of these authors. We hope to see China accumulate some great experience in the field of HPC. The authors and the members of the Inspur- Intel China Parallel Computing Joint Lab made great efforts to ensure the book publishing coincides with the Intel® Xeon Phi™ coprocessor, and they should be respected for this. We hope the readers will grasp the full use of the Intel® Xeon Phi™ coprocessor quickly after reading this book, and gain achievements in their own fields of HPC application by making use of the Intel® Xeon Phi™ coprocessor. The Inspur Group hopes to dedicate themselves to HPC endeavors together with Intel Corporation. Beijing, China Endong Wang x Foreword by Endong Wang
  • 16. Preface High-performance computing (HPC) is a recently developed technology in the field of computer science, and now computational science. HPC can secure a country’s might, improve its national defense science, and promote the rapid development of highly sophisticated weapons. HPC is one of the most important measures of a country’s overall prowess and economic strength. With the rapid growth of an information-based society, people are demanding more powerful capabilities in information processing. HPC is used not only for oil exploration, weather predic- tion, space technology, national defense, and scientific research, but also in finance, government, education, business, network games, and other fields that demand more computing capability. The drive in research to reach the goal of “trillion flops” computing has begun, and people are looking forward to solving larger scale and more complicated problems by using a trillion-flops supercomputer. In this century, the many-integrated core (MIC) era has finally arrived. Today, the HPC industry is going through a revolution, and parallel computing will be the trend of the future as a prominent hot spot for scientific research. Current main- stream research has adopted the CPU-homogeneous architecture, in which there are dozens of cores in one node; this is not unusual. In large-scale computing, thousands of cores will be needed. Meanwhile, the CPU-homogeneous architecture faces a huge challenge because of its low performance-to-power ratio, performance-to-access memory ratio, and low parallel efficiency. When computing with the CPU+GPU heterogeneous architecture, the MIC acceleration technology of GPU is used. More and more developers have become dedicated to this field, but it also faces challenges such as fined-grained parallel algorithms, programming efficiency, and performance on a large scale. This book focuses on the central issues of how to improve the efficiency of large-scale computing, how to simultaneously shorten programming cycles and increase software productivity, and how to reduce power consumption. Intel Corporation introduced the Intel® Xeon Phi™ series products, which are based on the MIC, to solve highly parallelized problems. The performance of the double-precision of this product has reached teraflop levels. It is based on the current x86 architecture, and supports OpenMP, pThread, MPI, and many parallel programming models. It also supports the traditional C/C++/Intel® Cilk™ Plus, Fortran, and many other programming languages. It is programmed easily, and many associated tools are supported. For applications that are difficult to realize by xi
  • 17. the traditional CPU platform, the MIC platform will greatly improve performance, and the source code can be shared by the CPU and MIC platform without any modifications. The combination of CPU and MIC in the x86 platform in heteroge- neous computing provides HPC users with a new supercomputing solution. Since the Inspur-Intel China Parallel Computing Joint Lab was established on August 24, 2011, the members have dedicated themselves to HPC application programs on the MIC platform, and have ensured that the Intel® Xeon Phi™ series products would be released smoothly. We have accumulated a large amount of experience in exploring the software and hardware of MIC. It is a great honor for us to participate in the technology revolution of HPC and introduce this book to readers as pioneers. We hope more readers will make use of MIC technology and enjoy the benefits brought forth by the Intel® Xeon Phi™ series products. Target Audience The basic aim of this book is to help developers learn how to efficiently use the Intel® Xeon Phi™ series products, by which they can develop, transplant, and optimize their parallel programs. The general content of this book introduces some computing grammar, programming technology, and optimization methods in using MIC, and we also offer some solutions to the problems encountered during actual use based on our optimization experience. We assume that readers already have some basic skills in parallel programming, but have a scant knowledge of MIC. This book does not intend to introduce the theory of parallel computing or algorithms, so we also assume that readers already have this knowledge. In spite of this, when faced with the parallel algorithm, we still describe it in a simple way. We assume that readers are familiar with OpenMP, MPI, and other parallel models, but we also state the basic grammar. We assume that readers can make use of any one of the C/C++/Fortran languages, and that C/C++ is preferred. However, the ideas and advice stated in this book are also adapted to other high-level languages. Moreover, when the Intel® Xeon Phi™ series of products support other languages in the future, most of the optimization methods and application experience will still be effective. Generally speaking, this book is for three types of computing-oriented people: Students and professional scientists and engineers in colleges, universities, and research institutes, and developers engaged in parallel computing, multi-core, and many integrated core technology. IT employees, especially those who develop HPC software, improve application performance by many-integrated cores, and pursue extreme performance in the HPC field. HPC users in other fields, including oil exploration, biological genetics, medical imaging, finance, aerospace, meteorology, and materials chemistry. We hope to help them to improve the original CPU performance by means of MIC and ultimately increase productivity. xii Preface
  • 18. We wish to benefit more readers with this book. In the future we also hope to engage more and more readers around the world. About This Book Because of the diverse characteristics of MIC architecture, this book cannot be sorted strictly into well-defined sections. This book introduces the MIC program- ming language and Intel® Xeon Phi™ series products, and it also describes optimi- zation in parallel programming. Through this book, we hope readers will fully understand MIC, and we expect readers to make good use of MIC technology in future practice. This book includes three parts. The first one covers MIC basics, and includes Chaps. 1–7, in which fundamental knowledge about the MIC technology is introduced. In Chap. 1, the development of parallel computing is recalled briefly. The current hardware characteristics of parallel computing are compared. Then MIC tech- nology is introduced, and the advantages of MIC are stated. In Chap. 2, the hardware and software architecture of MIC are introduced. Although there’s no influence on programming on MIC in the absence of this background knowledge, exploring the MIC architecture deeply will help our programs become more adapted to MIC. In Chap. 3, by computing the circumference ratio pi, the characteristics of MIC programming are directly demonstrated to readers. In addition, we introduce the background procedures of the program. In Chap. 4, the background knowledge of MIC programming is discussed, including the basic grammar of OpenMP and MPI. If you have had this basic training, you can skip this chapter altogether. In Chap. 5, the programming model, grammar, environment variables, and compi- lation options of MIC are introduced. You should be able to grasp the method of writing your own MIC program by this chapter. In Chap. 6, some debugging and optimization tools and their usage are introduced. These tools bring a great deal of convenience to debugging and optimization. In Chap. 7, some Intel mathematical libraries that have been adapted on MIC are discussed, including VML, FFT, and Blas. The second section covers performance optimization, and comprises Chaps. 8 and 9. In Chap. 8, the basic principles and strategy of MIC optimization are introduced, and then the methods and circumstance of MIC optimization are stated. The general methods of MIC optimization are covered. Moreover, most of the methods are applicable to the CPU platform, with a few exceptions. Preface xiii
  • 19. In Chap. 9, through the classical example in parallel computing—the optimization of matrix multiplication—the optimization measures are stated step-by-step in the method of integrating theory with practice. The third and last section covers project development, and includes Chaps. 10 and 11. In Chap. 10, we propose a set of methods to apply parallel computing to project applications by summarizing our experiences on development and optimization of our own projects. We also discuss how to determine if a serial or parallel CPU program is suitable for MIC, and how to transplant the program onto MIC. In Chap. 11, we show, using two actual cases of how the MIC technology influences an actual project. In the early stages, this book was initiated by Endong Wang, the director of the State Key Laboratory of high-efficiency server and storage technology at the Inspur- Intel China Parallel Computing Joint Lab, and the senior vice president of Inspur Group Co., Ltd. Qing Zhang, the lead engineer of the Inspur-Intel China Parallel Computing Joint Lab, formulated the plan, outline, structure, and content of every chapter. Then, in the middle stage, Qing Zhang organized and led the team for this book, checking and approving it regularly. He examined and verified the accuracy of the content, the depth of the technology stated, and the readability of this book, and gave feedback for revisions. This book was actually written by five engineers in the Inspur-Intel China Parallel Computing Joint Lab: Bo Shen, Guangyong Zhang, Xiaowei Lu, Qing Wu, and Yajuan Wang. The first chapter was written by Bo Shen. The second chapter was written by Qing Wu and Bo Shen. The third through fifth chapters were written by Bo Shen, and Yajuan Wang participated. The sixth chapter was written by Qing Wu. The seventh chapter was written by Xiaowei Lu. The eighth chapter was written by Guangyong Zhang, and Bo Shen and Yajuan Wang participated. The ninth chapter was written by Guangyong Zhang. The tenth chapter was written by Bo Shen. The eleventh chapter was written by Xiaowei Lu and Guangyong Zhang. In the later stage, this book was finally approved by Endong Wang, Qing Zhang, Dr. Warren from Intel, and Dr. Victor Lee. The whole source code has been tested by the authors of this book, but because of the initial stage of MIC technology, we cannot ensure that these codes will be applicable in the latest release. Hence, if any updates come out for the compiler and the execution environment of MIC, please consult the corresponding version manual by Intel. Acknowledgments The publication of this book is the result of group cooperation. We would like to show our respect to the people who gave their full support to the composition and publication. We must express our heartfelt thanks to Inspur Group and Intel Corporation, who gave us such a good platform and offered the working opportunity in the xiv Preface
  • 20. Inspur-Intel China Parallel Computing Joint Lab. We are fortunate to be able to do research on MIC technology. We are grateful for the support of the leadership of Inspur Group, especially to the director of the HPC Center, Inspur Group, Jun Liu, who supplied us with financial support and solicitude. We are grateful to Michael Casscles, Dr. Wanqing He, Hongchang Guo, Dr. David Scott, Xiaoping Duan, and Dr. Victor Lee for their support of the technology and resources for our daily work in the parallel computing joint lab. We especially can’t forget Wanqing! He supplied us with plenty of guidance from experience before writing this book. We are also grateful to Dr. Raj Hazra, GM Technical Computing in Intel Corporation, and Joe Curley, MD Technical Com- puting in Intel Corporation, for their support of the Inspur-Intel China Parallel Computing Joint Lab. We are grateful to our application users: BGP Inc., China National Petroleum Corp, Institute of Biophysics, Chinese Academy of Sciences, Northwestern Polytechnical University, Chinese Academy of Meteorological Sciences, and Shandong University—especially Prof. Fei Sun and Dr. Kai Zhang from the Institute of Biophysics Chinese Academy of Sciences—and Profs. Chengwen Zhong and Qinjian Li from Northwestern Polytechnical University. The cases in this book come from them. We are grateful to Inspur Group and Intel Corporation for their support, espe- cially the managers Yongchang Jiang and Ying Zhang from the High-Efficiency Server Department, Inspur Group, who were able to save us a great deal of time. We thank very much Dr. Haibo Xie and Xiaozhe Yang; we are unable to forget this pleasant time. We are grateful to the families of the authors for their consideration and patience. We thank the editors from China WaterPower Press, especially to the editor Chunyuan Zhou and editor Yan Li for their tolerance of our demands. This book could not possibly be published without their hard work. We are very grateful for the English translation made by Professor David A. Yuen and his team from the University of Minnesota, Twin Cities, China University of Geosciences, Wuhan, consisting of Qiang (Chuck) Li, Liang (Larry Beng) Zheng, Siqi (Sergei) Zhang, and Caroline Qian. Jed Brown and Karli Rupp from Argonne National Laboratory also gave very useful advice, and finally, Prof. Xiaowen Chu and Dr. Kayiyong Zhao from Hong Kong Baptist University are to be thanked for their help in proofreading of the last few chapters. Lastly, we are grateful to all the others whom we have not acknowledged. MIC technology has just come out, so there are undoubtedly some mistakes to be found in this book. We apologize for this and look forward to any suggestions from our readers. This is the first book ever written in any language on MIC technology; it was published in the fall of 2012, and is to be contrasted with the newer books coming out from the USA in 2013 bearing the names of Intel Xeon Phi coprocessor. Beijing, China Qing Zhang Preface xv
  • 21. ThiS is a FM Blank Page
  • 22. Contents Part I Fundamental Concepts of MIC 1 High-Performance Computing with MIC . . . . . . . . . . . . . . . . . . . . 3 1.1 A History of the Development of Multi-core and Many-Core Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 An Introduction to MIC Technology . . . . . . . . . . . . . . . . . . . . 7 1.3 Why Does One Choose MIC? . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3.1 SMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.2 Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.3 GPGPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2 MIC Hardware and Software Architecture . . . . . . . . . . . . . . . . . . 13 2.1 MIC Hardware Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.1.2 Overview of MIC Hardware Architecture . . . . . . . . . . 14 2.1.3 The MIC Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.1.4 Ring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.1.5 Clock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.1.6 Page Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.1.7 System Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.1.8 Performance Monitoring Unit and Event Manager . . . . 38 2.1.9 Power Management . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.2 Software Architecture of MIC . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.2.2 Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.2.3 Linux Loader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.2.4 μOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.2.5 Symmetric Communication Interface . . . . . . . . . . . . . 45 2.2.6 Host Driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.2.7 Sysfs Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.2.8 MIC Software Stack of MPI Applications . . . . . . . . . . 49 2.2.9 Application Programming Interfaces . . . . . . . . . . . . . . 56 xvii
  • 23. 3 The First MIC Example: Computing Π . . . . . . . . . . . . . . . . . . . . . 57 4 Fundamentals of OpenMP and MPI Programming . . . . . . . . . . . . 61 4.1 OpenMP Foundation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.1.1 A Brief Introduction to OpenMP . . . . . . . . . . . . . . . . . 62 4.1.2 OpenMP Programming Module . . . . . . . . . . . . . . . . . 62 4.1.3 Brief Introduction to OpenMP Grammar . . . . . . . . . . . 62 4.2 Message-Passing Interface Basics . . . . . . . . . . . . . . . . . . . . . . 67 4.2.1 Start and End MPI Library . . . . . . . . . . . . . . . . . . . . . 69 4.2.2 Getting Information About the Environment . . . . . . . . 70 4.2.3 Send and Receive Messages . . . . . . . . . . . . . . . . . . . . 70 5 Programming the MIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.1 MIC Programming Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.2 Application Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.2.1 CPU in Native Mode . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.2.2 CPU Primary, MIC Secondary Mode . . . . . . . . . . . . . 76 5.2.3 CPU and MIC “Peer-to-Peer” Mode . . . . . . . . . . . . . . 77 5.2.4 MIC Primary, CPU Secondary Mode . . . . . . . . . . . . . 77 5.2.5 MIC-Native Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.3 Basic Syntax of MIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.3.1 Offload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.3.2 Declarations of Variables and Functions . . . . . . . . . . . 100 5.3.3 Header File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.3.4 Environment Variables . . . . . . . . . . . . . . . . . . . . . . . . 102 5.3.5 Compiling Options . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.3.6 Other Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.4 MPI on MIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.4.1 MPI on MIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.4.2 MPI Programming on MIC . . . . . . . . . . . . . . . . . . . . . 106 5.4.3 MPI Environment Setting on MIC . . . . . . . . . . . . . . . 108 5.4.4 Compile and Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.4.5 MPI Examples on MIC . . . . . . . . . . . . . . . . . . . . . . . 111 5.5 SCIF Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.5.1 What Is SCIF? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.5.2 Basic Concepts of SCIF . . . . . . . . . . . . . . . . . . . . . . . 114 5.5.3 Communication Principles of SCIF . . . . . . . . . . . . . . . 116 5.5.4 SCIF’s API Functions . . . . . . . . . . . . . . . . . . . . . . . . 118 6 Debugging and Profiling Tools for the MIC . . . . . . . . . . . . . . . . . . 123 6.1 Intel’s MIC-Supported Tool Chains . . . . . . . . . . . . . . . . . . . . . 123 6.2 MIC Debugging Tool IDB . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.2.1 Overview of IDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.2.2 IDB Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.2.3 IDB Support and Requirements for MIC . . . . . . . . . . . 125 6.2.4 Debugging MIC Programs Using IDB . . . . . . . . . . . . . 125 6.3 MIC Profiling Tool VTune . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 xviii Contents
  • 24. 7 Intel Math Kernel Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 7.1 Introduction to the Intel Math Kernel Library . . . . . . . . . . . . . . 167 7.2 Using Intel MKL on MIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 7.2.1 Compiler-Aided Offload . . . . . . . . . . . . . . . . . . . . . . . 169 7.2.2 Automatic Offload Mode . . . . . . . . . . . . . . . . . . . . . . 171 7.3 Using FFT on the MIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 7.3.1 Introduction to FFT . . . . . . . . . . . . . . . . . . . . . . . . . . 175 7.3.2 A Method to Use FFT on the MIC . . . . . . . . . . . . . . . 175 7.3.3 Another Method to Use FFT on the MIC . . . . . . . . . . . 178 7.4 Use BLAS on the MIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 7.4.1 A Brief Introduction to BLAS . . . . . . . . . . . . . . . . . . . 184 7.4.2 How to Call BLAS on the MIC . . . . . . . . . . . . . . . . . . 185 Part II Performance Optimization 8 Performance Optimization on MIC . . . . . . . . . . . . . . . . . . . . . . . . 191 8.1 MIC Performance Optimization Strategy . . . . . . . . . . . . . . . . . 191 8.2 MIC Optimization Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 8.2.1 Parallelism Optimization . . . . . . . . . . . . . . . . . . . . . . 193 8.2.2 Memory Management Optimization . . . . . . . . . . . . . . 196 8.2.3 Data Transfer Optimization . . . . . . . . . . . . . . . . . . . . 199 8.2.4 Memory Access Optimization . . . . . . . . . . . . . . . . . . . 212 8.2.5 Vectorization Optimization . . . . . . . . . . . . . . . . . . . . . 216 8.2.6 Load Balance Optimization . . . . . . . . . . . . . . . . . . . . 225 8.2.7 Extensibility of MIC Threads Optimization . . . . . . . . . 228 9 MIC Optimization Example: Matrix Multiplication . . . . . . . . . . . . 231 9.1 Series Algorithm of Matrix Multiplication . . . . . . . . . . . . . . . . 231 9.2 Multi-thread Matrix Multiplication Based on OpenMP . . . . . . . 233 9.3 Multi-thread Matrix Multiplication Based on MIC . . . . . . . . . . 234 9.3.1 Basic Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 9.3.2 Vectorization Optimization . . . . . . . . . . . . . . . . . . . . . 235 9.3.3 SIMD Instruction Optimization . . . . . . . . . . . . . . . . . . 236 9.3.4 Block Matrix Multiplication . . . . . . . . . . . . . . . . . . . . 237 Part III Project Development 10 Developing HPC Applications Based on the MIC . . . . . . . . . . . . . . 259 10.1 Hotspot Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 10.1.1 Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 10.1.2 Hotspot Locating and Testing . . . . . . . . . . . . . . . . . . . 261 10.2 Program Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 10.2.1 Analysis of Program Port Modes . . . . . . . . . . . . . . . . . 264 10.2.2 Analysis of Size of the Computation . . . . . . . . . . . . . . 264 10.2.3 Characteristic Analysis . . . . . . . . . . . . . . . . . . . . . . . . 265 10.2.4 Parallel Analysis of Hotspots . . . . . . . . . . . . . . . . . . . 267 Contents xix
  • 25. 10.2.5 Vectorization Analysis . . . . . . . . . . . . . . . . . . . . . . . . 270 10.2.6 MIC Memory Analysis . . . . . . . . . . . . . . . . . . . . . . . . 270 10.2.7 Program Analysis Summary . . . . . . . . . . . . . . . . . . . . 271 10.3 MIC Program Development . . . . . . . . . . . . . . . . . . . . . . . . . . 271 10.3.1 OpenMP Parallelism Based on the CPU . . . . . . . . . . . 272 10.3.2 Thread Extension Based on MIC . . . . . . . . . . . . . . . . 273 10.3.3 Coordination Parallelism Based on Single-Node CPU+MIC Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 10.3.4 MIC Cluster Parallelism . . . . . . . . . . . . . . . . . . . . . . . 274 11 HPC Applications Based on MIC . . . . . . . . . . . . . . . . . . . . . . . . . . 277 11.1 Parallel Algorithms of Electron Tomography Three-Dimensional Reconstruction Based on Single-Node CPU+MIC Mode . . . . . . 278 11.1.1 Electron Tomography Three-Dimensional Reconstruction Technology and Introduction of SIRT Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 11.1.2 Analysis of the Sequential SIRT Program . . . . . . . . . . 281 11.1.3 Development of a Parallel SIRT Program Based on OpenMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 11.1.4 Development of Parallel SIRT Programs Based on the MIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 11.1.5 Design of the Heterogeneous and Hybrid Architecture of CPU+MIC Mode Based on Single Nodes and Multiple Cards . . . . . . . . . . . . . . . . . . . . . . . . . . 291 11.2 Parallel Algorithms of Large Eddy Simulation Based on the Multi-node CPU+MIC Mode . . . . . . . . . . . . . . . . . . . . . 296 11.2.1 Large Eddy Simulation Based on the Lattice Boltzmann Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296 11.2.2 Analysis of Large Eddy Simulation Sequential (Serial) Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 11.2.3 Parallel Algorithm of Large Eddy Simulation Based on OpenMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 11.2.4 Parallel Algorithm of Large Eddy Simulation Based on MIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 11.2.5 Parallel Algorithm of Large Eddy Simulation Based on Multi-nodes and CPU+MIC Hybrid Platform . . . . . . . . 309 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 Appendix: Installation and Environment Configuration of MIC . . . . . . 325 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 xx Contents
  • 26. Introduction to the Authors Endong Wang is both a Director and Professor of the Inspur-Intel China Parallel Computing Joint Lab in Beijing, China. He has received a special award from the China State Council, and is also a member of a national advanced computing tech- nology group of 863 experts, the director of the State Key Laboratory for high-efficiency server and storage technology, Senior Vice President of the Inspur group, the chairman of the Chinese Committee of the International Federation for Information Processing (IFIP), and Vice Presi- dent of the China Computer Industry Association. He is the winner of the National Scientific and Technology Progress Award as the first inventor in three projects, the winner of the Ho Leung Ho Lee Science and Technology innovation award in 2009, and has garnered 26 national invention patents. Qing Zhang has a master’s degree in computer science from Huazhong Technical University in Wuhan and is now a chief engineer of the Inspur- Intel China Parallel Computing Joint Lab. He is manager of HPC application technology in Inspur Group—which engages in HPC, parallel computing, CPU multi-core, GPU, and MIC technology—and is in charge of many heteroge- neous parallel computing projects in life sciences, petroleum, meteorology, and finance. xxi
  • 27. Bo Shen is a senior engineer of the Inspur-Intel China Parallel Computing Joint Lab, and is engaged in high-performance algorithms, research, and application of software develop- ment and optimization. He has many years of experience concerning the development and opti- mization in life sciences, petroleum, and meteorology. Guangyong Zhang has a master’s degree from Inner Mongolia University, majoring in com- puter architecture, and is now an R&D engineer of the Inspur-Intel China Parallel Computing Joint Lab, engaged in the development and opti- mization of GPU/MIC HPC application software. He has published many papers in key conference proceedings and journals. Xiaowei Lu received a master’s degree from Dalian University of Technology, where he stud- ied computer application technology, and is now a senior engineer of the Inspur-Intel China Paral- lel Computing Joint Lab, engaged in the algo- rithm transplantation and optimization in many fields. He is experienced in high-performance heterogeneous coordinate computing development. xxii Introduction to the Authors
  • 28. Qing Wu has a master’s degree from Jilin Univer- sity in Changchun and is now a senior engineer of the Inspur-Intel China Parallel Computing Joint Lab, engaged in high-performance parallel com- puting algorithm and hardware architecture as well as software development and optimization. He led many transplantation and optimization projects concerning the heterogeneous coordinate computing platform in petroleum. Yajuan Wang has a master’s degree from the Catholic University of Louvain, majoring in arti- ficial intelligence. She is now a senior engineer of the Inspur-Intel China Parallel Computing Joint Lab, and is heavily involved in artificial intelli- gence and password cracking. Introduction to the Authors xxiii
  • 29. Part I Fundamental Concepts of MIC The fundamental concepts of MIC architecture will be introduced in this section, including the development history of HPC, the software and hardware architecture of MIC, the installation and configuration of the MIC system, and the grammar of MIC. After finishing this section, readers will have learned the background of MIC architecture and how to write HPC programs on the MIC.
  • 30. High-Performance Computing with MIC 1 In this chapter, we will first review the history in the development of multi- and many-core computers. Then we will give a brief introduction to Intel MIC technol- ogy. Finally, we will compare MIC with other HPC technologies, as a background reference of MIC technology for the reader. Chapter Objectives. From this chapter, you will learn about: • Developmental history of parallel computing, multi-core and many-core. • Brief review of MIC technology • Feature of MIC as compared to other multi-core and many-core technologies. 1.1 A History of the Development of Multi-core and Many-Core Technology The computer of yore had only one core, and it could only run a single program at a time. Then batch processing was developed about 60 years ago, which allows for multiple programs to launch at the same time. But they could only be launched simultaneously; when they run on CPU, they still process sequentially. However, when computer hardware was developed further, one program might not use up all the computational power, thus wasting valuable resources. So the definition of process was born (and then based on processes, threads were later developed). The process switch was also developed, not only making full use of computing power, but also defining the general meaning of “parallel”: running different tasks at the same time. However, this “same time” is only based on a macroscopic meaning: only one task can run on a single time segment. From 1978, after the Intel 8086 processor was released, personal computers became cheaper and more popular. Then, Intel launched the 8087 coprocessor, which was a milestone event (it has great meaning for programmers: the IEEE 754 float standard was born because of the 8087 coprocessor). The coprocessor only assists the main processor, and it has to work together with the central processor. E. Wang et al., High-Performance Computing on the Intel® Xeon Phi™, DOI 10.1007/978-3-319-06486-4_1, # Springer International Publishing Switzerland 2014 3
  • 31. The purpose of the 8087 coprocessor was that at that time, the central processor was designed to work with integers and was weak on float support, but they could not put more transistors into the chip. Thus, the 8087 coprocessor was built to assist with float computation. The significance of this is that the coprocessor was born; the computing process was not patented for CPU, but it now had a first helper. And although after further development of manufacturing technology (i.e., 486DX), a coprocessor was built into the CPU, the idea of a coprocessor never died. Unfortunately, the growth of computational power never matches our requirements. After the idea of processes was proposed, the computational power came up short again. After Intel announced the 8086 in 1978, CPUs from Intel, AMD, and other companies increased performance continuously, almost following Moore’s law: every 18–24 months, the number of transistors doubled, and it allowed for a rapid increase of speed and computational power. When the manufacturing technology improves, the capability of the processing unit also increases. Technologies such as super-scalar, super-pipeline, Very Long Instruction World (VLIW), SIMD, hyper-threading, and branch prediction were applied to the CPU simultaneously. Those technologies bring instruction-level parallelism (ILP), which is the lowest level of parallelism. By CPU hardware support, they allow parallelism of binary instructions even while running on a single CPU. But this kind of parallelism is commonly controlled by hardware. Programmers can only passively take advantage of technology development instead of controlling the whole process. However, programmers can always adjust their code or use some special assembler instructions to control CPU action indirectly, even though the final implementation is still controlled by hardware. The developmental speed of CPUs has slowed down in recent years. The primary ways to improve single-core CPU performance are now to increase the working frequency and improve instruction-level parallelism. Both of these methods face problems: while manufacturing technology has improved, the size of the transistor is close to the atomic level, which makes power leaking a serious concern. The power consumption and heat generation per unit size has become larger and larger, making it difficult to improve frequency as quickly as before; 4 GHz is the limit of most companies. On the other hand, there is not much instruction-level parallelism in general-purpose computing. While there is a great effort in changing the design, the performance increase is not proportional to the increase in the number of transistors. While the single CPU can no longer improve very much, using multiple CPUs at the same time has become the natural next idea for scientists. Using multiple CPUs on a single motherboard is a cost-efficient solution. However, this solution is bounded by cost, so it is only popular on servers, which are not so sensitive to cost and power consumption. The idea of using multiple CPUs is still widely used in the area of high-performance computing. As early as 1966, Michael Flynn classified computer architecture by instructions and data flow: single instruction stream and single data stream (SISD), single instruction stream and multiple data stream (SIMD), multiple instruction stream and single data stream (MISD), and multiple instruction stream and multiple data stream (MIMD), named as the Flynn classification. Within those classifications, 4 1 High-Performance Computing with MIC
  • 32. MISD is very rare, and SISD is referred to as the primal batch machine model. SIMD uses a single instruction controller, dealing with different data streams using the same instructions. SIMD generally describes hardware; it is used in software parallelism, in which the single instruction controller is referred to one instruction instead of the specific hardware. MIMD refers to most of the parallel computers that use different instruction streams to process different data. In commercial parallel machines, MIMD is the most popular and SIMD is the second. Based on the Flynn’s idea of classification for top-level computers, hardware companies are now building supercomputers without taking cost into account. “Supercomputers”, referring to those computers with performance in the leading position (e.g., Top500), commonly have thousands of processors and a specially designed memory and I/O system. Their architecture is quite different from per- sonal computers, unlike some personal computers, which are connected to each other. However, there is still a very strong bond between personal computers and supercomputers. Just like high-level military and space technology, which can be used later in normal life (e.g., the Internet), many new technologies used in supercomputers can also be applied to the development of personal computers on desktops. For example, some CPUs of supercomputers can be used directly on personal computers, and some technologies, such as the vectorized unit or the processor package, are already widely used on personal computers. However, the cost of a supercomputer is too high, so most common research facilities cannot afford it. While network technologies continue to advance, the collaboration of multiple nodes becomes practical. Because all the nodes make a fully functional computer, jobs can be sent to different nodes to achieve parallelism among them, thus using computational resources efficiently. Collaboration can be done through a network, a derivative two-architectures computer cluster, or distributed computing. A computer cluster is a group of computers connected by a network to form a very tightly collaborating system. Distributed computing is the base of the popular “cloud computing”, which cuts a huge computation job and data into many small pieces, distributes them to many computers with a loose connection, and collects the results. Although the performance is no match for the supercomputer, the cost is much lower. While other hardware companies are extending to different computer architectures, the central processor companies are continuously increasing the frequency of the CPU and changing the CPU architecture. However, limited by manufacturing technologies, materials, and power consumption, after a period of fast development, the CPU frequency has reached a bottleneck, and the progress for increased processor frequency has slowed down. With current methods reaching a dead end, processor companies like Intel and AMD are seeking other ways to increase performance while maintaining or increasing the energy efficiency of processors. 1.1 A History of the Development of Multi-core and Many-Core Technology 5
  • 33. The commercial idea of CPU companies has not changed: if one CPU is not enough, two will be used. Then, with better manufacturing technology, they could put more cores on a single chip; thus, multi-core CPUs were born. In 2005, Intel and AMD formally released dual-core CPUs into the market; in 2009, quad-core and octo-core CPUs were announced by Intel. Multi-core CPUs became so popular that nowadays even normal personal computers have multi-core CPUs, with perfor- mance matching the previous multi-CPU server node. Even disregarding the increase of single core performance, multi-core CPUs build multiple cores together, and simply putting them together results in better connectivity between cores than the multi-CPU architecture connected by a mainboard bus. There are also many improvements nowadays for multi-core CPUs, such as the shared L3 cache, so that collaboration between cores is much better. Along with the transition from the single-core CPU to the multi-core, programmers also began to change ideas, focusing more on multi-threading and parallel programming. This idea was developed in the 1970s and 1980s, with supercomputers and clusters already being built in the high-end of the field at that time. Due to the cost, most programmers could only get their hands on single-core computers. So when multi-core CPUs became popular, programmers started to use tools like MPI and OpenMP, which had been dormant for quite a long time, and could enjoy the full use of computational power. While the requirements of computational power continue to grow relentlessly, CPU performance is not the only problem. Power consumption is another, so people remembered a good “helper” of the CPU: the coprocessor. In 2007, the populariza- tion of general-purpose computing on graphical processing units (GPGPU) also meant the return of coprocessors. Although the job of GPGPU concerns display and image processing, its powerful float processing capabilities make it a natural coprocessor. And as the developer of the coprocessor, Intel has never forgotten 8087, so in 2012 it introduced the MIC as a new generation of coprocessors, which will make a great contribution to high-performance computation. There are two key words concerning the development of the computer: need and transmigration. Dire need is always the original power for pushing the development of science and technology; as mentioned above, people needed a faster way to calculate, resulting in the creation of the computer. Then they needed to fully use the resources of the computer, so multi-processing and multi-threading came; because of the increase in calculation capacity, more computation was required, so the processing capacity of one core increased. And as the requirement to the computational power is endless, but improving the processing capacity of one core is limited, different multi-processors came, such as double CPU nodes, clusters, and multi-core CPUs. Transmigration, in general terms, is to recycle. In programmers’ jargon it is iteration, the path of computer development, and the way to future development. For example, the primal CPU is in charge of display, but the image processing becomes more and more complex, so the CPU alone cannot handle it. Then the GPU was born, and as manufacturing technology continued to advance, the development of the CPU and GPU became more mature. The CPU and GPU reunited again, such as the i-class CPUs of Intel, the APU of AMD, and NVIDIA is 6 1 High-Performance Computing with MIC
  • 34. Exploring the Variety of Random Documents with Different Content
  • 35. “Huh! A bunch of stuck-up tenderfeet—that’s all they are! They maybe learned that trick in a circus and pulled it off on us to make us feel how little we know.” “You couldn’t do it,” said the Parson, grimly. “Well, I wouldn’t want to. A cow pony is good enough for me, or I can walk when I have to.” And with that Hinkee Dee stalked away. But the others did not conceal their admiration and amazement at the feat of the boys. They crowded about, asked all sorts of questions, and some of the cowboys patted the parts of the craft as though soothing a restive horse of a new species. “Well, I see you arrived,” remarked Mr. Munson, who came up when the curiosity of the cowboys was about satisfied. “Did you know they were up to this?” demanded the foreman. “Well, I did see ’em tinkering with some contraption over in the woods,” admitted the cattle buyer as he called himself. “But I thought I’d let ’em surprise you.” Professor Snodgrass, who had come back, his specimen boxes filled, saw the gleaming wings of the airship and called: “Oh, boys, are you going to make another flight? I want to go up, for I have an idea there is a new species of high-flying butterfly in this region and I’d like to get a specimen.” “We’ll take you up after we’ve had something to eat,” said Bob. “Fine!” cried the professor. “I’ll get my long-handled net ready. Some of those butterflies are very shy in the upper air currents.” “Do you mean to say you’re going up in that?” asked the Parson. “Why not?” counter queried Professor Snodgrass. “I’ve done it before.” There was a murmur of surprise, and it was easy to see that the professor had advanced greatly in the estimation of the cowboys.
  • 36. The putting together of the airship, and its use by the boys made quite a diversion at Square Z ranch, where novelties were rare. The cowboys lost so much time from their routine work looking up at the clouds for a sight of the craft that Dick Watson finally requested the boys to make their flights at times when the employees were at liberty, or else keep from circulating over the cattle ranges. Professor Snodgrass went up not once but several times, and made choice captures of upper air insects. Jerry and his chums tried to induce some of the cowboys to take a flight with them. But though Gimp almost allowed himself to be persuaded he finally backed out, amid the jeers of his fellows. The boys were in high spirits for the airship accomplished all they expected it would in the way of gaining them more consideration. The cowboys treated them as more than equals. They could not ask enough questions about the workings of the airship, and few of them would believe that it was not like a balloon, and that, somehow or other, compressed gas caused it to rise. Jerry tried to illustrate by scaling a piece of tin in the air, the flat surface corresponding to the surface of the airship’s wings, and its motion sustaining it, just as the motion of the airship, imparted to it by the propeller kept the machine up. As soon as the forward motion ceased down came the tin, just as down came the aeroplane. But the cowboys were all incredulous in general, though Gimp and the Parson had some idea of the theories involved. As for Hinkee Dee, while he was plainly impressed, he did not become at all friendly. Instead of being sarcastic, he was just plain mean and insulting. “Well, we’ll get him yet,” declared Jerry. “He can’t hold off forever.” “I wonder what makes him this way?” asked Bob. “Is he afraid we’ll discover the cattle thieves?” “Looks that way,” replied Ned. “I guess he wants to solve the mystery himself. But he’d better get busy.”
  • 37. “He hasn’t done anything that I can see—except talk,” put in Jerry. “No,” agreed Ned. “It’s queer. But we haven’t done much ourselves. I say! let’s get busy, now we’ve had our fun in the airship.” “All right,” assented Jerry. “We’ll take a trip to-morrow over to the place where we ran up against a stone wall last time.” “In the airship?” asked Bob. “No. Not this time. The ponies will do.” It was boots and saddles early the next morning, the boys taking their lunch with them. “Good luck!” called the foreman after them. “If you don’t find the rustlers, at least you’ve kept ’em away since you came, except for that one raid.” When he went out to the corral a little later and observed a pony there he exclaimed to Gimp: “Who’s horse is Jerry riding?” “His own, ain’t he?” “There’s his pony now,” said the foreman. “Where’s Go Some?” “By stirrup!” cried the cowboy. “Jerry’s taken the wrong pony. That imp Go Some will turn wild after he’s been ridden a few hours—he always does. And the fellow that’s on his back—well, I wouldn’t give much for his hide!” and he started off on a run.
  • 38. CHAPTER XVII ANOTHER RAID “Here! where you goin’?” demanded the foreman after the retreating cowboy. “To see if I can catch that imp of Satan before he does any mischief,” was the reply, shot back over Gimp’s shoulder. “I can’t see how Jerry took the wrong pony.” “They look a heap alike to a fellow that don’t know much about hosses,” was the answer. “But if he doesn’t know Go Some’s tricks he sure will be throwed, and likely trampled on. Think you can get to him in time?” “I don’t know. They didn’t say where they was goin’, but I’ll do my best.” Gimp threw his saddle over his own mount that was having a “breather” after dinner, pulled tight the girths and swung himself up with a peculiar hitch that, as much as had his reputed ability to dance, had gained him his nickname. “Try down by Bubblin’ Spring,” directed the foreman. “I think I heard the professor say he was goin’ that way, and he asked the boys to stop and flag him if they got the chance. He said he was after some new kind of frog or other. The spring’s full of ’em.” “All right,” answered Gimp, as he galloped off. “Queer, though, how Jerry took the wrong pony,” murmured the foreman as he went back to his office. “They look a bit alike—his’n and Go Some, but the last is meaner’n pizen. He’ll trot along with you for an hour or so and then he’ll get as wild as the wust buckin’ bronco that ever stiffened his legs and humped his back. Never
  • 39. could account for it—never. Guess I’ll get rid of him—if Jerry comes out of this all right. If he don’t I’ll shoot the imp.” “What’s the matter? You got money in the bank?” asked Hinkee Dee, sauntering out of the bunk house. “Why?” the foreman queried. “Talkin’ to yourself like that.” “Oh! I was just wonderin’ why he took him.” “Who took him?” “Jerry—you know—one of the boys. He rode off on Go Some and left his own pony. Mistake, I reckon, but it’s like to be a bad one for him. You know Go Some.” “I should say I did! Don’t care for his acquaintance, either.” “Well, think of that tenderfoot lad on him. Gimp has rid off trying to catch him. Maybe if you was to——” “No thank you! I’ve got something else to do besides going to the rescue of thick-headed tenderfeet.” “But Jerry made a mistake I tell you! He took Go Some thinking he was his own pony. Must have been tethered where he left his mount, though I don’t see how that could be, as Go Some is never fastened with the saddle ponies any more.” Hinkee Dee said nothing as he strode away, but there was no look of concern on his face as there was on the countenance of the foreman. “What’s the matter with your pony, Jerry?” asked Bob as he and Ned rode beside their tall chum. “Nothing that I know of. Why?” “He seems to want to hurry up all the while. Never knew him to be that way before. He was always at the tail end.”
  • 40. “He is a bit speedy,” admitted Jerry, as he saw that his mount was stepping along at a good pace. “I never paid much attention to him before. Maybe he has some friends over this way. I wonder,” went on Jerry, speculatively, “if any of the cow rustlers’ ponies could be grazing around here?” for they were in the vicinity of the place where they had picked up the trail of the last raiding party. “It might be,” agreed Ned. “Horses have relations, same as other animals, I reckon, and if your pony got a whiff of the family he might be in a hurry to rub noses. But, however that may be, I’d give a good bit to know where they hide their horses and the cattle. Hold on there! Don’t be in such a rush!” Jerry tried to rein in his mount, but it was too late, for, a moment later, the animal had taken the bit in his teeth and was dashing across the plain. “What are you trying to do—start a race?” cried Ned. “I’ll give you a brush!” added Bob, but he had a glimpse of Jerry’s face as the lad tore past him, and Jerry’s countenance showed anything but delight in a coming test of speed. Meanwhile, Gimp, his anxious eyes scanning the horizon at every rise he topped, was riding on, muttering to himself. “That change of horses never was made natural,” he said. “Somebody who didn’t like Jerry had a hand in it. Now I wonder who it could be? Well, better not ask too many questions, I reckon. But I’ll keep my eyes open.” He trotted on, now and then speaking to his horse as a range rider will often do. But Gimp saw no trace of the boys of whom he was in search—at least not for over an hour after he had fared forth. Then, as he turned away from Bubbling Springs where his search had been unsuccessful, and headed for the defile where the trail of the cattle rustlers had been lost, he descried in the distance three figures, one far in advance of the others.
  • 41. “That’s them, sure!” exclaimed Gimp. “And Go Some has done his famous boltin’ stunt. Anyhow, Jerry’s still in the saddle. How long he’ll stay is another matter. Hop along you rat-tailed runt!” and with this affectionate epithet directed at his own steed, Gimp shook the reins and galloped off, making sure Lizzie, his horned toad pet, was safe in his pocket. He was within five hundred feet of the leading, onrushing Go Some when the maddened horse did just what was to be expected of him. He began to buck, and as Jerry was no expert in the saddle he shot out at the second landing. And then, with fury, Go Some turned and rushed at the prostrate, motionless figure.
  • 42. “GO SOME” TURNED AND RUSHED AT THE PROSTRATE, MOTIONLESS FIGURE. With yells of dismay, Ned and Bob tried to spur their already half- exhausted animals forward to stop the maddened brute, but their mounts were unable to give the necessary burst of speed. “Leave him to me!” yelled Gimp, who rode up just then. “I’ll ’tend to him!” “Hump yourself now, you rat!” he yelled to his animal.
  • 43. Like a polo pony, Blaze collided with the infuriated Go Some, the two horses coming together with a thud that could be heard for a long distance. Then Ned and Bob saw Gimp’s plan. He fairly knocked the maddened animal to one side so it could not trample on the unconscious Jerry. But the shock was only momentarily successful. Thrown out of his stride, and away from the object of his attack, Go Some swerved to one side for an instant. But as he came on again, with no thought of giving up his plan, Gimp was ready for him. Drawing his revolver, the cowboy fired directly at the furious animal. The bullet, as the marksman intended, creased a red line along the beast’s neck, making a smarting, stinging wound. “Maybe that’ll cure you!” muttered the cowboy as he saw the mad horse turn and gallop away across the rolling plain. Then Gimp reined Blaze in, and slipped out of the saddle. He knelt beside Jerry, as Bob and Ned jumped from their mounts. “Is he—is he——” faltered Chunky. “Not by a long shot!” exclaimed Gimp. “There’s a lot of fight left in him yet! He struck on his head and he’s insensible, but there don’t nothin’ seem to be busted,” he added, feeling all over Jerry who lay with closed eyes. “How’re we going to get him home?” asked Ned, when his chum had not aroused after they had wet his face with water and had tried to force some between his lips. “Guess one of you’ll have to ride back for the ambulance—I mean a wagon,” Gimp answered. “Our auto would be best,” suggested Ned. “I’ll go get it and run it back here.” Ned made good time back to the ranch, considering the half- exhausted state of his pony, and he made better time back with the automobile. Jerry was just opening his eyes when Ned returned, but
  • 44. he went off in another spell of faintness as they lifted him up on the pile of blankets that had been slipped in by the anxious foreman. As the automobile, carefully and slowly driven by Ned, while Bob and Gimp rode beside it, came within view of the Square Z buildings they saw a horseman riding toward them. “What’s up now; more trouble?” asked Gimp, as he recognized the Parson, who seemed excited. “I should say so! Munson’s been shot.” “Shot! How?” “In a cattle raid. There’s been another.”
  • 45. CHAPTER XVIII TWO INVALIDS Gimp pulled up his horse sharply and looked narrowly at the Parson. “Where was the raid this time?” he asked. “From the Bear Swamp range,” and he named a part of the Square Z ranch that lay to the southeast, a low tract that was wet part of the year. “Bear Swamp, eh?” mused Gimp. “That’s where some of the good stock was, too.” “Yes, the old man had a nice bunch fattening there for a special order. He’s ravin’ now.” For the moment Bob and Ned were more interested in how Munson had been shot than in the news of the cattle being driven off. The same thought was in both their minds. Was the cattle buyer shot while protecting the Square Z herd, or while participating in the theft? This last fitted in with the suspicions in the minds of the two boys. They wanted to ask a question but did not know just how, when Gimp saved them the trouble. “Where was Munson hit?” he asked. “In the back?” he added as a significant after query. The Parson laughed. “It wouldn’t have surprised me if he had been on the run away from the enemy when he got nipped,” he said, “but I’ll have to be just and say it was in the leg, and head on at that.” “What was he doing?” Gimp next demanded.
  • 46. “He tried to plug some of the rustlers but they got him first, it seems,” answered the Parson. “Huh! It seems?” inquired Gimp. “Doesn’t anybody know?” “Nobody was there but Munson, and we had to take his version of it,” went on the narrator. “At least nobody but Munson came back to Square Z after the fracas. The others rode away with the cattle.” “Oh, then he was the only one who saw ’em. Which way did they go?” asked Gimp, eagerly. “Over there—same way as the others,” and the Parson pointed toward the rocky defile near which all traces of the former bunch of stolen cattle had been lost. “Same gang then, I take it,” said Gimp, presently. “Go on. Spin the yarn as we go along. We’ve got a sick boy here and the sooner the doctor sees him the better.” Gimp told the Parson, briefly, how Jerry had been hurt, and added something about Hinkee Dee which Ned and Bob could not quite catch. Then, in his turn, the Parson told of the raid. Munson, it appeared, had ridden off, as he often did, to look at a bunch of steers or to inspect some part of the ranch. He had come back, riding a winded horse and with his right leg tied in bloody bandages. His story was to the effect that as he approached a small herd of cattle that were temporarily without cowboy watchers from Square Z, he had seen the steers being rounded up by half a dozen men, who started to drive them away. “Munson said he knowed they wasn’t our men,” said the Parson, “so he hailed ’em. They fired at him quick as a flash, and then he said he was sure they were the rustlers. He shot back and thinks he hit one, but they got him in the leg. He knows a little about medicine it seems, so he tore up his shirt, bandaged the wound and rode home. I guess most of us would have done the same.” “Then he saw the rustlers?” asked Gimp, eagerly.
  • 47. “Sure,” assented the Parson. “Can’t he give a description so we can find ’em?” “Well, he didn’t get near enough to see ’em clearly, he says. And you know one cowboy on a horse looks pretty much like another,” replied the Parson. “I guess Munson’s description won’t be much help. But we’re going to get right on their trail, and maybe we’ll be able to land ’em. They haven’t got such a start as before.” Poor Jerry was beginning to recover consciousness when they carried him into the ranch house. He opened his eyes. “Are you badly hurt, old scout?” asked Bob, anxiously. “Well,” was the slow and low-voiced answer, “I have felt better,” and there was a faint smile which showed Jerry’s grit. There were some modern conveniences at Square Z, a telephone being one of them, and a message was sent to town for a physician, who, fortunately, was in his office. He promised to come at once in his automobile, and was at Square Z in a comparatively short time. “You’ve got two invalids to look after, Doc,” remarked the foreman, who had remained behind with the boys when Gimp and the Parson had ridden off after the other cowboys who had already started the chase. “Two? I thought there was only one.” “Visitor stayin’ here got himself shot-up,” and Mr. Watson briefly described Munson’s hurt. As Jerry seemed to be the worse injured, the doctor attended him first, and after a searching examination announced, to the relief of Bob and Ned, that their chum was not in a serious condition. “He’s had a bad shaking up, and he’s as sore as a boil and will be for some days,” declared the physician. “But nothing is broken, and I think there will prove to be no internal injuries. He’s badly bruised and he’ll have to stay in bed for three or four days. Now where’s the other chap?”
  • 48. But that was a question that could not be answered; at least off- hand. For when they went to Munson’s room, whither he had limped on his arrival at the ranch with the startling news, he was gone. Some bloody bandages on a chair seemed to indicate that he had dressed his wound again and gone. But where? The cook solved the mystery by reporting that, just before the arrival of the doctor, Munson had been seen riding away in the direction taken by the pursuing cowboys. “Well, he’s got grit, that’s what I say!” exclaimed the foreman. Jerry was made as comfortable as possible, and then they could only await the return of the cowboys from the chase to see how Munson fared. And when he came riding in with the others, showing little traces on his face of any pain or suffering, and heard the edict that the doctor was to come to him, or he to go to the doctor, he exclaimed: “Not much! It isn’t the first time I’ve been shot, and it may not be the last. I know how to doctor myself and I’m all right. I’ll be a little lame and stiff for a while and I’ll have to lie around the bunk, but that’ll be about all. No doctor for me!” and they could not persuade him otherwise. Then the talk turned to the results of the pursuit. “They got clean away!” declared Gimp, in disappointed tones. “Couldn’t find hide nor hair of ’em.” “Where was the last trace?” asked the foreman. “Same place as the others, near Horse Tail Gulch.” This, it appeared, was the name of the ravine near which the boys had made some observations. “We traced ’em to there,” explained the Parson, “and that was all we could do.” “Well, this sure is queer!” exclaimed Mr. Watson, banging his fist down on the table. “I never knew cattle raids to be carried on like this. They must give the beasts wings after they start to drive ’em away.”
  • 49. “It does seem so,” agreed Gimp. “What they do with ’em is a mystery to me.” “Could they mingle your cattle in with others from another ranch, so you wouldn’t notice them?” asked Ned. “Well, Son, they could do that if there was other herds with a different brand than ours near here,” admitted the foreman. “But there isn’t. I see your drift. You mean they’ll round up some of your dad’s steers and when they get to where some other rancher has his herds they’ll bunch ’em; is that it?” “Yes,” nodded Ned. “Well, I don’t hardly believe they’d do that. It would be too hard work to cut out our cattle, and besides, as soon as the rancher saw a new brand in with his beef he’d send word here. Our brand is registered all over. “Besides,” went on the foreman, “the thieves wouldn’t just cut out our cattle and drive them on, after they’d let ’em mingle; they’d take some of the other man’s, too. And we haven’t heard of any other ranch being robbed the way Square Z has—at least, I haven’t,” he concluded, looking at the cowboys. “No, they seem to be picking on just us,” said the Parson. “I guess my theory isn’t of much account,” admitted Ned. Then, as the two boys left the group of ranchers, going off by themselves, he added: “But we’ve got to do something—we’ve got to make good.” “That’s right!” declared Bob. “We got the folks to consent to let us try our hand at this rather than hire detectives, and they may call us off if we don’t show results.” The doctor came the next day and announced that Jerry was doing finely, saying he could be up and around in another day. Munson stuck to his decision not to have the physician look at the wounded leg, and to this the medical man, with a shrug of his shoulders, had to agree.
  • 50. “It’s healing fine,” the cattle buyer said. Jerry was able to be up the next day, and it was considered that the two “invalids” were doing well. Ned and Bob wanted to stay around the ranch to keep Jerry company, but he insisted that they do what they could to get some clue to the mystery. So they rode off each morning toward the gulch, but they were not successful in uncovering anything. Nor were the cowboys, though they could not devote much time to searching, since there was much work to be done about the ranch. Jerry had been questioned as to why he took Go Some in mistake for his own horse. “Why, I thought it was my own pony, that’s all,” he said. “The wild one was tethered where I’d left mine, and I’m not sharp enough about horses to tell one from another at a glance when they are as much alike as those two.” “Well, they are a bit alike,” admitted the foreman. “But someone changed the places of the ponies, and I’d like to know who did it.” The puzzle remained unsolved, however—at least for some time. “Well, I guess I’ll be able to go about enough to-morrow to start with Bob and Ned on a thorough search,” said Jerry to himself, about a week after his accident, while he was moving about the house to get the stiffness out of his muscles. “I’m feeling all right again.” Munson had not been active, either, his leg developing a stiffness that kept him to his room. He had been given an apartment to himself instead of bunking in with the cowboys. Ned, Bob and Jerry, too, as guests, had rooms to themselves in the same building. As Jerry, walking in the Indian moccasins which he wore while in the house, passed Munson’s room he was minded to go in and have a talk with him. But as he noiselessly approached, something he saw through the partially opened door caused him to pause. The cattle buyer was changing his clothes. Jerry had a glimpse of both his bare legs and on neither one was a trace of a bullet wound!
  • 52. CHAPTER XIX ANOTHER ATTEMPT “Well!” exclaimed Jerry to himself, “wouldn’t that make you wonder if you were seeing things?” For a moment he stood, fascinated by the thought of what it all might mean, and he did not realize that it was not exactly the proper thing to do. But Munson was without so much as a scar to show where the bullet had gone in and been cut out, as he had claimed it had been! “I wonder if he could have said his arm instead of his leg?” mused Jerry as he walked softly away, having given over his idea of speaking to the cattle buyer. “Did I misunderstand them when they told me about the shooting?” Jerry tried to reason it out. No, he was sure “leg” had been mentioned. Besides, he himself had seen the blood-stained trousers the man had worn. “And one doesn’t wear trousers on one’s arms. What does it all mean?” Jerry mused. He tried to think it out. Clearly, since there was no trace of a bullet wound there could have been no bullet. And, by the same process of reasoning, if there was no bullet there could have been no shot fired at Munson. “And if there wasn’t a shot there wasn’t the fight he described, and maybe—yes, there was a cattle theft all right.” Jerry was sure of that much, anyhow.
  • 53. “But why should he fake a wound?” Jerry asked himself. “What object could he have, unless he wanted to make himself out a hero. I guess that must be it. He wanted to prove that he wasn’t afraid of a gun. Well, maybe he isn’t. But this is a queer way to prove it. I give it up!” A little later as Jerry was sitting out in the sun Munson came limping toward him. “He’s keeping up the fake,” thought the tall lad. “And he does it well. Limps just about enough, and not as much as at first. He doesn’t forget, either. Must be a good actor. “How’s the leg?” the boy asked, just to see what would be said. “Oh, getting on fine!” was the enthusiastic answer. “I’ll be able to leave the bandages off in a couple of days now,” and he motioned to a bulge under his trousers where, evidently, he had wound some cloth, uselessly, as Jerry knew. “That’s good,” was Jerry’s comment. Then, just to see what the effect would be, he remarked, as though in surprise: “Oh, you were shot in the right leg, weren’t you?” He thought perhaps Munson might surmise that he had been suspected of faking, and would seem confused. But he was perfectly cool and replied in casual tones: “Sure it was the right leg. Did you think it was the left?” “I had an idea,” Jerry answered. “Yes, I’ll be in fine shape in a couple more days,” went on Munson, “and then I can help you boys look for those cattle rustlers. I’d like to get hold of the man who shot me.” “You never will,” thought the lad grimly, “for there wasn’t any such man. You’re a big faker; but what’s your game?” Jerry cared more for that than for anything else just then. Was Munson in with the thieves? If so, what would it benefit him to
  • 54. pretend to be wounded? Jerry’s brain was tired with trying to get a loose end of the tangle that he could follow. Ned and Bob, going off by themselves to look for traces of the thieves, were no more successful than the three chums had been together. They returned at the end of a long day, tired and disappointed. Their zeal was quickened, however, when Jerry told them of the queer discovery in regard to Munson. “Whew!” whistled Ned. “There’s something doing here, all right. He’s one of the cattle thieves as sure as guns! We’ve got to watch him close.” “I agree to that last part all right,” said Jerry. “But I’m not so sure he’s in with the rustlers.” “I am!” and Bob sided with Ned. “Well, that’s one end to work on, and another is to see what happened to your dad’s cattle,” said Jerry. “We’ll have another try at the gulch, I think.” “It’s only a waste of time,” declared Ned. “Bob and I have gone over every inch of the ground there.” “Well, I’m a bit freshened up by my rest,” insisted Jerry, “and I want to take another look. But have you fellows formed ideas at all?” “Half a dozen, and not one any good,” answered Bob. “Once I had an idea that they took the cattle away in a big automobile from the point where we lost trace of them.” “They couldn’t do that without leaving marks of the wheels,” put in Ned, “and we didn’t see any.” “Then I got a crazy notion that they floated them down a river on a raft,” went on Chunky. “Only,” and he grinned, “there isn’t any river near there.” “And then he sprang the tunnel theory,” laughed Ned.
  • 55. “What’s that?” Jerry demanded. “Oh, I had an idea there might be a secret underground passage somewhere near the gulch, and the rustlers could slip the cattle away through that. But we couldn’t find any tunnel.” “And so we’re about at the end of our guessing,” resumed Ned. “The only theories left are that the cattle sprout wings and jump over the mountain range, or else they’re carried up in an elevator, leaving no trace.” “Well, we’ll see what we can find,” said Jerry. “What with that, and keeping an eye on Munson, we’re going to have our hands full.” “And our eyes, too,” laughed Ned. “Want to take a spin in the airship?” asked Bob of Jerry. “Not quite yet,” he replied. “I feel a bit weak still, and I haven’t gotten back all my nerve. But you two go if you like.” Bob and Ned did take a little flight just before supper, to the delight and astonishment of the cowboys, who never wearied of watching the evolutions of the aircraft, though once it made considerable work for them, as in flying over a herd of cattle the animals stampeded, when some of them saw the shadow of the big wings hovering over them, and the cowboys had all they could do to quiet the steers. But, for all that, the plainsmen delighted to watch the boys sail aloft. Few of them would venture very near the craft, however, for fear, as one of them said, “she might turn around and chase us.” But the airship gained for the boys a certain respect and awe that had been lacking before. Hinkee Dee only remained hostile, but he was less open in his antagonism now. A day or two later the three boys were on their way to the baffling gulch, or defile. Jerry, Bob and Ned rode their ponies easily along the undulating grassy plains, Jerry having made sure this time that he had his own horse. The wild one had wandered off the day of the accident and had not come back to the ranch. Mr. Watson had told
  • 56. the men not to make a search for him, as he was “too ornery for anyone to own.” Professor Snodgrass had been invited to accompany the boys, but he said he was on the track of some new kind of moth, and its feeding ground was in the opposite direction from the gulch. “Well, see what you can find,” suggested Ned to Jerry, as the trio reached the place where all traces of the stolen cattle had been lost. “Bob and I have ridden all over the place, and we can’t find a crack big enough to let a sheep through, let alone a steer.” “We’ll see,” said Jerry. “Mind, I don’t say there is anything here, but I just want to satisfy myself.” They looked carefully in the vicinity of the entrance to the gulch, or defile. It was at the top of a long low slope that extended along the western boundary of Square Z ranch. This ridge was really the last of a line of hills which lay at the foot of the mountain slope. The ravine was a sort of V-shaped break in the mountain wall. At one time it might have been a pass through the mountains, but an upheaval of nature had closed it until now it was but a wedge-shaped cut, or gash, into the stony side of the mountain. Stony were the steep walls and also the floor, which was covered with shale and flat rocks. “There’ve been cattle along here,” declared Jerry, pausing at the entrance to the gulch. “Yes, everybody admits that,” conceded Ned. “And there’ve been cattle in the gulch, too. You can see traces of ’em. But the mystery is: how do they get out?” Jerry looked about without answering.
  • 57. CHAPTER XX THE PROFESSOR’S DILEMMA Ned, Bob and Jerry were perhaps better fitted to attempt to solve a mystery of this kind than most young men would have been. They had traveled considerably, and had been in strange situations. More than once they had had to do with secret passageways and queer tunnels which they had discovered only after long, tiresome search. “But I never saw anything quite so plain as this,” confessed Jerry, as he and his chums rode around the sides of the V-shaped gulch. It was shaped like a V in two ways. That is, the entrance was of that character and the sides sloped down from the top; though because of the width of the floor, as it might be called, of the gulch the outline of the elevation would better be represented by the letter U. The opening of the gulch was perhaps half a mile in width, and the two sides were a mile or more long. They came together, gradually converging, until they formed the inside of a sharp wedge. “Now the question,” said Jerry, “is whether or not there is an opening in this V; and, if so—where?” “Now you’ve said it!” exclaimed Ned. “Where? Beats any problem in geometry I ever tackled.” “Well, come on, let’s be systematic about this,” suggested Jerry. “There are three of us, and we can divide this gulch into three parts.” The tall lad indicated some natural landmarks on the rocky walls of the ravine. He would take from the entrance on the left to a third of the way down the side. From there, extending part way up the
  • 58. other side, and, of course, including the angle of the V, would be Bob’s portion. The remainder would be inspected by Ned. “But Bob and I have done it all before,” objected Ned. “We didn’t find a thing.” “And maybe we sha’n’t now,” admitted Jerry. “But it won’t be for lack of trying. Come on now, start.” “And you can both meet me at the end of the gulch,” suggested Bob. “Why meet you there?” Jerry asked. “So you can eat,” was the ready response. “I’ve got the grub, you know.” “Trust you for that,” laughed Ned. “But it’s a good idea all the same.” The search began. The boys were sure the cattle had been driven up to the entrance of the defile. In this they were supported by the cowboys who agreed to the same thing. But there was a division of opinion as to whether the steers had been driven into the gulch and held there for a time. There were objections to this theory on the ground that in some cases pursuit had been made so soon after the raid that had the cattle been held in the gulch they would have been seen. Of course, they might have been kept there for a little while, and then concealed, either further up the side of the mountain or among the low foothills. But searches in these places had failed to give any clue. “The cattle come into this gulch,” was Jerry’s decision, “and we’ve got to find out how they are taken out without being seen.” The boys searched the rocky sides of the gulch thoroughly. They even climbed part way up, but all to no purpose. When Jerry and Ned met with Bob in the angle, and began to eat, they were no nearer a solution of the mystery than at first.
  • 59. Welcome to our website – the ideal destination for book lovers and knowledge seekers. With a mission to inspire endlessly, we offer a vast collection of books, ranging from classic literary works to specialized publications, self-development books, and children's literature. Each book is a new journey of discovery, expanding knowledge and enriching the soul of the reade Our website is not just a platform for buying books, but a bridge connecting readers to the timeless values of culture and wisdom. With an elegant, user-friendly interface and an intelligent search system, we are committed to providing a quick and convenient shopping experience. Additionally, our special promotions and home delivery services ensure that you save time and fully enjoy the joy of reading. Let us accompany you on the journey of exploring knowledge and personal growth! textbookfull.com