SlideShare a Scribd company logo
IBM Systems and Technology
An IBM White Paper
February 2011
Watson – A System
Designed for Answers
The future of workload optimized systems design
2 Watson – A System Designed for Answers
Executive summary
Over the last century, IBM has achieved numerous scientific
breakthroughs through its commitment to research and its tradi-
tion of Grand Challenges. These Grand Challenges—such as
Deep Blue®, which was designed to rival world chess champion
Gary Kasparov—work to push science in ways that weren’t
thought possible before. Watson is the latest IBM Research
Grand Challenge, designed to further the science of natural
language processing through advances in question and answer
technology.
Watson is a workload optimized system based on IBM DeepQA
architecture running on a cluster of IBM® POWER7®
processor-based servers. After four years of intense research and
development by a team of IBM researchers, Watson competed
on Jeopardy! in February 2011, performing at the level of human
experts in terms of precision, confidence and speed against two
of the best-known and most successful Jeopardy! Champions,
Ken Jennings and Brad Rutter. This white paper explains
Watson’s workload optimized system design, how it’s emblematic
of the future of systems design, and why this represents a new
computing paradigm.
Jeopardy! The IBM challenge
In 1997, Deep Blue, the computer chess-playing system devel-
oped by IBM Research, captured worldwide attention by com-
peting successfully against world chess champion Gary Kasparov.
It was the culmination of a grand challenge to advance the sci-
ence of computing in a way that created great popular interest.
Today, with companies increasingly capturing critical business
information in natural language documentation, there is growing
interest in workload optimized systems that deeply analyze the
content of natural language questions to answer those questions
with precision. Advances in question answering (QA) technology
will increasingly help support professionals in critical and timely
decision making in areas such as health care, business intelli-
gence, knowledge discovery, enterprise knowledge management,
and customer support.
With QA in mind, IBM settled on a challenge to build a com-
puter system called “Watson” (after Thomas J. Watson, the
founder of IBM), which could compete at the human champion
level in real time on the American TV quiz show Jeopardy! The
program, which has been broadcast in the United States for
more than 25 years, pits three human contestants against one
another to answer rich natural language questions over a broad
IBM Systems and Technology 3
range of topics, with penalties for wrong answers. In this three-
person competition, confidence, precision and answering speed
are of critical importance, as contestants usually come up with
their answers in the few seconds it takes for the host to read a
clue. To compete in this game at human-champion levels, a
computer system would need to answer roughly 70 percent of
the questions asked with greater than 80 percent precision in
three seconds or less.
Watson represents an impressive leap forward in systems design
and analytics. It runs IBM’s DeepQA technology, a new kind of
analytics capability that can perform thousands of simultaneous
tasks in seconds to provide precise answers to questions.
Powered by IBM POWER7 processor technology, Watson is an
example of the complex analytics workloads that are becoming
increasingly common and critical to business success and
competitiveness in today’s data-intensive environment.
Watson competed against two of the most well-known and suc-
cessful Jeopardy! champions—Ken Jennings and Brad Rutter—in
a two-match contest aired over three consecutive nights begin-
ning on February 14, 2011.
IBM DeepQA
DeepQA is a massively parallel probabilistic evidence-based
architecture. For the Jeopardy! Challenge, more than 100 differ-
ent techniques are used to analyze natural language, identify
sources, find and generate hypotheses, find and score evidence,
and merge and rank hypotheses. Far more important than any
particular technique is the way all these techniques are combined
in DeepQA such that overlapping approaches can bring their
strengths to bear and contribute to improvements in accuracy,
confidence, or speed.
4 Watson – A System Designed for Answers
DeepQA is an architecture with an accompanying methodology,
but it is not specific to the Jeopardy! Challenge. IBM has begun
adapting it to different business applications and additional
exploratory challenge problems including medicine, enterprise
search and gaming.
The overarching principles in DeepQA are:
1. Massive parallelism: Exploit massive parallelism in the con-
sideration of multiple interpretations and hypotheses.
2. Many experts: Facilitate the integration, application and con-
textual evaluation of a wide range of loosely coupled proba-
bilistic question and content analytics.
3. Pervasive confidence estimation: No single component
commits to an answer; all components produce features and
associated confidences, scoring different question and content
interpretations. An underlying confidence processing substrate
learns how to stack and combine the scores.
4. Integrate shallow and deep knowledge: Balance the use of
strict semantics and shallow semantics, leveraging many
loosely formed ontologies.
Speed and scale-out
DeepQA is developed using Apache UIMA, a framework
implementation of the Unstructured Information Management
Architecture. UIMA was designed to support interoperability
and scale-out of text and multimodal analysis applications. All of
the components in DeepQA are implemented as UIMA annota-
tors. These are components that analyze text and produce anno-
tations or assertions about the text. Over time Watson has
evolved so that the system now has hundred of components.
UIMA facilitated rapid component integration, testing and
evaluation.
Early implementations of Watson ran on a single processor,
which required two hours to answer a single question. The
DeepQA computation is embarrassing parallel, however, and so
it can be divided into a number of independent parts, each of
which can be executed by a separate processor. UIMA-AS, part
of Apache UIMA, enables the scale-out of UIMA applications
using asynchronous messaging. Watson uses UIMA-AS to scale
out across 2,880 POWER7 cores in a cluster of 90 IBM Power®
750 servers. UIMA_AS manages all of the inter-process commu-
nication using the open JMS standard. The UIMA-AS deploy-
ment on POWER7 enabled Watson to deliver answers in one to
six seconds.
Watson has roughly 200 million pages of natural language
content (equivalent to reading 1 million books). Watson uses the
Apache Hadoop framework to facilitate preprocessing the large
volume of data in order to create in-memory datasets used at
run-time. Watson’s DeepQA UIMA annotators were deployed
as mappers in the Hadoop map-reduce framework, which dis-
tributed them across processors in the cluster. Hadoop con-
tributes to optimal CPU utilization and also provides convenient
tools for deploying, managing, and monitoring the data
analysis process.
Harnessing POWER7
Watson harnesses the massive parallel processing performance
of its POWER7 processors to execute its thousands of
DeepQA tasks simultaneously on individual processor cores.
Each of Watson’s 90 clustered IBM Power 750 servers features
32 POWER7 cores running at 3.55 GHz. Running the Linux®
operating system, the servers are housed in 10 racks along with
associated I/O nodes and communications hubs. The system has
a combined total of 16 Terabytes of memory and can operate at
over 80 Teraflops (trillions of operations per second).
With its innovative, eight-core processor design, POWER7 is
ideally suited for massively parallel processing of Watson’s
analytics algorithms. POWER7 also features 500 gigabytes of
on-chip communications bandwidth, contributing to exceptional
efficiency of both memory and processor utilization. And since
each server packs 32 high performance POWER7 cores with up
to 512 GB of memory, the Power 750 makes an ideal platform
for Watson’s processor and memory-hungry Java processes.
5IBM Systems and Technology
Designing Watson on commercially available Power 750 servers
was a deliberate choice to ensure more rapid adoption of opti-
mized systems in industries such as healthcare and financial serv-
ices. That goal was a fundamental difference between Watson
and Deep Blue, which was a highly customized supercomputer.
Deep Blue was based on an earlier generation of Power proces-
sor technology, featuring a 30 node RS/6000 SP system, with
each node containing a single 120 MHz POWER2 processor.
But in addition to the regular POWER2 processors, Deep Blue’s
performance was enhanced with 480 special purpose chess
processor chips.
The same Power 750 server used by Watson is already
deployed today by thousands of organizations in optimized
systems that provide for both complex analytics and transaction
processing. Rice University in Houston, Texas, for example, uses
IBM Power 750 systems to accelerate the understanding of the
molecular basis of cancer through the application of genome
analysis technologies. POWER7 systems have given Rice more
flexibility and efficiency, enabling them to pursue a broader
range of research challenges on a single system than was possible
before. And GHY International, a customs brokerage firm in
Canada, migrated to a new Power 750 running Power AIX®,
Power i and Power Linux to better support their clients’
increased engagement in international trading. With
PowerVM™ virtualization, GHY is now able to deploy new
capabilities in as little as five minutes to support their clients’
changing needs.
A system designed for answers
After four years of intense research and development by a team
of IBM researchers, Watson has demonstrated its ability to
compete on Jeopardy! against champion players, performing at
human-expert levels in terms of precision, confidence and speed.
The project has advanced the fields of unstructured data
analytics, natural language processing, and the design of work-
load optimized systems. Beyond Jeopardy!, the technology
behind Watson can be adapted to solve business and societal
problems—for example, diagnosing disease, handling online
technical support questions, and parsing vast tracts of legal
documents—and to drive progress across industries.
Watson’s ability to understand the meaning and context of
human language, and rapidly process information to find precise
answers to complex questions, holds enormous potential to
transform how computers can help people accomplish tasks in
business and their personal lives.
Please Recycle
For more information
To learn more about Watson, POWER7 and workload opti-
mized systems, please contact your IBM marketing representa-
tive or IBM Business Partner, or visit the following websites:
● ibm.com/systems/power/advantages/watson
● ibm.com/systems/power © Copyright IBM Corporation 2011
IBM Systems and Technology Group
Route 100
Somers, NY 10589
Produced in the United States of America
February 2011
All Rights Reserved
IBM, the IBM logo, ibm.com, Power, POWER7 and DEEP BLUE are
trademarks of International Business Machines Corporation in the
United States, other countries or both. If these and other IBM trademarked
terms are marked on their first occurrence in this information with a
trademark symbol (® or ™), these symbols indicate U.S. registered or
common law trademarks owned by IBM at the time this information was
published. Such trademarks may also be registered or common law
trademarks in other countries. A current list of IBM trademarks is
available on the web at “Copyright and trademark information” at
ibm.com/legal/copytrade.shtml
Other company, product or service names may be trademarks or service
marks of others.
POW03061-USEN-00

More Related Content

PDF
Sybase job interview_preparation_guide
DOC
hari_duche_updated
PDF
Pivotal: Virtualize Big Data to Make the Elephant Dance
 
PDF
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
PDF
Greenplum hadoop
PPTX
Demonstrating the Future of Data Science
PDF
PDF
Big Data/Hadoop Infrastructure Considerations
Sybase job interview_preparation_guide
hari_duche_updated
Pivotal: Virtualize Big Data to Make the Elephant Dance
 
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
Greenplum hadoop
Demonstrating the Future of Data Science
Big Data/Hadoop Infrastructure Considerations

What's hot (20)

PDF
Building Big Data Applications
PDF
Queues, Pools and Caches paper
PDF
Greenplum Database Overview
 
DOCX
Queues, Pools and Caches - Paper
PPTX
Comparison of MPP Data Warehouse Platforms
PDF
Whitepaper : Working with Greenplum Database using Toad for Data Analysts
 
PDF
Microsoft SQL Azure - Cloud Based Database Datasheet
PDF
Using hadoop to expand data warehousing
PDF
Big_SQL_3.0_Whitepaper
PDF
IBM POWER8 Processor-Based Systems RAS
PDF
IRJET- Performing Load Balancing between Namenodes in HDFS
PDF
Infrastructure Considerations for Analytical Workloads
PPTX
Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...
PPTX
Hadoop & Greenplum: Why Do Such a Thing?
PDF
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
PDF
Run compute-intensive Apache Hadoop big data workloads faster with Dell EMC P...
PDF
IRJET - Survey Paper on Map Reduce Processing using HADOOP
PDF
Cidr11 paper32
PPTX
Learn Hadoop
PDF
Building a data warehouse of call data records
Building Big Data Applications
Queues, Pools and Caches paper
Greenplum Database Overview
 
Queues, Pools and Caches - Paper
Comparison of MPP Data Warehouse Platforms
Whitepaper : Working with Greenplum Database using Toad for Data Analysts
 
Microsoft SQL Azure - Cloud Based Database Datasheet
Using hadoop to expand data warehousing
Big_SQL_3.0_Whitepaper
IBM POWER8 Processor-Based Systems RAS
IRJET- Performing Load Balancing between Namenodes in HDFS
Infrastructure Considerations for Analytical Workloads
Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...
Hadoop & Greenplum: Why Do Such a Thing?
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
Run compute-intensive Apache Hadoop big data workloads faster with Dell EMC P...
IRJET - Survey Paper on Map Reduce Processing using HADOOP
Cidr11 paper32
Learn Hadoop
Building a data warehouse of call data records
Ad

Viewers also liked (20)

PDF
IBM Watson & Open Source Software - LinuxCon 2012
PPTX
IBM Relay 2015: New Data Sources, New Value. Watson, Weather and Beyond
 
PDF
Ibm watson - how it works, and what it means for society beyond winning jeo...
PDF
IBM Watson Explorer for inbound call centers
PPTX
IBM Watson Work Services Development
PPTX
Cognitive Era and Introduction to IBM Watson
PDF
IBM Watson Content Analytics Redbook
PDF
Shaping the future of insurance with IBM Watson
PPTX
Evolution Towards Web 3.0: The Semantic Web
PPT
IBM WATSON
PPT
IBM Watson Progress and 2013 Roadmap
PDF
W3C Tutorial on Semantic Web and Linked Data at WWW 2013
PDF
IBM Watson Content Analytics: Discover Hidden Value in Your Unstructured Data
PDF
IBM Watson & Cognitive Computing - Tech In Asia 2016
PDF
Introduction to the Semantic Web
PPT
The Semantic Web
PPTX
Ibm's watson
PPTX
Introduction to the Semantic Web
PDF
IBM Watson: How it Works, and What it means for Society beyond winning Jeopardy!
PDF
IBM Watson Analytics Presentation
IBM Watson & Open Source Software - LinuxCon 2012
IBM Relay 2015: New Data Sources, New Value. Watson, Weather and Beyond
 
Ibm watson - how it works, and what it means for society beyond winning jeo...
IBM Watson Explorer for inbound call centers
IBM Watson Work Services Development
Cognitive Era and Introduction to IBM Watson
IBM Watson Content Analytics Redbook
Shaping the future of insurance with IBM Watson
Evolution Towards Web 3.0: The Semantic Web
IBM WATSON
IBM Watson Progress and 2013 Roadmap
W3C Tutorial on Semantic Web and Linked Data at WWW 2013
IBM Watson Content Analytics: Discover Hidden Value in Your Unstructured Data
IBM Watson & Cognitive Computing - Tech In Asia 2016
Introduction to the Semantic Web
The Semantic Web
Ibm's watson
Introduction to the Semantic Web
IBM Watson: How it Works, and What it means for Society beyond winning Jeopardy!
IBM Watson Analytics Presentation
Ad

Similar to Ibm watson - who what why (20)

PDF
Watson A System Designed For Answers
PDF
Ibm Watson Designed For Answers
PDF
Watson - Who What Why
PDF
Watson white paper
PPTX
PDF
Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
PDF
2016 August POWER Up Your Insights - IBM System Summit Mumbai
PDF
AI Scalability for the Next Decade
PDF
Machine Learning and Power AI Workshop v4
PPTX
Innovation with ai at scale on the edge vt sept 2019 v0
PPTX
Breaking the Silos: Storage for Analytics & AI
PDF
B040101007012
DOCX
Tony Reid Resume
PPTX
IBM Watson
PPT
Optimized Systems: Matching technologies for business success.
DOCX
kumarResume
PDF
Covid-19 Response Capability with Power Systems
PDF
IBM Power Systems: Designed for Data
PPTX
GPU 101: The Beast In Data Centers
PDF
Streaming analytics
Watson A System Designed For Answers
Ibm Watson Designed For Answers
Watson - Who What Why
Watson white paper
Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
2016 August POWER Up Your Insights - IBM System Summit Mumbai
AI Scalability for the Next Decade
Machine Learning and Power AI Workshop v4
Innovation with ai at scale on the edge vt sept 2019 v0
Breaking the Silos: Storage for Analytics & AI
B040101007012
Tony Reid Resume
IBM Watson
Optimized Systems: Matching technologies for business success.
kumarResume
Covid-19 Response Capability with Power Systems
IBM Power Systems: Designed for Data
GPU 101: The Beast In Data Centers
Streaming analytics

More from Rick Bouter (20)

PDF
Inclusieve Artificial Intelligence
PDF
Themabrochure robotisering gerformeerde bond - prof.dr. m.j. de vries
PDF
Internet of things and the metamorphosis of objects - rick bouter , gérald ...
PDF
Accenture tech vision-2018-tech-trends-report
PDF
Ai - Artificial Intelligence predictions-2018-report - PWC
PDF
Internet of things rapport sogeti - vi nt - rick bouter
PDF
“Information driven added value” Internet of Things
PDF
Caesar blockchain whitepaper blockchain de hype voorbij v1.0 - online print
PDF
Telegram open network ton will be a third generation
PDF
3. blockchain cryptoplatform voor een frictieloze economie d2 d - design-to...
PDF
5. the unorganization d2 d - full-report-d2d-5-aie-en-web-final
PDF
1. een executive introductie in d2 d sogeti-vint-d2d
PDF
2. nieuwe digitale concurrentie d2 d - sogeti-d2d-2-nl
PDF
D2 d 4-design 2 disrupt - mastering digital disruption with devops - en-web
PDF
D2 d rapport 4 rapport design to disrupt devops nl
PDF
Sogeti big data - no more secrets with big data analytics
PDF
Big data 4 4 the art of the possible 4-en-web
PDF
Big data 3 4- vint-big-data-research-privacy-technology-and-the-law - big dat...
PDF
Big data 2 4 - big-social-predicting-behavior-with-big-data
PDF
Big data 1 4 vint-sogeti-on-big-data-1-of-4-creating clarity with big data
Inclusieve Artificial Intelligence
Themabrochure robotisering gerformeerde bond - prof.dr. m.j. de vries
Internet of things and the metamorphosis of objects - rick bouter , gérald ...
Accenture tech vision-2018-tech-trends-report
Ai - Artificial Intelligence predictions-2018-report - PWC
Internet of things rapport sogeti - vi nt - rick bouter
“Information driven added value” Internet of Things
Caesar blockchain whitepaper blockchain de hype voorbij v1.0 - online print
Telegram open network ton will be a third generation
3. blockchain cryptoplatform voor een frictieloze economie d2 d - design-to...
5. the unorganization d2 d - full-report-d2d-5-aie-en-web-final
1. een executive introductie in d2 d sogeti-vint-d2d
2. nieuwe digitale concurrentie d2 d - sogeti-d2d-2-nl
D2 d 4-design 2 disrupt - mastering digital disruption with devops - en-web
D2 d rapport 4 rapport design to disrupt devops nl
Sogeti big data - no more secrets with big data analytics
Big data 4 4 the art of the possible 4-en-web
Big data 3 4- vint-big-data-research-privacy-technology-and-the-law - big dat...
Big data 2 4 - big-social-predicting-behavior-with-big-data
Big data 1 4 vint-sogeti-on-big-data-1-of-4-creating clarity with big data

Recently uploaded (20)

PDF
cuic standard and advanced reporting.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPT
Teaching material agriculture food technology
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Electronic commerce courselecture one. Pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
cuic standard and advanced reporting.pdf
Network Security Unit 5.pdf for BCA BBA.
sap open course for s4hana steps from ECC to s4
Spectral efficient network and resource selection model in 5G networks
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Teaching material agriculture food technology
MIND Revenue Release Quarter 2 2025 Press Release
Electronic commerce courselecture one. Pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Big Data Technologies - Introduction.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Chapter 3 Spatial Domain Image Processing.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
The Rise and Fall of 3GPP – Time for a Sabbatical?
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Encapsulation_ Review paper, used for researhc scholars
Digital-Transformation-Roadmap-for-Companies.pptx

Ibm watson - who what why

  • 1. IBM Systems and Technology An IBM White Paper February 2011 Watson – A System Designed for Answers The future of workload optimized systems design
  • 2. 2 Watson – A System Designed for Answers Executive summary Over the last century, IBM has achieved numerous scientific breakthroughs through its commitment to research and its tradi- tion of Grand Challenges. These Grand Challenges—such as Deep Blue®, which was designed to rival world chess champion Gary Kasparov—work to push science in ways that weren’t thought possible before. Watson is the latest IBM Research Grand Challenge, designed to further the science of natural language processing through advances in question and answer technology. Watson is a workload optimized system based on IBM DeepQA architecture running on a cluster of IBM® POWER7® processor-based servers. After four years of intense research and development by a team of IBM researchers, Watson competed on Jeopardy! in February 2011, performing at the level of human experts in terms of precision, confidence and speed against two of the best-known and most successful Jeopardy! Champions, Ken Jennings and Brad Rutter. This white paper explains Watson’s workload optimized system design, how it’s emblematic of the future of systems design, and why this represents a new computing paradigm. Jeopardy! The IBM challenge In 1997, Deep Blue, the computer chess-playing system devel- oped by IBM Research, captured worldwide attention by com- peting successfully against world chess champion Gary Kasparov. It was the culmination of a grand challenge to advance the sci- ence of computing in a way that created great popular interest. Today, with companies increasingly capturing critical business information in natural language documentation, there is growing interest in workload optimized systems that deeply analyze the content of natural language questions to answer those questions with precision. Advances in question answering (QA) technology will increasingly help support professionals in critical and timely decision making in areas such as health care, business intelli- gence, knowledge discovery, enterprise knowledge management, and customer support. With QA in mind, IBM settled on a challenge to build a com- puter system called “Watson” (after Thomas J. Watson, the founder of IBM), which could compete at the human champion level in real time on the American TV quiz show Jeopardy! The program, which has been broadcast in the United States for more than 25 years, pits three human contestants against one another to answer rich natural language questions over a broad
  • 3. IBM Systems and Technology 3 range of topics, with penalties for wrong answers. In this three- person competition, confidence, precision and answering speed are of critical importance, as contestants usually come up with their answers in the few seconds it takes for the host to read a clue. To compete in this game at human-champion levels, a computer system would need to answer roughly 70 percent of the questions asked with greater than 80 percent precision in three seconds or less. Watson represents an impressive leap forward in systems design and analytics. It runs IBM’s DeepQA technology, a new kind of analytics capability that can perform thousands of simultaneous tasks in seconds to provide precise answers to questions. Powered by IBM POWER7 processor technology, Watson is an example of the complex analytics workloads that are becoming increasingly common and critical to business success and competitiveness in today’s data-intensive environment. Watson competed against two of the most well-known and suc- cessful Jeopardy! champions—Ken Jennings and Brad Rutter—in a two-match contest aired over three consecutive nights begin- ning on February 14, 2011. IBM DeepQA DeepQA is a massively parallel probabilistic evidence-based architecture. For the Jeopardy! Challenge, more than 100 differ- ent techniques are used to analyze natural language, identify sources, find and generate hypotheses, find and score evidence, and merge and rank hypotheses. Far more important than any particular technique is the way all these techniques are combined in DeepQA such that overlapping approaches can bring their strengths to bear and contribute to improvements in accuracy, confidence, or speed.
  • 4. 4 Watson – A System Designed for Answers DeepQA is an architecture with an accompanying methodology, but it is not specific to the Jeopardy! Challenge. IBM has begun adapting it to different business applications and additional exploratory challenge problems including medicine, enterprise search and gaming. The overarching principles in DeepQA are: 1. Massive parallelism: Exploit massive parallelism in the con- sideration of multiple interpretations and hypotheses. 2. Many experts: Facilitate the integration, application and con- textual evaluation of a wide range of loosely coupled proba- bilistic question and content analytics. 3. Pervasive confidence estimation: No single component commits to an answer; all components produce features and associated confidences, scoring different question and content interpretations. An underlying confidence processing substrate learns how to stack and combine the scores. 4. Integrate shallow and deep knowledge: Balance the use of strict semantics and shallow semantics, leveraging many loosely formed ontologies. Speed and scale-out DeepQA is developed using Apache UIMA, a framework implementation of the Unstructured Information Management Architecture. UIMA was designed to support interoperability and scale-out of text and multimodal analysis applications. All of the components in DeepQA are implemented as UIMA annota- tors. These are components that analyze text and produce anno- tations or assertions about the text. Over time Watson has evolved so that the system now has hundred of components. UIMA facilitated rapid component integration, testing and evaluation. Early implementations of Watson ran on a single processor, which required two hours to answer a single question. The DeepQA computation is embarrassing parallel, however, and so it can be divided into a number of independent parts, each of which can be executed by a separate processor. UIMA-AS, part of Apache UIMA, enables the scale-out of UIMA applications using asynchronous messaging. Watson uses UIMA-AS to scale out across 2,880 POWER7 cores in a cluster of 90 IBM Power® 750 servers. UIMA_AS manages all of the inter-process commu- nication using the open JMS standard. The UIMA-AS deploy- ment on POWER7 enabled Watson to deliver answers in one to six seconds. Watson has roughly 200 million pages of natural language content (equivalent to reading 1 million books). Watson uses the Apache Hadoop framework to facilitate preprocessing the large volume of data in order to create in-memory datasets used at run-time. Watson’s DeepQA UIMA annotators were deployed as mappers in the Hadoop map-reduce framework, which dis- tributed them across processors in the cluster. Hadoop con- tributes to optimal CPU utilization and also provides convenient tools for deploying, managing, and monitoring the data analysis process. Harnessing POWER7 Watson harnesses the massive parallel processing performance of its POWER7 processors to execute its thousands of DeepQA tasks simultaneously on individual processor cores. Each of Watson’s 90 clustered IBM Power 750 servers features 32 POWER7 cores running at 3.55 GHz. Running the Linux® operating system, the servers are housed in 10 racks along with associated I/O nodes and communications hubs. The system has a combined total of 16 Terabytes of memory and can operate at over 80 Teraflops (trillions of operations per second). With its innovative, eight-core processor design, POWER7 is ideally suited for massively parallel processing of Watson’s analytics algorithms. POWER7 also features 500 gigabytes of on-chip communications bandwidth, contributing to exceptional efficiency of both memory and processor utilization. And since each server packs 32 high performance POWER7 cores with up to 512 GB of memory, the Power 750 makes an ideal platform for Watson’s processor and memory-hungry Java processes.
  • 5. 5IBM Systems and Technology Designing Watson on commercially available Power 750 servers was a deliberate choice to ensure more rapid adoption of opti- mized systems in industries such as healthcare and financial serv- ices. That goal was a fundamental difference between Watson and Deep Blue, which was a highly customized supercomputer. Deep Blue was based on an earlier generation of Power proces- sor technology, featuring a 30 node RS/6000 SP system, with each node containing a single 120 MHz POWER2 processor. But in addition to the regular POWER2 processors, Deep Blue’s performance was enhanced with 480 special purpose chess processor chips. The same Power 750 server used by Watson is already deployed today by thousands of organizations in optimized systems that provide for both complex analytics and transaction processing. Rice University in Houston, Texas, for example, uses IBM Power 750 systems to accelerate the understanding of the molecular basis of cancer through the application of genome analysis technologies. POWER7 systems have given Rice more flexibility and efficiency, enabling them to pursue a broader range of research challenges on a single system than was possible before. And GHY International, a customs brokerage firm in Canada, migrated to a new Power 750 running Power AIX®, Power i and Power Linux to better support their clients’ increased engagement in international trading. With PowerVM™ virtualization, GHY is now able to deploy new capabilities in as little as five minutes to support their clients’ changing needs. A system designed for answers After four years of intense research and development by a team of IBM researchers, Watson has demonstrated its ability to compete on Jeopardy! against champion players, performing at human-expert levels in terms of precision, confidence and speed. The project has advanced the fields of unstructured data analytics, natural language processing, and the design of work- load optimized systems. Beyond Jeopardy!, the technology behind Watson can be adapted to solve business and societal problems—for example, diagnosing disease, handling online technical support questions, and parsing vast tracts of legal documents—and to drive progress across industries. Watson’s ability to understand the meaning and context of human language, and rapidly process information to find precise answers to complex questions, holds enormous potential to transform how computers can help people accomplish tasks in business and their personal lives.
  • 6. Please Recycle For more information To learn more about Watson, POWER7 and workload opti- mized systems, please contact your IBM marketing representa- tive or IBM Business Partner, or visit the following websites: ● ibm.com/systems/power/advantages/watson ● ibm.com/systems/power © Copyright IBM Corporation 2011 IBM Systems and Technology Group Route 100 Somers, NY 10589 Produced in the United States of America February 2011 All Rights Reserved IBM, the IBM logo, ibm.com, Power, POWER7 and DEEP BLUE are trademarks of International Business Machines Corporation in the United States, other countries or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the web at “Copyright and trademark information” at ibm.com/legal/copytrade.shtml Other company, product or service names may be trademarks or service marks of others. POW03061-USEN-00