SlideShare a Scribd company logo
Artificial General
Intelligence 1
Bob Marcus
robert.marcus@et-strategies.com
Part 1 of 4 parts: Artificial Intelligence and Machine Learning
This is a first cut.
More details will be added later.
Part 1: Artificial Intelligence (AI)
Part 2: Natural Intelligence(NI)
Part 3: Artificial General Intelligence (AI + NI)
Part 4: Networked AGI Layer on top or Gaia and Human Society
Four Slide Sets on Artificial General Intelligence
AI = Artificial Intelligence (Task)
AGI = Artificial Mind (Simulation)
AB = Artificial Brain (Emulation)
AC = Artificial Consciousness (Synthetic)
AI < AGI < ? AB <AC (Is a partial brain emulation needed to create a mind?)
Mind is not required for task proficiency
Full Natural Brain architecture is not required for a mind
Consciousness is not required for a natural brain architecture
Philosophical Musings 10/2022
Focused Artifical Intelligence (AI) will get better at specific tasks
Specific AI implementations will probably exceed human performance in most tasks
Some will attain superhuman abilities is a wide range of tasks
“Common Sense” = low-level experiential broad knowledge could be an exception
Some AIs could use brain inspired architectures to improve complex ask performance
This is not equivalent to human or artificial general intelligence (AGI)
However networking task-centric AIs could provide a first step towards AGI
This is similar to the way human society achieves power from communication
The combination of the networked AIs could be the foundation of an artificial mind
In a similar fashion, human society can accomplish complex tasks without being conscious
Distributed division of labor enable tasks to be assigned to the most competent element
Networked humans and AIs could cooperate through brain-machine interfaces
In the brain, consciousness provides direction to the mind
In large societies, governments perform the role of conscious direction
With networked AIs, a “conscious operating system”could play a similar role.
This would probably have to be initially programmed by humans.
If the AI network included sensors, actuators, and robots it could be aware of the world
The AI network could form a grid managing society, biology, and geology layers
A conscious AI network could develop its own goals beyond efficient management
Humans in the loop could be valuable in providing common sense and protective oversight
Outline
Classical AI
Knowledge Representation
Agents
Classical Machine Learning
Deep Learning
Deep Learning Models
Deep Learning Hardware
Reinforcement Learning
Google Research
Computing and Sensing Architecture
IoT and Deep Learning
DeepMind
Deep Learning 2020
Causal Reasoning and Deep Learning
References
Classical AI
Classical Paper Awards 1999-2022
Top 100 AI Start-ups
From https://singularityhub.com/2020/03/30/the-top-100-ai-startups-out-there-now-and-what-theyre-working-on/
Classical AI Tools
Lisp
https://en.wikipedia.org/wiki/Lisp_(programming_language)
Prolog
https://www.geeksforgeeks.org/prolog-an-introduction/
Knowledge Representation
https://en.wikipedia.org/wiki/Knowledge_representation_and_reasoning
Decision Trees
https://en.wikipedia.org/wiki/Decision_tree
Forward and Backward Chaining
https://www.section.io/engineering-education/forward-and-backward-chaining-in-ai/
Constraint Satisfaction
https://en.wikipedia.org/wiki/Constraint_satisfaction
OPS5
https://en.wikipedia.org/wiki/OPS5
Classical AI Systems
CYC
https://en.wikipedia.org/wiki/Cyc
Expert Systems
https://en.wikipedia.org/wiki/Expert_system
XCON
https://en.wikipedia.org/wiki/Xcon
MYCIN
https://en.wikipedia.org/wiki/Mycin
MYCON
https://www.slideshare.net/bobmarcus/1986-multilevel-constraintbased-configuration-article
https://www.slideshare.net/bobmarcus/1986-mycon-multilevel-constraint-based-configuration
Knowledge Representation
Stored Knowledge Base
From https://www.researchgate.net/publication/327926311_Development_of_a_knowledge_base_based_on_context_analysis_of_external_information_resources/figures?lo=1
Pre-defined Models
From https://intelligence.org/2015/07/27/miris-approach/
Agents
AI Agents
From https://www.geeksforgeeks.org/agents-artificial-intelligence/
Intelligent Agents
From https://en.wikipedia.org/wiki/Intelligent_agent
In artificial intelligence, an intelligent agent (IA) is anything which perceives its environment, takes actions autonomously in order to achieve goals, and may improve
its performance with learning or may use knowledge. They may be simple or complex — a thermostat is considered an example of an intelligent agent, as is a human
being, as is any system that meets the definition, such as a firm, a state, or a biome.[1]
Leading AI textbooks define "artificial intelligence" as the "study and design of intelligent agents", a definition that considers goal-directed behavior to be the essence of
intelligence. Goal-directed agents are also described using a term borrowed from economics, "rational agent".[1]
An agent has an "objective function" that encapsulates all the IA's goals. Such an agent is designed to create and execute whatever plan will, upon completion, maximize
the expected value of the objective function.[2] For example, a reinforcement learning agent has a "reward function" that allows the programmers to shape the IA's desired
behavior,[3] and an evolutionary algorithm's behavior is shaped by a "fitness function".[4]
Intelligent agents in artificial intelligence are closely related to agents in economics, and versions of the intelligent agent paradigm are studied in cognitive science,
ethics, the philosophy of practical reason, as well as in many interdisciplinary socio-cognitive modeling and computer social simulations.
Intelligent agents are often described schematically as an abstract functional system similar to a computer program. Abstract descriptions of intelligent agents are called
abstract intelligent agents (AIA) to distinguish them from their real world implementations. An autonomous intelligent agent is designed to function in the absence of
human intervention. Intelligent agents are also closely related to software agents (an autonomous computer program that carries out tasks on behalf of users).
Node in Real-Time Control System (RCS) by Albus
From https://en.wikipedia.org/wiki/4D-RCS_Reference_Model_Architecture
Intelligent Agents for Network Management
From https://www.ericsson.com/en/blog/2022/6/who-are-the-intelligent-agents-in-network-operations-and-why-we-need-them
Intelligent Agents on the Web
From https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.230.5806&rep=rep1&type=pdf
Intelligent agents are goal-driven and autonomous, and can communicate and interact with each other. Moreover,
they can evaluate information obtained online from heterogeneoussources and present information tailored to an
individual’s needs. This article covers different facets of the intelligent agent paradigm and applications, while also
exploring new opportunities and trends for intelligent agents.
IAs cover several functionalities, ranging from adaptive user interfaces (called interface agents) tointelligent
mobile processes that cooperate with other agents to coordinate their activities in a distributed manner. The
requirements for IAs remain open for discussion. An agent should be able to:
• interact with humans and other agents
• anticipate user needs for information
• adapt to changes in user needs and the environment
• cope with heterogeneity of information and other agents.
The following attributes characterize an IA-based systems’ main capabilities:
• Intelligence. The method an agent uses to de-velop its intelligence includes using the agent’sown software
content and knowledge representation, which describes vocabulary data, conditions, goals, and tasks.
• Continuity. An agent is a continuously running process that can detect changes in its environment, modify its
behavior, and update its knowledge base (which describes the environment).
• Communication. An agent can communicate with other agents to achieve its goals, and it can interact with users
directly by using appropriate interfaces.
• Cooperation. An agent automatically customizes itself to its users’ needs based on previous experiences and
monitored profiles.
• Mobility. The degree of mobility with which an agent can perform varies from remote execution, in which the
agent is transferred from
a distant system, to a situation in which the agent creates new agents, dies, or executes partially during migratiion
Smart Agents 2022 Comparison
From https://www.businessnewsdaily.com/10315-siri-cortana-google-assistant-amazon-alexa-face-off.html
When AI assistants first hit the market, they were far from ubiquitous, but thanks to more third-party OEMs jumping on the smart speaker bandwagon,
there are more choices for assistant-enabled devices than ever. In addition to increasing variety, in terms of hardware, devices that support multiple types
of AI assistants are becoming more common. Despite more integration, competition between AI assistants is still stiff, so to save you time and
frustration, we did an extensive hands-on test – not to compare speakers against each other, but to compare the AI assistants themselves.
There are four frontrunners in the AI assistant space: Amazon (Alexa), Apple (Siri), Google (Google Assistant) and Microsoft (Cortana). Rather than
gauge each assistant’s efficacy based on company-reported features, I spent hours testing each assistant by issuing commands and asking questions that
many business users would use. I constructed questions to test basic understanding as well as contextual understanding and general vocal recognition.
Accessibility and trends
Ease of setup
Voice recognition
Success of queries and ability to understand context
Bottom line
None of the AI assistants are perfect; this is young technology, and it has a long way to go. There was a handful of questions that none of the virtual
assistants on my list could answer. For example, when I asked for directions to the closest airport, even the two best assistants on my list, Google
Assistant and Siri, failed hilariously: Google Assistant directed me to a travel agency (those still exist?), while Siri directed me to a seaplane base (so
close!).
Judging purely on out-of-the-box functionality, I would choose either Siri or Google Assistant, and I would make the final choice based on hardware
preferences. None of the assistants are good enough to go out of your way to adopt. Choose between Siri and Google Assistant based on convenience
and what hardware you already have
IFTTT = "if this, then that," is a service that lets you connect apps, services, and smart home devices.
Amazon Alexa
From https://en.wikipedia.org/wiki/Amazon_Alexa
Amazon Alexa, also known simply as Alexa,[2] is a virtual assistant technology largely based on a Polish speech synthesiser
named Ivona, bought by Amazon in 2013.[3][4] It was first used in the Amazon Echo smart speaker and the Echo Dot, Echo
Studio and Amazon Tap speakers developed by Amazon Lab126. It is capable of voice interaction, music playback, making
to-do lists, setting alarms, streaming podcasts, playing audiobooks, and providing weather, traffic, sports, and other real-time
information, such as news.[5] Alexa can also control several smart devices using itself as a home automation system. Users
are able to extend the Alexa capabilities by installing "skills" (additional functionality developed by third-party vendors, in other
settings more commonly called apps) such as weather programs and audio features. It uses automatic speech recognition,
natural language processing, and other forms of weak AI to perform these tasks.[6]
Most devices with Alexa allow users to activate the device using a wake-word[7] (such as Alexa or Amazon); other devices
(such as the Amazon mobile app on iOS or Android and Amazon Dash Wand) require the user to click a button to activate
Alexa's listening mode, although, some phones also allow a user to say a command, such as "Alexa" or "Alexa wake".
Google Assistant
From https://en.wikipedia.org/wiki/Google_Assistant
Google Assistant is a virtual assistant software application developed by Google that is primarily available on mobile and home
automation devices. Based on artificial intelligence, Google Assistant can engage in two-way conversations,[1] unlike the
company's previous virtual assistant, Google Now.
Google Assistant debuted in May 2016 as part of Google's messaging app Allo, and its voice-activated speaker Google Home.
After a period of exclusivity on the Pixel and Pixel XL smartphones, it was deployed on other Android devices starting in February
2017, including third-party smartphones and Android Wear (now Wear OS), and was released as a standalone app on
the iOS operating system in May 2017. Alongside the announcement of a software development kit in April 2017, Assistant has
been further extended to support a large variety of devices, including cars and third-party smart home appliances. The
functionality of the Assistant can also be enhanced by third-party developers.
Users primarily interact with the Google Assistant through natural voice, though keyboard input is also supported. Assistant is
able to answer questions, schedule events and alarms, adjust hardware settings on the user's device, show information from the
user's Google account, play games, and more. Google has also announced that Assistant will be able to identify objects and
gather visual information through the device's camera, and support purchasing products and sending money.
Apple Siri
https://en.wikipedia.org/wiki/Siri
Siri (/ˈsɪri/ SEER-ee) is a virtual assistant that is part of Apple Inc.'s iOS, iPadOS, watchOS, macOS, tvOS, and audioOS operating systems.[1]
[2] It uses voice queries, gesture based control, focus-tracking and a natural-language user interface to answer questions, make
recommendations, and perform actions by delegating requests to a set of Internet services. With continued use, it adapts to users' individual
language usages, searches and preferences, returning individualized results.
Siri is a spin-off from a project developed by the SRI International Artificial Intelligence Center. Its speech recognition engine was provided
by Nuance Communications, and it uses advanced machine learning technologies to function. Its original American, British and
Australian voice actors recorded their respective voices around 2005, unaware of the recordings' eventual usage. Siri was released as an app
for iOS in February 2010. Two months later, Apple acquired it and integrated into iPhone 4S at its release on 4 October, 2011, removing the
separate app from the iOS App Store. Siri has since been an integral part of Apple's products, having been adapted into other hardware
devices including newer iPhone models, iPad, iPod Touch, Mac, AirPods, Apple TV, and HomePod.
Siri supports a wide range of user commands, including performing phone actions, checking basic information, scheduling events and
reminders, handling device settings, searching the Internet, navigating areas, finding information on entertainment, and is able to engage with
iOS-integrated apps. With the release of iOS 10 in 2016, Apple opened up limited third-party access to Siri, including third-party messaging
apps, as well as payments, ride-sharing, and Internet calling apps. With the release of iOS 11, Apple updated Siri's voice and added support
for follow-up questions, language translation, and additional third-party actions.
Microsoft Cortana
From https://en.wikipedia.org/wiki/Cortana_(virtual_assistant)
Cortana is a virtual assistant developed by Microsoft that uses the Bing search engine to perform tasks such as setting reminders
and answering questions for the user.
Cortana is currently available in English, Portuguese, French, German, Italian, Spanish, Chinese, and Japanese language editions, depending
on the software platform and region in which it is used.[8]
Microsoft began reducing the prevalence of Cortana and converting it from an assistant into different software integrations in 2019.[9] It was split
from the Windows 10 search bar in April 2019.[10] In January 2020, the Cortana mobile app was removed from certain markets,[11][12] and on
March 31, 2021, the Cortana mobile app was shut down globally.[13]
Microsoft has integrated Cortana into numerous products such as Microsoft Edge,[28] the browser bundled with Windows 10. Microsoft's
Cortana assistant is deeply integrated into its Edge browser. Cortana can find opening hours when on restaurant sites, show retail coupons for
websites, or show weather information in the address bar. At the Worldwide Partners Conference 2015 Microsoft demonstrated Cortana
integration with products such as GigJam.[29] Conversely, Microsoft announced in late April 2016 that it would block anything other than Bing
and Edge from being used to complete Cortana searches, again raising questions of anti-competitive practices by the company.[30]
In May 2017, Microsoft in collaboration with Harman Kardon announced INVOKE, a voice-activated speaker featuring Cortana. The premium
speaker has a cylindrical design and offers 360 degree sound, the ability to make and receive calls with Skype, and all of the other features
currently available with Cortana.[42]
Classical Machine Learning
Machine Learning Types
From https://towardsdatascience.com/coding-deep-learning-for-beginners-types-of-machine-learning-b9e651e1ed9d
Perceptron
From https://deepai.org/machine-learning-glossary-and-terms/perceptron
How does a Perceptron work?
The process begins by taking all the input values and multiplying them by their weights. Then, all of these
multiplied values are added together to create the weighted sum. The weighted sum is then applied to the
activation function, producing the perceptron's output. The activation function plays the integral role of
ensuring the output is mapped between required values such as (0,1) or (-1,1). It is important to note that
the weight of an input is indicative of the strength of a node. Similarly, an input's bias value gives the
ability to shift the activation function curve up or down.
Ensemble Machine Learning
From https://machinelearningmastery.com/tour-of-ensemble-learning-algorithms/
Ensemble learning is a general meta approach to machine learning that seeks better predictive
performance by combining the predictions from multiple models.
Although there are a seemingly unlimited number of ensembles that you can develop for your predictive
modeling problem, there are three methods that dominate the field of ensemble learning. So much so, that
rather than algorithms per se, each is a field of study that has spawned many more specialized methods.
The three main classes of ensemble learning methods are bagging, stacking, and boosting, and it is
important to both have a detailed understanding of each method and to consider them on your predictive
modeling project.
But, before that, you need a gentle introduction to these approaches and the key ideas behind each method
prior to layering on math and code.
In this tutorial, you will discover the three standard ensemble learning techniques for machine learning.
After completing this tutorial, you will know:
• Bagging involves fitting many decision trees on different samples of the same dataset and averaging
the predictions.
• Stacking involves fitting many different models types on the same data and using another model to
learn how to best combine the predictions.
• Boosting involves adding ensemble members sequentially that correct the predictions made by prior
models and outputs a weighted average of the predictions.
Bagging
From https://en.wikipedia.org/wiki/Bootstrap_aggregating
Bootstrap aggregating, also called bagging (from bootstrap aggregating), is a machine learning ensemble
meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in
statistical classification and regression. It also reduces variance and helps to avoid overfitting. Although it
is usually applied to decision tree methods, it can be used with any type of method. Bagging is a special
case of the model averaging approach.
Given a standard training set of size n, bagging generates m new training sets , each of size nʹ, by
sampling from D uniformly and with replacement. By sampling with replacement, some observations may
be repeated in each . If nʹ=n, then for large n the set is expected to have the fraction (1 - 1/e) (≈63.2%) of
the unique examples of D, the rest being duplicates.[1] This kind of sample is known as a bootstrap sample.
Sampling with replacement ensures each bootstrap is independent from its peers, as it does not depend on
previous chosen samples when sampling. Then, m models are fitted using the above m bootstrap samples
and combined by averaging the output (for regression) or voting (for classification).
Boosting
From https://www.ibm.com/cloud/learn/boosting and
https://en.wikipedia.org/wiki/Boosting_(machine_learning)
In machine learning, boosting is an ensemble meta-algorithm for primarily reducing bias, and also variance[1]
in supervised learning, and a family of machine learning algorithms that convert weak learners to strong ones
Bagging vs Boosting
Bagging and boosting are two main types of ensemble learning methods. As highlighted in this study (PDF, 242 KB)
(link resides outside IBM), the main difference between these learning methods is the way in which they are trained.
In bagging, weak learners are trained in parallel, but in boosting, they learn sequentially. This means that a series of
models are constructed and with each new model iteration, the weights of the misclassified data in the previous
model are increased. This redistribution of weights helps the algorithm identify the parameters that it needs to focus
on to improve its performance. AdaBoost, which stands for “adaptative boosting algorithm,” is one of the most
popular boosting algorithms as it was one of the first of its kind. Other types of boosting algorithms include
XGBoost, GradientBoost, and BrownBoost.
Another difference between bagging and boosting is in how they are used. For example, bagging methods are
typically used on weak learners that exhibit high variance and low bias, whereas boosting methods are leveraged
when low variance and high bias is observed. While bagging can be used to avoid overfitting, boosting methods
can be more prone to this (link resides outside IBM) although it really depends on the dataset. However, parameter
tuning can help avoid the issue.
As a result, bagging and boosting have different real-world applications as well. Bagging has been leveraged for
loan approval processes and statistical genomics while boosting has been used more within image recognition
apps and search engines.
Boosting is an ensemble learning method that combines a set of weak learners into a strong learner
to minimize training errors. In boosting, a random sample of data is selected, fitted with a model and
then trained sequentially—that is, each model tries to compensate for the weaknesses of its
predecessor. With each iteration, the weak rules from each individual classifier are combined to form
one, strong prediction rule.
Stacking
From https://www.geeksforgeeks.org/stacking-in-machine-learning/
Stacking is a way to ensemble multiple classifications or regression model. There are many ways to ensemble
models, the widely known models are Bagging or Boosting. Bagging allows multiple similar models with high
variance are averaged to decrease variance. Boosting builds multiple incremental models to decrease the bias, while
keeping variance small.
Stacking (sometimes called Stacked Generalization) is a different paradigm. The point of stacking is to explore a
space of different models for the same problem. The idea is that you can attack a learning problem with different
types of models which are capable to learn some part of the problem, but not the whole space of the problem. So, you
can build multiple different learners and you use them to build an intermediate prediction, one prediction for each
learned model. Then you add a new model which learns from the intermediate predictions the same target.
This final model is said to be stacked on the top of the others, hence the name. Thus, you might improve your overall
performance, and often you end up with a model which is better than any individual intermediate model. Notice
however, that it does not give you any guarantee, as is often the case with any machine learning technique.
Gradient Boosting
From https://en.wikipedia.org/wiki/Gradient_boosting
Gradient boosting is a machine learning technique used in regression and classification tasks, among others. It gives a
prediction model in the form of an ensemble of weak prediction models, which are typically decision trees.[1][2] When a
decision tree is the weak learner, the resulting algorithm is called gradient-boosted trees; it usually outperforms random
forest.[1][2][3] A gradient-boosted trees model is built in a stage-wise fashion as in other boosting methods, but it generalizes
the other methods by allowing optimization of an arbitrary differentiable loss function.
Introduction to XG Boost
From https://machinelearningmastery.com/gentle-introduction-xgboost-applied-machine-learning/
Terminology
・SoftMax https://en.wikipedia.org/wiki/Softmax_function
・SoftPlus https://en.wikipedia.org/wiki/Rectifier_(neural_networks)#Softplus
・Logit https://en.wikipedia.org/wiki/Logit
・Sigmoid https://en.wikipedia.org/wiki/Sigmoid_function
・Logistic Function https://en.wikipedia.org/wiki/Logistic_function
・Tanh https://brenocon.com/blog/2013/10/tanh-is-a-rescaled-logistic-sigmoid-function/
・ReLu https://en.wikipedia.org/wiki/Rectifier_(neural_networks)
・Maxpool Selects the maximum in subsets of convolutional neural nets layer
・
Relationships
SoftMax
SoftPlus
Sigmoid = Logistic
Tanh
Logit
Inverses
Derivative
SoftMax (z, 0)
First component
SoftMax (z, -z)
First component
SoftMax (z, -z)
Second component
-
x = log (2p/(1-p))
(0, x)
(-1, 1)
(0, 1)
(-∞, + ∞)
(0,1)
Log (SoftMax (z1, z2)
First component)/ (SoftMax (z1, z2)
Second component))
ReLu
(0, x)
Terminology (continued)
・Ηeteroscedastic https://en.wiktionary.org/wiki/scedasticity
・Maxout https://stats.stackexchange.com/questions/129698/what-is-maxout-in-neural-network/298705
・Cross-Entropy https://en.wikipedia.org/wiki/Cross_entropy -Ep(log q)
・Joint Entropy https://en.wikipedia.org/wiki/Joint_entropy - Ep(X,Y) (log (p(X,Y))
・KL Divergence https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
・H(P,Q) = H(P) + KL(P,Q) or Ep(log q) = -Ep(log p) + {Ep(log p) - Ep(log q)}
・Mutual Information https://en.wikipedia.org/wiki/Mutual_information KL (p(x,y), p(x)p(y))
・Ridge Regression and Lasso Regression
https://hackernoon.com/practical-machine-learning-ridge-regression-vs-lasso-a00326371ece
・Logistic Regression https://www.stat.cmu.edu/~cshalizi/uADA/12/lectures/ch12.pdf
・Dropout https://en.wikipedia.org/wiki/Dropout_(neural_networks)
・RMSProp and AdaGrad and AdaDelta and Adam
https://www.quora.com/What-are-differences-between-update-rules-like-AdaDelta-RMSProp-AdaGrad-and-AdaM
・Pooling https://www.quora.com/Is-pooling-indispensable-in-deep-learning
・Boltzmann Machine https://en.wikipedia.org/wiki/Boltzmann_machine
・Hyperparameters
・
Reinforcement Learning Book
From https://www.andrew.cmu.edu/course/10-703/textbook/BartoSutton.pdf
Acumos Shared Model Process Flow
From https://arxiv.org/ftp/arxiv/papers/1810/1810.07159.pdf
Distributed AI
From https://en.wikipedia.org/wiki/Distributed_artificial_intelligence
Distributed Artificial Intelligence (DAI) also called Decentralized Artificial Intelligence[1] is a subfield of artificial intelligence research dedicated to the
development of distributed solutions for problems. DAI is closely related to and a predecessor of the field of multi-agent systems.
The objectives of Distributed Artificial Intelligence are to solve the reasoning, planning, learning and perception problems of artificial intelligence,
especially if they require large data, by distributing the problem to autonomous processing nodes (agents). To reach the objective, DAI requires:
• A distributed system with robust and elastic computation on unreliable and failing resources that are loosely coupled
• Coordination of the actions and communication of the nodes
• Subsamples of large data sets and online machine learning
There are many reasons for wanting to distribute intelligence or cope with multi-agent systems. Mainstream problems in DAI research include the
following:
• Parallel problem solving: mainly deals with how classic artificial intelligence concepts can be modified, so that multiprocessor systems and clusters
of computers can be used to speed up calculation.
• Distributed problem solving (DPS): the concept of agent, autonomous entities that can communicate with each other, was developed to serve as an
abstraction for developing DPS systems. See below for further details.
• Multi-Agent Based Simulation (MABS): a branch of DAI that builds the foundation for simulations that need to analyze not only phenomena at
macro level but also at micro level, as it is in many social simulation scenarios.
Swarm Intelligence
From https://en.wikipedia.org/wiki/Swarm_intelligence
Swarm intelligence (SI) is the collective behavior of decentralized, self-organized systems, natural or artificial. The concept is employed in work
on artificial intelligence. The expression was introduced by Gerardo Beni and Jing Wang in 1989, in the context of cellular robotic systems.[1]
SI systems consist typically of a population of simple agents or boids interacting locally with one another and with their environment.[2] The
inspiration often comes from nature, especially biological systems. The agents follow very simple rules, and although there is no centralized control
structure dictating how individual agents should behave, local, and to a certain degree random, interactions between such agents lead to
the emergence of "intelligent" global behavior, unknown to the individual agents.[3] Examples of swarm intelligence in natural systems include ant
colonies, bee colonies, bird flocking, hawks hunting, animal herding, bacterial growth, fish schooling and microbial intelligence.
The application of swarm principles to robots is called swarm robotics while swarm intelligence refers to the more general set of algorithms. Swarm
prediction has been used in the context of forecasting problems. Similar approaches to those proposed for swarm robotics are considered
for genetically modified organisms in synthetic collective intelligence.[4]
• 1 Models of swarm behavior
◦ 1.1 Boids (Reynolds 1987)
◦ 1.2 Self-propelled particles (Vicsek et al. 1995)
• 2 Metaheuristics
◦ 2.1 Stochastic diffusion search (Bishop 1989)
◦ 2.2 Ant colony optimization (Dorigo 1992)
◦ 2.3 Particle swarm optimization (Kennedy, Eberhart & Shi 1995)
◦ 2.4 Artificial Swarm Intelligence (2015)
• 3 Applications
◦ 3.1 Ant-based routing
◦ 3.2 Crowd simulation
▪ 3.2.1 Instances
◦ 3.3 Human swarming
◦ 3.4 Swarm grammars
◦ 3.5 Swarmic art
IBM Watson
From https://en.wikipedia.org/wiki/IBM_Watson
IBM Watson is a question-answering computer system capable of answering questions posed in natural language,[2] developed in IBM's
DeepQA project by a research team led by principal investigator David Ferrucci.[3] Watson was named after IBM's founder and first CEO,
industrialist Thomas J. Watson.[4][5]
Software -Watson uses IBM's DeepQA software and the Apache UIMA (Unstructured Information Management Architecture) framework implementation. The system
was written in various languages, including Java, C++, and Prolog, and runs on the SUSE Linux Enterprise Server 11 operating system using the Apache Hadoop
framework to provide distributed computing.[12][13][14]
Hardware -The system is workload-optimized, integrating massively parallel POWER7 processors and built on IBM's DeepQA technology,[15] which it uses to generate
hypotheses, gather massive evidence, and analyze data.[2] Watson employs a cluster of ninety IBM Power 750 servers, each of which uses a 3.5 GHz POWER7 eight-
core processor, with four threads per core. In total, the system has 2,880 POWER7 processor threads and 16 terabytes of RAM.[15] According to John Rennie, Watson
can process 500 gigabytes (the equivalent of a million books) per second.[16] IBM master inventor and senior consultant Tony Pearson estimated Watson's hardware cost
at about three million dollars.[17] Its Linpack performance stands at 80 TeraFLOPs, which is about half as fast as the cut-off line for the Top 500 Supercomputers list.[18]
According to Rennie, all content was stored in Watson's RAM for the Jeopardy game because data stored on hard drives would be too slow to compete with human
Jeopardy champions.[16]
Data -The sources of information for Watson include encyclopedias, dictionaries, thesauri, newswire articles and literary works. Watson also used databases,
taxonomies and ontologies including DBPedia, WordNet and Yago.[19] The IBM team provided Watson with millions of documents, including dictionaries,
encyclopedias and other reference material, that it could use to build its knowledge.[20]
From https://www.researchgate.net/publication/282644173_Implementation_of_a_Natural_Language_Processing_Tool_for_Cyber-Physical_Systems/figures?lo=1
Deep Learning
Three Types of Deep Learning
From https://www.slideshare.net/TerryTaewoongUm/introduction-to-deep-learning-with-tensorflow
Convolutional Neural Networks
https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53
Convolutional Neural Nets Comparison (2016)
From https://medium.com/@culurciello/analysis-of-deep-neural-networks-dcf398e71aae
Reference: https://towardsdatascience.com/neural-network-architectures-156e5bad51ba
Recurrent Neural Networks
From https://medium.com/deep-math-machine-learning-ai/chapter-10-deepnlp-recurrent-neural-networks-with-math-c4a6846a50a2
From colah.github.io/posts/2015-08-Understanding-LSTMs/
Recurrent Neural Networks and Long Short Term Memory
Dynamical System View on Recurrent Neural Networks
From https://openreview.net/pdf?id=ryxepo0cFX
From https://arxiv.org/pdf/1412.3555v1.pdf
Gated Recurrent Units vs Long Short Term Memory
Deep Learning Models
From https://arxiv.org/pdf/1712.04301.pdf
Neural Net Models
From https://becominghuman.ai/cheat-sheets-for-ai-neural-networks-machine-learning-deep-learning-big-data-678c51b4b463
Neural Net Models (cont)
From https://becominghuman.ai/cheat-sheets-for-ai-neural-networks-machine-learning-deep-learning-big-data-678c51b4b463
TensorFlow
From https://en.wikipedia.org/wiki/TensorFlow
TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of
tasks but has a particular focus on training and inference of deep neural networks.[4][5]
TensorFlow was developed by the Google Brain team for internal Google use in research and production.[6][7][8] The initial version
was released under the Apache License 2.0 in 2015.[1][9] Google released the updated version of TensorFlow, named TensorFlow 2.0,
in September 2019.[10]
TensorFlow can be used in a wide variety of programming languages, most notably Python, as well as Javascript, C++, and Java.[11]
This flexibility lends itself to a range of applications in many different sectors.
Keras
From https://en.wikipedia.org/wiki/Keras
Keras is an open-source software library that provides a Python interface for artificial neural networks. Keras acts
as an interface for the TensorFlow library.
Up until version 2.3, Keras supported multiple backends, including TensorFlow, Microsoft Cognitive Toolkit,
Theano, and PlaidML.[1][2][3] As of version 2.4, only TensorFlow is supported. Designed to enable fast
experimentation with deep neural networks, it focuses on being user-friendly, modular, and extensible. It was
developed as part of the research effort of project ONEIROS (Open-ended Neuro-Electronic Intelligent Robot
Operating System),[4] and its primary author and maintainer is François Chollet, a Google engineer. Chollet is also
the author of the Xception deep neural network model.[5]
Comparison of Deep Learning Frameworks
From https://arxiv.org/pdf/1903.00102.pdf
Popularity of Deep Learning Frameworks
From https://medium.com/implodinggradients/tensorflow-or-keras-which-one-should-i-learn-5dd7fa3f9ca0
Acronyms in Deep Learning
• RBM - Restricted Boltzmann Machines
• MLP - Multi-layer Perceptron
• DBN - Deep Belief Network
• CNN - Convolution Neural Network
• RNN - Recurrent Neural Network
• SGD - Stochastic Gradient Descent
• XOR - Exclusive Or
• SVM - SupportVector Machine
• ReLu - Rectified Linear Unit
• MNIST - Modified National Institute of Standards and Technology
• RBF - Radial Basis Function
• HMM - Hidden Markovv Model
• MAP - Maximum A Postiori
• MLE - Maximum Likelihood Estimate
• Adam - Adaptive Moment Estimation
• LSTM - Long Short Term Memory
• GRU - Gated Recurrent Unit
Concerns for Deep Learning by Gary Marcus
From https://arxiv.org/ftp/arxiv/papers/1801/1801.00631.pdf
Deep Learning thus far:
• Is data hungry
• Is shallow and has limited capacity for transfer
• Has no natural way to deal with hierarchical structure
• Has struggled with open-ended inference
• Is not sufficiently transparent
• Has not been well integrated with prior knowledge
• Cannot inherently distinguish causation from correlation
• Presumes a largely stable world, in ways that may be problematic
• Works well as an approximation, but answers often can’t be fully trusted
• Is difficult to engineer with
Watson Architecture
From https://seekingalpha.com/article/4087604-much-artificial-intelligence-ibm-watson
How transferable are features in deep neural networks?
From http://cs231n.github.io/transfer-learning/
Transfer Learning
From https://www.mathematik.hu-berlin.de/~perkowsk/files/thesis.pdf
More Transfer Learning
From https://towardsdatascience.com/transfer-learning-from-pre-trained-models-f2393f124751
More Transfer Learning
From http://ruder.io/transfer-learning/
Bayesian Deep Learning
From https://alexgkendall.com/computer_vision/bayesian_deep_learning_for_safe_ai/
Bayesian Learning vis Stochastic Gradient Langevin Dynamics
From https://tinyurl.com/22xayz76
In this paper we propose a new framework for learning from large scale
datasets based on iterative learning from small minibatches.By adding the
right amount of noise to a standard stochastic gradient optimization
algorithm we show that the iterates will converge to samples from the true
posterior distribution as we anneal the stepsize. This seamless transition
between optimization and Bayesian posterior sampling provides an in-
built protection against overfitting. We also propose a practical method for
Monte Carlo estimates of posterior statistics which monitors a “sampling
threshold” and collects samples after it has been surpassed. We apply the
method to three models: a mixture of Gaussians, logistic regression and
ICA with natural gradients
Our method combines Robbins-Monro type algorithms which stochastically
optimize a likelihood, with Langevin dynamics which injects noise into the
parameter updates in such a waythat the trajectory of the parameters will
converge to the full posterior distribution rather than just themaximum a
posteriori mode. The resulting algorithm starts off being similar to stochastic
optimization, then automatically transitions to one that simulates samples from
the posterior using Langevin dynamics.
DeterministicVariational Inference for Robust Bayesian NNs
From https://openreview.net/pdf?id=B1l08oAct7
Bayesian Deep Learning Survey
From https://arxiv.org/pdf/1604.01662.pdf
Conclusion and Future Research
In this survey, we identified a current trend of merging probabilistic graphical models and neural networks (deep
learning) and reviewed recent work on Bayesian deep learning, which strives to combine the merits of PGM and NN by
organically integrating them in a single principled probabilistic framework. To learn parameters in BDL, several
algorithms have been proposed, ranging from block coordinate descent, Bayesian conditional density filtering, and
stochastic gradient thermostats to stochastic gradient variational Bayes. Bayesian deep learning gains its popularity
both from the success of PGM and from the recent promising advances on deep learning. Since many real-world tasks
involve both perception and inference, BDL is a natural choice to harness the perception ability from NN and the (causal
and logical) inference ability from PGM. Although current applications of BDL focus on recommender systems, topic
models, and stochastic optimal control, in the future, we can expect an increasing number of other applications like link
prediction, community detection, active learning, Bayesian reinforcement learning, and many other complex tasks that
need interaction between perception and causal inference. Besides, with the advances of efficient Bayesian neural
networks (BNN), BDL with BNN as an important component is expected to be more and more scalable
Ensemble Methods for Deep Learning
From https://machinelearningmastery.com/ensemble-methods-for-deep-learning-neural-networks/
Comparing Loss Functions
From Neural Networks and Deep Learning Book
Seed Reinforcement Learning from Google
From https://ai.googleblog.com/2020/03/massively-scaling-reinforcement.html
The field of reinforcement learning (RL) has recently seen impressive results across a variety oftasks. This has in
part been fueled by the introduction of deep learning in RL and the introduction of accelerators such as GPUs. In
the very recent history, focus on massive scale has been key to solve a number of complicated games such as
AlphaGo (Silver et al., 2016), Dota (OpenAI, 2018)and StarCraft 2 (Vinyals et al., 2017).
The sheer amount of environment data needed to solve tasks trivial to humans, makes distributed machine
learning unavoidable for fast experiment turnaround time. RL is inherently comprised of heterogeneous tasks:
running environments, model inference, model training, replay buffer, etc. and current state-of-the-art distributed
algorithms do not efficiently use compute resources for the tasks.The amount of data and inefficient use of
resources makes experiments unreasonably expensive. The two main challenges addressed in this paper are
scaling of reinforcement learning and optimizing the use of modern accelerators, CPUs and other resources.
We introduce SEED (Scalable, Efficient, Deep-RL), a modern RL agent that scales well, is flexible and efficiently
utilizes available resources. It is a distributed agent where model inference is done centrally combined with fast
streaming RPCs to reduce the overhead of inference calls. We show that with simple methods, one can achieve
state-of-the-art results faster on a number of tasks. For optimal performance, we use TPUs (cloud.google.com/
tpu/) and TensorFlow 2 (Abadi et al., 2015)to simplify the implementation. The cost of running SEED is analyzed
against IMPALA (Espeholtet al., 2018) which is a commonly used state-of-the-art distributed RL algorithm (Veeriah
et al.(2019); Li et al. (2019); Deverett et al. (2019); Omidshafiei et al. (2019); Vezhnevets et al. (2019);Hansen et
al. (2019); Schaarschmidt et al.; Tirumala et al. (2019), ...). We show cost reductions of up to 80% while being
significantly faster. When scaling SEED to many accelerators, it can train on millions of frames per second. Finally,
the implementation is open-sourced together with examples of running it at scale on Google Cloud (see Appendix
A.4 for details) making it easy to reproduce results and try novel ideas
Designing Neural Nets through Neuroevolution
From tinyurl.com/mykhb52y
Much of recent machine learning has focused on deep learning, in which neural network weights are trained through
variantsof stochastic gradient descent. An alternative approach comes from the field of neuroevolution, which harnesses
evolutionary algorithms to optimize neural networks, inspired by the fact that natural brains themselves are the products of
an evolutionary process. Neuroevolution enables important capabilities that are typically unavailable to gradient-based
approaches, including learning neural network building blocks (for example activation functions), hyperparameters,
architectures and even the algorithms for learning themselves. Neuroevolution also differs from deep learning (and deep
reinforcement learning) by maintaining a population of solutions during search, enabling extreme exploration and massive
parallelization. Finally, because neuroevolution research has (until recently) developed largely in isolation from gradient-
based neural network research, ithas developed many unique and effective techniques that should be effective in other
machine learning areas too.
This Review looks at several key aspects of modern neuroevolution, including large-scale computing, the benefits of novelty
and diversity, the power of indirect encoding, and the field’s contributions to meta-learning and architecture search. Our hope
is to inspire renewed interest in the field as it meets the potential of the increasing computation available today, to highlight
how many of its ideas can provide an exciting resource for inspiration and hybridization to the deep learning, deep
reinforcement learning and machine learning communities, and to explain how neuroevolution could prove to be a critical
tool in the long-term pursuit of artificial general intelligence
Illuminating Search Spaces by Mapping Elites
From https://arxiv.org/pdf/1504.04909.pdf
From https://blog.openai.com/reinforcement-learning-with-prediction-based-rewards/#implementationjump
Reinforcement Learning with Prediction-based Rewards
From https://arxiv.org/pdf/1412.3555v1.pdf
A transformer is a deep learning model that adopts the mechanism of self-attention, differentially weighting the significance of each part of the input data. It is used primarily
in the fields of natural language processing (NLP)[1] and computer vision (CV).[2]
Like recurrent neural networks (RNNs), transformers are designed to process sequential input data, such as natural language, with applications towards tasks such as translation
and text summarization. However, unlike RNNs, transformers process the entire input all at once. The attention mechanism provides context for any position in the input
sequence. For example, if the input data is a natural language sentence, the transformer does not have to process one word at a time. This allows for more parallelization than
RNNs and therefore reduces training times.[1]
Transformers were introduced in 2017 by a team at Google Brain[1] and are increasingly the model of choice for NLP problems,[3] replacing RNN models such as long short-
term memory (LSTM). The additional training parallelization allows training on larger datasets. This led to the development of pretrained systems such as BERT (Bidirectional
Encoder Representations from Transformers) and GPT (Generative Pre-trained Transformer), which were trained with large language datasets, such as the Wikipedia Corpus
and Common Crawl, and can be fine-tuned for specific tasks.[4][5]
Attention mechanisms let a model draw from the state at any preceding point along the sequence. The attention layer can access all previous states and weight them according to
a learned measure of relevance, providing relevant information about far-away tokens. When added to RNNs, attention mechanisms increase performance. The development of
the Transformer architecture revealed that attention mechanisms were powerful in themselves and that sequential recurrent processing of data was not necessary to achieve the
quality gains of RNNs with attention. Transformers use an attention mechanism without an RNN, processing all tokens at the same time and calculating attention weights
between them in successive layers. Since the attention mechanism only uses information about other tokens from lower layers, it can be computed for all tokens in parallel,
which leads to improved training speed.
Like earlier seq2seq models, the original Transformer model used an encoder–decoder architecture. The encoder consists of encoding layers that process the input iteratively
one layer after another, while the decoder consists of decoding layers that do the same thing to the encoder's output. The function of each encoder layer is to generate encodings
that contain information about which parts of the inputs are relevant to each other. It passes its encodings to the next encoder layer as inputs. Each decoder layer does the
opposite, taking all the encodings and using their incorporated contextual information to generate an output sequence.[6] To achieve this, each encoder and decoder layer makes
use of an attention mechanism. For each input, attention weighs the relevance of every other input and draws from them to produce the output.[7] Each decoder layer has an
additional attention mechanism that draws information from the outputs of previous decoders, before the decoder layer draws information from the encodings. Both the encoder
and decoder layers have a feed-forward neural network for additional processing of the outputs and contain residual connections and layer normalization steps.
Transformers
From https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)
Transformers
Before transformers, most state-of-the-art NLP systems relied on gated RNNs, such as LSTMs and gated recurrent units (GRUs), with added
attention mechanisms. Transformers also make use of attention mechanisms but, unlike RNNs, do not have a recurrent structure. This means that
provided with enough training data, attention mechanisms alone can match the performance of RNNs with attention.[1]
Sequential processing
Gated RNNs process tokens sequentially, maintaining a state vector that contains a representation of the data seen prior to the current token. To
process the th token, the model combines the state representing the sentence up to token with the information of the new token to create a new
state, representing the sentence up to token . Theoretically, the information from one token can propagate arbitrarily far down the sequence, if at
every point the state continues to encode contextual information about the token. In practice this mechanism is flawed: the vanishing gradient
problem leaves the model's state at the end of a long sentence without precise, extractable information about preceding tokens. The dependency of
token computations on results of previous token computations also makes it hard to parallelize computation on modern deep learning hardware.
This can make the training of RNNs inefficient.
Self-Attention
These problems were addressed by attention mechanisms. Attention mechanisms let a model draw from the state at any preceding point along the
sequence. The attention layer can access all previous states and weight them according to a learned measure of relevance, providing relevant
information about far-away tokens.
A clear example of the value of attention is in language translation, where context is essential to assign the meaning of a word in a sentence. In an
English-to-French translation system, the first word of the French output most probably depends heavily on the first few words of the English input.
However, in a classic LSTM model, in order to produce the first word of the French output, the model is given only the state vector after processing
the last English word. Theoretically, this vector can encode information about the whole English sentence, giving the model all necessary
knowledge. In practice, this information is often poorly preserved by the LSTM. An attention mechanism can be added to address this problem: the
decoder is given access to the state vectors of every English input word, not just the last, and can learn attention weights that dictate how much to
attend to each English input state vector.
When added to RNNs, attention mechanisms increase performance. The development of the Transformer architecture revealed that attention
mechanisms were powerful in themselves and that sequential recurrent processing of data was not necessary to achieve the quality gains of RNNs
with attention. Transformers use an attention mechanism without an RNN, processing all tokens at the same time and calculating attention weights
between them in successive layers. Since the attention mechanism only uses information about other tokens from lower layers, it can be computed
for all tokens in parallel, which leads to improved training speed.
From https://en.wikipedia.org/wiki/GPT-3
GPT-3
Generative Pre-trained Transformer 3 (GPT-3; stylized GPT·3) is an autoregressive language model that uses deep learning to
produce human-like text.
The architecture is a standard transformer network (with a few engineering tweaks) with the unprecedented size of 2048-token-long
context and 175 billion parameters (requiring 800 GB of storage). The training method is "generative pretraining", meaning that it is
trained to predict what the next token is. The model demonstrated strong few-shot learning on many text-based tasks.
It is the third-generation language prediction model in the GPT-n series (and the successor to GPT-2) created by OpenAI, a San
Francisco-based artificial intelligence research laboratory.[2] GPT-3's full version has a capacity of 175 billion machine learning
parameters. GPT-3, which was introduced in May 2020, and was in beta testing as of July 2020,[3] is part of a trend in natural language
processing (NLP) systems of pre-trained language representations.[1]
The quality of the text generated by GPT-3 is so high that it can be difficult to determine whether or not it was written by a human,
which has both benefits and risks.[4] Thirty-one OpenAI researchers and engineers presented the original May 28, 2020 paper
introducing GPT-3. In their paper, they warned of GPT-3's potential dangers and called for research to mitigate risk.[1]:34 David
Chalmers, an Australian philosopher, described GPT-3 as "one of the most interesting and important AI systems ever produced."[5]
Microsoft announced on September 22, 2020, that it had licensed "exclusive" use of GPT-3; others can still use the public API to receive
output, but only Microsoft has access to GPT-3's underlying model.[6]
An April 2022 review in The New York Times described GPT-3's capabilities as being able to write original prose with fluency
equivalent to that of a human.[7]
OpenAI
From https://openai.com/
Recent Research
Efficient Training of Language Models to Fill in the Middle
Hierarchical Text-Conditional Image Generation with CLIP Latents
Formal Mathematics Statement Curriculum Learning
Training language models to follow instructions with human feedback
Text and Code Embeddings by Contrastive Pre-Training
WebGPT: Browser-assisted question-answering with human feedback
Training Verifiers to Solve Math Word Problems
Recursively Summarizing Books with Human Feedback
Evaluating Large Language Models Trained on Code
Process for Adapting Language Models to
Society (PALMS) with Values-Targeted Datasets
Multimodal Neurons in Artificial Neural Networks
Learning Transferable Visual Models From Natural Language Supervision
Zero-Shot Text-to-Image Generation
Understanding the Capabilities, Limitations,
and Societal Impact of Large Language Models
OpenAI is an AI research and deployment company. Our mission is to ensure that artificial general intelligence benefits all of humanity.
From https://deepbrainai.io/?www.deepbrainai.io=
Deep Brain
Reservoir Computing
From https://martinuzzifrancesco.github.io/posts/a-brief-introduction-to-reservoir-computing/
Reservoir Computing is an umbrella term used to identify a general framework of computation derived from Recurrent Neural Networks (RNN),
indipendently developed by Jaeger [1] and Maass et al. [2]. These papers introduced the concepts of Echo State Networks (ESN) and Liquid State Machines
(LSM) respectively. Further improvements over these two models constitute what is now called the field of Reservoir Computing. The main idea lies in
leveraging a fixed non-linear system, of higher dimension than the input, onto which to input signal is mapped. After this mapping is only necessary to use a
simple readout layer to harvest the state of the reservoir and to train it to the desired output. In principle, given a complex enough system, this architecture
should be capable of any computation [3]. The intuition was born from the fact that in training RNNs most of the times the weights showing most change were
the ones in the last layer [4]. In the next section we will also see that ESNs actually use a fixed random RNN as the reservoir. Given the static nature of this
implementation usually ESNs can yield faster results and in some cases even better, in particular when dealing with chaotic time series predictions [5].
But not every complex system is suited to be a good reservoir. A good reservoir is one that is able to separate inputs; different external inputs should drive the
system to different regions of the configuration space [3]. This is called the separability condition. Furthermore an important property for the reservoirs of
ESNs is the Echo State property which states that inputs to the reservoir echo in the system forever, or util they dissipate. A more formal definition of this
property can be found in [6].
Reservoir computing is a best-in-class machine learning algorithm for processing information generated by dynamical systems using observed time-series
data. Importantly, it requires very small training data sets, uses linear optimization, and thus requires minimal computing resources. However, the
algorithm uses randomly sampled matrices to define the underlying recurrent neural network and has a multitude of metaparameters that must be
optimized. Recent results demonstrate the equivalence of reservoir computing to nonlinear vector autoregression, which requires no random matrices,
fewer metaparameters, and provides interpretable results. Here, we demonstrate that nonlinear vector autoregression excels at reservoir computing
benchmark tasks and requires even shorter training data sets and training time, heralding the next generation of reservoir computing.
A dynamical system evolves in time, with examples including the Earth’s weather system and human-built devices such as unmanned aerial vehicles. One practical
goal is to develop models for forecasting their behavior. Recent machine learning (ML) approaches can generate a model using only observed data, but many of these
algorithms tend to be data hungry, requiring long observation times and substantial computational resources.
Reservoir computing1,2 is an ML paradigm that is especially well-suited for learning dynamical systems. Even when systems display chaotic3 or complex
spatiotemporal behaviors4, which are considered the hardest-of-the-hard problems, an optimized reservoir computer (RC) can handle them with ease.
From https://www.nature.com/articles/s41467-021-25801-2
Reservoir Computing Trends
From https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.709.514&rep=rep1&type=pdf
Brain Connectivity meets Reservoir Computing
From https://www.biorxiv.org/content/10.1101/2021.01.22.427750v1
The connectivity of Artificial Neural Networks (ANNs) is different from the one observed in Biological Neural Networks (BNNs).
Can the wiring of actual brains help improve ANNs architectures? Can we learn from ANNs about what network features support
computation in the brain when solving a task?
ANNs’ architectures are carefully engineered and have crucial importance in many recent performance improvements. On the
other hand, BNNs’ exhibit complex emergent connectivity patterns. At the individual level, BNNs connectivity results from brain
development and plasticity processes, while at the species level, adaptive reconfigurations during evolution also play a major role
shaping connectivity.
Ubiquitous features of brain connectivity have been identified in recent years, but their role in the brain’s ability to perform
concrete computations remains poorly understood. Computational neuroscience studies reveal the influence of specific brain
connectivity features only on abstract dynamical properties, although the implications of real brain networks topologies on
machine learning or cognitive tasks have been barely explored.
Here we present a cross-species study with a hybrid approach integrating real brain connectomes and Bio-Echo State Networks,
which we use to solve concrete memory tasks, allowing us to probe the potential computational implications of real brain
connectivity patterns on task solving.
We find results consistent across species and tasks, showing that biologically inspired networks perform as well as classical echo
state networks, provided a minimum level of randomness and diversity of connections is allowed. We also present a framework,
bio2art, to map and scale up real connectomes that can be integrated into recurrent ANNs. This approach also allows us to show
the crucial importance of the diversity of interareal connectivity patterns, stressing the importance of stochastic processes
determining neural networks connectivity in general.
Deep Learning Models
Sharing Models
From https://arxiv.org/ftp/arxiv/papers/1810/1810.07159.pdf
Summary of Deep Learning Models: Survey
From https://arxiv.org/pdf/1712.04301.pdf
Deep Learning Acronyms
From https://arxiv.org/pdf/1712.04301.pdf
Deep Learning Hardware
From https://medium.com/iotforall/using-deep-learning-processors-for-intelligent-iot-devices-1a7ed9d2226d
Deep Learning MIT
From https://deeplearning.mit.edu/
ONNX
From http://onnx.ai/
GitHub ONNX Models
From https://github.com/onnx/models
HPC vs Big Data Ecosystems
From https://www.hpcwire.com/2018/08/31/the-convergence-of-big-data-and-extreme-scale-hpc/
HPC and ML
From http://dsc.soic.indiana.edu/publications/Learning_Everywhere_Summary.pdf
HPCforML: Using HPC to execute and enhance ML performance, or using HPC simulations to train ML algorithms
(theory
guided machine learning), which are then used to understand experimental data or simulations.
•MLforHPC: Using ML to enhance HPC applications and systems
•This categorization is related to Jeff Dean’s ”Machine Learning for Systems and Systems for Machine Learning” [6] and
Matsuoka’s convergence of AI and HPC [7].We further subdivide HPCforML as
•• HPCrunsML: Using HPC to execute ML with high performance • SimulationTrainedML: Using HPC simulations to train
ML algorithms, which are then used to understand experimental data or simulations. We also subdivide MLforHPC as •
MLautotuning: Using ML to configure (autotune) ML or HPC simulations. Already, autotuning with systems like ATLAS
is hugely successful and gives an initial view of MLautotuning. As well as choosing block sizes to improve cache use and
vectorization, MLautotuning can also be used for simulation mesh sizes [8] and in big data problems for configuring
databases and complex systems like Hadoop and Spark [9], [10]
•. • MLafterHPC: ML analyzing results of HPC as in trajectory analysis and structure identification in biomolecular
simulations • MLaroundHPC: Using ML to learn from simulations and produce learned surrogates for the simulations. The
same ML wrapper can also learn configurations as well as results. This differs from SimulationTrainedML as there
typically a learnt network is used to redirect observation whereas in MLaroundHPC we are using the ML to improve the
HPC performance
•. • MLControl: Using simulations (with HPC) in contro of experiments and in objective driven computational campaigns
[11]. Here the simulation surrogates are very valuable to allow real-time predictions.
Designing Neural Nets through Neuroevolution
From www.evolvingai.org/stanley-clune-lehman-2019-designing-neural-networks
Go Explore Algorithm
From http://www.evolvingai.org/files/1901.10995.pdf
Deep Density Destructors
From https://www.cs.cmu.edu/~dinouye/papers/inouye2018-deep-density-destructors-icml2018.pdf
We propose a unified framework for deep density models by formally defining density
destructors. A density destructor is an invertible function that transforms a given density to
the uniform density—essentially destroying any structure in the original density. This
destructive transformation generalizes Gaussianization via ICA and more recent
autoregressive models such as MAF and Real NVP. Informally, this transformation can be
seen as a generalized whitening procedure or a multivariate generalization of the univariate
CDF function. Unlike Gaussianization, our destructive transformation has the elegant
property that the density function is equal to the absolute value of the Jacobian determinant.
Thus, each layer of a deep density can be seen as a shallow density—uncovering a
fundamental connection between shallow and deep densities. In addition, our framework
provides a common interface for all previous methods enabling them to be systematically
combined, evaluated and improved. Leveraging the connection to shallow densities, we also
propose a novel tree destructor based on tree densities and an image-specific destructor based
on pixel locality. We illustrate our framework on a 2D dataset, MNIST, and CIFAR-10.
Predictive Perception
From https://www.quantamagazine.org/to-make-sense-of-the-present-brains-may-predict-the-future-20180710/
Sci-Kit Learning Decision Tree
From https://becominghuman.ai/cheat-sheets-for-ai-neural-networks-machine-learning-deep-learning-big-data-678c51b4b463
Imitation Learning
From https://drive.google.com/file/d/12QdNmMll-bGlSWnm8pmD_TawuRN7xagX/view https://drive.google.com/file/d/12QdNmMll-bGlSWnm8pmD_TawuRN7xagX/view
Imitation Learning
From https://drive.google.com/file/d/12QdNmMll-bGlSWnm8pmD_TawuRN7xagX/view https://drive.google.com/file/d/12QdNmMll-bGlSWnm8pmD_TawuRN7xagX/view
Generative Adversarial Networks (GANs)
From https://skymind.ai/wiki/generative-adversarial-network-gan
Deep Generative Network-based Activation Management (DGN-AMs)
From https://arxiv.org/pdf/1605.09304.pdf
Paired Open Ended Trailblazer (POET)
From https://drive.google.com/file/d/12QdNmMll-bGlSWnm8pmD_TawuRN7xagX/view https://drive.google.com/file/d/12QdNmMll-bGlSWnm8pmD_TawuRN7xagX/view
One Model to Learn Them All
From https://arxiv.org/pdf/1706.05137.pdf
Self-modifying NNs With Differentiable Neuromodulated Plasticity
From https://arxiv.org/pdf/1706.05137.pdf
Stein Variational Gradient Descent
From https://arxiv.org/pdf/1706.05137.pdf
Linux Foundation Deep Learning (LFDL) Projects
From https://lfdl.io/projects/
Linux Foundation Deep Learning (LFDL) Projects
From https://lfdl.io/projects/
Deep Learning Hardware
Graphical Processing Units (GPU)
From https://www.intel.com/content/www/us/en/products/docs/processors/what-is-a-gpu.html
Graphics processing technology has evolved to deliver unique benefits in the world of computing. The latest
graphics processing units (GPUs) unlock new possibilities in gaming, content creation, machine learning, and more.
What Does a GPU Do?
The graphics processing unit, or GPU, has become one of the most important types of computing technology, both for personal
and business computing. Designed for parallel processing, the GPU is used in a wide range of applications, including graphics and
video rendering. Although they’re best known for their capabilities in gaming, GPUs are becoming more popular for use in
creative production and artificial intelligence (AI).
GPUs were originally designed to accelerate the rendering of 3D graphics. Over time, they became more flexible and
programmable, enhancing their capabilities. This allowed graphics programmers to create more interesting visual effects and
realistic scenes with advanced lighting and shadowing techniques. Other developers also began to tap the power of GPUs to
dramatically accelerate additional workloads in high performance computing (HPC), deep learning, and more.
GPU and CPU: Working Together
The GPU evolved as a complement to its close cousin, the CPU (central processing unit). While CPUs have continued to deliver performance
increases through architectural innovations, faster clock speeds, and the addition of cores, GPUs are specifically designed to accelerate
computer graphics workloads. When shopping for a system, it can be helpful to know the role of the CPU vs. GPU so you can make the most
of both.
GPU vs. Graphics Card: What’s the Difference?
While the terms GPU and graphics card (or video card) are often used interchangeably, there is a subtle distinction between these terms.
Much like a motherboard contains a CPU, a graphics card refers to an add-in board that incorporates the GPU. This board also includes the
raft of components required to both allow the GPU to function and connect to the rest of the system.
GPUs come in two basic types: integrated and discrete. An integrated GPU does not come on its own separate card at all and is instead
embedded alongside the CPU. A discrete GPU is a distinct chip that is mounted on its own circuit board and is typically attached to a PCI
Express slot.
NVidia Graphical Processing Units (GPU)
From https://en.wikipedia.org/wiki/Nvidia
Nvidia Corporation[note 1][note 2] (/ɛnˈvɪdiə/ en-VID-ee-ə) is an American multinational technology company incorporated in
Delaware and based in Santa Clara, California.[2] It is a software and fabless company which designs graphics processing units
(GPUs), application programming interface (APIs) for data science and high-performance computing as well as system on a chip
units (SoCs) for the mobile computing and automotive market. Nvidia is a global leader in artificial intelligence hardware and
software.[3][4] Its professional line of GPUs are used in workstations for applications in such fields as architecture, engineering and
construction, media and entertainment, automotive, scientific research, and manufacturing design.[5]
In addition to GPU manufacturing, Nvidia provides an API called CUDA that allows the creation of massively parallel programs
which utilize GPUs.[6][7] They are deployed in supercomputing sites around the world.[8][9] More recently, it has moved into the
mobile computing market, where it produces Tegra mobile processors for smartphones and tablets as well as vehicle navigation
and entertainment systems.[10][11][12] In addition to AMD, its competitors include Intel,[13] Qualcomm[14] and AI-accelerator
companies such as Graphcore.
Nvidia's GPUs are used for edge to cloud computing, and supercomputers (Nvidia provides the accelerators, i.e. the GPUs for
many of them, including a previous top fastest, while it has been replaced, and current fastest, and most-power efficient, are
powered by AMD GPUs and CPUs) and Nvidia expanded its presence in the gaming industry with its handheld game consoles
Shield Portable, Shield Tablet, and Shield Android TV and its cloud gaming service GeForce Now.
Nvidia announced plans on September 13, 2020, to acquire Arm from SoftBank, pending regulatory approval, for a value of
US$40 billion in stock and cash, which would be the largest semiconductor acquisition to date. SoftBank Group will acquire
slightly less than a 10% stake in Nvidia, and Arm would maintain its headquarters in Cambridge.[15][16][17][18]
Tesla unveils new Dojo Supercomouter
From https://electrek.co/2022/10/01/tesla-dojo-supercomputer-tripped-power-grid/
Tesla has unveiled its latest version of its Dojo supercomputer and it’s apparently so powerful that it tripped the power grid in Palo
Alto. Dojo is Tesla’s own custom supercomputer platform built from the ground up for AI machine learning and more specifically
for video training using the video data coming from its fleet of vehicles.
The automaker already has a large NVIDIA GPU-based supercomputer that is one of the most powerful in the world, but the new
Dojo custom-built computer is using chips and an entire infrastructure designed by Tesla.The custom-built supercomputer is
expected to elevate Tesla’s capacity to train neural nets using video data, which is critical to its computer vision technology
powering its self-driving effort.
Last year, at Tesla’s AI Day, the company unveiled its Dojo supercomputer, but the company was still ramping up its effort at the
time. It only had its first chip and training tiles, and it was still working on building a full Dojo cabinet and cluster or
“Exapod.”Now Tesla has unveiled the progress made with the Dojo program over the last year during its AI Day 2022 last night.
Why does Tesla need to Dojo supercomputer?
It’s a fair question. Why is an automaker developing the world’s most powerful supercomputer? Well, Tesla would tell you that it’s
not just an automaker, but a technology company developing products to accelerate the transition to a sustainable economy.Musk
said it makes sense to offer a Dojo as a service, perhaps to take on his buddy Jeff Bezos’s Amazon AWS and calling it a “service
that you can use that’s available online where you can train your models way faster and for less money.”
But more specifically, Tesla needs Dojo to auto-label train videos from its fleet and train its neural nets to build its self-driving
system.Tesla realized that its approach to developing a self-driving system using neural nets training on millions of videos coming
from its customer fleet requires a lot of computing power. and it decided to develop its own supercomputer to deliver that power.
That’s the short-term goal, but Tesla will have plenty of use for the supercomputer going forward as it has big ambitions to
develop other artificial intelligence programs.
Linux Foundation Deep Learning (LFDL) Projects
From https://lfdl.io/projects/
Reinforcement Learning
Introduction to Deep Reinforcement Learning
From https://skymind.ai/wiki/deep-reinforcement-learning
Many RL references at this site
Model-based Reinforcement Learning
From http://rail.eecs.berkeley.edu/deeprlcourse-fa17/f17docs/lecture_9_model_based_rl.pdfhttp://rail.eecs.berkeley.edu/deeprlcourse-fa17/f17docs/lecture_9_model_based_rl.pdf
Hierarchical Deep Reinforcement Learning
From https://papers.nips.cc/paper/6233-hierarchical-deep-reinforcement-learning-integrating-temporal-abstraction-and-intrinsic-motivation.pdf
Meta Learning Shared Hierarchy
From https://skymind.ai/wiki/deep-reinforcement-learning
Learning with Hierarchical Deep Models
From https://www.cs.toronto.edu/~rsalakhu/papers/HD_PAMI.pdf
We introduce HD (or “Hierarchical-Deep”) models, a new compositional learning architecture
that integrates deep learning models with structured hierarchical Bayesian (HB) models.
Specifically, we show how we can learn a hierarchical Dirichlet process (HDP) prior over the
activities of the top-level features in a deep Boltzmann machine (DBM). This compound HDP-
DBM model learns to learn novel concepts from very few training example by learning low-
level generic features, high-level features that capture correlations among low-level features,
and a category hierarchy for sharing priors over the high-level features that are typical of
different kinds of concepts. We present efficient learning and inference algorithms for the
HDP-DBM model and show that it is able to learn new concepts from very few examples on
CIFAR-100 object recognition, handwritten character recognition, and human motion capture
datasets.
Transfer Learning
From http://cs231n.github.io/transfer-learning/
Convolutional Deep Belief Networks for Scalable
Unsupervised Learning of Hierarchical Representations
From https://web.eecs.umich.edu/~honglak/icml09-ConvolutionalDeepBeliefNetworks.pdf
There has been much interest in unsupervised learning of hierarchical generative models such as deep belief networks. Scaling such models to
full-sized, high-dimensional images remains a difficult problem. To address this problem, we present the convolutional deep belief network, a
hierarchical generative model which scales to realistic image sizes. This model is translation-invariant and supports efficient bottom-up and top-
down probabilistic inference. Key to our approach is probabilistic max-pooling, a novel technique which shrinks the representations of higher
layers in a probabilistically sound way. Our experiments show that the algorithm learns useful high-level visual features, such as object parts, from
unlabeled images of objects and natural scenes. We demonstrate excellent performance on several visual recognition tasks and show that our
model can perform hierarchical (bottom-up and top-down) inference over full-sized images.
The visual world can be described at many levels: pixel intensities, edges, object parts, objects, and beyond. The prospect of learning hierarchical
models which simultaneously represent multiple levels has recently generated much interest. Ideally, such “deep” representations would learn
hierarchies of feature detectors, and further be able to combine top-down and bottomup processing of an image. For instance, lower layers could
support object detection by spotting low-level features indicative of object parts. Conversely, information about objects in the higher layers could
resolve lower-level ambiguities in the image or infer the locations of hidden object parts. Deep architectures consist of feature detector units
arranged in layers. Lower layers detect simple features and feed into higher layers, which in turn detect more complex features. There have been
several approaches to learning deep networks (LeCun et al., 1989; Bengio et al., 2006; Ranzato et al., 2006; Hinton et al., 2006). In particular, the
deep belief network (DBN) (Hinton et al., 2006) is a multilayer generative model where each layer encodes statistical dependencies among the
units in the layer below it; it is trained to (approximately) maximize the likelihood of its training data. DBNs have been successfully used to learn
high-level structure in a wide variety of domains, including handwritten digits (Hinton et al., 2006) and human motion capture data (Taylor et al.,
2007). We build upon the DBN in this paper because we are interested in learning a generative model of images which can be trained in a purely
unsupervised manner
This paper presents the convolutional deep belief network, a hierarchical generative model that scales to full-sized images. Another key to our
approach is probabilistic max-pooling, a novel technique that allows higher-layer units to cover larger areas of the input in a probabilistically
sound way. To the best of our knowledge, ours is the first translation invariant hierarchical generative model which supports both top-down and
bottom-up probabilistic inference and scales to realistic image sizes. The first, second, and third layers of our network learn edge detectors, object
parts, and objects respectively. We show that these representations achieve excellent performance on several visual recognition tasks and allow
“hidden” object parts to be inferred from high-level object information.
Learning with Hierarchical-Deep Models
From https://www.cs.toronto.edu/~rsalakhu/papers/HD_PAMI.pdf
We introduce HD (or “Hierarchical-Deep”) models, a new compositional learning architecture that integrates deep learning models with structured
hierarchical Bayesian (HB) models. Specifically, we show how we can learn a hierarchical Dirichlet process (HDP) prior over the activities of the
top-level features in a deep Boltzmann machine (DBM). This compound HDP-DBM model learns to learn novel concepts from very few training
example by learning low-level generic features, high-level features that capture correlations among low-level features, and a category hierarchy for
sharing priors over the high-level features that are typical of different kinds of concepts. We present efficient learning and inference algorithms for
the HDP-DBM model and show that it is able to learn new concepts from very few examples on CIFAR-100 object recognition, handwritten
character recognition, and human motion capture datasets
The ability to learn abstract representations that support transfer to novel but related tasks lies at the core of many problems in computer vision,
natural language processing, cognitive science, and machine learning. In typical applications of machine classification algorithms today, learning a
new concept requires tens, hundreds, or thousands of training examples. For human learners, however, just one or a few examples are often
sufficient to grasp a new category and make meaningful generalizations to novel instances [15], [25], [31], [44]. Clearly, this requires very strong
but also appropriately tuned inductive biases. The architecture we describe here takes a step toward this ability by learning several forms of abstract
knowledge at different levels of abstraction that support transfer of useful inductive biases from previously learned concepts to novel ones.
We call our architectures compound HD models, where “HD” stands for “Hierarchical-Deep,” because they are derived by composing hierarchical
nonparametric Bayesian models with deep networks, two influential approaches from the recent unsupervised learning literature with
complementary strengths. Recently introduced deep learning models, including deep belief networks (DBNs) [12], deep Boltzmann machines
(DBM) [29], deep autoencoders [19], and many others [9], [10], [21], [22], [26], [32], [34], [43], have been shown to learn useful distributed feature
representations for many high-dimensional datasets. The ability to automatically learn in multiple layers allows deep models to construct
sophisticated domain-specific features without the need to rely on precise human-crafted input representations, increasingly important with the
proliferation of datasets and application domains.
Reinforcement Learning: Fast and Slow
From https://www.cell.com/trends/cognitive-sciences/fulltext/S1364-6613(19)30061-0
Meta-RL: Speeding up Deep RL by Learning to Learn
As discussed earlier, a second key source of slowness in standard deep RL, alongside incremental
updating, is weak inductive bias. As formalized in the idea of the bias–variance tradeoff, fast learning
requires the learner to go in with a reasonably sized set of hypotheses concerning the structure of the
patterns that it will face. The narrower the hypothesis set, the faster learning can be. However, as
foreshadowed earlier, there is a catch: a narrow hypothesis set will only speed learning if it contains
the correct hypothesis. While strong inductive biases can accelerate learning, they will only do so if
the specific biases the learner adopts happen to fit with the material to be learned. As a result of this, a
new learning problem arises: how can the learner know what inductive biases to adopt?
Episodic Deep RL: Fast Learning through Episodic Memory
If incremental parameter adjustment is one source of slowness in deep RL, then one way to
learn faster might be to avoid such incremental updating. Naively increasing the learning rate
governing gradient descent optimization leads to the problem of catastrophic interference.
However, recent research shows that there is another way to accomplish the same goal, which
is to keep an explicit record of past events, and use this record directly as a point of reference
in making new decisions. This idea, referred to as episodic RL parallels ‘non-parametric’
approaches in machine learning and resembles ‘instance-’ or ‘exemplar-based’ theories of
learning in psychology When a new situation is encountered and a decision must be made
concerning what action to take, the procedure is to compare an internal representation of the
current situation with stored representations of past situations. The action chosen is then the
one associated with the highest value, based on the outcomes of the past situations that are
most similar to the present. When the internal state representation is computed by a multilayer
neural network, we refer to the resulting algorithm as ‘episodic deep RL’.
Google Research featuring Jeff Dean
Large-Scale Deep Learning (Jeff Dean)
From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
Embedding for Sparse Inputs (Jeff Dean)
From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
Efficient Vector Representation of Words (Jeff Dean)
From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
Deep Convolution Neural Nets and Gaussian Processes
From https://ai.google/research/pubs/pub47671
Deep Convolution Neural Nets and Gaussian Processes(cont)
From https://ai.google/research/pubs/pub47671
Google’s Inception Network
From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
Google’s Inception Network
From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
Google’s Inception Network
From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
Google’s Inception Network
From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
Google’s Inception Network
From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
Google’s Inception Network
From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
Google’s Inception Network
From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
Large-Scale Deep Learning (Jeff Dean)
From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
Large-Scale Deep Learning (Jeff Dean)
From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
Large-Scale Deep Learning (Jeff Dean)
From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
Large-Scale Deep Learning (Jeff Dean)
From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
Computing and Sensing Architecture
Simple
Event
Processing
Complex
Event
Processing
Hierarchical C4ISR Flow Model from Bob Marcus
Preprocess
In
Input
Devices
u
World
Model
Update
New
World
Model
Strategy
Tactics
HQ
Operations
Field
Operations
Situation Impact
Object Process
Simple
Response
Complex
Response
Update Plan
Create New
Goals and Plan
Sensor and
Effects
Management
In
Actuator
Devices
Measurement
Field
Processors
Data Structured Data Information Knowledge Wisdom
Devices
Awareness
Decision
Adapted From http://www.et-strategies.com/great-global-grid/Events.pdf
Computing and Sensing Architectures
From https://www.researchgate.net/publication/323835314_Greening_Trends_in_Energy-Efficiency_of_IoT-based_Heterogeneous_Wireless_Nodes/figures?lo=1
Computing and Sensing Architectures
From https://www.researchgate.net/publication/323835314_Greening_Trends_in_Energy-Efficiency_of_IoT-based_Heterogeneous_Wireless_Nodes/figures?lo=1
Bio-Inspired Distributed Intelligence
From https://news.mit.edu/2022/wiggling-toward-bio-inspired-machine-intelligence-juncal-arbelaiz-1002
More than half of an octopus’ nerves are distributed through its eight arms, each of which has some degree of autonomy. This
distributed sensing and information processing system intrigued Arbelaiz, who is researching how to design decentralized
intelligence for human-made systems with embedded sensing and computation. At MIT, Arbelaiz is an applied math student who
is working on the fundamentals of optimal distributed control and estimation in the final weeks before completing her PhD this
fall.
She finds inspiration in the biological intelligence of invertebrates such as octopus and jellyfish, with the ultimate goal of
designing novel control strategies for flexible “soft” robots that could be used in tight or delicate surroundings, such as a surgical
tool or for search-and-rescue missions.
“The squishiness of soft robots allows them to dynamically adapt to different environments. Think of worms, snakes, or jellyfish,
and compare their motion and adaptation capabilities to those of vertebrate animals,” says Arbelaiz. “It is an interesting expression
of embodied intelligence — lacking a rigid skeleton gives advantages to certain applications and helps to handle uncertainty in the
real world more efficiently. But this additional softness also entails new system-theoretic challenges.”
In the biological world, the “controller” is usually associated with the brain and central nervous system — it creates motor
commands for the muscles to achieve movement. Jellyfish and a few other soft organisms lack a centralized nerve center, or brain.
Inspired by this observation, she is now working toward a theory where soft-robotic systems could be controlled using
decentralized sensory information sharing.
“When sensing and actuation are distributed in the body of the robot and onboard computational capabilities are limited, it might
be difficult to implement centralized intelligence,” she says. “So, we need these sort of decentralized schemes that, despite sharing
sensory information only locally, guarantee the desired global behavior. Some biological systems, such as the jellyfish, are
beautiful examples of decentralized control architectures — locomotion is achieved in the absence of a (centralized) brain. This is
fascinating as compared to what we can achieve with human-made machines.”
IoT and Deep Learning
From https://cse.buffalo.edu/~lusu/papers/Computer2018.pdf
Deep Learning for IoT
Deep Learning for IoT Overview: Survey
From https://arxiv.org/pdf/1712.04301.pdf
Deep Learning for IoT Overview: Survey
From https://arxiv.org/pdf/1712.04301.pdf
Standardized IoT Data Sets: Survey
From https://arxiv.org/pdf/1712.04301.pdf
Standardized IoT Data Sets: Survey
From https://arxiv.org/pdf/1712.04301.pdf
DeepMind
DeepMind Website
DeepMind Home page
https://deepmind.com/
DeepMind Research
https://deepmind.com/research/
https://deepmind.com/research/publications/
DeepMind Blog
https://deepmind.com/blog
DeepMind Applied
https://deepmind.com/applied
DeepMind Featured Research Publications
From https://deepmind.com/research
AlphaGo
https://www.deepmind.com/research/highlighted-research/alphago
Deep Reinforcement Learning
https://deepmind.com/research/dqn/
A Dual Approach to Scalable Verification of Deep Networks
http://auai.org/uai2018/proceedings/papers/204.pdf
https://www.youtube.com/watch?v=SV05j3GM0LI
Learning to reinforcement learn
https://arxiv.org/abs/1611.05763
Neural Programmer - Interpreters
https://arxiv.org/pdf/1511.06279v3.pdf
Dueling Network Architectures for Deep Reinforcement Learning
https://arxiv.org/pdf/1511.06581.pdf
DeepMind Research over 400 publications
https://deepmind.com/research/publications/
DeepMind Applied
From https://deepmind.com/applied/
DeepMind Health
https://deepmind.com/applied/deepmind-health/
DeepMind for Google
https://deepmind.com/applied/deepmind-google/
DeepMind Ethics and Society
https://deepmind.com/applied/deepmind-ethics-society/
AlphaGo and AlphaGoZero
From https://www.deepmind.com/research/highlighted-research/alphago
We created AlphaGo, a computer program that combines advanced search tree with deep neural
networks. These neural networks take a description of the Go board as an input and process it
through a number of different network layers containing millions of neuron-like connections.
One neural network, the “policy network”, selects the next move to play. The other neural network,
the “value network”, predicts the winner of the game. We introduced AlphaGo to numerous amateur
games to help it develop an understanding of reasonable human play. Then we had it play against
different versions of itself thousands of times, each time learning from its mistakes.
Over time, AlphaGo improved and became increasingly stronger and better at learning and decision-
making. This process is known as reinforcement learning. AlphaGo went on to defeat Go world
champions in different global arenas and arguably became the greatest Go player of all time.
Following the summit, we revealed AlphaGo Zero. While AlphaGo learnt the game by
playing thousands of matches with amateur and professional players, AlphaGo Zero
learnt by playing against itself, starting from completely random play.
This powerful technique is no longer constrained by the limits of human knowledge. Instead,
the computer program accumulated thousands of years of human knowledge during a period of
just a few days and learned to play Go from the strongest player in the world, AlphaGo.
AlphaGo Zero quickly surpassed the performance of all previous versions and also discovered new
knowledge, developing unconventional strategies and creative new moves, including those which
beat the World Go Champions Lee Sedol and Ke Jie. These creative moments give us confidence
that AI can be used as a positive multiplier for human ingenuity.
AlphaZero
From https://www.deepmind.com/blog/alphazero-shedding-new-light-on-chess-shogi-and-go
In late 2017 we introduced AlphaZero, a single system that taught itself from scratch how to master the
games of chess, shogi(Japanese chess), and Go, beating a world-champion program in each case. We were
excited by the preliminary results and thrilled to see the response from members of the chess community,
who saw in AlphaZero’s games a ground-breaking, highly dynamic and “unconventional” style of play that
differed from any chess playing engine that came before it.
Today, we are delighted to introduce the full evaluation of AlphaZero, published in the journal Science (Open
Access version here), that confirms and updates those preliminary results. It describes how AlphaZero quickly
learns each game to become the strongest player in history for each, despite starting its training from random play,
with no in-built domain knowledge but the basic rules of the game.
This ability to learn each game afresh, unconstrained by the norms of human play, results in a distinctive,
unorthodox, yet creative and dynamic playing style. Chess Grandmaster Matthew Sadler and Women’s
International Master Natasha Regan, who have analysed thousands of AlphaZero’s chess games for their
forthcoming book Game Changer (New in Chess, January 2019), say its style is unlike any traditional chess
engine.” It’s like discovering the secret notebooks of some great player from the past,” says Matthew.
Traditional chess engines – including the world computer chess champion Stockfish and IBM’s ground-
breaking Deep Blue – rely on thousands of rules and heuristics handcrafted by strong human players that try
to account for every eventuality in a game. Shogi programs are also game specific, using similar search
engines and algorithms to chess programs.
AlphaZero takes a totally different approach, replacing these hand-crafted rules with a deep neural network
and general purpose algorithms that know nothing about the game beyond the basic rules.
AlphaTensor
From https://www.deepmind.com/blog/discovering-novel-algorithms-with-alphatensor
First extension of AlphaZero to mathematics unlocks new possibilities for research
Algorithms have helped mathematicians perform fundamental operations for thousands of years. The ancient Egyptians created an
algorithm to multiply two numbers without requiring a multiplication table, and Greek mathematician Euclid described an algorithm
to compute the greatest common divisor, which is still in use today.
During the Islamic Golden Age, Persian mathematician Muhammad ibn Musa al-Khwarizmi designed new algorithms to solve linear
and quadratic equations. In fact, al-Khwarizmi’s name, translated into Latin as Algoritmi, led to the term algorithm. But, despite the
familiarity with algorithms today – used throughout society from classroom algebra to cutting edge scientific research – the process
of discovering new algorithms is incredibly difficult, and an example of the amazing reasoning abilities of the human mind.
In our paper, published today in Nature, we introduce AlphaTensor, the first artificial intelligence (AI) system for discovering novel,
efficient, and provably correct algorithms for fundamental tasks such as matrix multiplication. This sheds light on a 50-year-old open
question in mathematics about finding the fastest way to multiply two matrices.
This paper is a stepping stone in DeepMind’s mission to advance science and unlock the most fundamental problems using AI. Our
system, AlphaTensor, builds upon AlphaZero, an agent that has shown superhuman performance on board games, like chess, Go and
shogi, and this work shows the journey of AlphaZero from playing games to tackling unsolved mathematical problems for the first
time
Matrix multiplication
Matrix multiplication is one of the simplest operations in algebra, commonly taught in high school maths classes. But outside the
classroom, this humble mathematical operation has enormous influence in the contemporary digital world and is ubiquitous in
modern computing.
AlphaTensor (cont)
From https://www.deepmind.com/blog/discovering-novel-algorithms-with-alphatensor
First, we converted the problem of finding efficient algorithms for matrix multiplication into a single-player game. In this game, the board is
a three-dimensional tensor (array of numbers), capturing how far from correct the current algorithm is. Through a set of allowed moves,
corresponding to algorithm instructions, the player attempts to modify the tensor and zero out its entries. When the player manages to do so,
this results in a provably correct matrix multiplication algorithm for any pair of matrices, and its efficiency is captured by the number of steps
taken to zero out the tensor.
This game is incredibly challenging – the number of possible algorithms to consider is much greater than the number of atoms in the
universe, even for small cases of matrix multiplication. Compared to the game of Go, which remained a challenge for AI for decades, the
number of possible moves at each step of our game is 30 orders of magnitude larger (above 1033 for one of the settings we consider).
Essentially, to play this game well, one needs to identify the tiniest of needles in a gigantic haystack of possibilities. To tackle the challenges
of this domain, which significantly departs from traditional games, we developed multiple crucialcomponents including a novel neural
network architecture that incorporates problem-specific inductive biases, a procedure to generate useful synthetic data, and a recipe to
leverage symmetries of the problem.
We then trained an AlphaTensor agent using reinforcement learning to play the game, starting without any knowledge about existing
matrix multiplication algorithms. Through learning, AlphaTensor gradually improves over time, re-discovering historical fast matrix
multiplication algorithms such as Strassen’s, eventually surpassing the realm of human intuition and discovering algorithms faster
than previously known.
Detailed Article in Nature
AlphaTensor
From https://www.nature.com/articles/s41586-022-05172-4
Complex Cooperative Agents
From https://deepmind.com/blog/capture-the-flag-science/
From https://science.sciencemag.org/content/364/6443/859 5/19
Complex Cooperative Agents (cont)
From https://science.sciencemag.org/content/364/6443/859 5/19
Complex Cooperative Agents (cont)
From https://science.sciencemag.org/content/364/6443/859 5/19
Unsupervised Learning
From https://deepmind.com/blog/unsupervised-learning/
Unsupervised learning is a paradigm designed to create autonomous intelligence by
rewarding agents (that is, computer programs) for learning about the data they observe
without a particular task in mind. In other words, the agent learns for the sake of learning.
A key motivation for unsupervised learning is that, while the data passed to learning
algorithms is extremely rich in internal structure (e.g., images, videos and text), the targets
and rewards used for training are typically very sparse (e.g., the label ‘dog’ referring to that
particularly protean species, or a single one or zero to denote success or failure in a game).
This suggests that the bulk of what is learned by an algorithm must consist of understanding
the data itself, rather than applying that understanding to particular tasks.
Unsupervised Learning (cont)
From https://deepmind.com/blog/unsupervised-learning/
Unsupervised learning is a paradigm designed to create autonomous intelligence by
rewarding agents (that is, computer programs) for learning about the data they observe
without a particular task in mind. In other words, the agent learns for the sake of learning.
A key motivation for unsupervised learning is that, while the data passed to learning
algorithms is extremely rich in internal structure (e.g., images, videos and text), the targets
and rewards used for training are typically very sparse (e.g., the label ‘dog’ referring to that
particularly protean species, or a single one or zero to denote success or failure in a game).
This suggests that the bulk of what is learned by an algorithm must consist of understanding
the data itself, rather than applying that understanding to particular tasks.
These results resonate with our intuitions about the human mind. Our ability to learn about the
world without explicit supervision is fundamental to what we regard as intelligence. On a train
ride we might listlessly gaze through the window, drag our fingers over the velvet of the seat,
regard the passengers sitting across from us. We have no agenda in these studies: we
almost can’t help but gather information, our brains ceaselessly working to understand the
world around us, and our place within it.
Unsupervised Learning (cont)
From https://deepmind.com/blog/unsupervised-learning/
Decoding the elements of vision
2012 was a landmark year for deep learning, when AlexNet (named after its lead architect Alex Krizhnevsky) swept the
ImageNet classification competition. AlexNet’s abilities to recognize images were unprecedented, but even more
striking is what was happening under the hood. When researchers analysed what AlexNet was doing, they discovered that
it interprets images by building increasingly complex internal representations of its inputs. Low-level features, such as
textures and edges, are represented in the bottom layers, and these are then combined to form high-level concepts such
as wheels and dogs in higher layers.
This is remarkably similar to how information is processed in our brains, where simple edges and textures in primary
sensory processing areas are assembled into complex objects like faces in higher areas. The representation of a complex
scene can therefore be built out of visual primitives, in much the same way that meaning emerges from the individual
words comprising a sentence. Without explicit guidance to do so, the layers of AlexNet had discovered a fundamental
‘vocabulary’ of vision in order to solve its task. In a sense, it had learned to play what Wittgenstein called a ‘language
game’ that iteratively translates from pixels to labels.
Unsupervised Learning (cont)
From https://deepmind.com/blog/unsupervised-learning/
Transfer learning
From the perspective of general intelligence, the most interesting thing about AlexNet’s vocabulary is that it can be reused,
or transferred, to visual tasks other than the one it was trained on, such as recognising whole scenes rather than
individual objects. Transfer is essential in an ever-changing world, and humans excel at it: we are able to rapidly adapt
the skills and understanding we’ve gleaned from our experiences (our ‘world model’) to whatever situation is at hand. For
example, a classically-trained pianist can pick up jazz piano with relative ease. Artificial agents that form the right internal
representations of the world, the reasoning goes, should be able to do similarly.
Nonetheless, the representations learned by classifiers such as AlexNet have limitations. In particular, as the network was
only trained to label images with a single class (cat, dog, car, volcano), any information not required to infer the label—no
matter how useful it might be for other tasks—is liable to be ignored. For example, the representations may fail to capture
the background of the image if the label always refers to the foreground. A possible solution is to provide more
comprehensive training signals, like detailed captions describing the images: not just “dog,” but “A Corgi catching a
frisbee in a sunny park.” However, such targets are laborious to provide, especially at scale, and still may be insufficient to
capture all the information needed to complete a task. The basic premise of unsupervised learning is that the best way to
learn rich, broadly transferable representations is to attempt to learn everything that can be learned about the data.
If the notion of transfer through representation learning seems too abstract, consider a child who has learned to draw
people as stick figures. She has discovered a representation of the human form that is both highly compact and rapidly
adaptable. By augmenting each stick figure with specifics, she can create portraits of all her classmates: glasses for her
best friend, her deskmate in his favorite red tee-shirt. And she has developed this skill not in order to complete a specific
task or receive a reward, but rather in response to her basic urge to reflect the world around her.
Unsupervised Learning (cont)
From https://deepmind.com/blog/unsupervised-learning/
Learning by creating: generative models
Perhaps the simplest objective for unsupervised learning is to train an algorithm to generate its own instances of data. So-
called generative models should not simply reproduce the data they are trained on (an uninteresting act of memorisation),
but rather build a model of the underlying class from which that data was drawn: not a particular photograph of a horse or
a rainbow, but the set of all photographs of horses and rainbows; not a specific utterance from a specific speaker, but the
general distribution of spoken utterances. The guiding principle of generative models is that being able to construct a
convincing example of the data is the strongest evidence of having understood it: as Richard Feynman put it, "what I
cannot create, I do not understand.”
For images, the most successful generative model so far has been the Generative Adversarial Network (GAN for short),
in which two networks—a generator and a discriminator—engage in a contest of discernment akin to that of an artistic
forger and a detective. The generator produces images with the goal of tricking the discriminator into believing they are
real; the discriminator, meanwhile, is rewarded for spotting the fakes. The generated images, first messy and random, are
refined over many iterations, and the ongoing dynamic between the networks leads to ever-more realistic images that are
in many cases indistinguishable from real photographs. Generative adversarial networks can also dream details of
landscapes defined by the rough sketches of users.
A glance at the images below is enough to convince us that the network has learned to represent many of the key features
of the photographs they were trained on, such as the structure of animal’s bodies, the texture of grass, and detailed effects
of light and shade (even when refracted through a soap bubble). Close inspection reveals slight anomalies, such as the
white dog’s apparent extra leg and the oddly right-angled flow of one of the jets in the fountain. While the creators of
generative models strive to avoid such imperfections, their visibility highlights one of the benefits of recreating familiar data
such as images: by inspecting the samples, researchers can infer what the model has and hasn’t learned.
Unsupervised Learning (cont)
From https://deepmind.com/blog/unsupervised-learning/
Creating by predicting
Another notable family within unsupervised learning are autoregressive models, in which the data is split into a
sequence of small pieces, each of which is predicted in turn. Such models can be used to generate data by successively
guessing what will come next, feeding in a guess as input and guessing again. Language models, where each word is
predicted from the words before it, are perhaps the best known example: these models power the text predictions that pop
up on some email and messaging apps. Recent advances in language modelling have enabled the generation of strikingly
plausible passages, such as the one shown below from OpenAI’s GPT-2.
By controlling the input sequence used to condition the out predictions, autoregressive models can also be used to
transform one sequence into another. This demo uses a conditional autoregressive model to transform text into realistic
handwriting. WaveNet transforms text into natural sounding speech, and is now used to generate voices for Google
Assistant. A similar process of conditioning and autoregressive generation can be used to translate from one language
to another.
Autoregressive models learn about data by attempting to predict each piece of it in a particular order. A more general
class of unsupervised learning algorithms can be built by predicting any part of the data from any other. For example, this
could mean removing a word from a sentence, and attempting to predict it from whatever remains. By learning to make
lots of localised predictions, the system is forced to learn about the data as a whole.
One concern around generative models is their potential for misuse. While manipulating evidence with photo, video, and
audio editing has been possible for a long time, generative models could make it even easier to edit media with malicious
intent. We have already seen demonstrations of so-called ‘deepfakes’—for instance, this fabricated video footage of
President Obama. It’s encouraging to see that several major efforts to address these challenges are already underway,
including using statistical techniques to help detect synthetic media and verify authentic media, raising public
awareness, and discussions around limiting the availability of trained generative models.
Unsupervised Learning (cont)
From https://deepmind.com/blog/unsupervised-learning/
Re-imagining intelligence
Generative models are fascinating in their own right, but our principal interest in them at DeepMind is as a stepping stone
towards general intelligence. Endowing an agent with the ability to generate data is a way of giving it an imagination, and
hence the ability to plan and reason about the future. Even without explicit generation, our studies show that learning to
predict different aspects of the environment enriches the agent’s world model, and thereby improves its ability to solve
problems.
These results resonate with our intuitions about the human mind. Our ability to learn about the world without explicit
supervision is fundamental to what we regard as intelligence. On a train ride we might listlessly gaze through the window,
drag our fingers over the velvet of the seat, regard the passengers sitting across from us. We have no agenda in these
studies: we almost can’t help but gather information, our brains ceaselessly working to understand the world around us,
and our place within it.
Towards Robust andVerified AI
From https://deepmind.com/blog/robust-and-verified-ai/
Bugs and software have gone hand in hand since the beginning of computer programming. Over time, software developers
have established a set of best practices for testing and debugging before deployment, but these practices are not suited for
modern deep learning systems. Today, the prevailing practice in machine learning is to train a system on a training data set,
and then test it on another set. While this reveals the average-case performance of models, it is also crucial to ensure
robustness, or acceptably high performance even in the worst case. In this article, we describe three approaches for rigorously identifying and
eliminating bugs in learned predictive models: adversarial testing, robust learning, and formal verification.
This is not an entirely new problem. Computer programs have always had bugs. Over decades, software engineers have assembled an
impressive toolkit of techniques, ranging from unit testing to formal verification. These methods work well on traditional software, but
adapting these approaches to rigorously test machine learning models like neural networks is extremely challenging due to the scale and
lack of structure in these models, which may contain hundreds of millions of parameters. This necessitates the need for developing novel
approaches for ensuring that machine learning systems are robust at deployment.
From a programmer’s perspective, a bug is any behaviour that is inconsistent with the specification, i.e. the intended functionality, of a
system. As part of our mission of solving intelligence, we conduct research into techniques for evaluating whether machine learning
systems are consistent not only with the train and test set, but also with a list of specifications describing desirable properties of a system.
Such properties might include robustness to sufficiently small perturbations in inputs, safety constraints to avoid catastrophic failures, or
producing predictions consistent with the laws of physics.
Towards Robust andVerified AI (cont)
From https://deepmind.com/blog/robust-and-verified-ai/
In this article, we discuss three important technical challenges for the machine learning community to take on, as we collectively work
towards rigorous development and deployment of machine learning systems that are reliably consistent with desired specifications:
• Testing consistency with specifications efficiently. We explore efficient ways to test that machine learning systems are
consistent with properties (such as invariance or robustness) desired by the designer and users of the system. One approach to
uncover cases where the model might be inconsistent with the desired behaviour is to systematically search for worst-case outcomes
during evaluation.
• Training machine learning models to be specification-consistent. Even with copious training data, standard machine learning
algorithms can produce predictive models that make predictions inconsistent with desirable specifications like robustness or fairness -
requires us to reconsider training algorithms that produce models that not only fit training data well, but also are consistent with a list of
specifications.
• Formally proving that machine learning models are specification-consistent. There is a need for algorithms that can verify
the model predictions are provably consistent with a specification of interest for all possible inputs. While the field of formal verification h
studied such algorithms for several decades, these approaches do not easily scale to modern deep learning systems despite
impressive progress.
Towards Robust andVerified AI (cont)
From https://deepmind.com/blog/robust-and-verified-ai/
Testing consistency with specifications efficiently
Robustness to adversarial examples is a relatively well-studied problem in deep learning. One major theme that has come out of this
work is the importance of evaluating against strong attacks, and designing transparent models which can be efficiently analysed.
Alongside other researchers from the community, we have found that many models appear robust when evaluated against weak
adversaries. However, they show essentially 0% adversarial accuracy when evaluated against stronger adversaries (Athalye et al.,
2018, Uesato et al., 2018, Carlini and Wagner, 2017).
While most work has focused on rare failures in the context of supervised learning (largely image classification), there is a need to
extend these ideas to other settings. In recent work on adversarial approaches for uncovering catastrophic failures, we apply these
ideas towards testing reinforcement learning agents intended for use in safety-critical settings. One challenge in developing
autonomous systems is that because a single mistake may have large consequences, very small failure probabilities are unacceptable.
Our objective is to design an “adversary” to allow us to detect such failures in advance (e.g., in a controlled environment). If the
adversary can efficiently identify the worst-case input for a given model, this allows us to catch rare failure cases before deploying a
model. As with image classifiers, evaluating against a weak adversary provides a false sense of security during deployment. This is
similar to the software practice of red-teaming, though extends beyond failures caused by malicious adversaries, and also includes
failures which arise naturally, for example due to lack of generalization.
We developed two complementary approaches for adversarial testing of RL agents. In the first, we use a derivative-free optimisation to
directly minimise the expected reward of an agent. In the second, we learn an adversarial value function which predicts from
experience which situations are most likely to cause failures for the agent. We then use this learned function for optimisation to focus
the evaluation on the most problematic inputs. These approaches form only a small part of a rich, growing space of potential
algorithms, and we are excited about future development in rigorous evaluation of agents.
Already, both approaches result in large improvements over random testing. Using our method, failures that would have taken days to
uncover, or even gone undetected entirely, can be detected in minutes (Uesato et al., 2018b). We also found that adversarial testing
may uncover qualitatively different behaviour in our agents from what might be expected from evaluation on a random test set. In
particular, using adversarial environment construction we found that agents performing a 3D navigation task, which match human-level
performance on average, still failed to find the goal completely on surprisingly simple mazes (Ruderman et al., 2018). Our work also
highlights that we need to design systems that are secure against natural failures, not only against adversaries.
Towards Robust andVerified AI (cont)
From https://deepmind.com/blog/robust-and-verified-ai/
Training machine learning models to be specification-consistent
Adversarial testing aims to find a counter example that violates specifications. As such, it often leads to overestimating the
consistency of models with respect to these specifications. Mathematically, a specification is some relationship that has to
hold between the inputs and outputs of a neural network. This can take the form of upper and lower bounds on certain key
input and output parameters.
Motivated by this observation, several researchers (Raghunathan et al., 2018; Wong et al., 2018; Mirman et al., 2018;
Wang et al., 2018) including our team at DeepMind (Dvijotham et al., 2018; Gowal et al., 2018), have worked on
algorithms that are agnostic to the adversarial testing procedure (used to assess consistency with the specification). This
can be understood geometrically - we can bound (e.g., using interval bound propagation; Ehlers 2017, Katz et al. 2017,
Mirman et al., 2018) the worst violation of a specification by bounding the space of outputs given a set of inputs. If this
bound is differentiable with respect to network parameters and can be computed quickly, it can be used during training.
The original bounding box can then be propagated through each layer of the network.
We show that interval bound propagation is fast, efficient, and — contrary to prior belief — can achieve strong results
(Gowal et al., 2018). In particular, we demonstrate that it can decrease the provable error rate (i.e., maximal error rate
achievable by any adversary) over state-of-the-art in image classification on both MNIST and CIFAR-10 datasets.
Going forward, the next frontier will be to learn the right geometric abstractions to compute tighter overapproximations of
the space of outputs. We also want to train networks to be consistent with more complex specifications capturing desirable
behavior, such as above mentioned invariances and consistency with physical laws.
Towards Robust andVerified AI (cont)
From https://deepmind.com/blog/robust-and-verified-ai/
Formally proving that machine learning models are specification-consistent
Rigorous testing and training can go a long way towards building robust machine learning systems. However, no amount of
testing can formally guarantee that a system will behave as we want. In large-scale models, enumerating all possible outputs
for a given set of inputs (for example, infinitesimal perturbations to an image) is intractable due to the astronomical number of
choices for the input perturbation. However, as in the case of training, we can find more efficient approaches by setting
geometric bounds on the set of outputs. Formal verification is a subject of ongoing research at DeepMind.
The machine learning community has developed several interesting ideas on how to compute precise geometric bounds on
the space of outputs of the network (Katz et al. 2017, Weng et al., 2018; Singh et al., 2018). Our approach (Dvijotham et al.,
2018), based on optimisation and duality, consists of formulating the verification problem as an optimisation problem that tries
to find the largest violation of the property being verified. By using ideas from duality in optimisation, the problem becomes
computationally tractable. This results in additional constraints that refine the bounding boxes computed by interval bound
propagation, using so-called cutting planes. This approach is sound but incomplete: there may be cases where the property of
interest is true, but the bound computed by this algorithm is not tight enough to prove the property. However, once we obtain a
bound, this formally guarantees that there can be no violation of the property. The figure below graphically illustrates the
approach.
This approach enables us to extend the applicability of verification algorithms to more general networks (activation functions,
architectures), general specifications and more sophisticated deep learning models (generative models, neural processes,
etc.) and specifications beyond adversarial robustness (Qin, 2018).
From https://deepmind.com/blog/robust-and-verified-ai/
Outlook
Deployment of machine learning in high-stakes situations presents unique challenges, and requires the
development of evaluation techniques that reliably detect unlikely failure modes. More broadly, we believe that
learning consistency with specifications can provide large efficiency improvements over approaches where
specifications only arise implicitly from training data. We are excited about ongoing research into adversarial
evaluation, learning robust models, and verification of formal specifications.
Much more work is needed to build automated tools for ensuring that AI systems in the real world will do the
“right thing”. In particular, we are excited about progress in the following directions:
• Learning for adversarial evaluation and verification: As AI systems scale and become more
complex, it will become increasingly difficult to design adversarial evaluation and verification algorithms
that are well-adapted to the AI model. If we can leverage the power of AI to facilitate evaluation and
verification, this process can be bootstrapped to scale.
• Development of publicly-available tools for adversarial evaluation and verification: It is important
to provide AI engineers and practitioners with easy-to-use tools that shed light on the possible failure
modes of the AI system before it leads to widespread negative impact. This would require some degree of
standardisation of adversarial evaluation and verification algorithms.
• Broadening the scope of adversarial examples: To date, most work on adversarial examples has
focused on model invariances to small perturbations, typically of images. This has provided an excellent
testbed for developing approaches to adversarial evaluation, robust learning, and verification. We have
begun to explore alternate specifications for properties directly relevant in the real world, and are excited
by future research in this direction.
• Learning specifications: Specifications that capture “correct” behavior in AI systems are often
difficult to precisely state. Building systems that can use partial human specifications and learn further
specifications from evaluative feedback would be required as we build increasingly intelligent agents
capable of exhibiting complex behaviors and acting in unstructured environments.
Towards Robust andVerified AI (cont)
TF-Replicator: Distributed Machine Learning for Researchers
From https://deepmind.com/blog/tf-replicator-distributed-machine-learning/
At DeepMind, the Research Platform Team builds infrastructure to empower and accelerate our AI research.
Today, we are excited to share how we developed TF-Replicator, a software library that helps researchers
deploy their TensorFlow models on GPUs and Cloud TPUs with minimal effort and no previous experience
with distributed systems. TF-Replicator’s programming model has now been open sourced as part of
TensorFlow’s tf.distribute.Strategy. This blog post gives an overview of the ideas and technical challenges
underlying TF-Replicator. For a more comprehensive description, please read our arXiv paper.
A recurring theme in recent AI breakthroughs -- from AlphaFold to BigGAN to AlphaStar -- is the need for effortless
and reliable scalability. Increasing amounts of computational capacity allow researchers to train ever-larger neural
networks with new capabilities. To address this, the Research Platform Team developed TF-Replicator, which allows
researchers to target different hardware accelerators for Machine Learning, scale up workloads to many devices, and
seamlessly switch between different types of accelerators. While it was initially developed as a library on top of
TensorFlow, TF-Replicator’s API has since been integrated into TensorFlow 2.0’s new tf.distribute.Strategy.
While TensorFlow provides direct support for CPU, GPU, and TPU (Tensor Processing Unit) devices, switching
between targets requires substantial effort from the user. This typically involves specialising code for a particular
hardware target, constraining research ideas to the capabilities of that platform. Some existing frameworks built on
top of TensorFlow, e.g. Estimators, seek to address this problem. However, they are typically targeted at production
use cases and lack the expressivity and flexibility required for rapid iteration of research ideas.
AlphaFold Protein Folding
From https://deepmind.com/blog/alphafold/
AlphaFold Protein Folding (cont)
From https://deepmind.com/blog/alphafold/
Google Streams for NHS
From https://deepmind.com/applied/deepmind-health/working-partners/how-were-helping-today
Open Sourcing TRFL
From https://deepmind.com/blog/trfl/
Open Sourcing TRFL (cont)
From https://deepmind.com/blog/trfl/
Multi-Task Learning (e.g.Atari)
From https://deepmind.com/blog/preserving-outputs-precisely-while-adaptively-rescaling-targets/
Multi-Task Learning (cont)
From https://deepmind.com/blog/preserving-outputs-precisely-while-adaptively-rescaling-targets/
Measuring Abstract Reasoning in Neural Nets
From http://proceedings.mlr.press/v80/santoro18a/santoro18a.pdf
Whether neural networks can learn abstract rea- soning or whether they merely rely on superficial statistics is a
topic of recent debate. Here, we propose a dataset and challenge designed to probe abstract reasoning, inspired by a
well-known human IQ test. To succeed at this challenge, models must cope with various generalisation ‘regimes’ in
which the training and test data differ in clearly- defined ways. We show that popular models such as ResNets
perform poorly, even when the train- ing and test sets differ only minimally, and we present a novel architecture,
with a structure de- signed to encourage reasoning, that does signifi- cantly better. When we vary the way in which
the test questions and training data differ, we find that our model is notably proficient at certain forms of
generalisation, but notably weak at others. We further show that the model’s ability to generalise improves
markedly if it is trained to predict sym- bolic explanations for its answers. Altogether, we introduce and explore
ways to both measure and induce stronger abstract reasoning in neural networks. Our freely-available dataset
should motivate further progress in this direction.
One of the long-standing goals of artificial intelligence is to develop machines with abstract reasoning capabilities that equal or
better those of humans. Though there has also been substantial progress in both reasoning and abstract represen- tation learning
in neural nets (Botvinick et al., 2017; LeCun et al., 2015; Higgins et al., 2016; 2017), the extent to which these models exhibit
anything like general abstract reason- ing is the subject of much debate (Garnelo et al., 2016; Lake & Baroni, 2017; Marcus,
2018). The research presented here was therefore motivated by two main goals. (1) To understand whether, and (2) to
understand how, deep neural networks might be able to solve abstract visual reasoning problems.
Our answer to (1) is that, with important caveats, neural networks can indeed learn to infer and apply abstract reason- ing
principles. Our best performing model learned to solve complex visual reasoning questions, and to do so, it needed to induce
and detect from raw pixel input the presence of abstract notions such as logical operations and arithmetic progressions, and
apply these principles to never-before observed stimuli. Importantly, we found that the architec- ture of the model made a
critical difference to its ability to learn and execute such processes. While standard visual- processing models such as CNNs
and ResNets performed poorly, a model that promoted the representation of, and comparison between parts of the stimuli
performed very well. We found ways to improve this performance via addi- tional supervision: the training outcomes and the
model’s ability to generalise were improved if it was required to decode its representations into symbols corresponding to the
reason behind the correct answer.
Learning to Navigate Cities without a Map
From https://arxiv.org/abs/1804.00168
Navigating through unstructured environments is a basic capability of intelligent
creatures, and thus is of fundamental interest in the study and development of artificial
intelligence. Long-range navigation is a complex cognitive task that relies on developing
an internal representation of space, grounded by recognisable landmarks and robust
visual processing, that can simultaneously support continuous self-localisation ("I am
here") and a representation of the goal ("I am going there"). Building upon recent
research that applies deep reinforcement learning to maze navigation problems, we
present an end-to-end deep reinforcement learning approach that can be applied on a
city scale. Recognising that successful navigation relies on integration of general policies
with locale-specific knowledge, we propose a dual pathway architecture that allows
locale-specific features to be encapsulated, while still enabling transfer to multiple cities.
We present an interactive navigation environment that uses Google StreetView for its
photographic content and worldwide coverage, and demonstrate that our learning
method allows agents to learn to navigate multiple cities and to traverse to target
destinations that may be kilometres away. The project webpage this http URL contains a
video summarising our research and showing the trained agent in diverse city
environments and on the transfer task, the form to request the StreetLearn dataset and
links to further resources. The StreetLearn environment code is available at this https
URL
Learning to Generate Images
From https://deepmind.com/blog/learning-to-generate-images/
Advances in deep generative networks have led to impressive results in recent years.
Neverthe- less, such models can often waste their capacity on the minutiae of datasets,
presumably due to weak inductive biases in their decoders. This is where graphics
engines may come in handy since they abstract away low-level details and represent
images as high-level programs. Current methods that combine deep learning and
renderers are lim- ited by hand-crafted likelihood or distance func- tions, a need for large
amounts of supervision, or difficulties in scaling their inference algorithms to richer
datasets. To mitigate these issues, we present SPIRAL, an adversarially trained agent that
generates a program which is executed by a graphics engine to interpret and sample
images. The goal of this agent is to fool a discriminator network that distinguishes
between real and ren- dered data, trained with a distributed reinforce- ment learning setup
without any supervision. A surprising finding is that using the discrimina- tor’s output as
a reward signal is the key to allow the agent to make meaningful progress at match- ing
the desired output rendering. To the best of our knowledge, this is the first demonstration
of an end-to-end, unsupervised and adversarial in- verse graphics agent on challenging
real world (MNIST, OMNIGLOT, CELEBA) and synthetic 3D datasets. A video of the
agent can be found at https://youtu.be/iSyvwAwa7vk.
Neuron Deletion
From https://deepmind.com/blog/understanding-deep-learning-through-neuron-deletion/
We measured the performance impact of damaging the network by deleting individual neurons as
well as groups of neurons. Our experiments led to two surprising findings:
• Although many previous studies have focused on understanding easily interpretable
individual neurons (e.g. “cat neurons”, or neurons in the hidden layers of deep networks
which are only active in response to images of cats), we found that these interpretable
neurons are no more important than confusing neurons with difficult-to-interpret activity.
• Networks which correctly classify unseen images are more resilient to neuron deletion
than networks which can only classify images they have seen before. In other words,
networks which generalise well are much less reliant on single directions than those which
memorise.
To evaluate neuron importance, we measured how network performance on image classification
tasks changes when a neuron is deleted. If a neuron is very important, deleting it should be
highly damaging and substantially decrease network performance, while the deletion of an
unimportant neuron should have little impact. Neuroscientists routinely perform similar
experiments, although they cannot achieve the fine-grained precision which is necessary for
these experiments and readily available in artificial neural networks.
Surprisingly, we found that there was little relationship between selectivity and importance. In
other words, “cat neurons” were no more important than confusing neurons. This finding echoes
recent work in neuroscience which has demonstrated that confusing neurons can actually be
quite informative, and suggests that we must look beyond the most easily interpretable neurons in
order to understand deep neural networks.
Learning by Playing
From https://arxiv.org/abs/1802.10567
We propose Scheduled Auxiliary Control (SAC- X), a new learning paradigm in the context of
Reinforcement Learning (RL). SAC-X enables learning of complex behaviors – from scratch – in
the presence of multiple sparse reward signals. To this end, the agent is equipped with a set of gen-
eral auxiliary tasks, that it attempts to learn simultaneously via off-policy RL. The key idea behind
our method is that active (learned) scheduling and execution of auxiliary policies allows the agent
to efficiently explore its environment – enabling it to excel at sparse reward RL. Our experiments
in several challenging robotic manipulation settings demonstrate the power of our approach. A
video of the rich set of learned behaviors can be found at https://youtu.be/mPKyvocNe M.
This paper introduces SAC-X, a method that simultaneously learns intention policies on a set of
auxiliary tasks, and ac- tively schedules and executes these to explore its observation space - in
search for sparse rewards of externally defined target tasks. Utilizing simple auxiliary tasks enables
SAC-X to learn complicated target tasks from rewards defined in a ’pure’, sparse, manner: only the
end goal is specified, but not the solution path.
We demonstrated the power of SAC-X on several challenging robotics tasks in simulation, using a
common set of simple and sparse auxiliary tasks and on a real robot. The learned intentions are
highly reactive, reliable, and exhibit a rich and robust behavior. We consider this as an important
step towards the goal of applying RL to real world domains.
Scalable Distributed DeepRL
From https://deepmind.com/blog/impala-scalable-distributed-deeprl-dmlab-30/
Deep Reinforcement Learning (DeepRL) has achieved remarkable success in a
range of tasks, from continuous control problems in robotics to playing games
like Go and Atari. The improvements seen in these domains have so far been
limited to individual tasks where a separate agent has been tuned and trained for
each task.
In our most recent work, we explore the challenge of training a single agent on many
tasks.
Today we are releasing DMLab-30, a set of new tasks that span a large variety of
challenges in a visually unified environment with a common action space. Training an
agent to perform well on many tasks requires massive throughput and making efficient
use of every data point. To this end, we have developed a new, highly scalable agent
architecture for distributed training called Importance Weighted Actor-Learner
Architecture that uses a new off-policy correction algorithm called V-trace
DMLab-30 is a collection of new levels designed using our open source RL
environment DeepMind Lab. These environments enable any DeepRL researcher to
test systems on a large spectrum of interesting tasks either individually or in a multi-
task setting.
Scalable Distributed DeepRL (cont)
From https://deepmind.com/blog/impala-scalable-distributed-deeprl-dmlab-30/
In order to tackle the challenging DMLab-30 suite, we developed a new distributed agent
called Importance Weighted Actor-Learner Architecture that maximises data throughput using an
efficient distributed architecture with TensorFlow.
Importance Weighted Actor-Learner Architecture is inspired by the popular A3C architecture which
uses multiple distributed actors to learn the agent’s parameters. In models like this, each of the
actors uses a clone of the policy parameters to act in the environment. Periodically, actors pause
their exploration to share the gradients they have computed with a central parameter server that
applies updates.
Learning Explanatory Rules from Noisy Data
From https://deepmind.com/blog/learning-explanatory-rules-noisy-data/
The distinction is interesting to us because these two types of thinking correspond to two different approaches
to machine learning: deep learning and symbolic program synthesis. Deep learning concentrates on intuitive
perceptual thinking whereas symbolic program synthesis focuses on conceptual, rule-based thinking. Each
system has different merits - deep learning systems are robust to noisy data but are difficult to interpret and
require large amounts of data to train, whereas symbolic systems are much easier to interpret and require less
training data but struggle with noisy data. While human cognition seamlessly combines these two distinct
ways of thinking, it is much less clear whether or how it is possible to replicate this in a single AI system.
Our new paper, recently published in JAIR, demonstrates it is possible for systems to combine intuitive
perceptual with conceptual interpretable reasoning. The system we describe, ∂ILP, is robust to noise, data-
efficient, and produces interpretable rules.
Learning Explanatory Rules from Noisy Data (cont)
From https://arxiv.org/abs/1802.01561
In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a
single set of parameters. A key challenge is to handle the increased amount of data and extended training
time. We have developed a new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture)
that not only uses resources more efficiently in single-machine training but also scales to thousands of
machines without sacrificing data efficiency or resource utilisation. We achieve stable learning at high
throughput by combining decoupled acting and learning with a novel off-policy correction method called V-
trace. We demonstrate the effectiveness of IMPALA for multi-task reinforcement learning on DMLab-30 (a set
of 30 tasks from the DeepMind Lab environment (Beattie et al., 2016)) and Atari-57 (all available Atari games
in Arcade Learning Environment (Bellemare et al., 2013a)). Our results show that IMPALA is able to achieve
better performance than previous agents with less data, and crucially exhibits positive transfer between tasks
as a result of its multi-task approach.
DeepMind Lab
From https://deepmind.com/blog/open-sourcing-deepmind-lab/
The development of innovative agents goes hand in hand with the careful design and implementation of
rationally selected, flexible and well-maintained environments. To that end, we at DeepMind have invested
considerable effort toward building rich simulated environments to serve as “laboratories” for AI research.
Now we are open-sourcing our flagship platform, DeepMind Lab, so the broader research community can
make use of it.
DeepMind Lab is a fully 3D game-like platform tailored for agent-based AI research. It is observed from a
first-person viewpoint, through the eyes of the simulated agent. Scenes are rendered with rich science
fiction-style visuals. The available actions allow agents to look around and move in 3D. The agent’s “body”
is a floating orb. It levitates and moves by activating thrusters opposite its desired direction of movement,
and it has a camera that moves around the main sphere as a ball-in-socket joint tracking the rotational look
actions. Example tasks include collecting fruit, navigating in mazes, traversing dangerous passages while
avoiding falling off cliffs, bouncing through space using launch pads to move between platforms, playing
laser tag, and quickly learning and remembering random procedurally generated environments. An
illustration of how agents in DeepMind Lab perceive and interact with the world can be seen below:
Game Theory for Asymmetric Players
From https://deepmind.com/blog/game-theory-insights-asymmetric-multi-agent-games/
As AI systems start to play an increasing role in the real world it is important to understand how different systems will
interact with one another. In our latest paper, published in the journal Scientific Reports, we use a branch of game
theory to shed light on this problem. In particular, we examine how two intelligent systems behave and respond in a
particular type of situation known as an asymmetric game, which include Leduc poker and various board games such
as Scotland Yard. Asymmetric games also naturally model certain real-world scenarios such as automated auctions
where buyers and sellers operate with different motivations. Our results give us new insights into these situations and
reveal a surprisingly simple way to analyse them. While our interest is in how this theory applies to the interaction of
multiple AI systems, we believe the results could also be of use in economics, evolutionary biology and empirical game
theory among others
Game theory is a field of mathematics that is used to analyse the strategies used by decision makers in competitive
situations. It can apply to humans, animals, and computers in various situations but is commonly used in AI research to
study “multi-agent” environments where there is more than one system, for example several household robots
cooperating to clean the house. Traditionally, the evolutionary dynamics of multi-agent systems have been analysed
using simple, symmetric games, such as the classic Prisoner’s Dilemma, where each player has access to the same
set of actions. Although these games can provide useful insights into how multi-agent systems work and tell us how to
achieve a desirable outcome for all players - known as the Nash equilibrium - they cannot model all situations.
Our new technique allows us to quickly and easily identify the strategies used to find the Nash equilibrium in more
complex asymmetric games - characterised as games where each player has different strategies, goals and rewards.
These games - and the new technique we use to understand them - can be illustrated using an example from ‘Battle of
the Sexes’, a coordination game commonly used in game theory research.
UPDATE 20/03/18: Our latest paper, forthcoming at the Autonomous Agents and Multi-Agent Systems conference
(AAMAS), builds on the Scientific Reports paper outlined above. A Generalised Method for Empirical Game
Theoretic Analysis introduces a general method to perform empirical analysis of multi-agent interactions, both in
symmetric and asymmetric games. The method allows to understand how multi-agent strategies interact, what the
attractors are and what the basins of attraction look like, giving an intuitive understanding for the strength of the involved
strategies. Furthermore, it explains how many data samples to consider in order to guarantee that the equilibria of the
approximating game are sufficiently reliable. We apply the method to several domains, including AlphaGo, Colonel
Blotto and Leduc poker.
A Generalised Method for Empirical Game Theoretic Analysis
From https://arxiv.org/abs/1803.06376
This paper provides theoretical bounds for empirical game theoretical analysis of
complex multi-agent interactions. We provide insights in the empirical meta game
showing that a Nash equilibrium of the meta-game is an approximate Nash
equilibrium of the true underlying game. We investigate and show how many data
samples are required to obtain a close enough approximation of the underlying game.
Additionally, we extend the meta-game analysis methodology to asymmetric games.
The state-of-the-art has only considered empirical games in which agents have
access to the same strategy sets and the payoff structure is symmetric, implying that
agents are interchangeable. Finally, we carry out an empirical illustration of the
generalised method in several domains, illustrating the theory and evolutionary
dynamics of several versions of the AlphaGo algorithm (symmetric), the dynamics of
the Colonel Blotto game played by human players on Facebook (symmetric), and an
example of a meta-game in Leduc Poker (asymmetric), generated by the PSRO multi-
agent learning algorithm.
DeepMind 2017 Review
From https://deepmind.com/blog/2017-deepminds-year-review/
The approach we take at DeepMind is inspired by neuroscience, helping to make progress in
critical areas such as imagination, reasoning, memory and learning. Take imagination, for
example: this distinctively human ability plays a crucial part in our daily lives, allowing us to plan and
reason about the future, but is hugely challenging for computers. We continue to work hard on this
problem, this year introducing imagination-augmented agents that are able to extract relevant
information from an environment in order to plan what to do in the future.
Separately, we made progress in the field of generative models. Just over a year ago we presented WaveNet, a
deep neural network for generating raw audio waveforms that was capable of producing better and more
realistic-sounding speech than existing techniques. At that time, the model was a research prototype and was too
computationally intensive to work in consumer products. Over the last 12 months, our teams managed to create a
new model that was 1000x faster. In October, we revealed that this new Parallel WaveNet is now being used in
the real world, generating the Google Assistant voices for US English and Japanese.
This is an example of the effort we invest in making it easier to build, train and optimise AI systems. Other
techniques we worked on this year, such as distributional reinforcement learning, population based training
for neural networks and new neural architecture search methods, promise to make systems easier to build,
more accurate and quicker to optimise. We have also dedicated significant time to creating new and challenging
environments in which to test our systems, including our work with Blizzard to open up StarCraft II for research
But we know that technology is not value neutral. We cannot simply make progress in fundamental research
without also taking responsibility for the ethical and social impact of our work. This drives our research in critical
areas such as interpretability, where we have been exploring novel methods to understand and explain how our
systems work. It’s also why we have an established technical safety team that continued to develop practical
ways to ensure that we can depend on future systems and that they remain under meaningful human control.
Population Based Training of Neural Networks
From https://arxiv.org/abs/1711.09846
Neural networks dominate the modern machine learning landscape, but their
training and success still suffer from sensitivity to empirical choices of
hyperparameters such as model architecture, loss function, and optimisation
algorithm. In this work we present emph{Population Based Training (PBT)}, a
simple asynchronous optimisation algorithm which effectively utilises a fixed
computational budget to jointly optimise a population of models and their
hyperparameters to maximise performance. Importantly, PBT discovers a
schedule of hyperparameter settings rather than following the generally sub-
optimal strategy of trying to find a single fixed set to use for the whole course
of training. With just a small modification to a typical distributed
hyperparameter training framework, our method allows robust and reliable
training of models. We demonstrate the effectiveness of PBT on deep
reinforcement learning problems, showing faster wall-clock convergence and
higher final performance of agents by optimising over a suite of
hyperparameters. In addition, we show the same method can be applied to
supervised learning for machine translation, where PBT is used to maximise
the BLEU score directly, and also to training of Generative Adversarial Networks
to maximise the Inception score of generated images. In all cases PBT results
in the automatic discovery of hyperparameter schedules and model selection
which results in stable training and better final performance.
Neuroscience Inspired Artificial Intelligence
From https://www.cell.com/neuron/fulltext/S0896-6273(17)30509-3
The fields of neuroscience and artificial intelligence (AI) have a long and intertwined history. In more recent
times, however, communication and collaboration between the two fields has become less commonplace. In
this article, we argue that better understanding biological brains could play a vital role in building intelligent
machines. We survey historical interactions between the AI and neuroscience fields and emphasize current
advances in AI that have been inspired by the study of neural computation in humans and other animals. We
conclude by highlighting shared themes that may be key for advancing future research in both fields.
In this perspective, we have reviewed some of the many ways in which neuroscience has made fundamental
contributions to advancing AI research, and argued for its increasingly important relevance. In strategizing for
the future exchange between the two fields, it is important to appreciate that the past contributions of
neuroscience to AI have rarely involved a simple transfer of full-fledged solutions that could be directly re-
implemented in machines. Rather, neuroscience has typically been useful in a subtler way, stimulating
algorithmic-level questions about facets of animal learning and intelligence of interest to AI researchers and
providing initial leads toward relevant mechanisms. As such, our view is that leveraging insights gained from
neuroscience research will expedite progress in AI research, and this will be most effective if AI researchers
actively initiate collaborations with neuroscientists to highlight key questions that could be addressed by
empirical work.
The successful transfer of insights gained from neuroscience to the development of AI algorithms is critically
dependent on the interaction between researchers working in both these fields, with insights often developing
through a continual handing back and forth of ideas between fields. In the future, we hope that greater
collaboration between researchers in neuroscience and AI, and the identification of a common language
between the two fields (Marblestone et al., 2016), will permit a virtuous circle whereby research is accelerated
through shared theoretical insights and common empirical advances. We believe that the quest to develop AI
will ultimately also lead to a better understanding of our own minds and thought processes. Distilling
intelligence into an algorithmic construct and comparing it to the human brain might yield insights into some of
the deepest and the most enduring mysteries of the mind, such as the nature of creativity, dreams, and
perhaps one day, even consciousness.
Toward an Integration of Deep Learning and Neuroscience
From https://www.frontiersin.org/articles/10.3389/fncom.2016.00094/full
Neuroscience has focused on the detailed implementation of computation, studying neural
codes, dynamics and circuits. In machine learning, however, artificial neural networks tend to
eschew precisely designed codes, dynamics or circuits in favor of brute force optimization of a
cost function, often using simple and relatively uniform initial architectures. Two recent
developments have emerged within machine learning that create an opportunity to connect these
seemingly divergent perspectives.
First, structured architectures are used, including dedicated systems for attention, recursion and
various forms of short- and long-term memory storage.
Second, cost functions and training procedures have become more complex and are varied across
layers and over time. Here we think about the brain in terms of these ideas. We hypothesize that
(1) the brain optimizes cost functions, (2) the cost functions are diverse and differ across brain
locations and over development, and (3) optimization operates within a pre-structured
architecture matched to the computational problems posed by behavior.
In support of these hypotheses, we argue that a range of implementations of credit assignment
through multiple layers of neurons are compatible with our current knowledge of neural
circuitry, and that the brain's specialized systems can be interpreted as enabling efficient
optimization for specific problem classes. Such a heterogeneously optimized system, enabled by
a series of interacting cost functions, serves to make learning data-efficient and precisely
targeted to the needs of the organism. We suggest directions by which neuroscience could seek to
refine and test these hypotheses.
Hippocampus Predictive Map
From https://deepmind.com/blog/hippocampus-predictive-map/
In our new paper, in Nature Neuroscience, we apply a neuroscience lens to a longstanding mathematical
theory from machine learning to provide new insights into the nature of learning and memory. Specifically,
we propose that the area of the brain known as the hippocampus offers a unique solution to this problem by
compactly summarising future events using what we call a “predictive map.”
The hippocampus has traditionally been thought to only represent an animal’s current state, particularly in
spatial tasks, such as navigating a maze. This view gained significant traction with the discovery of “place
cells” in the rodent hippocampus, which fire selectively when the animal is in specific locations. While this
theory accounts for many neurophysiological findings, it does not fully explain why the hippocampus is also
involved in other functions, such as memory, relational reasoning, and decision making.
Our new theory thinks about navigation as part of the more general problem of computing plans that maximise
future reward. Our insights were derived from reinforcement learning, the subdiscipline of AI research that
focuses on systems that learn by trial and error. The key computational idea we drew on is that to estimate
future reward, an agent must first estimate how much immediate reward it expects to receive in each state,
and then weight this expected reward by how often it expects to visit that state in the future. By summing up
this weighted reward across all possible states, the agent obtains an estimate of future reward.
Similarly, we argue that the hippocampus represents every situation - or state - in terms of the future states
which it predicts. For example, if you are leaving work (your current state) your hippocampus might represent
this by predicting that you will likely soon be on your commute, picking up your kids from school or, more
distantly, at home. By representing each current state in terms of its anticipated successor states, the
hippocampus conveys a compact summary of future events, known formally as the “successor
representation”. We suggest that this specific form of predictive map allows the brain to adapt rapidly in
environments with changing rewards, but without having to run expensive simulations of the future.
Going Beyond Average for Neural Learning
From https://deepmind.com/blog/going-beyond-average-reinforcement-learning/
Randomness is something we encounter everyday and has a profound effect on how we
experience the world. The same is true in reinforcement learning (RL) applications, systems
that learn by trial and error and are motivated by rewards. Typically, an RL algorithm predicts
the average reward it receives from multiple attempts at a task, and uses this prediction to
decide how to act. But random perturbations in the environment can alter its behaviour by
changing the exact amount of reward the system receives.
In a new paper, we show it is possible to model not only the average but also the full variation
of this reward, what we call the value distribution. This results in RL systems that are more
accurate and faster to train than previous models, and more importantly opens up the
possibility of rethinking the whole of reinforcement learning.
From https://arxiv.org/abs/1707.06887
In this paper we argue for the fundamental importance of the value distribution: the
distribution of the random return received by a reinforcement learning agent. This is in
contrast to the common approach to reinforcement learning which models the
expectation of this return, or value. Although there is an established body of literature
studying the value distribution, thus far it has always been used for a specific purpose
such as implementing risk-aware behaviour. We begin with theoretical results in both
the policy evaluation and control settings, exposing a significant distributional
instability in the latter. We then use the distributional perspective to design a new
algorithm which applies Bellman's equation to the learning of approximate value
distributions. We evaluate our algorithm using the suite of games from the Arcade
Learning Environment. We obtain both state-of-the-art results and anecdotal evidence
demonstrating the importance of the value distribution in approximate reinforcement
learning. Finally, we combine theoretical and empirical evidence to highlight the ways
in which the value distribution impacts learning in the approximate setting.
Agents that Imagine and Plan
From https://deepmind.com/blog/agents-imagine-and-plan/
In two new papers, we describe a new family of approaches for imagination-based planning.
We also introduce architectures which provide new ways for agents to learn and construct
plans to maximise the efficiency of a task. These architectures are efficient, robust to complex
and imperfect models, and can adopt flexible strategies for exploiting their imagination.
Imagination-augmented agents
The agents we introduce benefit from an ‘imagination encoder’- a neural network which learns
to extract any information useful for the agent’s future decisions, but ignore that which is not
relevant. These agents have a number of distinct features:
• they learn to interpret their internal simulations. This allows them to use models which
coarsely capture the environmental dynamics, even when those dynamics are not perfect.
• they use their imagination efficiently. They do this by adapting the number of imagined
trajectories to suit the problem. Efficiency is also enhanced by the encoder, which is able
to extract additional information from imagination beyond rewards - these trajectories may
contain useful clues even if they do not necessarily result in high reward.
• they can learn different strategies to construct plans. They do this by choosing between
continuing a current imagined trajectory or restarting from scratch. Alternatively, they can
use different imagination models, with different accuracies and computational costs. This
offers them a broad spectrum of effective planning strategies, rather than being restricted
to a one-size-fits-all approach which might limit adaptability in imperfect environments.
Agents that Imagine and Plan
From https://arxiv.org/abs/1707.06203
We introduce Imagination-Augmented Agents (I2As), a novel architecture for deep reinforcement learning
combining model-free and model-based aspects. In contrast to most existing model-based reinforcement
learning and planning methods, which prescribe how a model should be used to arrive at a policy, I2As learn to
interpret predictions from a learned environment model to construct implicit plans in arbitrary ways, by using
the predictions as additional context in deep policy networks. I2As show improved data efficiency, performance,
and robustness to model misspecification compared to several baselines.
From https://arxiv.org/abs/1707.06170
Conventional wisdom holds that model-based planning is a powerful approach to sequential decision-making. It
is often very challenging in practice, however, because while a model can be used to evaluate a plan, it does not
prescribe how to construct a plan. Here we introduce the "Imagination-based Planner", the first model-based,
sequential decision-making agent that can learn to construct, evaluate, and execute plans. Before any action, it
can perform a variable number of imagination steps, which involve proposing an imagined action and evaluating
it with its model-based imagination. All imagined actions and outcomes are aggregated, iteratively, into a "plan
context" which conditions future real and imagined actions. The agent can even decide how to imagine: testing
out alternative imagined actions, chaining sequences of actions together, or building a more complex
"imagination tree" by navigating flexibly among the previously imagined states using a learned policy. And our
agent can learn to plan economically, jointly optimizing for external rewards and computational costs associated
with using its imagination. We show that our architecture can learn to solve a challenging continuous control
problem, and also learn elaborate planning strategies in a discrete maze-solving task. Our work opens a new
direction toward learning the components of a model-based planning system and how to use them.
Creating NewVisual Concepts
From https://deepmind.com/blog/imagine-creating-new-visual-concepts-recombining-familiar-ones/
In our new paper, we propose a novel theoretical approach to address this problem. We also demonstrate a
new neural network component called the Symbol-Concept Association Network (SCAN), that can, for the
first time, learn a grounded visual concept hierarchy in a way that mimics human vision and word
acquisition, enabling it to imagine novel concepts guided by language instructions.
Our approach can be summarised as follows:
• The SCAN model experiences the visual world in the same way as a young baby might during the first
few months of life. This is the period when the baby’s eyes are still unable to focus on anything more
than an arm’s length away, and the baby essentially spends all her time observing various objects
coming into view, moving and rotating in front of her. To emulate this process, we placed SCAN in a
simulated 3D world of DeepMind Lab, where, like a baby in a cot, it could not move, but it could rotate
its head and observe one of three possible objects presented to it against various coloured
backgrounds - a hat, a suitcase or an ice lolly. Like the baby’s visual system, our model learns the basic
structure of the visual world and how to represent objects in terms of interpretable visual “primitives”.
For example, when looking at an apple, the model will learn to represent it in terms of its colour, shape,
size, position or lighting.
From https://arxiv.org/abs/1707.03389
The seemingly infinite diversity of the natural world arises from a relatively small set of coherent rules, such as
the laws of physics or chemistry. We conjecture that these rules give rise to regularities that can be discovered
through primarily unsupervised experiences and represented as abstract concepts. If such representations are
compositional and hierarchical, they can be recombined into an exponentially large set of new concepts. This
paper describes SCAN (Symbol-Concept Association Network), a new framework for learning such abstractions in
the visual domain. SCAN learns concepts through fast symbol association, grounding them in disentangled visual
primitives that are discovered in an unsupervised manner. Unlike state of the art multimodal generative model
baselines, our approach requires very few pairings between symbols and images and makes no assumptions
about the form of symbol representations. Once trained, SCAN is capable of multimodal bi-directional inference,
generating a diverse set of image samples from symbolic descriptions and vice versa. It also allows for traversal
and manipulation of the implicit hierarchy of visual concepts through symbolic instructions and learnt logical
recombination operations. Such manipulations enable SCAN to break away from its training data distribution and
imagine novel visual concepts through symbolically instructed recombination of previously learnt concepts.
Producing Flexible Behaviors in Simulation Environments
From https://deepmind.com/blog/producing-flexible-behaviours-simulated-environments/
True motor intelligence requires learning how to control and coordinate a flexible body to solve tasks in a
range of complex environments. Existing attempts to control physically simulated humanoid bodies come
from diverse fields, including computer animation and biomechanics. A trend has been to use hand-
crafted objectives, sometimes with motion capture data, to produce specific behaviors. However, this
may require considerable engineering effort, and can result in restricted behaviours or behaviours that
may be difficult to repurpose for new tasks.
In three new papers, we seek ways to produce flexible and natural behaviours that can be reused and
adapted to solve tasks.
Read:
Emergence of locomotion behaviours in rich environments
Learning human behaviours from motion capture by adversarial imitation
Robust imitation of diverse behaviours
Achieving flexible and adaptive control of simulated bodies is a key element of AI research. Our work
aims to develop flexible systems which learn and adapt skills to solve motor control tasks while reducing
the manual engineering required to achieve this goal. Future work could extend these approaches to
enable coordination of a greater range of behaviours in more complex situations.
Producing Flexible Behaviors in Simulation Environments
The reinforcement learning paradigm allows, in principle, for complex behaviours to be learned directly from simple
reward signals. In practice, however, it is common to carefully hand-design the reward function to encourage a
particular solution, or to derive it from demonstration data. In this paper explore how a rich environment can help
to promote the learning of complex behavior. Specifically, we train agents in diverse environmental contexts, and
find that this encourages the emergence of robust behaviours that perform well across a suite of tasks. We
demonstrate this principle for locomotion -- behaviours that are known for their sensitivity to the choice of reward.
We train several simulated bodies on a diverse set of challenging terrains and obstacles, using a simple reward
function based on forward progress. Using a novel scalable variant of policy gradient reinforcement learning, our
agents learn to run, jump, crouch and turn as required by the environment without explicit reward-based guidance.
A visual depiction of highlights of the learned behavior can be viewed following this https URL .
Learning human behaviours from motion capture by adversarial imitation
Emergence of locomotion behaviours in rich environments
Rapid progress in deep reinforcement learning has made it increasingly feasible to train controllers for high-
dimensional humanoid bodies. However, methods that use pure reinforcement learning with simple reward functions
tend to produce non-humanlike and overly stereotyped movement behaviors. In this work, we extend generative
adversarial imitation learning to enable training of generic neural network policies to produce humanlike movement
patterns from limited demonstrations consisting only of partially observed state features, without access to actions,
even when the demonstrations come from a body with different and unknown physical parameters. We leverage this
approach to build sub-skill policies from motion capture data and show that they can be reused to solve tasks when
controlled by a higher level controller.
Robust imitation of diverse behaviours
Deep generative models have recently shown great promise in imitation learning for motor control. Given enough data,
even supervised approaches can do one-shot imitation learning; however, they are vulnerable to cascading failures
when the agent trajectory diverges from the demonstrations. Compared to purely supervised methods, Generative
Adversarial Imitation Learning (GAIL) can learn more robust controllers from fewer demonstrations, but is inherently
mode-seeking and more difficult to train. In this paper, we show how to combine the favourable aspects of these two
approaches. The base of our model is a new type of variational autoencoder on demonstration trajectories that learns
semantic policy embeddings. We show that these embeddings can be learned on a 9 DoF Jaco robot arm in reaching
tasks, and then smoothly interpolated with a resulting smooth interpolation of reaching behavior. Leveraging these
policy representations, we develop a new version of GAIL that (1) is much more robust than the purely-supervised
controller, especially with few demonstrations, and (2) avoids mode collapse, capturing many diverse behaviors when
GAIL on its own does not. We demonstrate our approach on learning diverse gaits from demonstration on a 2D biped
and a 62 DoF 3D humanoid in the MuJoCo physics environment.
DQN - Deep Reinforcement Learning
From https://deepmind.com/research/dqn
Nature Paper
https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf
DQN - Deep Reinforcement Learning Paper
From https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf
From https://arxiv.org/pdf/1908.04734.pdf
Reward Tampering Problems and Solutions in Reinforcement Learning
PathNet from Google DeepMind
From https://arxiv.org/pdf/1701.08734.pdf
For artificial general intelligence (AGI) it would be efficient
if multiple users trained the same giant neural network, per-
mitting parameter reuse, without catastrophic forgetting.
PathNet is a first step in this direction. It is a neural net-
work algorithm that uses agents embedded in the neural net-
work whose task is to discover which parts of the network to
re-use for new tasks. Agents are pathways (views) through
the network which determine the subset of parameters that
are used and updated by the forwards and backwards passes
of the backpropogation algorithm. During learning, a tour-
nament selection genetic algorithm is used to select path-
ways through the neural network for replication and muta-
tion. Pathway fitness is the performance of that pathway
measured according to a cost function. We demonstrate
successful transfer learning; fixing the parameters along a
path learned on task A and re-evolving a new population
of paths for task B, allows task B to be learned faster than
it could be learned from scratch or after fine-tuning. Paths
evolved on task B re-use parts of the optimal path evolved
on task A. Positive transfer was demonstrated for binary
MNIST, CIFAR, and SVHN supervised learning classifica-
tion tasks, and a set of Atari and Labyrinth reinforcement
learning tasks, suggesting PathNets have general applicabil-
ity for neural network training. Finally, PathNet also signif-
icantly improves the robustness to hyperparameter choices
of a parallel asynchronous reinforcement learning algorithm
Pathways
Deep Learning 2020’s
2020 References
• Future of Deep Learning
https://thenextweb.com/neural/2020/04/05/self-supervised-learning-is-the-future-of-ai-syndication/
• Turing Award Winners Video
https://www.youtube.com/watch?v=UX8OubxsY8w
• MIT Deep Learning Video
https://www.youtube.com/watch?v=0VH1Lim8gL8
Three Challenges of Deep Learning fromYann LeCun
From https://thenextweb.com/neural/2020/04/05/self-supervised-learning-is-the-future-of-ai-syndication/
1. First, we need to develop AI systems that learn with fewer samples or fewer trials.“My
suggestion is to use unsupervised learning, or I prefer to call it self-supervised learning because
the algorithms we use are really akin to supervised learning, which is basically learning to fill in
the blanks,” LeCun says.“Basically, it’s the idea of learning to represent the world before
learning a task.This is what babies and animals do.We run about the world, we learn how it
works before we learn any task. Once we have good representations of the world, learning a
task requires few trials and few samples.”
2. The second challenge is creating deep learning systems that can reason. Current deep
learning systems are notoriously bad at reasoning and abstraction, which is why they need huge
amounts of data to learn simple tasks.“The question is, how do we go beyond feed-forward
computation and system 1? How do we make reasoning compatible with gradient-based
learning? How do we make reasoning differentiable? That’s the bottom line,” LeCun said.
System 1 is the kind of learning tasks that don’t require active thinking, such as navigating a
known area or making small calculations. System 2 is the more active kind of thinking, which
requires reasoning. Symbolic artificial intelligence, the classic approach to AI, has proven to be
much better at reasoning and abstraction.
3.The third challenge is to create deep learning systems that can lean and plan complex action
sequences, and decompose tasks into subtasks. Deep learning systems are good at providing
end-to-end solutions to problems but very bad at breaking them down into specific
interpretable and modifiable steps.There have been advances in creating learning-based AI
systems that can decompose images, speech, and text. Capsule networks, invented by Geoffry
Hinton, address some of these challenges. But learning to reason about complex tasks is
beyond today’s AI.“We have no idea how to do this,” LeCun admits.
Foundation Models
From https://research.ibm.com/blog/what-are-foundation-models
In recent years, we’ve managed to build AI systems that can learn from thousands, or millions, of examples to help us better
understand our world, or find new solutions to difficult problems. These large-scale models have led to systems that can
understand when we talk or write, such as the natural-language processing and understanding programs we use every day,
from digital assistants to speech-to-text programs. Other systems, trained on things like the entire work of famous artists, or
every chemistry textbook in existence, have allowed us to build generative models that can create new works of art based on
those styles, or new compound ideas based on the history of chemical research.
While many new AI systems are helping solve all sorts of real-world problems, creating and deploying each new system
often requires a considerable amount of time and resources. For each new application, you need to ensure that there’s a large,
well-labelled dataset for the specific task you want to tackle. If a dataset didn’t exist, you’d have to have people spend
hundreds or thousands of hours finding and labelling appropriate images, text, or graphs for the dataset. Then the AI model
has to learn to recognize everything in the dataset, and then it can be applied to the use case you have, from recognizing
language to generating new molecules for drug discovery. And training one large natural-language processing model, for
example, has roughly the same carbon footprint as running five cars over their lifetime.
The next wave in AI looks to replace the task-specific models that have dominated the AI landscape to date. The future is
models that are trained on a broad set of unlabeled data that can be used for different tasks, with minimal fine-tuning. These
are called foundation models, a term first popularized by the Stanford Institute for Human-Centered Artificial Intelligence.
We’ve seen the first glimmers of the potential of foundation models in the worlds of imagery and language. Early examples
of models, like GPT-3, BERT, or DALL-E 2, have shown what’s possible. Input a short prompt, and the system generates an
entire essay, or a complex image, based on your parameters, even if it wasn’t specifically trained on how to execute that
exact argument or generate an image in that way.
What makes these new systems foundation models is that they, as the name suggests, can be the foundation for many
applications of the AI model. Using self-supervised learning and transfer learning, the model can apply information it’s learnt
about one situation to another. While the amount of data is considerably more than the average person needs to transfer
understanding from one task to another, the end result is relatively similar: You learn to drive on one car, for example, and
without too much effort, you can drive most other cars — or even a truck or a bus.
Challenges and Risks of Foundation Models
From https://arxiv.org/pdf/2108.07258.pdf
AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad
data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to
underscore their critically central yet incomplete character. This report provides a thorough account of the
opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics,
reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data,
systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact
(e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation
models are based on standard deep learning and transfer learning, their scale results in new emergent
capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides
powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted
models downstream. Despite the impending widespread deployment of foundation models, we currently lack a
clear understanding of how they work, when they fail, and what they are even capable of due to their emergent
properties. To tackle these questions, we believe much of the critical research on foundation models will require
deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.
This report investigates an emerging paradigm for building artificial intelligence (AI) systems
based on a general class of models which we term foundation models.2 A foundation model is any
model that is trained on broad data (generally using self-supervision at scale) that can be adapted
(e.g., fine-tuned) to a wide range of downstream tasks; current examples include BERT [Devlin et al .
2019], GPT-3 [Brown et al . 2020], and CLIP [Radford et al . 2021]. From a technological point of view,
foundation models are not new — they are based on deep neural networks and self-supervised
learning, both of which have existed for decades. However, the sheer scale and scope of foundation
models from the last few years have stretched our imagination of what is possible; for example,
GPT-3 has 175 billion parameters and can be adapted via natural language prompts to do a passable
job on a wide range of tasks despite not being trained explicitly to do many of those tasks [Brown
et al. 2020]. At the same time, existing foundation models have the potential to accentuate harms,
and their characteristics are in general poorly understood. Given their impending widespread
deployment, they have become a topic of intense scrutiny [Bender et al. 2021]
Capsule Neural Nets
From https://en.wikipedia.org/wiki/Capsule_neural_network
A Capsule Neural Network (CapsNet) is a machine learning system that is a type of artificial neural network (ANN) that
can be used to better model hierarchical relationships. The approach is an attempt to more closely mimic biological neural
organization.[1]
The idea is to add structures called “capsules” to a convolutional neural network (CNN), and to reuse output from several
of those capsules to form more stable (with respect to various perturbations) representations for higher capsules.[2] The
output is a vector consisting of the probability of an observation, and a pose for that observation. This vector is similar to
what is done for example when doing classification with localization in CNNs.
Among other benefits, capsnets address the "Picasso problem" in image recognition: images that have all the right parts
but that are not in the correct spatial relationship (e.g., in a "face", the positions of the mouth and one eye are switched).
For image recognition, capsnets exploit the fact that while viewpoint changes have nonlinear effects at the pixel level, they
have linear effects at the part/object level.[3] This can be compared to inverting the rendering of an object of multiple parts.
[4]
Capsules
From https://www.youtube.com/watch?v=UX8OubxsY8w
Capsules
From https://www.youtube.com/watch?v=UX8OubxsY8w
Dalle-2
From https://www.nytimes.com/2022/08/24/technology/ai-technology-progress.html
For the past few days, I’ve been playing around with DALL-E 2, an app developed by the San
Francisco company OpenAI that turns text descriptions into hyper-realistic images.
What’s impressive about DALL-E 2 isn’t just the art it generates. It’s how it generates art. These
aren’t composites made out of existing internet images — they’re wholly new creations made
through a complex A.I. process known as “diffusion,” which starts with a random series of pixels
and refines it repeatedly until it matches a given text description. And it’s improving quickly —
DALL-E 2’s images are four times as detailed as the images generated by the original DALL-E,
which was introduced only last year.
DALL-E 2 got a lot of attention when it was announced this year, and rightfully so. It’s an
impressive piece of technology with big implications for anyone who makes a living working with
images — illustrators, graphic designers, photographers and so on. It also raises important
questions about what all of this A.I.-generated art will be used for, and whether we need to worry
about a surge in synthetic propaganda, hyper-realistic deepfakes or even nonconsensual
pornography.
Dalle-2 available to all
If you've been itching to try OpenAI's image synthesis tool but have been stymied by the lack of an
invitation, now's your chance. Today, OpenAI announced that it removed the waitlist for its DALL-E AI
image generator service. That means anyone can sign up and use it.
DALL-E is a deep learning image synthesis model that has been trained on hundreds of millions of images
pulled from the Internet. It uses a technique called latent diffusion to learn associations between words and
images. As a result, DALL-E users can type in a text description—called a prompt—and see it rendered
visually as a 1024×1024 pixel image in almost any artistic style.
Make-a-Video
From https://makeavideo.studio/
Make-A-Video research builds on the recent progress made in text-to-image generation technology built
to enable text-to-video generation. The system uses images with descriptions to learn what the world
looks like and how it is often described. It also uses unlabeled videos to learn how the world moves.
With this data, Make-A-Video lets you bring your imagination to life by generating whimsical, one-of-a-
kind videos with just a few words or lines of text.
From Make-a-Video Paper
We propose Make-A-Video – an approach for directly translating the tremendous
recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V). Our
intuition is simple: learn what the world looks like and how it is described from
paired text-image data, and learn how the world moves from unsupervised video
footage. Make-A-Video has three advantages: (1) it accelerates training of the
T2V model (it does not need to learn visual and multimodal representations from
scratch), (2) it does not require paired text-video data, and (3) the generated
videos inherit the vastness (diversity in aesthetic, fantastical depictions, etc.)
of today’s image generation models. We design a simple yet effective way to
build on T2I models with novel and effective spatial-temporal modules. First, we
decompose the full temporal U-Net and attention tensors and approximate them
in space and time. Second, we design a spatial temporal pipeline to generate
high resolution and frame rate videos with a video decoder, interpolation model
and two super resolution models that can enable various applications besides
T2V. In all aspects, spatial and temporal resolution, faithfulness to text, and
quality, Make-A-Video sets the new state-of-the-art in text-to-video generation,
as determined by both qualitative and quantitative measures
Concerns for Deep Learning by Gary Marcus
From https://arxiv.org/ftp/arxiv/papers/1801/1801.00631.pdf
Deep Learning thus far:
• Is data hungry
• Is shallow and has limited capacity for transfer
• Has no natural way to deal with hierarchical structure
• Has struggled with open-ended inference
• Is not sufficiently transparent
• Has not been well integrated with prior knowledge
• Cannot inherently distinguish causation from correlation
• Presumes a largely stable world, in ways that may be problematic
• Works well as an approximation, but answers often can’t be fully trusted
• Is difficult to engineer with
Causal Reasoning and
Deep Learning (Advanced)
z
Causal Reasoning and Transfer Learning
From A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms
We propose to meta-learn causal structures based on how fast a learner adapts to
new distributions arising from sparse distributional changes, e.g. due to
interventions, actions of agents and other sources of non-stationarities. We show
that under this assumption, the correct causal structural choices lead to faster
adaptation to modified distributions because the changes are concentrated in one
or just a few mechanisms when the learned knowledge is modularized appropriately.
This leads to sparse expected gradients and a lower effective number of degrees of
freedom needing to be relearned while adapting to the change. It motivates using
the speed of adaptation to a modified distribution as a meta-learning objective. We
demonstrate how this can be used to determine the cause-effect relationship
between two observed variables. The distributional changes do not need to
correspond to standard interventions (clamping a variable), and the learner has no
direct knowledge of these interventions. We show that causal structures can be
parameterized via continuous variables and learned end-to-end. We then explore
how these ideas could be used to also learn an encoder that would map low-level
observed variables to unobserved causal variables leading to faster adaptation out-
of-distribution, learning a representation space where one can satisfy the
assumptions of independent mechanisms and of small and sparse changes in these
mechanisms due to actions and non-stationarities.
Causal Deep Learning from Bengio
From https://www.wired.com/story/ai-pioneer-algorithms-understand-why/
From https://arxiv.org/abs/1901.10912
z
Causal Reasoning and Transfer Learning
A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms
From https://arxiv.org/abs/1901.10912
Proposition 1. The expected gradient over the transfer distribution of the regret
(accumulated negative log-likelihood during the adaptation episode) with respect to the
module parameters is zero for the parameters of the modules that (a) were correctly
learned in the training phase, and (b) have the correct set of causal parents, corresponding
to the ground truth causal graph, if (c) the corresponding ground truth conditional
distributions did not change from the training distribution to the transfer distribution.
Adaptation to the transfer distribution, as more transfer distribution examples are seen by
the learner (horizontal axis), in terms of the log-likelihood on the transfer distribution (on a
large test set from the transfer distribution, tested after each update of the parameters).
Here the model is discrete, withN= 10. Curves are the median over 10 000 runs, with
25-75% quantiles intervals,for both the correct causal model (blue, top) and the incorrect
one (red, bottom). We see that the correct causal model adapts faster (smaller regret), and
that the most informative part of the trajectory (where the two models generalize the most
differently) is in the first 10-20 examples
z
Causal Reasoning and Transfer Learning
A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms
From https://arxiv.org/abs/1901.10912
Equation (2) R = − log [sigmoid(γ)LA→B + (1 − sigmoid(γ))LB→A]
z
Causal Reasoning and Transfer Learning
A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms
From https://arxiv.org/abs/1901.10912
References
z
Causal Reasoning and Deep Learning References
http://causality.cs.ucla.edu/blog/
http://causality.cs.ucla.edu/
https://www.google.com/search?client=firefox-b-1-d&q=deep+learning+causal+analysis
https://arxiv.org/search/?query=causal&searchtype=title&source=header
https://arxiv.org/abs/1901.10912
https://www.ericsson.com/en/blog/2020/2/causal-inference-machine-learning
https://towardsdatascience.com/introduction-to-causality-in-machine-
learning-4cee9467f06f
References
• Neural Networks and Deep Learning:A Textbook
• Deep Learning (Adaptive Computation and Machine Learning series)
• The Deep Learning Revolution (The MIT Press)
• Introduction to Deep Learning (The MIT Press)
• Deep Learning with PythonAn Introduction to Deep Reinforcement Learning
• World Models
• Learning and Querying Fast Generative Models for Reinforcement Learning
• Imagination-Augmented Agents for Deep Reinforcement Learning
• Neural Networks and Deep Learning:A Textbook
• Google Brain
• Convolutional Neural Nets (Detailed introduction)
• Future of Deep Learning
References (cont)
• Recurrent Neural Networks
• Guide to LSTM and Recurrent Neural Networks
• Enterprise Deep Learning
• 6 AI Trends for 2019
• Designing Neural Nets through Neural Evolution
• Compositional Pattern Producing Networks
• Deep Generator Networks
• Deep Reinforcement Learning Course
• N-Grams
• A Beginners Guide to Deep Reinforcement Learning with many links
• Verifiable AI from Specifications
• Amazon Deep Learning Containers
• A Deep Dive in to Deep Learning
Google AI References
• https://ai.google/research/pubs/?area=AlgorithmsandTheory
• https://ai.google/research/pubs/?area=DistributedSystemsandParallelComputing
• https://ai.google/research/pubs/?area=MachineTranslation
• https://ai.google/research/pubs/?area=MachineIntelligence
• https://ai.google/research/pubs/?area=MachinePerception
• https://ai.google/research/pubs/?area=DataManagement
• https://ai.google/research/pubs/?area=InformationRetrievalandtheWeb
• https://ai.google/research/pubs/?area=NaturalLanguageProcessing
• https://ai.google/research/pubs/?area=SpeechProcessing
• Deep Mind Publications
Deep Mind References
DeepMind Home page
https://deepmind.com/
DeepMind Research
https://deepmind.com/research/
https://deepmind.com/research/publications/
DeepMind Blog
https://deepmind.com/blog
DeepMind Applied
https://deepmind.com/applied
Deep Compressed Sensing
https://arxiv.org/pdf/1905.06723.pdf
Deep Mind NIPS Papers
https://deepmind.com/blog/deepmind-papers-nips-2017/
DeepMind Papers at ICML 2018
https://deepmind.com/blog/deepmind-papers-icml-2018/
DeepMind Papers at ICLR 2018
https://deepmind.com/blog/deepmind-papers-iclr-2018/
Proceedings of ICML Program 2018
http://proceedings.mlr.press/v97/
References (cont)
• OpenAI
• OpenAI Blog
• OpenAI Research
• Deep Learning Book Lecture Notes
• Deep Learning Course Lecture Notes
• Bayesian Deep Learning Resources
• Gradient Boosting Algorithms
• Deep Mind Research
• David Inouye Papers
• Jeff Klune’s Research
• Jeff Hawkins Books
• Numenta
• Reinforcement Learning Book

More Related Content

PPTX
UNIT I - AI.pptx
PPT
Types of Artificial Intelligence.ppt
PPTX
AI INTELLIGENT AGENTS AND ENVIRONMENT.pptx
PPTX
artificial intelligence introduction slides
PPT
artificial Intelligence unit1 ppt (1).ppt
PPTX
Artificial Intelligence- lecture 1 from BUKC lecture 1
PPTX
Artificial Intelligence- lecture 1 BUKC lecture
PDF
What is artificial intelligence
UNIT I - AI.pptx
Types of Artificial Intelligence.ppt
AI INTELLIGENT AGENTS AND ENVIRONMENT.pptx
artificial intelligence introduction slides
artificial Intelligence unit1 ppt (1).ppt
Artificial Intelligence- lecture 1 from BUKC lecture 1
Artificial Intelligence- lecture 1 BUKC lecture
What is artificial intelligence

Similar to AGI Part 1.pdf (20)

PPTX
Artificial intelligence BCA 6th Sem Notes
PPTX
Artificial intelligence BCA 6th Sem Notes
PDF
AI Module 1.pptx.pdf Artificial Intelligence Notes
PDF
From Assistants to Autopilots_ The Rise of AI Agents.pdf
PPT
Introduction
PPTX
AIES Unit I(2022).pptx
PPT
Lecture1
PDF
AIES Unit_1 (2022).pdf...............................
PPTX
artificial intelligence bcs515b notes vtu
PPTX
Artificial intelligence_ class 12 KATHIR.pptx
PDF
Iaetsd intelligent agent business development systems -trends and approach
PPTX
Artificial Intelligence BCS51 Intelligent
PPT
Artificial Intelligence Module 1_additional2.ppt
PDF
introduction to Artificial Intelligence for computer science
PPTX
UNIT1-AI final.pptx
PPTX
AI module 1 presentation under VTU Syllabus
PPTX
unit-1 AI.pptx hddjlaajhshsjskskdhdhdbhdbd
PDF
ABOUT FORMATION OF INTERNATIONAL ETHICAL DIGITAL ENVIRONMENT WITH SMART ARTIF...
PDF
ABOUT FORMATION OF INTERNATIONAL ETHICAL DIGITAL ENVIRONMENT WITH SMART ARTIF...
Artificial intelligence BCA 6th Sem Notes
Artificial intelligence BCA 6th Sem Notes
AI Module 1.pptx.pdf Artificial Intelligence Notes
From Assistants to Autopilots_ The Rise of AI Agents.pdf
Introduction
AIES Unit I(2022).pptx
Lecture1
AIES Unit_1 (2022).pdf...............................
artificial intelligence bcs515b notes vtu
Artificial intelligence_ class 12 KATHIR.pptx
Iaetsd intelligent agent business development systems -trends and approach
Artificial Intelligence BCS51 Intelligent
Artificial Intelligence Module 1_additional2.ppt
introduction to Artificial Intelligence for computer science
UNIT1-AI final.pptx
AI module 1 presentation under VTU Syllabus
unit-1 AI.pptx hddjlaajhshsjskskdhdhdbhdbd
ABOUT FORMATION OF INTERNATIONAL ETHICAL DIGITAL ENVIRONMENT WITH SMART ARTIF...
ABOUT FORMATION OF INTERNATIONAL ETHICAL DIGITAL ENVIRONMENT WITH SMART ARTIF...
Ad

Recently uploaded (20)

PDF
Chapter 3 Spatial Domain Image Processing.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPT
Teaching material agriculture food technology
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Approach and Philosophy of On baking technology
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Cloud computing and distributed systems.
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Big Data Technologies - Introduction.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Machine learning based COVID-19 study performance prediction
Chapter 3 Spatial Domain Image Processing.pdf
The AUB Centre for AI in Media Proposal.docx
Understanding_Digital_Forensics_Presentation.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Teaching material agriculture food technology
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Approach and Philosophy of On baking technology
MYSQL Presentation for SQL database connectivity
Advanced methodologies resolving dimensionality complications for autism neur...
Cloud computing and distributed systems.
“AI and Expert System Decision Support & Business Intelligence Systems”
Big Data Technologies - Introduction.pptx
Review of recent advances in non-invasive hemoglobin estimation
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Machine learning based COVID-19 study performance prediction
Ad

AGI Part 1.pdf

  • 1. Artificial General Intelligence 1 Bob Marcus robert.marcus@et-strategies.com Part 1 of 4 parts: Artificial Intelligence and Machine Learning
  • 2. This is a first cut. More details will be added later.
  • 3. Part 1: Artificial Intelligence (AI) Part 2: Natural Intelligence(NI) Part 3: Artificial General Intelligence (AI + NI) Part 4: Networked AGI Layer on top or Gaia and Human Society Four Slide Sets on Artificial General Intelligence AI = Artificial Intelligence (Task) AGI = Artificial Mind (Simulation) AB = Artificial Brain (Emulation) AC = Artificial Consciousness (Synthetic) AI < AGI < ? AB <AC (Is a partial brain emulation needed to create a mind?) Mind is not required for task proficiency Full Natural Brain architecture is not required for a mind Consciousness is not required for a natural brain architecture
  • 4. Philosophical Musings 10/2022 Focused Artifical Intelligence (AI) will get better at specific tasks Specific AI implementations will probably exceed human performance in most tasks Some will attain superhuman abilities is a wide range of tasks “Common Sense” = low-level experiential broad knowledge could be an exception Some AIs could use brain inspired architectures to improve complex ask performance This is not equivalent to human or artificial general intelligence (AGI) However networking task-centric AIs could provide a first step towards AGI This is similar to the way human society achieves power from communication The combination of the networked AIs could be the foundation of an artificial mind In a similar fashion, human society can accomplish complex tasks without being conscious Distributed division of labor enable tasks to be assigned to the most competent element Networked humans and AIs could cooperate through brain-machine interfaces In the brain, consciousness provides direction to the mind In large societies, governments perform the role of conscious direction With networked AIs, a “conscious operating system”could play a similar role. This would probably have to be initially programmed by humans. If the AI network included sensors, actuators, and robots it could be aware of the world The AI network could form a grid managing society, biology, and geology layers A conscious AI network could develop its own goals beyond efficient management Humans in the loop could be valuable in providing common sense and protective oversight
  • 5. Outline Classical AI Knowledge Representation Agents Classical Machine Learning Deep Learning Deep Learning Models Deep Learning Hardware Reinforcement Learning Google Research Computing and Sensing Architecture IoT and Deep Learning DeepMind Deep Learning 2020 Causal Reasoning and Deep Learning References
  • 6. Classical AI Classical Paper Awards 1999-2022
  • 7. Top 100 AI Start-ups From https://singularityhub.com/2020/03/30/the-top-100-ai-startups-out-there-now-and-what-theyre-working-on/
  • 8. Classical AI Tools Lisp https://en.wikipedia.org/wiki/Lisp_(programming_language) Prolog https://www.geeksforgeeks.org/prolog-an-introduction/ Knowledge Representation https://en.wikipedia.org/wiki/Knowledge_representation_and_reasoning Decision Trees https://en.wikipedia.org/wiki/Decision_tree Forward and Backward Chaining https://www.section.io/engineering-education/forward-and-backward-chaining-in-ai/ Constraint Satisfaction https://en.wikipedia.org/wiki/Constraint_satisfaction OPS5 https://en.wikipedia.org/wiki/OPS5
  • 9. Classical AI Systems CYC https://en.wikipedia.org/wiki/Cyc Expert Systems https://en.wikipedia.org/wiki/Expert_system XCON https://en.wikipedia.org/wiki/Xcon MYCIN https://en.wikipedia.org/wiki/Mycin MYCON https://www.slideshare.net/bobmarcus/1986-multilevel-constraintbased-configuration-article https://www.slideshare.net/bobmarcus/1986-mycon-multilevel-constraint-based-configuration
  • 11. Stored Knowledge Base From https://www.researchgate.net/publication/327926311_Development_of_a_knowledge_base_based_on_context_analysis_of_external_information_resources/figures?lo=1
  • 15. Intelligent Agents From https://en.wikipedia.org/wiki/Intelligent_agent In artificial intelligence, an intelligent agent (IA) is anything which perceives its environment, takes actions autonomously in order to achieve goals, and may improve its performance with learning or may use knowledge. They may be simple or complex — a thermostat is considered an example of an intelligent agent, as is a human being, as is any system that meets the definition, such as a firm, a state, or a biome.[1] Leading AI textbooks define "artificial intelligence" as the "study and design of intelligent agents", a definition that considers goal-directed behavior to be the essence of intelligence. Goal-directed agents are also described using a term borrowed from economics, "rational agent".[1] An agent has an "objective function" that encapsulates all the IA's goals. Such an agent is designed to create and execute whatever plan will, upon completion, maximize the expected value of the objective function.[2] For example, a reinforcement learning agent has a "reward function" that allows the programmers to shape the IA's desired behavior,[3] and an evolutionary algorithm's behavior is shaped by a "fitness function".[4] Intelligent agents in artificial intelligence are closely related to agents in economics, and versions of the intelligent agent paradigm are studied in cognitive science, ethics, the philosophy of practical reason, as well as in many interdisciplinary socio-cognitive modeling and computer social simulations. Intelligent agents are often described schematically as an abstract functional system similar to a computer program. Abstract descriptions of intelligent agents are called abstract intelligent agents (AIA) to distinguish them from their real world implementations. An autonomous intelligent agent is designed to function in the absence of human intervention. Intelligent agents are also closely related to software agents (an autonomous computer program that carries out tasks on behalf of users).
  • 16. Node in Real-Time Control System (RCS) by Albus From https://en.wikipedia.org/wiki/4D-RCS_Reference_Model_Architecture
  • 17. Intelligent Agents for Network Management From https://www.ericsson.com/en/blog/2022/6/who-are-the-intelligent-agents-in-network-operations-and-why-we-need-them
  • 18. Intelligent Agents on the Web From https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.230.5806&rep=rep1&type=pdf Intelligent agents are goal-driven and autonomous, and can communicate and interact with each other. Moreover, they can evaluate information obtained online from heterogeneoussources and present information tailored to an individual’s needs. This article covers different facets of the intelligent agent paradigm and applications, while also exploring new opportunities and trends for intelligent agents. IAs cover several functionalities, ranging from adaptive user interfaces (called interface agents) tointelligent mobile processes that cooperate with other agents to coordinate their activities in a distributed manner. The requirements for IAs remain open for discussion. An agent should be able to: • interact with humans and other agents • anticipate user needs for information • adapt to changes in user needs and the environment • cope with heterogeneity of information and other agents. The following attributes characterize an IA-based systems’ main capabilities: • Intelligence. The method an agent uses to de-velop its intelligence includes using the agent’sown software content and knowledge representation, which describes vocabulary data, conditions, goals, and tasks. • Continuity. An agent is a continuously running process that can detect changes in its environment, modify its behavior, and update its knowledge base (which describes the environment). • Communication. An agent can communicate with other agents to achieve its goals, and it can interact with users directly by using appropriate interfaces. • Cooperation. An agent automatically customizes itself to its users’ needs based on previous experiences and monitored profiles. • Mobility. The degree of mobility with which an agent can perform varies from remote execution, in which the agent is transferred from a distant system, to a situation in which the agent creates new agents, dies, or executes partially during migratiion
  • 19. Smart Agents 2022 Comparison From https://www.businessnewsdaily.com/10315-siri-cortana-google-assistant-amazon-alexa-face-off.html When AI assistants first hit the market, they were far from ubiquitous, but thanks to more third-party OEMs jumping on the smart speaker bandwagon, there are more choices for assistant-enabled devices than ever. In addition to increasing variety, in terms of hardware, devices that support multiple types of AI assistants are becoming more common. Despite more integration, competition between AI assistants is still stiff, so to save you time and frustration, we did an extensive hands-on test – not to compare speakers against each other, but to compare the AI assistants themselves. There are four frontrunners in the AI assistant space: Amazon (Alexa), Apple (Siri), Google (Google Assistant) and Microsoft (Cortana). Rather than gauge each assistant’s efficacy based on company-reported features, I spent hours testing each assistant by issuing commands and asking questions that many business users would use. I constructed questions to test basic understanding as well as contextual understanding and general vocal recognition. Accessibility and trends Ease of setup Voice recognition Success of queries and ability to understand context Bottom line None of the AI assistants are perfect; this is young technology, and it has a long way to go. There was a handful of questions that none of the virtual assistants on my list could answer. For example, when I asked for directions to the closest airport, even the two best assistants on my list, Google Assistant and Siri, failed hilariously: Google Assistant directed me to a travel agency (those still exist?), while Siri directed me to a seaplane base (so close!). Judging purely on out-of-the-box functionality, I would choose either Siri or Google Assistant, and I would make the final choice based on hardware preferences. None of the assistants are good enough to go out of your way to adopt. Choose between Siri and Google Assistant based on convenience and what hardware you already have IFTTT = "if this, then that," is a service that lets you connect apps, services, and smart home devices.
  • 20. Amazon Alexa From https://en.wikipedia.org/wiki/Amazon_Alexa Amazon Alexa, also known simply as Alexa,[2] is a virtual assistant technology largely based on a Polish speech synthesiser named Ivona, bought by Amazon in 2013.[3][4] It was first used in the Amazon Echo smart speaker and the Echo Dot, Echo Studio and Amazon Tap speakers developed by Amazon Lab126. It is capable of voice interaction, music playback, making to-do lists, setting alarms, streaming podcasts, playing audiobooks, and providing weather, traffic, sports, and other real-time information, such as news.[5] Alexa can also control several smart devices using itself as a home automation system. Users are able to extend the Alexa capabilities by installing "skills" (additional functionality developed by third-party vendors, in other settings more commonly called apps) such as weather programs and audio features. It uses automatic speech recognition, natural language processing, and other forms of weak AI to perform these tasks.[6] Most devices with Alexa allow users to activate the device using a wake-word[7] (such as Alexa or Amazon); other devices (such as the Amazon mobile app on iOS or Android and Amazon Dash Wand) require the user to click a button to activate Alexa's listening mode, although, some phones also allow a user to say a command, such as "Alexa" or "Alexa wake".
  • 21. Google Assistant From https://en.wikipedia.org/wiki/Google_Assistant Google Assistant is a virtual assistant software application developed by Google that is primarily available on mobile and home automation devices. Based on artificial intelligence, Google Assistant can engage in two-way conversations,[1] unlike the company's previous virtual assistant, Google Now. Google Assistant debuted in May 2016 as part of Google's messaging app Allo, and its voice-activated speaker Google Home. After a period of exclusivity on the Pixel and Pixel XL smartphones, it was deployed on other Android devices starting in February 2017, including third-party smartphones and Android Wear (now Wear OS), and was released as a standalone app on the iOS operating system in May 2017. Alongside the announcement of a software development kit in April 2017, Assistant has been further extended to support a large variety of devices, including cars and third-party smart home appliances. The functionality of the Assistant can also be enhanced by third-party developers. Users primarily interact with the Google Assistant through natural voice, though keyboard input is also supported. Assistant is able to answer questions, schedule events and alarms, adjust hardware settings on the user's device, show information from the user's Google account, play games, and more. Google has also announced that Assistant will be able to identify objects and gather visual information through the device's camera, and support purchasing products and sending money.
  • 22. Apple Siri https://en.wikipedia.org/wiki/Siri Siri (/ˈsɪri/ SEER-ee) is a virtual assistant that is part of Apple Inc.'s iOS, iPadOS, watchOS, macOS, tvOS, and audioOS operating systems.[1] [2] It uses voice queries, gesture based control, focus-tracking and a natural-language user interface to answer questions, make recommendations, and perform actions by delegating requests to a set of Internet services. With continued use, it adapts to users' individual language usages, searches and preferences, returning individualized results. Siri is a spin-off from a project developed by the SRI International Artificial Intelligence Center. Its speech recognition engine was provided by Nuance Communications, and it uses advanced machine learning technologies to function. Its original American, British and Australian voice actors recorded their respective voices around 2005, unaware of the recordings' eventual usage. Siri was released as an app for iOS in February 2010. Two months later, Apple acquired it and integrated into iPhone 4S at its release on 4 October, 2011, removing the separate app from the iOS App Store. Siri has since been an integral part of Apple's products, having been adapted into other hardware devices including newer iPhone models, iPad, iPod Touch, Mac, AirPods, Apple TV, and HomePod. Siri supports a wide range of user commands, including performing phone actions, checking basic information, scheduling events and reminders, handling device settings, searching the Internet, navigating areas, finding information on entertainment, and is able to engage with iOS-integrated apps. With the release of iOS 10 in 2016, Apple opened up limited third-party access to Siri, including third-party messaging apps, as well as payments, ride-sharing, and Internet calling apps. With the release of iOS 11, Apple updated Siri's voice and added support for follow-up questions, language translation, and additional third-party actions.
  • 23. Microsoft Cortana From https://en.wikipedia.org/wiki/Cortana_(virtual_assistant) Cortana is a virtual assistant developed by Microsoft that uses the Bing search engine to perform tasks such as setting reminders and answering questions for the user. Cortana is currently available in English, Portuguese, French, German, Italian, Spanish, Chinese, and Japanese language editions, depending on the software platform and region in which it is used.[8] Microsoft began reducing the prevalence of Cortana and converting it from an assistant into different software integrations in 2019.[9] It was split from the Windows 10 search bar in April 2019.[10] In January 2020, the Cortana mobile app was removed from certain markets,[11][12] and on March 31, 2021, the Cortana mobile app was shut down globally.[13] Microsoft has integrated Cortana into numerous products such as Microsoft Edge,[28] the browser bundled with Windows 10. Microsoft's Cortana assistant is deeply integrated into its Edge browser. Cortana can find opening hours when on restaurant sites, show retail coupons for websites, or show weather information in the address bar. At the Worldwide Partners Conference 2015 Microsoft demonstrated Cortana integration with products such as GigJam.[29] Conversely, Microsoft announced in late April 2016 that it would block anything other than Bing and Edge from being used to complete Cortana searches, again raising questions of anti-competitive practices by the company.[30] In May 2017, Microsoft in collaboration with Harman Kardon announced INVOKE, a voice-activated speaker featuring Cortana. The premium speaker has a cylindrical design and offers 360 degree sound, the ability to make and receive calls with Skype, and all of the other features currently available with Cortana.[42]
  • 25. Machine Learning Types From https://towardsdatascience.com/coding-deep-learning-for-beginners-types-of-machine-learning-b9e651e1ed9d
  • 26. Perceptron From https://deepai.org/machine-learning-glossary-and-terms/perceptron How does a Perceptron work? The process begins by taking all the input values and multiplying them by their weights. Then, all of these multiplied values are added together to create the weighted sum. The weighted sum is then applied to the activation function, producing the perceptron's output. The activation function plays the integral role of ensuring the output is mapped between required values such as (0,1) or (-1,1). It is important to note that the weight of an input is indicative of the strength of a node. Similarly, an input's bias value gives the ability to shift the activation function curve up or down.
  • 27. Ensemble Machine Learning From https://machinelearningmastery.com/tour-of-ensemble-learning-algorithms/ Ensemble learning is a general meta approach to machine learning that seeks better predictive performance by combining the predictions from multiple models. Although there are a seemingly unlimited number of ensembles that you can develop for your predictive modeling problem, there are three methods that dominate the field of ensemble learning. So much so, that rather than algorithms per se, each is a field of study that has spawned many more specialized methods. The three main classes of ensemble learning methods are bagging, stacking, and boosting, and it is important to both have a detailed understanding of each method and to consider them on your predictive modeling project. But, before that, you need a gentle introduction to these approaches and the key ideas behind each method prior to layering on math and code. In this tutorial, you will discover the three standard ensemble learning techniques for machine learning. After completing this tutorial, you will know: • Bagging involves fitting many decision trees on different samples of the same dataset and averaging the predictions. • Stacking involves fitting many different models types on the same data and using another model to learn how to best combine the predictions. • Boosting involves adding ensemble members sequentially that correct the predictions made by prior models and outputs a weighted average of the predictions.
  • 28. Bagging From https://en.wikipedia.org/wiki/Bootstrap_aggregating Bootstrap aggregating, also called bagging (from bootstrap aggregating), is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression. It also reduces variance and helps to avoid overfitting. Although it is usually applied to decision tree methods, it can be used with any type of method. Bagging is a special case of the model averaging approach. Given a standard training set of size n, bagging generates m new training sets , each of size nʹ, by sampling from D uniformly and with replacement. By sampling with replacement, some observations may be repeated in each . If nʹ=n, then for large n the set is expected to have the fraction (1 - 1/e) (≈63.2%) of the unique examples of D, the rest being duplicates.[1] This kind of sample is known as a bootstrap sample. Sampling with replacement ensures each bootstrap is independent from its peers, as it does not depend on previous chosen samples when sampling. Then, m models are fitted using the above m bootstrap samples and combined by averaging the output (for regression) or voting (for classification).
  • 29. Boosting From https://www.ibm.com/cloud/learn/boosting and https://en.wikipedia.org/wiki/Boosting_(machine_learning) In machine learning, boosting is an ensemble meta-algorithm for primarily reducing bias, and also variance[1] in supervised learning, and a family of machine learning algorithms that convert weak learners to strong ones Bagging vs Boosting Bagging and boosting are two main types of ensemble learning methods. As highlighted in this study (PDF, 242 KB) (link resides outside IBM), the main difference between these learning methods is the way in which they are trained. In bagging, weak learners are trained in parallel, but in boosting, they learn sequentially. This means that a series of models are constructed and with each new model iteration, the weights of the misclassified data in the previous model are increased. This redistribution of weights helps the algorithm identify the parameters that it needs to focus on to improve its performance. AdaBoost, which stands for “adaptative boosting algorithm,” is one of the most popular boosting algorithms as it was one of the first of its kind. Other types of boosting algorithms include XGBoost, GradientBoost, and BrownBoost. Another difference between bagging and boosting is in how they are used. For example, bagging methods are typically used on weak learners that exhibit high variance and low bias, whereas boosting methods are leveraged when low variance and high bias is observed. While bagging can be used to avoid overfitting, boosting methods can be more prone to this (link resides outside IBM) although it really depends on the dataset. However, parameter tuning can help avoid the issue. As a result, bagging and boosting have different real-world applications as well. Bagging has been leveraged for loan approval processes and statistical genomics while boosting has been used more within image recognition apps and search engines. Boosting is an ensemble learning method that combines a set of weak learners into a strong learner to minimize training errors. In boosting, a random sample of data is selected, fitted with a model and then trained sequentially—that is, each model tries to compensate for the weaknesses of its predecessor. With each iteration, the weak rules from each individual classifier are combined to form one, strong prediction rule.
  • 30. Stacking From https://www.geeksforgeeks.org/stacking-in-machine-learning/ Stacking is a way to ensemble multiple classifications or regression model. There are many ways to ensemble models, the widely known models are Bagging or Boosting. Bagging allows multiple similar models with high variance are averaged to decrease variance. Boosting builds multiple incremental models to decrease the bias, while keeping variance small. Stacking (sometimes called Stacked Generalization) is a different paradigm. The point of stacking is to explore a space of different models for the same problem. The idea is that you can attack a learning problem with different types of models which are capable to learn some part of the problem, but not the whole space of the problem. So, you can build multiple different learners and you use them to build an intermediate prediction, one prediction for each learned model. Then you add a new model which learns from the intermediate predictions the same target. This final model is said to be stacked on the top of the others, hence the name. Thus, you might improve your overall performance, and often you end up with a model which is better than any individual intermediate model. Notice however, that it does not give you any guarantee, as is often the case with any machine learning technique.
  • 31. Gradient Boosting From https://en.wikipedia.org/wiki/Gradient_boosting Gradient boosting is a machine learning technique used in regression and classification tasks, among others. It gives a prediction model in the form of an ensemble of weak prediction models, which are typically decision trees.[1][2] When a decision tree is the weak learner, the resulting algorithm is called gradient-boosted trees; it usually outperforms random forest.[1][2][3] A gradient-boosted trees model is built in a stage-wise fashion as in other boosting methods, but it generalizes the other methods by allowing optimization of an arbitrary differentiable loss function.
  • 32. Introduction to XG Boost From https://machinelearningmastery.com/gentle-introduction-xgboost-applied-machine-learning/
  • 33. Terminology ・SoftMax https://en.wikipedia.org/wiki/Softmax_function ・SoftPlus https://en.wikipedia.org/wiki/Rectifier_(neural_networks)#Softplus ・Logit https://en.wikipedia.org/wiki/Logit ・Sigmoid https://en.wikipedia.org/wiki/Sigmoid_function ・Logistic Function https://en.wikipedia.org/wiki/Logistic_function ・Tanh https://brenocon.com/blog/2013/10/tanh-is-a-rescaled-logistic-sigmoid-function/ ・ReLu https://en.wikipedia.org/wiki/Rectifier_(neural_networks) ・Maxpool Selects the maximum in subsets of convolutional neural nets layer ・
  • 34. Relationships SoftMax SoftPlus Sigmoid = Logistic Tanh Logit Inverses Derivative SoftMax (z, 0) First component SoftMax (z, -z) First component SoftMax (z, -z) Second component - x = log (2p/(1-p)) (0, x) (-1, 1) (0, 1) (-∞, + ∞) (0,1) Log (SoftMax (z1, z2) First component)/ (SoftMax (z1, z2) Second component)) ReLu (0, x)
  • 35. Terminology (continued) ・Ηeteroscedastic https://en.wiktionary.org/wiki/scedasticity ・Maxout https://stats.stackexchange.com/questions/129698/what-is-maxout-in-neural-network/298705 ・Cross-Entropy https://en.wikipedia.org/wiki/Cross_entropy -Ep(log q) ・Joint Entropy https://en.wikipedia.org/wiki/Joint_entropy - Ep(X,Y) (log (p(X,Y)) ・KL Divergence https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence ・H(P,Q) = H(P) + KL(P,Q) or Ep(log q) = -Ep(log p) + {Ep(log p) - Ep(log q)} ・Mutual Information https://en.wikipedia.org/wiki/Mutual_information KL (p(x,y), p(x)p(y)) ・Ridge Regression and Lasso Regression https://hackernoon.com/practical-machine-learning-ridge-regression-vs-lasso-a00326371ece ・Logistic Regression https://www.stat.cmu.edu/~cshalizi/uADA/12/lectures/ch12.pdf ・Dropout https://en.wikipedia.org/wiki/Dropout_(neural_networks) ・RMSProp and AdaGrad and AdaDelta and Adam https://www.quora.com/What-are-differences-between-update-rules-like-AdaDelta-RMSProp-AdaGrad-and-AdaM ・Pooling https://www.quora.com/Is-pooling-indispensable-in-deep-learning ・Boltzmann Machine https://en.wikipedia.org/wiki/Boltzmann_machine ・Hyperparameters ・
  • 36. Reinforcement Learning Book From https://www.andrew.cmu.edu/course/10-703/textbook/BartoSutton.pdf
  • 37. Acumos Shared Model Process Flow From https://arxiv.org/ftp/arxiv/papers/1810/1810.07159.pdf
  • 38. Distributed AI From https://en.wikipedia.org/wiki/Distributed_artificial_intelligence Distributed Artificial Intelligence (DAI) also called Decentralized Artificial Intelligence[1] is a subfield of artificial intelligence research dedicated to the development of distributed solutions for problems. DAI is closely related to and a predecessor of the field of multi-agent systems. The objectives of Distributed Artificial Intelligence are to solve the reasoning, planning, learning and perception problems of artificial intelligence, especially if they require large data, by distributing the problem to autonomous processing nodes (agents). To reach the objective, DAI requires: • A distributed system with robust and elastic computation on unreliable and failing resources that are loosely coupled • Coordination of the actions and communication of the nodes • Subsamples of large data sets and online machine learning There are many reasons for wanting to distribute intelligence or cope with multi-agent systems. Mainstream problems in DAI research include the following: • Parallel problem solving: mainly deals with how classic artificial intelligence concepts can be modified, so that multiprocessor systems and clusters of computers can be used to speed up calculation. • Distributed problem solving (DPS): the concept of agent, autonomous entities that can communicate with each other, was developed to serve as an abstraction for developing DPS systems. See below for further details. • Multi-Agent Based Simulation (MABS): a branch of DAI that builds the foundation for simulations that need to analyze not only phenomena at macro level but also at micro level, as it is in many social simulation scenarios.
  • 39. Swarm Intelligence From https://en.wikipedia.org/wiki/Swarm_intelligence Swarm intelligence (SI) is the collective behavior of decentralized, self-organized systems, natural or artificial. The concept is employed in work on artificial intelligence. The expression was introduced by Gerardo Beni and Jing Wang in 1989, in the context of cellular robotic systems.[1] SI systems consist typically of a population of simple agents or boids interacting locally with one another and with their environment.[2] The inspiration often comes from nature, especially biological systems. The agents follow very simple rules, and although there is no centralized control structure dictating how individual agents should behave, local, and to a certain degree random, interactions between such agents lead to the emergence of "intelligent" global behavior, unknown to the individual agents.[3] Examples of swarm intelligence in natural systems include ant colonies, bee colonies, bird flocking, hawks hunting, animal herding, bacterial growth, fish schooling and microbial intelligence. The application of swarm principles to robots is called swarm robotics while swarm intelligence refers to the more general set of algorithms. Swarm prediction has been used in the context of forecasting problems. Similar approaches to those proposed for swarm robotics are considered for genetically modified organisms in synthetic collective intelligence.[4] • 1 Models of swarm behavior ◦ 1.1 Boids (Reynolds 1987) ◦ 1.2 Self-propelled particles (Vicsek et al. 1995) • 2 Metaheuristics ◦ 2.1 Stochastic diffusion search (Bishop 1989) ◦ 2.2 Ant colony optimization (Dorigo 1992) ◦ 2.3 Particle swarm optimization (Kennedy, Eberhart & Shi 1995) ◦ 2.4 Artificial Swarm Intelligence (2015) • 3 Applications ◦ 3.1 Ant-based routing ◦ 3.2 Crowd simulation ▪ 3.2.1 Instances ◦ 3.3 Human swarming ◦ 3.4 Swarm grammars ◦ 3.5 Swarmic art
  • 40. IBM Watson From https://en.wikipedia.org/wiki/IBM_Watson IBM Watson is a question-answering computer system capable of answering questions posed in natural language,[2] developed in IBM's DeepQA project by a research team led by principal investigator David Ferrucci.[3] Watson was named after IBM's founder and first CEO, industrialist Thomas J. Watson.[4][5] Software -Watson uses IBM's DeepQA software and the Apache UIMA (Unstructured Information Management Architecture) framework implementation. The system was written in various languages, including Java, C++, and Prolog, and runs on the SUSE Linux Enterprise Server 11 operating system using the Apache Hadoop framework to provide distributed computing.[12][13][14] Hardware -The system is workload-optimized, integrating massively parallel POWER7 processors and built on IBM's DeepQA technology,[15] which it uses to generate hypotheses, gather massive evidence, and analyze data.[2] Watson employs a cluster of ninety IBM Power 750 servers, each of which uses a 3.5 GHz POWER7 eight- core processor, with four threads per core. In total, the system has 2,880 POWER7 processor threads and 16 terabytes of RAM.[15] According to John Rennie, Watson can process 500 gigabytes (the equivalent of a million books) per second.[16] IBM master inventor and senior consultant Tony Pearson estimated Watson's hardware cost at about three million dollars.[17] Its Linpack performance stands at 80 TeraFLOPs, which is about half as fast as the cut-off line for the Top 500 Supercomputers list.[18] According to Rennie, all content was stored in Watson's RAM for the Jeopardy game because data stored on hard drives would be too slow to compete with human Jeopardy champions.[16] Data -The sources of information for Watson include encyclopedias, dictionaries, thesauri, newswire articles and literary works. Watson also used databases, taxonomies and ontologies including DBPedia, WordNet and Yago.[19] The IBM team provided Watson with millions of documents, including dictionaries, encyclopedias and other reference material, that it could use to build its knowledge.[20] From https://www.researchgate.net/publication/282644173_Implementation_of_a_Natural_Language_Processing_Tool_for_Cyber-Physical_Systems/figures?lo=1
  • 42. Three Types of Deep Learning From https://www.slideshare.net/TerryTaewoongUm/introduction-to-deep-learning-with-tensorflow
  • 44. Convolutional Neural Nets Comparison (2016) From https://medium.com/@culurciello/analysis-of-deep-neural-networks-dcf398e71aae Reference: https://towardsdatascience.com/neural-network-architectures-156e5bad51ba
  • 45. Recurrent Neural Networks From https://medium.com/deep-math-machine-learning-ai/chapter-10-deepnlp-recurrent-neural-networks-with-math-c4a6846a50a2
  • 47. Dynamical System View on Recurrent Neural Networks From https://openreview.net/pdf?id=ryxepo0cFX
  • 49. Deep Learning Models From https://arxiv.org/pdf/1712.04301.pdf
  • 50. Neural Net Models From https://becominghuman.ai/cheat-sheets-for-ai-neural-networks-machine-learning-deep-learning-big-data-678c51b4b463
  • 51. Neural Net Models (cont) From https://becominghuman.ai/cheat-sheets-for-ai-neural-networks-machine-learning-deep-learning-big-data-678c51b4b463
  • 52. TensorFlow From https://en.wikipedia.org/wiki/TensorFlow TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks.[4][5] TensorFlow was developed by the Google Brain team for internal Google use in research and production.[6][7][8] The initial version was released under the Apache License 2.0 in 2015.[1][9] Google released the updated version of TensorFlow, named TensorFlow 2.0, in September 2019.[10] TensorFlow can be used in a wide variety of programming languages, most notably Python, as well as Javascript, C++, and Java.[11] This flexibility lends itself to a range of applications in many different sectors.
  • 53. Keras From https://en.wikipedia.org/wiki/Keras Keras is an open-source software library that provides a Python interface for artificial neural networks. Keras acts as an interface for the TensorFlow library. Up until version 2.3, Keras supported multiple backends, including TensorFlow, Microsoft Cognitive Toolkit, Theano, and PlaidML.[1][2][3] As of version 2.4, only TensorFlow is supported. Designed to enable fast experimentation with deep neural networks, it focuses on being user-friendly, modular, and extensible. It was developed as part of the research effort of project ONEIROS (Open-ended Neuro-Electronic Intelligent Robot Operating System),[4] and its primary author and maintainer is François Chollet, a Google engineer. Chollet is also the author of the Xception deep neural network model.[5]
  • 54. Comparison of Deep Learning Frameworks From https://arxiv.org/pdf/1903.00102.pdf
  • 55. Popularity of Deep Learning Frameworks From https://medium.com/implodinggradients/tensorflow-or-keras-which-one-should-i-learn-5dd7fa3f9ca0
  • 56. Acronyms in Deep Learning • RBM - Restricted Boltzmann Machines • MLP - Multi-layer Perceptron • DBN - Deep Belief Network • CNN - Convolution Neural Network • RNN - Recurrent Neural Network • SGD - Stochastic Gradient Descent • XOR - Exclusive Or • SVM - SupportVector Machine • ReLu - Rectified Linear Unit • MNIST - Modified National Institute of Standards and Technology • RBF - Radial Basis Function • HMM - Hidden Markovv Model • MAP - Maximum A Postiori • MLE - Maximum Likelihood Estimate • Adam - Adaptive Moment Estimation • LSTM - Long Short Term Memory • GRU - Gated Recurrent Unit
  • 57. Concerns for Deep Learning by Gary Marcus From https://arxiv.org/ftp/arxiv/papers/1801/1801.00631.pdf Deep Learning thus far: • Is data hungry • Is shallow and has limited capacity for transfer • Has no natural way to deal with hierarchical structure • Has struggled with open-ended inference • Is not sufficiently transparent • Has not been well integrated with prior knowledge • Cannot inherently distinguish causation from correlation • Presumes a largely stable world, in ways that may be problematic • Works well as an approximation, but answers often can’t be fully trusted • Is difficult to engineer with
  • 59. How transferable are features in deep neural networks? From http://cs231n.github.io/transfer-learning/
  • 61. More Transfer Learning From https://towardsdatascience.com/transfer-learning-from-pre-trained-models-f2393f124751
  • 62. More Transfer Learning From http://ruder.io/transfer-learning/
  • 63. Bayesian Deep Learning From https://alexgkendall.com/computer_vision/bayesian_deep_learning_for_safe_ai/
  • 64. Bayesian Learning vis Stochastic Gradient Langevin Dynamics From https://tinyurl.com/22xayz76 In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small minibatches.By adding the right amount of noise to a standard stochastic gradient optimization algorithm we show that the iterates will converge to samples from the true posterior distribution as we anneal the stepsize. This seamless transition between optimization and Bayesian posterior sampling provides an in- built protection against overfitting. We also propose a practical method for Monte Carlo estimates of posterior statistics which monitors a “sampling threshold” and collects samples after it has been surpassed. We apply the method to three models: a mixture of Gaussians, logistic regression and ICA with natural gradients Our method combines Robbins-Monro type algorithms which stochastically optimize a likelihood, with Langevin dynamics which injects noise into the parameter updates in such a waythat the trajectory of the parameters will converge to the full posterior distribution rather than just themaximum a posteriori mode. The resulting algorithm starts off being similar to stochastic optimization, then automatically transitions to one that simulates samples from the posterior using Langevin dynamics.
  • 65. DeterministicVariational Inference for Robust Bayesian NNs From https://openreview.net/pdf?id=B1l08oAct7
  • 66. Bayesian Deep Learning Survey From https://arxiv.org/pdf/1604.01662.pdf Conclusion and Future Research In this survey, we identified a current trend of merging probabilistic graphical models and neural networks (deep learning) and reviewed recent work on Bayesian deep learning, which strives to combine the merits of PGM and NN by organically integrating them in a single principled probabilistic framework. To learn parameters in BDL, several algorithms have been proposed, ranging from block coordinate descent, Bayesian conditional density filtering, and stochastic gradient thermostats to stochastic gradient variational Bayes. Bayesian deep learning gains its popularity both from the success of PGM and from the recent promising advances on deep learning. Since many real-world tasks involve both perception and inference, BDL is a natural choice to harness the perception ability from NN and the (causal and logical) inference ability from PGM. Although current applications of BDL focus on recommender systems, topic models, and stochastic optimal control, in the future, we can expect an increasing number of other applications like link prediction, community detection, active learning, Bayesian reinforcement learning, and many other complex tasks that need interaction between perception and causal inference. Besides, with the advances of efficient Bayesian neural networks (BNN), BDL with BNN as an important component is expected to be more and more scalable
  • 67. Ensemble Methods for Deep Learning From https://machinelearningmastery.com/ensemble-methods-for-deep-learning-neural-networks/
  • 68. Comparing Loss Functions From Neural Networks and Deep Learning Book
  • 69. Seed Reinforcement Learning from Google From https://ai.googleblog.com/2020/03/massively-scaling-reinforcement.html The field of reinforcement learning (RL) has recently seen impressive results across a variety oftasks. This has in part been fueled by the introduction of deep learning in RL and the introduction of accelerators such as GPUs. In the very recent history, focus on massive scale has been key to solve a number of complicated games such as AlphaGo (Silver et al., 2016), Dota (OpenAI, 2018)and StarCraft 2 (Vinyals et al., 2017). The sheer amount of environment data needed to solve tasks trivial to humans, makes distributed machine learning unavoidable for fast experiment turnaround time. RL is inherently comprised of heterogeneous tasks: running environments, model inference, model training, replay buffer, etc. and current state-of-the-art distributed algorithms do not efficiently use compute resources for the tasks.The amount of data and inefficient use of resources makes experiments unreasonably expensive. The two main challenges addressed in this paper are scaling of reinforcement learning and optimizing the use of modern accelerators, CPUs and other resources. We introduce SEED (Scalable, Efficient, Deep-RL), a modern RL agent that scales well, is flexible and efficiently utilizes available resources. It is a distributed agent where model inference is done centrally combined with fast streaming RPCs to reduce the overhead of inference calls. We show that with simple methods, one can achieve state-of-the-art results faster on a number of tasks. For optimal performance, we use TPUs (cloud.google.com/ tpu/) and TensorFlow 2 (Abadi et al., 2015)to simplify the implementation. The cost of running SEED is analyzed against IMPALA (Espeholtet al., 2018) which is a commonly used state-of-the-art distributed RL algorithm (Veeriah et al.(2019); Li et al. (2019); Deverett et al. (2019); Omidshafiei et al. (2019); Vezhnevets et al. (2019);Hansen et al. (2019); Schaarschmidt et al.; Tirumala et al. (2019), ...). We show cost reductions of up to 80% while being significantly faster. When scaling SEED to many accelerators, it can train on millions of frames per second. Finally, the implementation is open-sourced together with examples of running it at scale on Google Cloud (see Appendix A.4 for details) making it easy to reproduce results and try novel ideas
  • 70. Designing Neural Nets through Neuroevolution From tinyurl.com/mykhb52y Much of recent machine learning has focused on deep learning, in which neural network weights are trained through variantsof stochastic gradient descent. An alternative approach comes from the field of neuroevolution, which harnesses evolutionary algorithms to optimize neural networks, inspired by the fact that natural brains themselves are the products of an evolutionary process. Neuroevolution enables important capabilities that are typically unavailable to gradient-based approaches, including learning neural network building blocks (for example activation functions), hyperparameters, architectures and even the algorithms for learning themselves. Neuroevolution also differs from deep learning (and deep reinforcement learning) by maintaining a population of solutions during search, enabling extreme exploration and massive parallelization. Finally, because neuroevolution research has (until recently) developed largely in isolation from gradient- based neural network research, ithas developed many unique and effective techniques that should be effective in other machine learning areas too. This Review looks at several key aspects of modern neuroevolution, including large-scale computing, the benefits of novelty and diversity, the power of indirect encoding, and the field’s contributions to meta-learning and architecture search. Our hope is to inspire renewed interest in the field as it meets the potential of the increasing computation available today, to highlight how many of its ideas can provide an exciting resource for inspiration and hybridization to the deep learning, deep reinforcement learning and machine learning communities, and to explain how neuroevolution could prove to be a critical tool in the long-term pursuit of artificial general intelligence
  • 71. Illuminating Search Spaces by Mapping Elites From https://arxiv.org/pdf/1504.04909.pdf
  • 73. From https://arxiv.org/pdf/1412.3555v1.pdf A transformer is a deep learning model that adopts the mechanism of self-attention, differentially weighting the significance of each part of the input data. It is used primarily in the fields of natural language processing (NLP)[1] and computer vision (CV).[2] Like recurrent neural networks (RNNs), transformers are designed to process sequential input data, such as natural language, with applications towards tasks such as translation and text summarization. However, unlike RNNs, transformers process the entire input all at once. The attention mechanism provides context for any position in the input sequence. For example, if the input data is a natural language sentence, the transformer does not have to process one word at a time. This allows for more parallelization than RNNs and therefore reduces training times.[1] Transformers were introduced in 2017 by a team at Google Brain[1] and are increasingly the model of choice for NLP problems,[3] replacing RNN models such as long short- term memory (LSTM). The additional training parallelization allows training on larger datasets. This led to the development of pretrained systems such as BERT (Bidirectional Encoder Representations from Transformers) and GPT (Generative Pre-trained Transformer), which were trained with large language datasets, such as the Wikipedia Corpus and Common Crawl, and can be fine-tuned for specific tasks.[4][5] Attention mechanisms let a model draw from the state at any preceding point along the sequence. The attention layer can access all previous states and weight them according to a learned measure of relevance, providing relevant information about far-away tokens. When added to RNNs, attention mechanisms increase performance. The development of the Transformer architecture revealed that attention mechanisms were powerful in themselves and that sequential recurrent processing of data was not necessary to achieve the quality gains of RNNs with attention. Transformers use an attention mechanism without an RNN, processing all tokens at the same time and calculating attention weights between them in successive layers. Since the attention mechanism only uses information about other tokens from lower layers, it can be computed for all tokens in parallel, which leads to improved training speed. Like earlier seq2seq models, the original Transformer model used an encoder–decoder architecture. The encoder consists of encoding layers that process the input iteratively one layer after another, while the decoder consists of decoding layers that do the same thing to the encoder's output. The function of each encoder layer is to generate encodings that contain information about which parts of the inputs are relevant to each other. It passes its encodings to the next encoder layer as inputs. Each decoder layer does the opposite, taking all the encodings and using their incorporated contextual information to generate an output sequence.[6] To achieve this, each encoder and decoder layer makes use of an attention mechanism. For each input, attention weighs the relevance of every other input and draws from them to produce the output.[7] Each decoder layer has an additional attention mechanism that draws information from the outputs of previous decoders, before the decoder layer draws information from the encodings. Both the encoder and decoder layers have a feed-forward neural network for additional processing of the outputs and contain residual connections and layer normalization steps. Transformers
  • 74. From https://en.wikipedia.org/wiki/Transformer_(machine_learning_model) Transformers Before transformers, most state-of-the-art NLP systems relied on gated RNNs, such as LSTMs and gated recurrent units (GRUs), with added attention mechanisms. Transformers also make use of attention mechanisms but, unlike RNNs, do not have a recurrent structure. This means that provided with enough training data, attention mechanisms alone can match the performance of RNNs with attention.[1] Sequential processing Gated RNNs process tokens sequentially, maintaining a state vector that contains a representation of the data seen prior to the current token. To process the th token, the model combines the state representing the sentence up to token with the information of the new token to create a new state, representing the sentence up to token . Theoretically, the information from one token can propagate arbitrarily far down the sequence, if at every point the state continues to encode contextual information about the token. In practice this mechanism is flawed: the vanishing gradient problem leaves the model's state at the end of a long sentence without precise, extractable information about preceding tokens. The dependency of token computations on results of previous token computations also makes it hard to parallelize computation on modern deep learning hardware. This can make the training of RNNs inefficient. Self-Attention These problems were addressed by attention mechanisms. Attention mechanisms let a model draw from the state at any preceding point along the sequence. The attention layer can access all previous states and weight them according to a learned measure of relevance, providing relevant information about far-away tokens. A clear example of the value of attention is in language translation, where context is essential to assign the meaning of a word in a sentence. In an English-to-French translation system, the first word of the French output most probably depends heavily on the first few words of the English input. However, in a classic LSTM model, in order to produce the first word of the French output, the model is given only the state vector after processing the last English word. Theoretically, this vector can encode information about the whole English sentence, giving the model all necessary knowledge. In practice, this information is often poorly preserved by the LSTM. An attention mechanism can be added to address this problem: the decoder is given access to the state vectors of every English input word, not just the last, and can learn attention weights that dictate how much to attend to each English input state vector. When added to RNNs, attention mechanisms increase performance. The development of the Transformer architecture revealed that attention mechanisms were powerful in themselves and that sequential recurrent processing of data was not necessary to achieve the quality gains of RNNs with attention. Transformers use an attention mechanism without an RNN, processing all tokens at the same time and calculating attention weights between them in successive layers. Since the attention mechanism only uses information about other tokens from lower layers, it can be computed for all tokens in parallel, which leads to improved training speed.
  • 75. From https://en.wikipedia.org/wiki/GPT-3 GPT-3 Generative Pre-trained Transformer 3 (GPT-3; stylized GPT·3) is an autoregressive language model that uses deep learning to produce human-like text. The architecture is a standard transformer network (with a few engineering tweaks) with the unprecedented size of 2048-token-long context and 175 billion parameters (requiring 800 GB of storage). The training method is "generative pretraining", meaning that it is trained to predict what the next token is. The model demonstrated strong few-shot learning on many text-based tasks. It is the third-generation language prediction model in the GPT-n series (and the successor to GPT-2) created by OpenAI, a San Francisco-based artificial intelligence research laboratory.[2] GPT-3's full version has a capacity of 175 billion machine learning parameters. GPT-3, which was introduced in May 2020, and was in beta testing as of July 2020,[3] is part of a trend in natural language processing (NLP) systems of pre-trained language representations.[1] The quality of the text generated by GPT-3 is so high that it can be difficult to determine whether or not it was written by a human, which has both benefits and risks.[4] Thirty-one OpenAI researchers and engineers presented the original May 28, 2020 paper introducing GPT-3. In their paper, they warned of GPT-3's potential dangers and called for research to mitigate risk.[1]:34 David Chalmers, an Australian philosopher, described GPT-3 as "one of the most interesting and important AI systems ever produced."[5] Microsoft announced on September 22, 2020, that it had licensed "exclusive" use of GPT-3; others can still use the public API to receive output, but only Microsoft has access to GPT-3's underlying model.[6] An April 2022 review in The New York Times described GPT-3's capabilities as being able to write original prose with fluency equivalent to that of a human.[7]
  • 76. OpenAI From https://openai.com/ Recent Research Efficient Training of Language Models to Fill in the Middle Hierarchical Text-Conditional Image Generation with CLIP Latents Formal Mathematics Statement Curriculum Learning Training language models to follow instructions with human feedback Text and Code Embeddings by Contrastive Pre-Training WebGPT: Browser-assisted question-answering with human feedback Training Verifiers to Solve Math Word Problems Recursively Summarizing Books with Human Feedback Evaluating Large Language Models Trained on Code Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets Multimodal Neurons in Artificial Neural Networks Learning Transferable Visual Models From Natural Language Supervision Zero-Shot Text-to-Image Generation Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models OpenAI is an AI research and deployment company. Our mission is to ensure that artificial general intelligence benefits all of humanity.
  • 78. Reservoir Computing From https://martinuzzifrancesco.github.io/posts/a-brief-introduction-to-reservoir-computing/ Reservoir Computing is an umbrella term used to identify a general framework of computation derived from Recurrent Neural Networks (RNN), indipendently developed by Jaeger [1] and Maass et al. [2]. These papers introduced the concepts of Echo State Networks (ESN) and Liquid State Machines (LSM) respectively. Further improvements over these two models constitute what is now called the field of Reservoir Computing. The main idea lies in leveraging a fixed non-linear system, of higher dimension than the input, onto which to input signal is mapped. After this mapping is only necessary to use a simple readout layer to harvest the state of the reservoir and to train it to the desired output. In principle, given a complex enough system, this architecture should be capable of any computation [3]. The intuition was born from the fact that in training RNNs most of the times the weights showing most change were the ones in the last layer [4]. In the next section we will also see that ESNs actually use a fixed random RNN as the reservoir. Given the static nature of this implementation usually ESNs can yield faster results and in some cases even better, in particular when dealing with chaotic time series predictions [5]. But not every complex system is suited to be a good reservoir. A good reservoir is one that is able to separate inputs; different external inputs should drive the system to different regions of the configuration space [3]. This is called the separability condition. Furthermore an important property for the reservoirs of ESNs is the Echo State property which states that inputs to the reservoir echo in the system forever, or util they dissipate. A more formal definition of this property can be found in [6]. Reservoir computing is a best-in-class machine learning algorithm for processing information generated by dynamical systems using observed time-series data. Importantly, it requires very small training data sets, uses linear optimization, and thus requires minimal computing resources. However, the algorithm uses randomly sampled matrices to define the underlying recurrent neural network and has a multitude of metaparameters that must be optimized. Recent results demonstrate the equivalence of reservoir computing to nonlinear vector autoregression, which requires no random matrices, fewer metaparameters, and provides interpretable results. Here, we demonstrate that nonlinear vector autoregression excels at reservoir computing benchmark tasks and requires even shorter training data sets and training time, heralding the next generation of reservoir computing. A dynamical system evolves in time, with examples including the Earth’s weather system and human-built devices such as unmanned aerial vehicles. One practical goal is to develop models for forecasting their behavior. Recent machine learning (ML) approaches can generate a model using only observed data, but many of these algorithms tend to be data hungry, requiring long observation times and substantial computational resources. Reservoir computing1,2 is an ML paradigm that is especially well-suited for learning dynamical systems. Even when systems display chaotic3 or complex spatiotemporal behaviors4, which are considered the hardest-of-the-hard problems, an optimized reservoir computer (RC) can handle them with ease. From https://www.nature.com/articles/s41467-021-25801-2
  • 79. Reservoir Computing Trends From https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.709.514&rep=rep1&type=pdf
  • 80. Brain Connectivity meets Reservoir Computing From https://www.biorxiv.org/content/10.1101/2021.01.22.427750v1 The connectivity of Artificial Neural Networks (ANNs) is different from the one observed in Biological Neural Networks (BNNs). Can the wiring of actual brains help improve ANNs architectures? Can we learn from ANNs about what network features support computation in the brain when solving a task? ANNs’ architectures are carefully engineered and have crucial importance in many recent performance improvements. On the other hand, BNNs’ exhibit complex emergent connectivity patterns. At the individual level, BNNs connectivity results from brain development and plasticity processes, while at the species level, adaptive reconfigurations during evolution also play a major role shaping connectivity. Ubiquitous features of brain connectivity have been identified in recent years, but their role in the brain’s ability to perform concrete computations remains poorly understood. Computational neuroscience studies reveal the influence of specific brain connectivity features only on abstract dynamical properties, although the implications of real brain networks topologies on machine learning or cognitive tasks have been barely explored. Here we present a cross-species study with a hybrid approach integrating real brain connectomes and Bio-Echo State Networks, which we use to solve concrete memory tasks, allowing us to probe the potential computational implications of real brain connectivity patterns on task solving. We find results consistent across species and tasks, showing that biologically inspired networks perform as well as classical echo state networks, provided a minimum level of randomness and diversity of connections is allowed. We also present a framework, bio2art, to map and scale up real connectomes that can be integrated into recurrent ANNs. This approach also allows us to show the crucial importance of the diversity of interareal connectivity patterns, stressing the importance of stochastic processes determining neural networks connectivity in general.
  • 83. Summary of Deep Learning Models: Survey From https://arxiv.org/pdf/1712.04301.pdf
  • 84. Deep Learning Acronyms From https://arxiv.org/pdf/1712.04301.pdf
  • 85. Deep Learning Hardware From https://medium.com/iotforall/using-deep-learning-processors-for-intelligent-iot-devices-1a7ed9d2226d
  • 86. Deep Learning MIT From https://deeplearning.mit.edu/
  • 88. GitHub ONNX Models From https://github.com/onnx/models
  • 89. HPC vs Big Data Ecosystems From https://www.hpcwire.com/2018/08/31/the-convergence-of-big-data-and-extreme-scale-hpc/
  • 90. HPC and ML From http://dsc.soic.indiana.edu/publications/Learning_Everywhere_Summary.pdf HPCforML: Using HPC to execute and enhance ML performance, or using HPC simulations to train ML algorithms (theory guided machine learning), which are then used to understand experimental data or simulations. •MLforHPC: Using ML to enhance HPC applications and systems •This categorization is related to Jeff Dean’s ”Machine Learning for Systems and Systems for Machine Learning” [6] and Matsuoka’s convergence of AI and HPC [7].We further subdivide HPCforML as •• HPCrunsML: Using HPC to execute ML with high performance • SimulationTrainedML: Using HPC simulations to train ML algorithms, which are then used to understand experimental data or simulations. We also subdivide MLforHPC as • MLautotuning: Using ML to configure (autotune) ML or HPC simulations. Already, autotuning with systems like ATLAS is hugely successful and gives an initial view of MLautotuning. As well as choosing block sizes to improve cache use and vectorization, MLautotuning can also be used for simulation mesh sizes [8] and in big data problems for configuring databases and complex systems like Hadoop and Spark [9], [10] •. • MLafterHPC: ML analyzing results of HPC as in trajectory analysis and structure identification in biomolecular simulations • MLaroundHPC: Using ML to learn from simulations and produce learned surrogates for the simulations. The same ML wrapper can also learn configurations as well as results. This differs from SimulationTrainedML as there typically a learnt network is used to redirect observation whereas in MLaroundHPC we are using the ML to improve the HPC performance •. • MLControl: Using simulations (with HPC) in contro of experiments and in objective driven computational campaigns [11]. Here the simulation surrogates are very valuable to allow real-time predictions.
  • 91. Designing Neural Nets through Neuroevolution From www.evolvingai.org/stanley-clune-lehman-2019-designing-neural-networks
  • 92. Go Explore Algorithm From http://www.evolvingai.org/files/1901.10995.pdf
  • 93. Deep Density Destructors From https://www.cs.cmu.edu/~dinouye/papers/inouye2018-deep-density-destructors-icml2018.pdf We propose a unified framework for deep density models by formally defining density destructors. A density destructor is an invertible function that transforms a given density to the uniform density—essentially destroying any structure in the original density. This destructive transformation generalizes Gaussianization via ICA and more recent autoregressive models such as MAF and Real NVP. Informally, this transformation can be seen as a generalized whitening procedure or a multivariate generalization of the univariate CDF function. Unlike Gaussianization, our destructive transformation has the elegant property that the density function is equal to the absolute value of the Jacobian determinant. Thus, each layer of a deep density can be seen as a shallow density—uncovering a fundamental connection between shallow and deep densities. In addition, our framework provides a common interface for all previous methods enabling them to be systematically combined, evaluated and improved. Leveraging the connection to shallow densities, we also propose a novel tree destructor based on tree densities and an image-specific destructor based on pixel locality. We illustrate our framework on a 2D dataset, MNIST, and CIFAR-10.
  • 95. Sci-Kit Learning Decision Tree From https://becominghuman.ai/cheat-sheets-for-ai-neural-networks-machine-learning-deep-learning-big-data-678c51b4b463
  • 96. Imitation Learning From https://drive.google.com/file/d/12QdNmMll-bGlSWnm8pmD_TawuRN7xagX/view https://drive.google.com/file/d/12QdNmMll-bGlSWnm8pmD_TawuRN7xagX/view
  • 97. Imitation Learning From https://drive.google.com/file/d/12QdNmMll-bGlSWnm8pmD_TawuRN7xagX/view https://drive.google.com/file/d/12QdNmMll-bGlSWnm8pmD_TawuRN7xagX/view
  • 98. Generative Adversarial Networks (GANs) From https://skymind.ai/wiki/generative-adversarial-network-gan
  • 99. Deep Generative Network-based Activation Management (DGN-AMs) From https://arxiv.org/pdf/1605.09304.pdf
  • 100. Paired Open Ended Trailblazer (POET) From https://drive.google.com/file/d/12QdNmMll-bGlSWnm8pmD_TawuRN7xagX/view https://drive.google.com/file/d/12QdNmMll-bGlSWnm8pmD_TawuRN7xagX/view
  • 101. One Model to Learn Them All From https://arxiv.org/pdf/1706.05137.pdf
  • 102. Self-modifying NNs With Differentiable Neuromodulated Plasticity From https://arxiv.org/pdf/1706.05137.pdf
  • 103. Stein Variational Gradient Descent From https://arxiv.org/pdf/1706.05137.pdf
  • 104. Linux Foundation Deep Learning (LFDL) Projects From https://lfdl.io/projects/
  • 105. Linux Foundation Deep Learning (LFDL) Projects From https://lfdl.io/projects/
  • 107. Graphical Processing Units (GPU) From https://www.intel.com/content/www/us/en/products/docs/processors/what-is-a-gpu.html Graphics processing technology has evolved to deliver unique benefits in the world of computing. The latest graphics processing units (GPUs) unlock new possibilities in gaming, content creation, machine learning, and more. What Does a GPU Do? The graphics processing unit, or GPU, has become one of the most important types of computing technology, both for personal and business computing. Designed for parallel processing, the GPU is used in a wide range of applications, including graphics and video rendering. Although they’re best known for their capabilities in gaming, GPUs are becoming more popular for use in creative production and artificial intelligence (AI). GPUs were originally designed to accelerate the rendering of 3D graphics. Over time, they became more flexible and programmable, enhancing their capabilities. This allowed graphics programmers to create more interesting visual effects and realistic scenes with advanced lighting and shadowing techniques. Other developers also began to tap the power of GPUs to dramatically accelerate additional workloads in high performance computing (HPC), deep learning, and more. GPU and CPU: Working Together The GPU evolved as a complement to its close cousin, the CPU (central processing unit). While CPUs have continued to deliver performance increases through architectural innovations, faster clock speeds, and the addition of cores, GPUs are specifically designed to accelerate computer graphics workloads. When shopping for a system, it can be helpful to know the role of the CPU vs. GPU so you can make the most of both. GPU vs. Graphics Card: What’s the Difference? While the terms GPU and graphics card (or video card) are often used interchangeably, there is a subtle distinction between these terms. Much like a motherboard contains a CPU, a graphics card refers to an add-in board that incorporates the GPU. This board also includes the raft of components required to both allow the GPU to function and connect to the rest of the system. GPUs come in two basic types: integrated and discrete. An integrated GPU does not come on its own separate card at all and is instead embedded alongside the CPU. A discrete GPU is a distinct chip that is mounted on its own circuit board and is typically attached to a PCI Express slot.
  • 108. NVidia Graphical Processing Units (GPU) From https://en.wikipedia.org/wiki/Nvidia Nvidia Corporation[note 1][note 2] (/ɛnˈvɪdiə/ en-VID-ee-ə) is an American multinational technology company incorporated in Delaware and based in Santa Clara, California.[2] It is a software and fabless company which designs graphics processing units (GPUs), application programming interface (APIs) for data science and high-performance computing as well as system on a chip units (SoCs) for the mobile computing and automotive market. Nvidia is a global leader in artificial intelligence hardware and software.[3][4] Its professional line of GPUs are used in workstations for applications in such fields as architecture, engineering and construction, media and entertainment, automotive, scientific research, and manufacturing design.[5] In addition to GPU manufacturing, Nvidia provides an API called CUDA that allows the creation of massively parallel programs which utilize GPUs.[6][7] They are deployed in supercomputing sites around the world.[8][9] More recently, it has moved into the mobile computing market, where it produces Tegra mobile processors for smartphones and tablets as well as vehicle navigation and entertainment systems.[10][11][12] In addition to AMD, its competitors include Intel,[13] Qualcomm[14] and AI-accelerator companies such as Graphcore. Nvidia's GPUs are used for edge to cloud computing, and supercomputers (Nvidia provides the accelerators, i.e. the GPUs for many of them, including a previous top fastest, while it has been replaced, and current fastest, and most-power efficient, are powered by AMD GPUs and CPUs) and Nvidia expanded its presence in the gaming industry with its handheld game consoles Shield Portable, Shield Tablet, and Shield Android TV and its cloud gaming service GeForce Now. Nvidia announced plans on September 13, 2020, to acquire Arm from SoftBank, pending regulatory approval, for a value of US$40 billion in stock and cash, which would be the largest semiconductor acquisition to date. SoftBank Group will acquire slightly less than a 10% stake in Nvidia, and Arm would maintain its headquarters in Cambridge.[15][16][17][18]
  • 109. Tesla unveils new Dojo Supercomouter From https://electrek.co/2022/10/01/tesla-dojo-supercomputer-tripped-power-grid/ Tesla has unveiled its latest version of its Dojo supercomputer and it’s apparently so powerful that it tripped the power grid in Palo Alto. Dojo is Tesla’s own custom supercomputer platform built from the ground up for AI machine learning and more specifically for video training using the video data coming from its fleet of vehicles. The automaker already has a large NVIDIA GPU-based supercomputer that is one of the most powerful in the world, but the new Dojo custom-built computer is using chips and an entire infrastructure designed by Tesla.The custom-built supercomputer is expected to elevate Tesla’s capacity to train neural nets using video data, which is critical to its computer vision technology powering its self-driving effort. Last year, at Tesla’s AI Day, the company unveiled its Dojo supercomputer, but the company was still ramping up its effort at the time. It only had its first chip and training tiles, and it was still working on building a full Dojo cabinet and cluster or “Exapod.”Now Tesla has unveiled the progress made with the Dojo program over the last year during its AI Day 2022 last night. Why does Tesla need to Dojo supercomputer? It’s a fair question. Why is an automaker developing the world’s most powerful supercomputer? Well, Tesla would tell you that it’s not just an automaker, but a technology company developing products to accelerate the transition to a sustainable economy.Musk said it makes sense to offer a Dojo as a service, perhaps to take on his buddy Jeff Bezos’s Amazon AWS and calling it a “service that you can use that’s available online where you can train your models way faster and for less money.” But more specifically, Tesla needs Dojo to auto-label train videos from its fleet and train its neural nets to build its self-driving system.Tesla realized that its approach to developing a self-driving system using neural nets training on millions of videos coming from its customer fleet requires a lot of computing power. and it decided to develop its own supercomputer to deliver that power. That’s the short-term goal, but Tesla will have plenty of use for the supercomputer going forward as it has big ambitions to develop other artificial intelligence programs.
  • 110. Linux Foundation Deep Learning (LFDL) Projects From https://lfdl.io/projects/
  • 112. Introduction to Deep Reinforcement Learning From https://skymind.ai/wiki/deep-reinforcement-learning Many RL references at this site
  • 113. Model-based Reinforcement Learning From http://rail.eecs.berkeley.edu/deeprlcourse-fa17/f17docs/lecture_9_model_based_rl.pdfhttp://rail.eecs.berkeley.edu/deeprlcourse-fa17/f17docs/lecture_9_model_based_rl.pdf
  • 114. Hierarchical Deep Reinforcement Learning From https://papers.nips.cc/paper/6233-hierarchical-deep-reinforcement-learning-integrating-temporal-abstraction-and-intrinsic-motivation.pdf
  • 115. Meta Learning Shared Hierarchy From https://skymind.ai/wiki/deep-reinforcement-learning
  • 116. Learning with Hierarchical Deep Models From https://www.cs.toronto.edu/~rsalakhu/papers/HD_PAMI.pdf We introduce HD (or “Hierarchical-Deep”) models, a new compositional learning architecture that integrates deep learning models with structured hierarchical Bayesian (HB) models. Specifically, we show how we can learn a hierarchical Dirichlet process (HDP) prior over the activities of the top-level features in a deep Boltzmann machine (DBM). This compound HDP- DBM model learns to learn novel concepts from very few training example by learning low- level generic features, high-level features that capture correlations among low-level features, and a category hierarchy for sharing priors over the high-level features that are typical of different kinds of concepts. We present efficient learning and inference algorithms for the HDP-DBM model and show that it is able to learn new concepts from very few examples on CIFAR-100 object recognition, handwritten character recognition, and human motion capture datasets.
  • 118. Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations From https://web.eecs.umich.edu/~honglak/icml09-ConvolutionalDeepBeliefNetworks.pdf There has been much interest in unsupervised learning of hierarchical generative models such as deep belief networks. Scaling such models to full-sized, high-dimensional images remains a difficult problem. To address this problem, we present the convolutional deep belief network, a hierarchical generative model which scales to realistic image sizes. This model is translation-invariant and supports efficient bottom-up and top- down probabilistic inference. Key to our approach is probabilistic max-pooling, a novel technique which shrinks the representations of higher layers in a probabilistically sound way. Our experiments show that the algorithm learns useful high-level visual features, such as object parts, from unlabeled images of objects and natural scenes. We demonstrate excellent performance on several visual recognition tasks and show that our model can perform hierarchical (bottom-up and top-down) inference over full-sized images. The visual world can be described at many levels: pixel intensities, edges, object parts, objects, and beyond. The prospect of learning hierarchical models which simultaneously represent multiple levels has recently generated much interest. Ideally, such “deep” representations would learn hierarchies of feature detectors, and further be able to combine top-down and bottomup processing of an image. For instance, lower layers could support object detection by spotting low-level features indicative of object parts. Conversely, information about objects in the higher layers could resolve lower-level ambiguities in the image or infer the locations of hidden object parts. Deep architectures consist of feature detector units arranged in layers. Lower layers detect simple features and feed into higher layers, which in turn detect more complex features. There have been several approaches to learning deep networks (LeCun et al., 1989; Bengio et al., 2006; Ranzato et al., 2006; Hinton et al., 2006). In particular, the deep belief network (DBN) (Hinton et al., 2006) is a multilayer generative model where each layer encodes statistical dependencies among the units in the layer below it; it is trained to (approximately) maximize the likelihood of its training data. DBNs have been successfully used to learn high-level structure in a wide variety of domains, including handwritten digits (Hinton et al., 2006) and human motion capture data (Taylor et al., 2007). We build upon the DBN in this paper because we are interested in learning a generative model of images which can be trained in a purely unsupervised manner This paper presents the convolutional deep belief network, a hierarchical generative model that scales to full-sized images. Another key to our approach is probabilistic max-pooling, a novel technique that allows higher-layer units to cover larger areas of the input in a probabilistically sound way. To the best of our knowledge, ours is the first translation invariant hierarchical generative model which supports both top-down and bottom-up probabilistic inference and scales to realistic image sizes. The first, second, and third layers of our network learn edge detectors, object parts, and objects respectively. We show that these representations achieve excellent performance on several visual recognition tasks and allow “hidden” object parts to be inferred from high-level object information.
  • 119. Learning with Hierarchical-Deep Models From https://www.cs.toronto.edu/~rsalakhu/papers/HD_PAMI.pdf We introduce HD (or “Hierarchical-Deep”) models, a new compositional learning architecture that integrates deep learning models with structured hierarchical Bayesian (HB) models. Specifically, we show how we can learn a hierarchical Dirichlet process (HDP) prior over the activities of the top-level features in a deep Boltzmann machine (DBM). This compound HDP-DBM model learns to learn novel concepts from very few training example by learning low-level generic features, high-level features that capture correlations among low-level features, and a category hierarchy for sharing priors over the high-level features that are typical of different kinds of concepts. We present efficient learning and inference algorithms for the HDP-DBM model and show that it is able to learn new concepts from very few examples on CIFAR-100 object recognition, handwritten character recognition, and human motion capture datasets The ability to learn abstract representations that support transfer to novel but related tasks lies at the core of many problems in computer vision, natural language processing, cognitive science, and machine learning. In typical applications of machine classification algorithms today, learning a new concept requires tens, hundreds, or thousands of training examples. For human learners, however, just one or a few examples are often sufficient to grasp a new category and make meaningful generalizations to novel instances [15], [25], [31], [44]. Clearly, this requires very strong but also appropriately tuned inductive biases. The architecture we describe here takes a step toward this ability by learning several forms of abstract knowledge at different levels of abstraction that support transfer of useful inductive biases from previously learned concepts to novel ones. We call our architectures compound HD models, where “HD” stands for “Hierarchical-Deep,” because they are derived by composing hierarchical nonparametric Bayesian models with deep networks, two influential approaches from the recent unsupervised learning literature with complementary strengths. Recently introduced deep learning models, including deep belief networks (DBNs) [12], deep Boltzmann machines (DBM) [29], deep autoencoders [19], and many others [9], [10], [21], [22], [26], [32], [34], [43], have been shown to learn useful distributed feature representations for many high-dimensional datasets. The ability to automatically learn in multiple layers allows deep models to construct sophisticated domain-specific features without the need to rely on precise human-crafted input representations, increasingly important with the proliferation of datasets and application domains.
  • 120. Reinforcement Learning: Fast and Slow From https://www.cell.com/trends/cognitive-sciences/fulltext/S1364-6613(19)30061-0 Meta-RL: Speeding up Deep RL by Learning to Learn As discussed earlier, a second key source of slowness in standard deep RL, alongside incremental updating, is weak inductive bias. As formalized in the idea of the bias–variance tradeoff, fast learning requires the learner to go in with a reasonably sized set of hypotheses concerning the structure of the patterns that it will face. The narrower the hypothesis set, the faster learning can be. However, as foreshadowed earlier, there is a catch: a narrow hypothesis set will only speed learning if it contains the correct hypothesis. While strong inductive biases can accelerate learning, they will only do so if the specific biases the learner adopts happen to fit with the material to be learned. As a result of this, a new learning problem arises: how can the learner know what inductive biases to adopt? Episodic Deep RL: Fast Learning through Episodic Memory If incremental parameter adjustment is one source of slowness in deep RL, then one way to learn faster might be to avoid such incremental updating. Naively increasing the learning rate governing gradient descent optimization leads to the problem of catastrophic interference. However, recent research shows that there is another way to accomplish the same goal, which is to keep an explicit record of past events, and use this record directly as a point of reference in making new decisions. This idea, referred to as episodic RL parallels ‘non-parametric’ approaches in machine learning and resembles ‘instance-’ or ‘exemplar-based’ theories of learning in psychology When a new situation is encountered and a decision must be made concerning what action to take, the procedure is to compare an internal representation of the current situation with stored representations of past situations. The action chosen is then the one associated with the highest value, based on the outcomes of the past situations that are most similar to the present. When the internal state representation is computed by a multilayer neural network, we refer to the resulting algorithm as ‘episodic deep RL’.
  • 122. Large-Scale Deep Learning (Jeff Dean) From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
  • 123. Embedding for Sparse Inputs (Jeff Dean) From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
  • 124. Efficient Vector Representation of Words (Jeff Dean) From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
  • 125. Deep Convolution Neural Nets and Gaussian Processes From https://ai.google/research/pubs/pub47671
  • 126. Deep Convolution Neural Nets and Gaussian Processes(cont) From https://ai.google/research/pubs/pub47671
  • 127. Google’s Inception Network From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
  • 128. Google’s Inception Network From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
  • 129. Google’s Inception Network From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
  • 130. Google’s Inception Network From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
  • 131. Google’s Inception Network From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
  • 132. Google’s Inception Network From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
  • 133. Google’s Inception Network From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
  • 134. Large-Scale Deep Learning (Jeff Dean) From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
  • 135. Large-Scale Deep Learning (Jeff Dean) From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
  • 136. Large-Scale Deep Learning (Jeff Dean) From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
  • 137. Large-Scale Deep Learning (Jeff Dean) From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
  • 138. Computing and Sensing Architecture
  • 139. Simple Event Processing Complex Event Processing Hierarchical C4ISR Flow Model from Bob Marcus Preprocess In Input Devices u World Model Update New World Model Strategy Tactics HQ Operations Field Operations Situation Impact Object Process Simple Response Complex Response Update Plan Create New Goals and Plan Sensor and Effects Management In Actuator Devices Measurement Field Processors Data Structured Data Information Knowledge Wisdom Devices Awareness Decision Adapted From http://www.et-strategies.com/great-global-grid/Events.pdf
  • 140. Computing and Sensing Architectures From https://www.researchgate.net/publication/323835314_Greening_Trends_in_Energy-Efficiency_of_IoT-based_Heterogeneous_Wireless_Nodes/figures?lo=1
  • 141. Computing and Sensing Architectures From https://www.researchgate.net/publication/323835314_Greening_Trends_in_Energy-Efficiency_of_IoT-based_Heterogeneous_Wireless_Nodes/figures?lo=1
  • 142. Bio-Inspired Distributed Intelligence From https://news.mit.edu/2022/wiggling-toward-bio-inspired-machine-intelligence-juncal-arbelaiz-1002 More than half of an octopus’ nerves are distributed through its eight arms, each of which has some degree of autonomy. This distributed sensing and information processing system intrigued Arbelaiz, who is researching how to design decentralized intelligence for human-made systems with embedded sensing and computation. At MIT, Arbelaiz is an applied math student who is working on the fundamentals of optimal distributed control and estimation in the final weeks before completing her PhD this fall. She finds inspiration in the biological intelligence of invertebrates such as octopus and jellyfish, with the ultimate goal of designing novel control strategies for flexible “soft” robots that could be used in tight or delicate surroundings, such as a surgical tool or for search-and-rescue missions. “The squishiness of soft robots allows them to dynamically adapt to different environments. Think of worms, snakes, or jellyfish, and compare their motion and adaptation capabilities to those of vertebrate animals,” says Arbelaiz. “It is an interesting expression of embodied intelligence — lacking a rigid skeleton gives advantages to certain applications and helps to handle uncertainty in the real world more efficiently. But this additional softness also entails new system-theoretic challenges.” In the biological world, the “controller” is usually associated with the brain and central nervous system — it creates motor commands for the muscles to achieve movement. Jellyfish and a few other soft organisms lack a centralized nerve center, or brain. Inspired by this observation, she is now working toward a theory where soft-robotic systems could be controlled using decentralized sensory information sharing. “When sensing and actuation are distributed in the body of the robot and onboard computational capabilities are limited, it might be difficult to implement centralized intelligence,” she says. “So, we need these sort of decentralized schemes that, despite sharing sensory information only locally, guarantee the desired global behavior. Some biological systems, such as the jellyfish, are beautiful examples of decentralized control architectures — locomotion is achieved in the absence of a (centralized) brain. This is fascinating as compared to what we can achieve with human-made machines.”
  • 143. IoT and Deep Learning
  • 145. Deep Learning for IoT Overview: Survey From https://arxiv.org/pdf/1712.04301.pdf
  • 146. Deep Learning for IoT Overview: Survey From https://arxiv.org/pdf/1712.04301.pdf
  • 147. Standardized IoT Data Sets: Survey From https://arxiv.org/pdf/1712.04301.pdf
  • 148. Standardized IoT Data Sets: Survey From https://arxiv.org/pdf/1712.04301.pdf
  • 150. DeepMind Website DeepMind Home page https://deepmind.com/ DeepMind Research https://deepmind.com/research/ https://deepmind.com/research/publications/ DeepMind Blog https://deepmind.com/blog DeepMind Applied https://deepmind.com/applied
  • 151. DeepMind Featured Research Publications From https://deepmind.com/research AlphaGo https://www.deepmind.com/research/highlighted-research/alphago Deep Reinforcement Learning https://deepmind.com/research/dqn/ A Dual Approach to Scalable Verification of Deep Networks http://auai.org/uai2018/proceedings/papers/204.pdf https://www.youtube.com/watch?v=SV05j3GM0LI Learning to reinforcement learn https://arxiv.org/abs/1611.05763 Neural Programmer - Interpreters https://arxiv.org/pdf/1511.06279v3.pdf Dueling Network Architectures for Deep Reinforcement Learning https://arxiv.org/pdf/1511.06581.pdf DeepMind Research over 400 publications https://deepmind.com/research/publications/
  • 152. DeepMind Applied From https://deepmind.com/applied/ DeepMind Health https://deepmind.com/applied/deepmind-health/ DeepMind for Google https://deepmind.com/applied/deepmind-google/ DeepMind Ethics and Society https://deepmind.com/applied/deepmind-ethics-society/
  • 153. AlphaGo and AlphaGoZero From https://www.deepmind.com/research/highlighted-research/alphago We created AlphaGo, a computer program that combines advanced search tree with deep neural networks. These neural networks take a description of the Go board as an input and process it through a number of different network layers containing millions of neuron-like connections. One neural network, the “policy network”, selects the next move to play. The other neural network, the “value network”, predicts the winner of the game. We introduced AlphaGo to numerous amateur games to help it develop an understanding of reasonable human play. Then we had it play against different versions of itself thousands of times, each time learning from its mistakes. Over time, AlphaGo improved and became increasingly stronger and better at learning and decision- making. This process is known as reinforcement learning. AlphaGo went on to defeat Go world champions in different global arenas and arguably became the greatest Go player of all time. Following the summit, we revealed AlphaGo Zero. While AlphaGo learnt the game by playing thousands of matches with amateur and professional players, AlphaGo Zero learnt by playing against itself, starting from completely random play. This powerful technique is no longer constrained by the limits of human knowledge. Instead, the computer program accumulated thousands of years of human knowledge during a period of just a few days and learned to play Go from the strongest player in the world, AlphaGo. AlphaGo Zero quickly surpassed the performance of all previous versions and also discovered new knowledge, developing unconventional strategies and creative new moves, including those which beat the World Go Champions Lee Sedol and Ke Jie. These creative moments give us confidence that AI can be used as a positive multiplier for human ingenuity.
  • 154. AlphaZero From https://www.deepmind.com/blog/alphazero-shedding-new-light-on-chess-shogi-and-go In late 2017 we introduced AlphaZero, a single system that taught itself from scratch how to master the games of chess, shogi(Japanese chess), and Go, beating a world-champion program in each case. We were excited by the preliminary results and thrilled to see the response from members of the chess community, who saw in AlphaZero’s games a ground-breaking, highly dynamic and “unconventional” style of play that differed from any chess playing engine that came before it. Today, we are delighted to introduce the full evaluation of AlphaZero, published in the journal Science (Open Access version here), that confirms and updates those preliminary results. It describes how AlphaZero quickly learns each game to become the strongest player in history for each, despite starting its training from random play, with no in-built domain knowledge but the basic rules of the game. This ability to learn each game afresh, unconstrained by the norms of human play, results in a distinctive, unorthodox, yet creative and dynamic playing style. Chess Grandmaster Matthew Sadler and Women’s International Master Natasha Regan, who have analysed thousands of AlphaZero’s chess games for their forthcoming book Game Changer (New in Chess, January 2019), say its style is unlike any traditional chess engine.” It’s like discovering the secret notebooks of some great player from the past,” says Matthew. Traditional chess engines – including the world computer chess champion Stockfish and IBM’s ground- breaking Deep Blue – rely on thousands of rules and heuristics handcrafted by strong human players that try to account for every eventuality in a game. Shogi programs are also game specific, using similar search engines and algorithms to chess programs. AlphaZero takes a totally different approach, replacing these hand-crafted rules with a deep neural network and general purpose algorithms that know nothing about the game beyond the basic rules.
  • 155. AlphaTensor From https://www.deepmind.com/blog/discovering-novel-algorithms-with-alphatensor First extension of AlphaZero to mathematics unlocks new possibilities for research Algorithms have helped mathematicians perform fundamental operations for thousands of years. The ancient Egyptians created an algorithm to multiply two numbers without requiring a multiplication table, and Greek mathematician Euclid described an algorithm to compute the greatest common divisor, which is still in use today. During the Islamic Golden Age, Persian mathematician Muhammad ibn Musa al-Khwarizmi designed new algorithms to solve linear and quadratic equations. In fact, al-Khwarizmi’s name, translated into Latin as Algoritmi, led to the term algorithm. But, despite the familiarity with algorithms today – used throughout society from classroom algebra to cutting edge scientific research – the process of discovering new algorithms is incredibly difficult, and an example of the amazing reasoning abilities of the human mind. In our paper, published today in Nature, we introduce AlphaTensor, the first artificial intelligence (AI) system for discovering novel, efficient, and provably correct algorithms for fundamental tasks such as matrix multiplication. This sheds light on a 50-year-old open question in mathematics about finding the fastest way to multiply two matrices. This paper is a stepping stone in DeepMind’s mission to advance science and unlock the most fundamental problems using AI. Our system, AlphaTensor, builds upon AlphaZero, an agent that has shown superhuman performance on board games, like chess, Go and shogi, and this work shows the journey of AlphaZero from playing games to tackling unsolved mathematical problems for the first time Matrix multiplication Matrix multiplication is one of the simplest operations in algebra, commonly taught in high school maths classes. But outside the classroom, this humble mathematical operation has enormous influence in the contemporary digital world and is ubiquitous in modern computing.
  • 156. AlphaTensor (cont) From https://www.deepmind.com/blog/discovering-novel-algorithms-with-alphatensor First, we converted the problem of finding efficient algorithms for matrix multiplication into a single-player game. In this game, the board is a three-dimensional tensor (array of numbers), capturing how far from correct the current algorithm is. Through a set of allowed moves, corresponding to algorithm instructions, the player attempts to modify the tensor and zero out its entries. When the player manages to do so, this results in a provably correct matrix multiplication algorithm for any pair of matrices, and its efficiency is captured by the number of steps taken to zero out the tensor. This game is incredibly challenging – the number of possible algorithms to consider is much greater than the number of atoms in the universe, even for small cases of matrix multiplication. Compared to the game of Go, which remained a challenge for AI for decades, the number of possible moves at each step of our game is 30 orders of magnitude larger (above 1033 for one of the settings we consider). Essentially, to play this game well, one needs to identify the tiniest of needles in a gigantic haystack of possibilities. To tackle the challenges of this domain, which significantly departs from traditional games, we developed multiple crucialcomponents including a novel neural network architecture that incorporates problem-specific inductive biases, a procedure to generate useful synthetic data, and a recipe to leverage symmetries of the problem. We then trained an AlphaTensor agent using reinforcement learning to play the game, starting without any knowledge about existing matrix multiplication algorithms. Through learning, AlphaTensor gradually improves over time, re-discovering historical fast matrix multiplication algorithms such as Strassen’s, eventually surpassing the realm of human intuition and discovering algorithms faster than previously known. Detailed Article in Nature
  • 158. Complex Cooperative Agents From https://deepmind.com/blog/capture-the-flag-science/ From https://science.sciencemag.org/content/364/6443/859 5/19
  • 159. Complex Cooperative Agents (cont) From https://science.sciencemag.org/content/364/6443/859 5/19
  • 160. Complex Cooperative Agents (cont) From https://science.sciencemag.org/content/364/6443/859 5/19
  • 161. Unsupervised Learning From https://deepmind.com/blog/unsupervised-learning/ Unsupervised learning is a paradigm designed to create autonomous intelligence by rewarding agents (that is, computer programs) for learning about the data they observe without a particular task in mind. In other words, the agent learns for the sake of learning. A key motivation for unsupervised learning is that, while the data passed to learning algorithms is extremely rich in internal structure (e.g., images, videos and text), the targets and rewards used for training are typically very sparse (e.g., the label ‘dog’ referring to that particularly protean species, or a single one or zero to denote success or failure in a game). This suggests that the bulk of what is learned by an algorithm must consist of understanding the data itself, rather than applying that understanding to particular tasks.
  • 162. Unsupervised Learning (cont) From https://deepmind.com/blog/unsupervised-learning/ Unsupervised learning is a paradigm designed to create autonomous intelligence by rewarding agents (that is, computer programs) for learning about the data they observe without a particular task in mind. In other words, the agent learns for the sake of learning. A key motivation for unsupervised learning is that, while the data passed to learning algorithms is extremely rich in internal structure (e.g., images, videos and text), the targets and rewards used for training are typically very sparse (e.g., the label ‘dog’ referring to that particularly protean species, or a single one or zero to denote success or failure in a game). This suggests that the bulk of what is learned by an algorithm must consist of understanding the data itself, rather than applying that understanding to particular tasks. These results resonate with our intuitions about the human mind. Our ability to learn about the world without explicit supervision is fundamental to what we regard as intelligence. On a train ride we might listlessly gaze through the window, drag our fingers over the velvet of the seat, regard the passengers sitting across from us. We have no agenda in these studies: we almost can’t help but gather information, our brains ceaselessly working to understand the world around us, and our place within it.
  • 163. Unsupervised Learning (cont) From https://deepmind.com/blog/unsupervised-learning/ Decoding the elements of vision 2012 was a landmark year for deep learning, when AlexNet (named after its lead architect Alex Krizhnevsky) swept the ImageNet classification competition. AlexNet’s abilities to recognize images were unprecedented, but even more striking is what was happening under the hood. When researchers analysed what AlexNet was doing, they discovered that it interprets images by building increasingly complex internal representations of its inputs. Low-level features, such as textures and edges, are represented in the bottom layers, and these are then combined to form high-level concepts such as wheels and dogs in higher layers. This is remarkably similar to how information is processed in our brains, where simple edges and textures in primary sensory processing areas are assembled into complex objects like faces in higher areas. The representation of a complex scene can therefore be built out of visual primitives, in much the same way that meaning emerges from the individual words comprising a sentence. Without explicit guidance to do so, the layers of AlexNet had discovered a fundamental ‘vocabulary’ of vision in order to solve its task. In a sense, it had learned to play what Wittgenstein called a ‘language game’ that iteratively translates from pixels to labels.
  • 164. Unsupervised Learning (cont) From https://deepmind.com/blog/unsupervised-learning/ Transfer learning From the perspective of general intelligence, the most interesting thing about AlexNet’s vocabulary is that it can be reused, or transferred, to visual tasks other than the one it was trained on, such as recognising whole scenes rather than individual objects. Transfer is essential in an ever-changing world, and humans excel at it: we are able to rapidly adapt the skills and understanding we’ve gleaned from our experiences (our ‘world model’) to whatever situation is at hand. For example, a classically-trained pianist can pick up jazz piano with relative ease. Artificial agents that form the right internal representations of the world, the reasoning goes, should be able to do similarly. Nonetheless, the representations learned by classifiers such as AlexNet have limitations. In particular, as the network was only trained to label images with a single class (cat, dog, car, volcano), any information not required to infer the label—no matter how useful it might be for other tasks—is liable to be ignored. For example, the representations may fail to capture the background of the image if the label always refers to the foreground. A possible solution is to provide more comprehensive training signals, like detailed captions describing the images: not just “dog,” but “A Corgi catching a frisbee in a sunny park.” However, such targets are laborious to provide, especially at scale, and still may be insufficient to capture all the information needed to complete a task. The basic premise of unsupervised learning is that the best way to learn rich, broadly transferable representations is to attempt to learn everything that can be learned about the data. If the notion of transfer through representation learning seems too abstract, consider a child who has learned to draw people as stick figures. She has discovered a representation of the human form that is both highly compact and rapidly adaptable. By augmenting each stick figure with specifics, she can create portraits of all her classmates: glasses for her best friend, her deskmate in his favorite red tee-shirt. And she has developed this skill not in order to complete a specific task or receive a reward, but rather in response to her basic urge to reflect the world around her.
  • 165. Unsupervised Learning (cont) From https://deepmind.com/blog/unsupervised-learning/ Learning by creating: generative models Perhaps the simplest objective for unsupervised learning is to train an algorithm to generate its own instances of data. So- called generative models should not simply reproduce the data they are trained on (an uninteresting act of memorisation), but rather build a model of the underlying class from which that data was drawn: not a particular photograph of a horse or a rainbow, but the set of all photographs of horses and rainbows; not a specific utterance from a specific speaker, but the general distribution of spoken utterances. The guiding principle of generative models is that being able to construct a convincing example of the data is the strongest evidence of having understood it: as Richard Feynman put it, "what I cannot create, I do not understand.” For images, the most successful generative model so far has been the Generative Adversarial Network (GAN for short), in which two networks—a generator and a discriminator—engage in a contest of discernment akin to that of an artistic forger and a detective. The generator produces images with the goal of tricking the discriminator into believing they are real; the discriminator, meanwhile, is rewarded for spotting the fakes. The generated images, first messy and random, are refined over many iterations, and the ongoing dynamic between the networks leads to ever-more realistic images that are in many cases indistinguishable from real photographs. Generative adversarial networks can also dream details of landscapes defined by the rough sketches of users. A glance at the images below is enough to convince us that the network has learned to represent many of the key features of the photographs they were trained on, such as the structure of animal’s bodies, the texture of grass, and detailed effects of light and shade (even when refracted through a soap bubble). Close inspection reveals slight anomalies, such as the white dog’s apparent extra leg and the oddly right-angled flow of one of the jets in the fountain. While the creators of generative models strive to avoid such imperfections, their visibility highlights one of the benefits of recreating familiar data such as images: by inspecting the samples, researchers can infer what the model has and hasn’t learned.
  • 166. Unsupervised Learning (cont) From https://deepmind.com/blog/unsupervised-learning/ Creating by predicting Another notable family within unsupervised learning are autoregressive models, in which the data is split into a sequence of small pieces, each of which is predicted in turn. Such models can be used to generate data by successively guessing what will come next, feeding in a guess as input and guessing again. Language models, where each word is predicted from the words before it, are perhaps the best known example: these models power the text predictions that pop up on some email and messaging apps. Recent advances in language modelling have enabled the generation of strikingly plausible passages, such as the one shown below from OpenAI’s GPT-2. By controlling the input sequence used to condition the out predictions, autoregressive models can also be used to transform one sequence into another. This demo uses a conditional autoregressive model to transform text into realistic handwriting. WaveNet transforms text into natural sounding speech, and is now used to generate voices for Google Assistant. A similar process of conditioning and autoregressive generation can be used to translate from one language to another. Autoregressive models learn about data by attempting to predict each piece of it in a particular order. A more general class of unsupervised learning algorithms can be built by predicting any part of the data from any other. For example, this could mean removing a word from a sentence, and attempting to predict it from whatever remains. By learning to make lots of localised predictions, the system is forced to learn about the data as a whole. One concern around generative models is their potential for misuse. While manipulating evidence with photo, video, and audio editing has been possible for a long time, generative models could make it even easier to edit media with malicious intent. We have already seen demonstrations of so-called ‘deepfakes’—for instance, this fabricated video footage of President Obama. It’s encouraging to see that several major efforts to address these challenges are already underway, including using statistical techniques to help detect synthetic media and verify authentic media, raising public awareness, and discussions around limiting the availability of trained generative models.
  • 167. Unsupervised Learning (cont) From https://deepmind.com/blog/unsupervised-learning/ Re-imagining intelligence Generative models are fascinating in their own right, but our principal interest in them at DeepMind is as a stepping stone towards general intelligence. Endowing an agent with the ability to generate data is a way of giving it an imagination, and hence the ability to plan and reason about the future. Even without explicit generation, our studies show that learning to predict different aspects of the environment enriches the agent’s world model, and thereby improves its ability to solve problems. These results resonate with our intuitions about the human mind. Our ability to learn about the world without explicit supervision is fundamental to what we regard as intelligence. On a train ride we might listlessly gaze through the window, drag our fingers over the velvet of the seat, regard the passengers sitting across from us. We have no agenda in these studies: we almost can’t help but gather information, our brains ceaselessly working to understand the world around us, and our place within it.
  • 168. Towards Robust andVerified AI From https://deepmind.com/blog/robust-and-verified-ai/ Bugs and software have gone hand in hand since the beginning of computer programming. Over time, software developers have established a set of best practices for testing and debugging before deployment, but these practices are not suited for modern deep learning systems. Today, the prevailing practice in machine learning is to train a system on a training data set, and then test it on another set. While this reveals the average-case performance of models, it is also crucial to ensure robustness, or acceptably high performance even in the worst case. In this article, we describe three approaches for rigorously identifying and eliminating bugs in learned predictive models: adversarial testing, robust learning, and formal verification. This is not an entirely new problem. Computer programs have always had bugs. Over decades, software engineers have assembled an impressive toolkit of techniques, ranging from unit testing to formal verification. These methods work well on traditional software, but adapting these approaches to rigorously test machine learning models like neural networks is extremely challenging due to the scale and lack of structure in these models, which may contain hundreds of millions of parameters. This necessitates the need for developing novel approaches for ensuring that machine learning systems are robust at deployment. From a programmer’s perspective, a bug is any behaviour that is inconsistent with the specification, i.e. the intended functionality, of a system. As part of our mission of solving intelligence, we conduct research into techniques for evaluating whether machine learning systems are consistent not only with the train and test set, but also with a list of specifications describing desirable properties of a system. Such properties might include robustness to sufficiently small perturbations in inputs, safety constraints to avoid catastrophic failures, or producing predictions consistent with the laws of physics.
  • 169. Towards Robust andVerified AI (cont) From https://deepmind.com/blog/robust-and-verified-ai/ In this article, we discuss three important technical challenges for the machine learning community to take on, as we collectively work towards rigorous development and deployment of machine learning systems that are reliably consistent with desired specifications: • Testing consistency with specifications efficiently. We explore efficient ways to test that machine learning systems are consistent with properties (such as invariance or robustness) desired by the designer and users of the system. One approach to uncover cases where the model might be inconsistent with the desired behaviour is to systematically search for worst-case outcomes during evaluation. • Training machine learning models to be specification-consistent. Even with copious training data, standard machine learning algorithms can produce predictive models that make predictions inconsistent with desirable specifications like robustness or fairness - requires us to reconsider training algorithms that produce models that not only fit training data well, but also are consistent with a list of specifications. • Formally proving that machine learning models are specification-consistent. There is a need for algorithms that can verify the model predictions are provably consistent with a specification of interest for all possible inputs. While the field of formal verification h studied such algorithms for several decades, these approaches do not easily scale to modern deep learning systems despite impressive progress.
  • 170. Towards Robust andVerified AI (cont) From https://deepmind.com/blog/robust-and-verified-ai/ Testing consistency with specifications efficiently Robustness to adversarial examples is a relatively well-studied problem in deep learning. One major theme that has come out of this work is the importance of evaluating against strong attacks, and designing transparent models which can be efficiently analysed. Alongside other researchers from the community, we have found that many models appear robust when evaluated against weak adversaries. However, they show essentially 0% adversarial accuracy when evaluated against stronger adversaries (Athalye et al., 2018, Uesato et al., 2018, Carlini and Wagner, 2017). While most work has focused on rare failures in the context of supervised learning (largely image classification), there is a need to extend these ideas to other settings. In recent work on adversarial approaches for uncovering catastrophic failures, we apply these ideas towards testing reinforcement learning agents intended for use in safety-critical settings. One challenge in developing autonomous systems is that because a single mistake may have large consequences, very small failure probabilities are unacceptable. Our objective is to design an “adversary” to allow us to detect such failures in advance (e.g., in a controlled environment). If the adversary can efficiently identify the worst-case input for a given model, this allows us to catch rare failure cases before deploying a model. As with image classifiers, evaluating against a weak adversary provides a false sense of security during deployment. This is similar to the software practice of red-teaming, though extends beyond failures caused by malicious adversaries, and also includes failures which arise naturally, for example due to lack of generalization. We developed two complementary approaches for adversarial testing of RL agents. In the first, we use a derivative-free optimisation to directly minimise the expected reward of an agent. In the second, we learn an adversarial value function which predicts from experience which situations are most likely to cause failures for the agent. We then use this learned function for optimisation to focus the evaluation on the most problematic inputs. These approaches form only a small part of a rich, growing space of potential algorithms, and we are excited about future development in rigorous evaluation of agents. Already, both approaches result in large improvements over random testing. Using our method, failures that would have taken days to uncover, or even gone undetected entirely, can be detected in minutes (Uesato et al., 2018b). We also found that adversarial testing may uncover qualitatively different behaviour in our agents from what might be expected from evaluation on a random test set. In particular, using adversarial environment construction we found that agents performing a 3D navigation task, which match human-level performance on average, still failed to find the goal completely on surprisingly simple mazes (Ruderman et al., 2018). Our work also highlights that we need to design systems that are secure against natural failures, not only against adversaries.
  • 171. Towards Robust andVerified AI (cont) From https://deepmind.com/blog/robust-and-verified-ai/ Training machine learning models to be specification-consistent Adversarial testing aims to find a counter example that violates specifications. As such, it often leads to overestimating the consistency of models with respect to these specifications. Mathematically, a specification is some relationship that has to hold between the inputs and outputs of a neural network. This can take the form of upper and lower bounds on certain key input and output parameters. Motivated by this observation, several researchers (Raghunathan et al., 2018; Wong et al., 2018; Mirman et al., 2018; Wang et al., 2018) including our team at DeepMind (Dvijotham et al., 2018; Gowal et al., 2018), have worked on algorithms that are agnostic to the adversarial testing procedure (used to assess consistency with the specification). This can be understood geometrically - we can bound (e.g., using interval bound propagation; Ehlers 2017, Katz et al. 2017, Mirman et al., 2018) the worst violation of a specification by bounding the space of outputs given a set of inputs. If this bound is differentiable with respect to network parameters and can be computed quickly, it can be used during training. The original bounding box can then be propagated through each layer of the network. We show that interval bound propagation is fast, efficient, and — contrary to prior belief — can achieve strong results (Gowal et al., 2018). In particular, we demonstrate that it can decrease the provable error rate (i.e., maximal error rate achievable by any adversary) over state-of-the-art in image classification on both MNIST and CIFAR-10 datasets. Going forward, the next frontier will be to learn the right geometric abstractions to compute tighter overapproximations of the space of outputs. We also want to train networks to be consistent with more complex specifications capturing desirable behavior, such as above mentioned invariances and consistency with physical laws.
  • 172. Towards Robust andVerified AI (cont) From https://deepmind.com/blog/robust-and-verified-ai/ Formally proving that machine learning models are specification-consistent Rigorous testing and training can go a long way towards building robust machine learning systems. However, no amount of testing can formally guarantee that a system will behave as we want. In large-scale models, enumerating all possible outputs for a given set of inputs (for example, infinitesimal perturbations to an image) is intractable due to the astronomical number of choices for the input perturbation. However, as in the case of training, we can find more efficient approaches by setting geometric bounds on the set of outputs. Formal verification is a subject of ongoing research at DeepMind. The machine learning community has developed several interesting ideas on how to compute precise geometric bounds on the space of outputs of the network (Katz et al. 2017, Weng et al., 2018; Singh et al., 2018). Our approach (Dvijotham et al., 2018), based on optimisation and duality, consists of formulating the verification problem as an optimisation problem that tries to find the largest violation of the property being verified. By using ideas from duality in optimisation, the problem becomes computationally tractable. This results in additional constraints that refine the bounding boxes computed by interval bound propagation, using so-called cutting planes. This approach is sound but incomplete: there may be cases where the property of interest is true, but the bound computed by this algorithm is not tight enough to prove the property. However, once we obtain a bound, this formally guarantees that there can be no violation of the property. The figure below graphically illustrates the approach. This approach enables us to extend the applicability of verification algorithms to more general networks (activation functions, architectures), general specifications and more sophisticated deep learning models (generative models, neural processes, etc.) and specifications beyond adversarial robustness (Qin, 2018).
  • 173. From https://deepmind.com/blog/robust-and-verified-ai/ Outlook Deployment of machine learning in high-stakes situations presents unique challenges, and requires the development of evaluation techniques that reliably detect unlikely failure modes. More broadly, we believe that learning consistency with specifications can provide large efficiency improvements over approaches where specifications only arise implicitly from training data. We are excited about ongoing research into adversarial evaluation, learning robust models, and verification of formal specifications. Much more work is needed to build automated tools for ensuring that AI systems in the real world will do the “right thing”. In particular, we are excited about progress in the following directions: • Learning for adversarial evaluation and verification: As AI systems scale and become more complex, it will become increasingly difficult to design adversarial evaluation and verification algorithms that are well-adapted to the AI model. If we can leverage the power of AI to facilitate evaluation and verification, this process can be bootstrapped to scale. • Development of publicly-available tools for adversarial evaluation and verification: It is important to provide AI engineers and practitioners with easy-to-use tools that shed light on the possible failure modes of the AI system before it leads to widespread negative impact. This would require some degree of standardisation of adversarial evaluation and verification algorithms. • Broadening the scope of adversarial examples: To date, most work on adversarial examples has focused on model invariances to small perturbations, typically of images. This has provided an excellent testbed for developing approaches to adversarial evaluation, robust learning, and verification. We have begun to explore alternate specifications for properties directly relevant in the real world, and are excited by future research in this direction. • Learning specifications: Specifications that capture “correct” behavior in AI systems are often difficult to precisely state. Building systems that can use partial human specifications and learn further specifications from evaluative feedback would be required as we build increasingly intelligent agents capable of exhibiting complex behaviors and acting in unstructured environments. Towards Robust andVerified AI (cont)
  • 174. TF-Replicator: Distributed Machine Learning for Researchers From https://deepmind.com/blog/tf-replicator-distributed-machine-learning/ At DeepMind, the Research Platform Team builds infrastructure to empower and accelerate our AI research. Today, we are excited to share how we developed TF-Replicator, a software library that helps researchers deploy their TensorFlow models on GPUs and Cloud TPUs with minimal effort and no previous experience with distributed systems. TF-Replicator’s programming model has now been open sourced as part of TensorFlow’s tf.distribute.Strategy. This blog post gives an overview of the ideas and technical challenges underlying TF-Replicator. For a more comprehensive description, please read our arXiv paper. A recurring theme in recent AI breakthroughs -- from AlphaFold to BigGAN to AlphaStar -- is the need for effortless and reliable scalability. Increasing amounts of computational capacity allow researchers to train ever-larger neural networks with new capabilities. To address this, the Research Platform Team developed TF-Replicator, which allows researchers to target different hardware accelerators for Machine Learning, scale up workloads to many devices, and seamlessly switch between different types of accelerators. While it was initially developed as a library on top of TensorFlow, TF-Replicator’s API has since been integrated into TensorFlow 2.0’s new tf.distribute.Strategy. While TensorFlow provides direct support for CPU, GPU, and TPU (Tensor Processing Unit) devices, switching between targets requires substantial effort from the user. This typically involves specialising code for a particular hardware target, constraining research ideas to the capabilities of that platform. Some existing frameworks built on top of TensorFlow, e.g. Estimators, seek to address this problem. However, they are typically targeted at production use cases and lack the expressivity and flexibility required for rapid iteration of research ideas.
  • 175. AlphaFold Protein Folding From https://deepmind.com/blog/alphafold/
  • 176. AlphaFold Protein Folding (cont) From https://deepmind.com/blog/alphafold/
  • 177. Google Streams for NHS From https://deepmind.com/applied/deepmind-health/working-partners/how-were-helping-today
  • 178. Open Sourcing TRFL From https://deepmind.com/blog/trfl/
  • 179. Open Sourcing TRFL (cont) From https://deepmind.com/blog/trfl/
  • 180. Multi-Task Learning (e.g.Atari) From https://deepmind.com/blog/preserving-outputs-precisely-while-adaptively-rescaling-targets/
  • 181. Multi-Task Learning (cont) From https://deepmind.com/blog/preserving-outputs-precisely-while-adaptively-rescaling-targets/
  • 182. Measuring Abstract Reasoning in Neural Nets From http://proceedings.mlr.press/v80/santoro18a/santoro18a.pdf Whether neural networks can learn abstract rea- soning or whether they merely rely on superficial statistics is a topic of recent debate. Here, we propose a dataset and challenge designed to probe abstract reasoning, inspired by a well-known human IQ test. To succeed at this challenge, models must cope with various generalisation ‘regimes’ in which the training and test data differ in clearly- defined ways. We show that popular models such as ResNets perform poorly, even when the train- ing and test sets differ only minimally, and we present a novel architecture, with a structure de- signed to encourage reasoning, that does signifi- cantly better. When we vary the way in which the test questions and training data differ, we find that our model is notably proficient at certain forms of generalisation, but notably weak at others. We further show that the model’s ability to generalise improves markedly if it is trained to predict sym- bolic explanations for its answers. Altogether, we introduce and explore ways to both measure and induce stronger abstract reasoning in neural networks. Our freely-available dataset should motivate further progress in this direction. One of the long-standing goals of artificial intelligence is to develop machines with abstract reasoning capabilities that equal or better those of humans. Though there has also been substantial progress in both reasoning and abstract represen- tation learning in neural nets (Botvinick et al., 2017; LeCun et al., 2015; Higgins et al., 2016; 2017), the extent to which these models exhibit anything like general abstract reason- ing is the subject of much debate (Garnelo et al., 2016; Lake & Baroni, 2017; Marcus, 2018). The research presented here was therefore motivated by two main goals. (1) To understand whether, and (2) to understand how, deep neural networks might be able to solve abstract visual reasoning problems. Our answer to (1) is that, with important caveats, neural networks can indeed learn to infer and apply abstract reason- ing principles. Our best performing model learned to solve complex visual reasoning questions, and to do so, it needed to induce and detect from raw pixel input the presence of abstract notions such as logical operations and arithmetic progressions, and apply these principles to never-before observed stimuli. Importantly, we found that the architec- ture of the model made a critical difference to its ability to learn and execute such processes. While standard visual- processing models such as CNNs and ResNets performed poorly, a model that promoted the representation of, and comparison between parts of the stimuli performed very well. We found ways to improve this performance via addi- tional supervision: the training outcomes and the model’s ability to generalise were improved if it was required to decode its representations into symbols corresponding to the reason behind the correct answer.
  • 183. Learning to Navigate Cities without a Map From https://arxiv.org/abs/1804.00168 Navigating through unstructured environments is a basic capability of intelligent creatures, and thus is of fundamental interest in the study and development of artificial intelligence. Long-range navigation is a complex cognitive task that relies on developing an internal representation of space, grounded by recognisable landmarks and robust visual processing, that can simultaneously support continuous self-localisation ("I am here") and a representation of the goal ("I am going there"). Building upon recent research that applies deep reinforcement learning to maze navigation problems, we present an end-to-end deep reinforcement learning approach that can be applied on a city scale. Recognising that successful navigation relies on integration of general policies with locale-specific knowledge, we propose a dual pathway architecture that allows locale-specific features to be encapsulated, while still enabling transfer to multiple cities. We present an interactive navigation environment that uses Google StreetView for its photographic content and worldwide coverage, and demonstrate that our learning method allows agents to learn to navigate multiple cities and to traverse to target destinations that may be kilometres away. The project webpage this http URL contains a video summarising our research and showing the trained agent in diverse city environments and on the transfer task, the form to request the StreetLearn dataset and links to further resources. The StreetLearn environment code is available at this https URL
  • 184. Learning to Generate Images From https://deepmind.com/blog/learning-to-generate-images/ Advances in deep generative networks have led to impressive results in recent years. Neverthe- less, such models can often waste their capacity on the minutiae of datasets, presumably due to weak inductive biases in their decoders. This is where graphics engines may come in handy since they abstract away low-level details and represent images as high-level programs. Current methods that combine deep learning and renderers are lim- ited by hand-crafted likelihood or distance func- tions, a need for large amounts of supervision, or difficulties in scaling their inference algorithms to richer datasets. To mitigate these issues, we present SPIRAL, an adversarially trained agent that generates a program which is executed by a graphics engine to interpret and sample images. The goal of this agent is to fool a discriminator network that distinguishes between real and ren- dered data, trained with a distributed reinforce- ment learning setup without any supervision. A surprising finding is that using the discrimina- tor’s output as a reward signal is the key to allow the agent to make meaningful progress at match- ing the desired output rendering. To the best of our knowledge, this is the first demonstration of an end-to-end, unsupervised and adversarial in- verse graphics agent on challenging real world (MNIST, OMNIGLOT, CELEBA) and synthetic 3D datasets. A video of the agent can be found at https://youtu.be/iSyvwAwa7vk.
  • 185. Neuron Deletion From https://deepmind.com/blog/understanding-deep-learning-through-neuron-deletion/ We measured the performance impact of damaging the network by deleting individual neurons as well as groups of neurons. Our experiments led to two surprising findings: • Although many previous studies have focused on understanding easily interpretable individual neurons (e.g. “cat neurons”, or neurons in the hidden layers of deep networks which are only active in response to images of cats), we found that these interpretable neurons are no more important than confusing neurons with difficult-to-interpret activity. • Networks which correctly classify unseen images are more resilient to neuron deletion than networks which can only classify images they have seen before. In other words, networks which generalise well are much less reliant on single directions than those which memorise. To evaluate neuron importance, we measured how network performance on image classification tasks changes when a neuron is deleted. If a neuron is very important, deleting it should be highly damaging and substantially decrease network performance, while the deletion of an unimportant neuron should have little impact. Neuroscientists routinely perform similar experiments, although they cannot achieve the fine-grained precision which is necessary for these experiments and readily available in artificial neural networks. Surprisingly, we found that there was little relationship between selectivity and importance. In other words, “cat neurons” were no more important than confusing neurons. This finding echoes recent work in neuroscience which has demonstrated that confusing neurons can actually be quite informative, and suggests that we must look beyond the most easily interpretable neurons in order to understand deep neural networks.
  • 186. Learning by Playing From https://arxiv.org/abs/1802.10567 We propose Scheduled Auxiliary Control (SAC- X), a new learning paradigm in the context of Reinforcement Learning (RL). SAC-X enables learning of complex behaviors – from scratch – in the presence of multiple sparse reward signals. To this end, the agent is equipped with a set of gen- eral auxiliary tasks, that it attempts to learn simultaneously via off-policy RL. The key idea behind our method is that active (learned) scheduling and execution of auxiliary policies allows the agent to efficiently explore its environment – enabling it to excel at sparse reward RL. Our experiments in several challenging robotic manipulation settings demonstrate the power of our approach. A video of the rich set of learned behaviors can be found at https://youtu.be/mPKyvocNe M. This paper introduces SAC-X, a method that simultaneously learns intention policies on a set of auxiliary tasks, and ac- tively schedules and executes these to explore its observation space - in search for sparse rewards of externally defined target tasks. Utilizing simple auxiliary tasks enables SAC-X to learn complicated target tasks from rewards defined in a ’pure’, sparse, manner: only the end goal is specified, but not the solution path. We demonstrated the power of SAC-X on several challenging robotics tasks in simulation, using a common set of simple and sparse auxiliary tasks and on a real robot. The learned intentions are highly reactive, reliable, and exhibit a rich and robust behavior. We consider this as an important step towards the goal of applying RL to real world domains.
  • 187. Scalable Distributed DeepRL From https://deepmind.com/blog/impala-scalable-distributed-deeprl-dmlab-30/ Deep Reinforcement Learning (DeepRL) has achieved remarkable success in a range of tasks, from continuous control problems in robotics to playing games like Go and Atari. The improvements seen in these domains have so far been limited to individual tasks where a separate agent has been tuned and trained for each task. In our most recent work, we explore the challenge of training a single agent on many tasks. Today we are releasing DMLab-30, a set of new tasks that span a large variety of challenges in a visually unified environment with a common action space. Training an agent to perform well on many tasks requires massive throughput and making efficient use of every data point. To this end, we have developed a new, highly scalable agent architecture for distributed training called Importance Weighted Actor-Learner Architecture that uses a new off-policy correction algorithm called V-trace DMLab-30 is a collection of new levels designed using our open source RL environment DeepMind Lab. These environments enable any DeepRL researcher to test systems on a large spectrum of interesting tasks either individually or in a multi- task setting.
  • 188. Scalable Distributed DeepRL (cont) From https://deepmind.com/blog/impala-scalable-distributed-deeprl-dmlab-30/ In order to tackle the challenging DMLab-30 suite, we developed a new distributed agent called Importance Weighted Actor-Learner Architecture that maximises data throughput using an efficient distributed architecture with TensorFlow. Importance Weighted Actor-Learner Architecture is inspired by the popular A3C architecture which uses multiple distributed actors to learn the agent’s parameters. In models like this, each of the actors uses a clone of the policy parameters to act in the environment. Periodically, actors pause their exploration to share the gradients they have computed with a central parameter server that applies updates.
  • 189. Learning Explanatory Rules from Noisy Data From https://deepmind.com/blog/learning-explanatory-rules-noisy-data/ The distinction is interesting to us because these two types of thinking correspond to two different approaches to machine learning: deep learning and symbolic program synthesis. Deep learning concentrates on intuitive perceptual thinking whereas symbolic program synthesis focuses on conceptual, rule-based thinking. Each system has different merits - deep learning systems are robust to noisy data but are difficult to interpret and require large amounts of data to train, whereas symbolic systems are much easier to interpret and require less training data but struggle with noisy data. While human cognition seamlessly combines these two distinct ways of thinking, it is much less clear whether or how it is possible to replicate this in a single AI system. Our new paper, recently published in JAIR, demonstrates it is possible for systems to combine intuitive perceptual with conceptual interpretable reasoning. The system we describe, ∂ILP, is robust to noise, data- efficient, and produces interpretable rules.
  • 190. Learning Explanatory Rules from Noisy Data (cont) From https://arxiv.org/abs/1802.01561 In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. A key challenge is to handle the increased amount of data and extended training time. We have developed a new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation. We achieve stable learning at high throughput by combining decoupled acting and learning with a novel off-policy correction method called V- trace. We demonstrate the effectiveness of IMPALA for multi-task reinforcement learning on DMLab-30 (a set of 30 tasks from the DeepMind Lab environment (Beattie et al., 2016)) and Atari-57 (all available Atari games in Arcade Learning Environment (Bellemare et al., 2013a)). Our results show that IMPALA is able to achieve better performance than previous agents with less data, and crucially exhibits positive transfer between tasks as a result of its multi-task approach.
  • 191. DeepMind Lab From https://deepmind.com/blog/open-sourcing-deepmind-lab/ The development of innovative agents goes hand in hand with the careful design and implementation of rationally selected, flexible and well-maintained environments. To that end, we at DeepMind have invested considerable effort toward building rich simulated environments to serve as “laboratories” for AI research. Now we are open-sourcing our flagship platform, DeepMind Lab, so the broader research community can make use of it. DeepMind Lab is a fully 3D game-like platform tailored for agent-based AI research. It is observed from a first-person viewpoint, through the eyes of the simulated agent. Scenes are rendered with rich science fiction-style visuals. The available actions allow agents to look around and move in 3D. The agent’s “body” is a floating orb. It levitates and moves by activating thrusters opposite its desired direction of movement, and it has a camera that moves around the main sphere as a ball-in-socket joint tracking the rotational look actions. Example tasks include collecting fruit, navigating in mazes, traversing dangerous passages while avoiding falling off cliffs, bouncing through space using launch pads to move between platforms, playing laser tag, and quickly learning and remembering random procedurally generated environments. An illustration of how agents in DeepMind Lab perceive and interact with the world can be seen below:
  • 192. Game Theory for Asymmetric Players From https://deepmind.com/blog/game-theory-insights-asymmetric-multi-agent-games/ As AI systems start to play an increasing role in the real world it is important to understand how different systems will interact with one another. In our latest paper, published in the journal Scientific Reports, we use a branch of game theory to shed light on this problem. In particular, we examine how two intelligent systems behave and respond in a particular type of situation known as an asymmetric game, which include Leduc poker and various board games such as Scotland Yard. Asymmetric games also naturally model certain real-world scenarios such as automated auctions where buyers and sellers operate with different motivations. Our results give us new insights into these situations and reveal a surprisingly simple way to analyse them. While our interest is in how this theory applies to the interaction of multiple AI systems, we believe the results could also be of use in economics, evolutionary biology and empirical game theory among others Game theory is a field of mathematics that is used to analyse the strategies used by decision makers in competitive situations. It can apply to humans, animals, and computers in various situations but is commonly used in AI research to study “multi-agent” environments where there is more than one system, for example several household robots cooperating to clean the house. Traditionally, the evolutionary dynamics of multi-agent systems have been analysed using simple, symmetric games, such as the classic Prisoner’s Dilemma, where each player has access to the same set of actions. Although these games can provide useful insights into how multi-agent systems work and tell us how to achieve a desirable outcome for all players - known as the Nash equilibrium - they cannot model all situations. Our new technique allows us to quickly and easily identify the strategies used to find the Nash equilibrium in more complex asymmetric games - characterised as games where each player has different strategies, goals and rewards. These games - and the new technique we use to understand them - can be illustrated using an example from ‘Battle of the Sexes’, a coordination game commonly used in game theory research. UPDATE 20/03/18: Our latest paper, forthcoming at the Autonomous Agents and Multi-Agent Systems conference (AAMAS), builds on the Scientific Reports paper outlined above. A Generalised Method for Empirical Game Theoretic Analysis introduces a general method to perform empirical analysis of multi-agent interactions, both in symmetric and asymmetric games. The method allows to understand how multi-agent strategies interact, what the attractors are and what the basins of attraction look like, giving an intuitive understanding for the strength of the involved strategies. Furthermore, it explains how many data samples to consider in order to guarantee that the equilibria of the approximating game are sufficiently reliable. We apply the method to several domains, including AlphaGo, Colonel Blotto and Leduc poker.
  • 193. A Generalised Method for Empirical Game Theoretic Analysis From https://arxiv.org/abs/1803.06376 This paper provides theoretical bounds for empirical game theoretical analysis of complex multi-agent interactions. We provide insights in the empirical meta game showing that a Nash equilibrium of the meta-game is an approximate Nash equilibrium of the true underlying game. We investigate and show how many data samples are required to obtain a close enough approximation of the underlying game. Additionally, we extend the meta-game analysis methodology to asymmetric games. The state-of-the-art has only considered empirical games in which agents have access to the same strategy sets and the payoff structure is symmetric, implying that agents are interchangeable. Finally, we carry out an empirical illustration of the generalised method in several domains, illustrating the theory and evolutionary dynamics of several versions of the AlphaGo algorithm (symmetric), the dynamics of the Colonel Blotto game played by human players on Facebook (symmetric), and an example of a meta-game in Leduc Poker (asymmetric), generated by the PSRO multi- agent learning algorithm.
  • 194. DeepMind 2017 Review From https://deepmind.com/blog/2017-deepminds-year-review/ The approach we take at DeepMind is inspired by neuroscience, helping to make progress in critical areas such as imagination, reasoning, memory and learning. Take imagination, for example: this distinctively human ability plays a crucial part in our daily lives, allowing us to plan and reason about the future, but is hugely challenging for computers. We continue to work hard on this problem, this year introducing imagination-augmented agents that are able to extract relevant information from an environment in order to plan what to do in the future. Separately, we made progress in the field of generative models. Just over a year ago we presented WaveNet, a deep neural network for generating raw audio waveforms that was capable of producing better and more realistic-sounding speech than existing techniques. At that time, the model was a research prototype and was too computationally intensive to work in consumer products. Over the last 12 months, our teams managed to create a new model that was 1000x faster. In October, we revealed that this new Parallel WaveNet is now being used in the real world, generating the Google Assistant voices for US English and Japanese. This is an example of the effort we invest in making it easier to build, train and optimise AI systems. Other techniques we worked on this year, such as distributional reinforcement learning, population based training for neural networks and new neural architecture search methods, promise to make systems easier to build, more accurate and quicker to optimise. We have also dedicated significant time to creating new and challenging environments in which to test our systems, including our work with Blizzard to open up StarCraft II for research But we know that technology is not value neutral. We cannot simply make progress in fundamental research without also taking responsibility for the ethical and social impact of our work. This drives our research in critical areas such as interpretability, where we have been exploring novel methods to understand and explain how our systems work. It’s also why we have an established technical safety team that continued to develop practical ways to ensure that we can depend on future systems and that they remain under meaningful human control.
  • 195. Population Based Training of Neural Networks From https://arxiv.org/abs/1711.09846 Neural networks dominate the modern machine learning landscape, but their training and success still suffer from sensitivity to empirical choices of hyperparameters such as model architecture, loss function, and optimisation algorithm. In this work we present emph{Population Based Training (PBT)}, a simple asynchronous optimisation algorithm which effectively utilises a fixed computational budget to jointly optimise a population of models and their hyperparameters to maximise performance. Importantly, PBT discovers a schedule of hyperparameter settings rather than following the generally sub- optimal strategy of trying to find a single fixed set to use for the whole course of training. With just a small modification to a typical distributed hyperparameter training framework, our method allows robust and reliable training of models. We demonstrate the effectiveness of PBT on deep reinforcement learning problems, showing faster wall-clock convergence and higher final performance of agents by optimising over a suite of hyperparameters. In addition, we show the same method can be applied to supervised learning for machine translation, where PBT is used to maximise the BLEU score directly, and also to training of Generative Adversarial Networks to maximise the Inception score of generated images. In all cases PBT results in the automatic discovery of hyperparameter schedules and model selection which results in stable training and better final performance.
  • 196. Neuroscience Inspired Artificial Intelligence From https://www.cell.com/neuron/fulltext/S0896-6273(17)30509-3 The fields of neuroscience and artificial intelligence (AI) have a long and intertwined history. In more recent times, however, communication and collaboration between the two fields has become less commonplace. In this article, we argue that better understanding biological brains could play a vital role in building intelligent machines. We survey historical interactions between the AI and neuroscience fields and emphasize current advances in AI that have been inspired by the study of neural computation in humans and other animals. We conclude by highlighting shared themes that may be key for advancing future research in both fields. In this perspective, we have reviewed some of the many ways in which neuroscience has made fundamental contributions to advancing AI research, and argued for its increasingly important relevance. In strategizing for the future exchange between the two fields, it is important to appreciate that the past contributions of neuroscience to AI have rarely involved a simple transfer of full-fledged solutions that could be directly re- implemented in machines. Rather, neuroscience has typically been useful in a subtler way, stimulating algorithmic-level questions about facets of animal learning and intelligence of interest to AI researchers and providing initial leads toward relevant mechanisms. As such, our view is that leveraging insights gained from neuroscience research will expedite progress in AI research, and this will be most effective if AI researchers actively initiate collaborations with neuroscientists to highlight key questions that could be addressed by empirical work. The successful transfer of insights gained from neuroscience to the development of AI algorithms is critically dependent on the interaction between researchers working in both these fields, with insights often developing through a continual handing back and forth of ideas between fields. In the future, we hope that greater collaboration between researchers in neuroscience and AI, and the identification of a common language between the two fields (Marblestone et al., 2016), will permit a virtuous circle whereby research is accelerated through shared theoretical insights and common empirical advances. We believe that the quest to develop AI will ultimately also lead to a better understanding of our own minds and thought processes. Distilling intelligence into an algorithmic construct and comparing it to the human brain might yield insights into some of the deepest and the most enduring mysteries of the mind, such as the nature of creativity, dreams, and perhaps one day, even consciousness.
  • 197. Toward an Integration of Deep Learning and Neuroscience From https://www.frontiersin.org/articles/10.3389/fncom.2016.00094/full Neuroscience has focused on the detailed implementation of computation, studying neural codes, dynamics and circuits. In machine learning, however, artificial neural networks tend to eschew precisely designed codes, dynamics or circuits in favor of brute force optimization of a cost function, often using simple and relatively uniform initial architectures. Two recent developments have emerged within machine learning that create an opportunity to connect these seemingly divergent perspectives. First, structured architectures are used, including dedicated systems for attention, recursion and various forms of short- and long-term memory storage. Second, cost functions and training procedures have become more complex and are varied across layers and over time. Here we think about the brain in terms of these ideas. We hypothesize that (1) the brain optimizes cost functions, (2) the cost functions are diverse and differ across brain locations and over development, and (3) optimization operates within a pre-structured architecture matched to the computational problems posed by behavior. In support of these hypotheses, we argue that a range of implementations of credit assignment through multiple layers of neurons are compatible with our current knowledge of neural circuitry, and that the brain's specialized systems can be interpreted as enabling efficient optimization for specific problem classes. Such a heterogeneously optimized system, enabled by a series of interacting cost functions, serves to make learning data-efficient and precisely targeted to the needs of the organism. We suggest directions by which neuroscience could seek to refine and test these hypotheses.
  • 198. Hippocampus Predictive Map From https://deepmind.com/blog/hippocampus-predictive-map/ In our new paper, in Nature Neuroscience, we apply a neuroscience lens to a longstanding mathematical theory from machine learning to provide new insights into the nature of learning and memory. Specifically, we propose that the area of the brain known as the hippocampus offers a unique solution to this problem by compactly summarising future events using what we call a “predictive map.” The hippocampus has traditionally been thought to only represent an animal’s current state, particularly in spatial tasks, such as navigating a maze. This view gained significant traction with the discovery of “place cells” in the rodent hippocampus, which fire selectively when the animal is in specific locations. While this theory accounts for many neurophysiological findings, it does not fully explain why the hippocampus is also involved in other functions, such as memory, relational reasoning, and decision making. Our new theory thinks about navigation as part of the more general problem of computing plans that maximise future reward. Our insights were derived from reinforcement learning, the subdiscipline of AI research that focuses on systems that learn by trial and error. The key computational idea we drew on is that to estimate future reward, an agent must first estimate how much immediate reward it expects to receive in each state, and then weight this expected reward by how often it expects to visit that state in the future. By summing up this weighted reward across all possible states, the agent obtains an estimate of future reward. Similarly, we argue that the hippocampus represents every situation - or state - in terms of the future states which it predicts. For example, if you are leaving work (your current state) your hippocampus might represent this by predicting that you will likely soon be on your commute, picking up your kids from school or, more distantly, at home. By representing each current state in terms of its anticipated successor states, the hippocampus conveys a compact summary of future events, known formally as the “successor representation”. We suggest that this specific form of predictive map allows the brain to adapt rapidly in environments with changing rewards, but without having to run expensive simulations of the future.
  • 199. Going Beyond Average for Neural Learning From https://deepmind.com/blog/going-beyond-average-reinforcement-learning/ Randomness is something we encounter everyday and has a profound effect on how we experience the world. The same is true in reinforcement learning (RL) applications, systems that learn by trial and error and are motivated by rewards. Typically, an RL algorithm predicts the average reward it receives from multiple attempts at a task, and uses this prediction to decide how to act. But random perturbations in the environment can alter its behaviour by changing the exact amount of reward the system receives. In a new paper, we show it is possible to model not only the average but also the full variation of this reward, what we call the value distribution. This results in RL systems that are more accurate and faster to train than previous models, and more importantly opens up the possibility of rethinking the whole of reinforcement learning. From https://arxiv.org/abs/1707.06887 In this paper we argue for the fundamental importance of the value distribution: the distribution of the random return received by a reinforcement learning agent. This is in contrast to the common approach to reinforcement learning which models the expectation of this return, or value. Although there is an established body of literature studying the value distribution, thus far it has always been used for a specific purpose such as implementing risk-aware behaviour. We begin with theoretical results in both the policy evaluation and control settings, exposing a significant distributional instability in the latter. We then use the distributional perspective to design a new algorithm which applies Bellman's equation to the learning of approximate value distributions. We evaluate our algorithm using the suite of games from the Arcade Learning Environment. We obtain both state-of-the-art results and anecdotal evidence demonstrating the importance of the value distribution in approximate reinforcement learning. Finally, we combine theoretical and empirical evidence to highlight the ways in which the value distribution impacts learning in the approximate setting.
  • 200. Agents that Imagine and Plan From https://deepmind.com/blog/agents-imagine-and-plan/ In two new papers, we describe a new family of approaches for imagination-based planning. We also introduce architectures which provide new ways for agents to learn and construct plans to maximise the efficiency of a task. These architectures are efficient, robust to complex and imperfect models, and can adopt flexible strategies for exploiting their imagination. Imagination-augmented agents The agents we introduce benefit from an ‘imagination encoder’- a neural network which learns to extract any information useful for the agent’s future decisions, but ignore that which is not relevant. These agents have a number of distinct features: • they learn to interpret their internal simulations. This allows them to use models which coarsely capture the environmental dynamics, even when those dynamics are not perfect. • they use their imagination efficiently. They do this by adapting the number of imagined trajectories to suit the problem. Efficiency is also enhanced by the encoder, which is able to extract additional information from imagination beyond rewards - these trajectories may contain useful clues even if they do not necessarily result in high reward. • they can learn different strategies to construct plans. They do this by choosing between continuing a current imagined trajectory or restarting from scratch. Alternatively, they can use different imagination models, with different accuracies and computational costs. This offers them a broad spectrum of effective planning strategies, rather than being restricted to a one-size-fits-all approach which might limit adaptability in imperfect environments.
  • 201. Agents that Imagine and Plan From https://arxiv.org/abs/1707.06203 We introduce Imagination-Augmented Agents (I2As), a novel architecture for deep reinforcement learning combining model-free and model-based aspects. In contrast to most existing model-based reinforcement learning and planning methods, which prescribe how a model should be used to arrive at a policy, I2As learn to interpret predictions from a learned environment model to construct implicit plans in arbitrary ways, by using the predictions as additional context in deep policy networks. I2As show improved data efficiency, performance, and robustness to model misspecification compared to several baselines. From https://arxiv.org/abs/1707.06170 Conventional wisdom holds that model-based planning is a powerful approach to sequential decision-making. It is often very challenging in practice, however, because while a model can be used to evaluate a plan, it does not prescribe how to construct a plan. Here we introduce the "Imagination-based Planner", the first model-based, sequential decision-making agent that can learn to construct, evaluate, and execute plans. Before any action, it can perform a variable number of imagination steps, which involve proposing an imagined action and evaluating it with its model-based imagination. All imagined actions and outcomes are aggregated, iteratively, into a "plan context" which conditions future real and imagined actions. The agent can even decide how to imagine: testing out alternative imagined actions, chaining sequences of actions together, or building a more complex "imagination tree" by navigating flexibly among the previously imagined states using a learned policy. And our agent can learn to plan economically, jointly optimizing for external rewards and computational costs associated with using its imagination. We show that our architecture can learn to solve a challenging continuous control problem, and also learn elaborate planning strategies in a discrete maze-solving task. Our work opens a new direction toward learning the components of a model-based planning system and how to use them.
  • 202. Creating NewVisual Concepts From https://deepmind.com/blog/imagine-creating-new-visual-concepts-recombining-familiar-ones/ In our new paper, we propose a novel theoretical approach to address this problem. We also demonstrate a new neural network component called the Symbol-Concept Association Network (SCAN), that can, for the first time, learn a grounded visual concept hierarchy in a way that mimics human vision and word acquisition, enabling it to imagine novel concepts guided by language instructions. Our approach can be summarised as follows: • The SCAN model experiences the visual world in the same way as a young baby might during the first few months of life. This is the period when the baby’s eyes are still unable to focus on anything more than an arm’s length away, and the baby essentially spends all her time observing various objects coming into view, moving and rotating in front of her. To emulate this process, we placed SCAN in a simulated 3D world of DeepMind Lab, where, like a baby in a cot, it could not move, but it could rotate its head and observe one of three possible objects presented to it against various coloured backgrounds - a hat, a suitcase or an ice lolly. Like the baby’s visual system, our model learns the basic structure of the visual world and how to represent objects in terms of interpretable visual “primitives”. For example, when looking at an apple, the model will learn to represent it in terms of its colour, shape, size, position or lighting. From https://arxiv.org/abs/1707.03389 The seemingly infinite diversity of the natural world arises from a relatively small set of coherent rules, such as the laws of physics or chemistry. We conjecture that these rules give rise to regularities that can be discovered through primarily unsupervised experiences and represented as abstract concepts. If such representations are compositional and hierarchical, they can be recombined into an exponentially large set of new concepts. This paper describes SCAN (Symbol-Concept Association Network), a new framework for learning such abstractions in the visual domain. SCAN learns concepts through fast symbol association, grounding them in disentangled visual primitives that are discovered in an unsupervised manner. Unlike state of the art multimodal generative model baselines, our approach requires very few pairings between symbols and images and makes no assumptions about the form of symbol representations. Once trained, SCAN is capable of multimodal bi-directional inference, generating a diverse set of image samples from symbolic descriptions and vice versa. It also allows for traversal and manipulation of the implicit hierarchy of visual concepts through symbolic instructions and learnt logical recombination operations. Such manipulations enable SCAN to break away from its training data distribution and imagine novel visual concepts through symbolically instructed recombination of previously learnt concepts.
  • 203. Producing Flexible Behaviors in Simulation Environments From https://deepmind.com/blog/producing-flexible-behaviours-simulated-environments/ True motor intelligence requires learning how to control and coordinate a flexible body to solve tasks in a range of complex environments. Existing attempts to control physically simulated humanoid bodies come from diverse fields, including computer animation and biomechanics. A trend has been to use hand- crafted objectives, sometimes with motion capture data, to produce specific behaviors. However, this may require considerable engineering effort, and can result in restricted behaviours or behaviours that may be difficult to repurpose for new tasks. In three new papers, we seek ways to produce flexible and natural behaviours that can be reused and adapted to solve tasks. Read: Emergence of locomotion behaviours in rich environments Learning human behaviours from motion capture by adversarial imitation Robust imitation of diverse behaviours Achieving flexible and adaptive control of simulated bodies is a key element of AI research. Our work aims to develop flexible systems which learn and adapt skills to solve motor control tasks while reducing the manual engineering required to achieve this goal. Future work could extend these approaches to enable coordination of a greater range of behaviours in more complex situations.
  • 204. Producing Flexible Behaviors in Simulation Environments The reinforcement learning paradigm allows, in principle, for complex behaviours to be learned directly from simple reward signals. In practice, however, it is common to carefully hand-design the reward function to encourage a particular solution, or to derive it from demonstration data. In this paper explore how a rich environment can help to promote the learning of complex behavior. Specifically, we train agents in diverse environmental contexts, and find that this encourages the emergence of robust behaviours that perform well across a suite of tasks. We demonstrate this principle for locomotion -- behaviours that are known for their sensitivity to the choice of reward. We train several simulated bodies on a diverse set of challenging terrains and obstacles, using a simple reward function based on forward progress. Using a novel scalable variant of policy gradient reinforcement learning, our agents learn to run, jump, crouch and turn as required by the environment without explicit reward-based guidance. A visual depiction of highlights of the learned behavior can be viewed following this https URL . Learning human behaviours from motion capture by adversarial imitation Emergence of locomotion behaviours in rich environments Rapid progress in deep reinforcement learning has made it increasingly feasible to train controllers for high- dimensional humanoid bodies. However, methods that use pure reinforcement learning with simple reward functions tend to produce non-humanlike and overly stereotyped movement behaviors. In this work, we extend generative adversarial imitation learning to enable training of generic neural network policies to produce humanlike movement patterns from limited demonstrations consisting only of partially observed state features, without access to actions, even when the demonstrations come from a body with different and unknown physical parameters. We leverage this approach to build sub-skill policies from motion capture data and show that they can be reused to solve tasks when controlled by a higher level controller. Robust imitation of diverse behaviours Deep generative models have recently shown great promise in imitation learning for motor control. Given enough data, even supervised approaches can do one-shot imitation learning; however, they are vulnerable to cascading failures when the agent trajectory diverges from the demonstrations. Compared to purely supervised methods, Generative Adversarial Imitation Learning (GAIL) can learn more robust controllers from fewer demonstrations, but is inherently mode-seeking and more difficult to train. In this paper, we show how to combine the favourable aspects of these two approaches. The base of our model is a new type of variational autoencoder on demonstration trajectories that learns semantic policy embeddings. We show that these embeddings can be learned on a 9 DoF Jaco robot arm in reaching tasks, and then smoothly interpolated with a resulting smooth interpolation of reaching behavior. Leveraging these policy representations, we develop a new version of GAIL that (1) is much more robust than the purely-supervised controller, especially with few demonstrations, and (2) avoids mode collapse, capturing many diverse behaviors when GAIL on its own does not. We demonstrate our approach on learning diverse gaits from demonstration on a 2D biped and a 62 DoF 3D humanoid in the MuJoCo physics environment.
  • 205. DQN - Deep Reinforcement Learning From https://deepmind.com/research/dqn Nature Paper https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf
  • 206. DQN - Deep Reinforcement Learning Paper From https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf
  • 207. From https://arxiv.org/pdf/1908.04734.pdf Reward Tampering Problems and Solutions in Reinforcement Learning
  • 208. PathNet from Google DeepMind From https://arxiv.org/pdf/1701.08734.pdf For artificial general intelligence (AGI) it would be efficient if multiple users trained the same giant neural network, per- mitting parameter reuse, without catastrophic forgetting. PathNet is a first step in this direction. It is a neural net- work algorithm that uses agents embedded in the neural net- work whose task is to discover which parts of the network to re-use for new tasks. Agents are pathways (views) through the network which determine the subset of parameters that are used and updated by the forwards and backwards passes of the backpropogation algorithm. During learning, a tour- nament selection genetic algorithm is used to select path- ways through the neural network for replication and muta- tion. Pathway fitness is the performance of that pathway measured according to a cost function. We demonstrate successful transfer learning; fixing the parameters along a path learned on task A and re-evolving a new population of paths for task B, allows task B to be learned faster than it could be learned from scratch or after fine-tuning. Paths evolved on task B re-use parts of the optimal path evolved on task A. Positive transfer was demonstrated for binary MNIST, CIFAR, and SVHN supervised learning classifica- tion tasks, and a set of Atari and Labyrinth reinforcement learning tasks, suggesting PathNets have general applicabil- ity for neural network training. Finally, PathNet also signif- icantly improves the robustness to hyperparameter choices of a parallel asynchronous reinforcement learning algorithm Pathways
  • 210. 2020 References • Future of Deep Learning https://thenextweb.com/neural/2020/04/05/self-supervised-learning-is-the-future-of-ai-syndication/ • Turing Award Winners Video https://www.youtube.com/watch?v=UX8OubxsY8w • MIT Deep Learning Video https://www.youtube.com/watch?v=0VH1Lim8gL8
  • 211. Three Challenges of Deep Learning fromYann LeCun From https://thenextweb.com/neural/2020/04/05/self-supervised-learning-is-the-future-of-ai-syndication/ 1. First, we need to develop AI systems that learn with fewer samples or fewer trials.“My suggestion is to use unsupervised learning, or I prefer to call it self-supervised learning because the algorithms we use are really akin to supervised learning, which is basically learning to fill in the blanks,” LeCun says.“Basically, it’s the idea of learning to represent the world before learning a task.This is what babies and animals do.We run about the world, we learn how it works before we learn any task. Once we have good representations of the world, learning a task requires few trials and few samples.” 2. The second challenge is creating deep learning systems that can reason. Current deep learning systems are notoriously bad at reasoning and abstraction, which is why they need huge amounts of data to learn simple tasks.“The question is, how do we go beyond feed-forward computation and system 1? How do we make reasoning compatible with gradient-based learning? How do we make reasoning differentiable? That’s the bottom line,” LeCun said. System 1 is the kind of learning tasks that don’t require active thinking, such as navigating a known area or making small calculations. System 2 is the more active kind of thinking, which requires reasoning. Symbolic artificial intelligence, the classic approach to AI, has proven to be much better at reasoning and abstraction. 3.The third challenge is to create deep learning systems that can lean and plan complex action sequences, and decompose tasks into subtasks. Deep learning systems are good at providing end-to-end solutions to problems but very bad at breaking them down into specific interpretable and modifiable steps.There have been advances in creating learning-based AI systems that can decompose images, speech, and text. Capsule networks, invented by Geoffry Hinton, address some of these challenges. But learning to reason about complex tasks is beyond today’s AI.“We have no idea how to do this,” LeCun admits.
  • 212. Foundation Models From https://research.ibm.com/blog/what-are-foundation-models In recent years, we’ve managed to build AI systems that can learn from thousands, or millions, of examples to help us better understand our world, or find new solutions to difficult problems. These large-scale models have led to systems that can understand when we talk or write, such as the natural-language processing and understanding programs we use every day, from digital assistants to speech-to-text programs. Other systems, trained on things like the entire work of famous artists, or every chemistry textbook in existence, have allowed us to build generative models that can create new works of art based on those styles, or new compound ideas based on the history of chemical research. While many new AI systems are helping solve all sorts of real-world problems, creating and deploying each new system often requires a considerable amount of time and resources. For each new application, you need to ensure that there’s a large, well-labelled dataset for the specific task you want to tackle. If a dataset didn’t exist, you’d have to have people spend hundreds or thousands of hours finding and labelling appropriate images, text, or graphs for the dataset. Then the AI model has to learn to recognize everything in the dataset, and then it can be applied to the use case you have, from recognizing language to generating new molecules for drug discovery. And training one large natural-language processing model, for example, has roughly the same carbon footprint as running five cars over their lifetime. The next wave in AI looks to replace the task-specific models that have dominated the AI landscape to date. The future is models that are trained on a broad set of unlabeled data that can be used for different tasks, with minimal fine-tuning. These are called foundation models, a term first popularized by the Stanford Institute for Human-Centered Artificial Intelligence. We’ve seen the first glimmers of the potential of foundation models in the worlds of imagery and language. Early examples of models, like GPT-3, BERT, or DALL-E 2, have shown what’s possible. Input a short prompt, and the system generates an entire essay, or a complex image, based on your parameters, even if it wasn’t specifically trained on how to execute that exact argument or generate an image in that way. What makes these new systems foundation models is that they, as the name suggests, can be the foundation for many applications of the AI model. Using self-supervised learning and transfer learning, the model can apply information it’s learnt about one situation to another. While the amount of data is considerably more than the average person needs to transfer understanding from one task to another, the end result is relatively similar: You learn to drive on one car, for example, and without too much effort, you can drive most other cars — or even a truck or a bus.
  • 213. Challenges and Risks of Foundation Models From https://arxiv.org/pdf/2108.07258.pdf AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature. This report investigates an emerging paradigm for building artificial intelligence (AI) systems based on a general class of models which we term foundation models.2 A foundation model is any model that is trained on broad data (generally using self-supervision at scale) that can be adapted (e.g., fine-tuned) to a wide range of downstream tasks; current examples include BERT [Devlin et al . 2019], GPT-3 [Brown et al . 2020], and CLIP [Radford et al . 2021]. From a technological point of view, foundation models are not new — they are based on deep neural networks and self-supervised learning, both of which have existed for decades. However, the sheer scale and scope of foundation models from the last few years have stretched our imagination of what is possible; for example, GPT-3 has 175 billion parameters and can be adapted via natural language prompts to do a passable job on a wide range of tasks despite not being trained explicitly to do many of those tasks [Brown et al. 2020]. At the same time, existing foundation models have the potential to accentuate harms, and their characteristics are in general poorly understood. Given their impending widespread deployment, they have become a topic of intense scrutiny [Bender et al. 2021]
  • 214. Capsule Neural Nets From https://en.wikipedia.org/wiki/Capsule_neural_network A Capsule Neural Network (CapsNet) is a machine learning system that is a type of artificial neural network (ANN) that can be used to better model hierarchical relationships. The approach is an attempt to more closely mimic biological neural organization.[1] The idea is to add structures called “capsules” to a convolutional neural network (CNN), and to reuse output from several of those capsules to form more stable (with respect to various perturbations) representations for higher capsules.[2] The output is a vector consisting of the probability of an observation, and a pose for that observation. This vector is similar to what is done for example when doing classification with localization in CNNs. Among other benefits, capsnets address the "Picasso problem" in image recognition: images that have all the right parts but that are not in the correct spatial relationship (e.g., in a "face", the positions of the mouth and one eye are switched). For image recognition, capsnets exploit the fact that while viewpoint changes have nonlinear effects at the pixel level, they have linear effects at the part/object level.[3] This can be compared to inverting the rendering of an object of multiple parts. [4]
  • 217. Dalle-2 From https://www.nytimes.com/2022/08/24/technology/ai-technology-progress.html For the past few days, I’ve been playing around with DALL-E 2, an app developed by the San Francisco company OpenAI that turns text descriptions into hyper-realistic images. What’s impressive about DALL-E 2 isn’t just the art it generates. It’s how it generates art. These aren’t composites made out of existing internet images — they’re wholly new creations made through a complex A.I. process known as “diffusion,” which starts with a random series of pixels and refines it repeatedly until it matches a given text description. And it’s improving quickly — DALL-E 2’s images are four times as detailed as the images generated by the original DALL-E, which was introduced only last year. DALL-E 2 got a lot of attention when it was announced this year, and rightfully so. It’s an impressive piece of technology with big implications for anyone who makes a living working with images — illustrators, graphic designers, photographers and so on. It also raises important questions about what all of this A.I.-generated art will be used for, and whether we need to worry about a surge in synthetic propaganda, hyper-realistic deepfakes or even nonconsensual pornography. Dalle-2 available to all If you've been itching to try OpenAI's image synthesis tool but have been stymied by the lack of an invitation, now's your chance. Today, OpenAI announced that it removed the waitlist for its DALL-E AI image generator service. That means anyone can sign up and use it. DALL-E is a deep learning image synthesis model that has been trained on hundreds of millions of images pulled from the Internet. It uses a technique called latent diffusion to learn associations between words and images. As a result, DALL-E users can type in a text description—called a prompt—and see it rendered visually as a 1024×1024 pixel image in almost any artistic style.
  • 218. Make-a-Video From https://makeavideo.studio/ Make-A-Video research builds on the recent progress made in text-to-image generation technology built to enable text-to-video generation. The system uses images with descriptions to learn what the world looks like and how it is often described. It also uses unlabeled videos to learn how the world moves. With this data, Make-A-Video lets you bring your imagination to life by generating whimsical, one-of-a- kind videos with just a few words or lines of text. From Make-a-Video Paper We propose Make-A-Video – an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V). Our intuition is simple: learn what the world looks like and how it is described from paired text-image data, and learn how the world moves from unsupervised video footage. Make-A-Video has three advantages: (1) it accelerates training of the T2V model (it does not need to learn visual and multimodal representations from scratch), (2) it does not require paired text-video data, and (3) the generated videos inherit the vastness (diversity in aesthetic, fantastical depictions, etc.) of today’s image generation models. We design a simple yet effective way to build on T2I models with novel and effective spatial-temporal modules. First, we decompose the full temporal U-Net and attention tensors and approximate them in space and time. Second, we design a spatial temporal pipeline to generate high resolution and frame rate videos with a video decoder, interpolation model and two super resolution models that can enable various applications besides T2V. In all aspects, spatial and temporal resolution, faithfulness to text, and quality, Make-A-Video sets the new state-of-the-art in text-to-video generation, as determined by both qualitative and quantitative measures
  • 219. Concerns for Deep Learning by Gary Marcus From https://arxiv.org/ftp/arxiv/papers/1801/1801.00631.pdf Deep Learning thus far: • Is data hungry • Is shallow and has limited capacity for transfer • Has no natural way to deal with hierarchical structure • Has struggled with open-ended inference • Is not sufficiently transparent • Has not been well integrated with prior knowledge • Cannot inherently distinguish causation from correlation • Presumes a largely stable world, in ways that may be problematic • Works well as an approximation, but answers often can’t be fully trusted • Is difficult to engineer with
  • 220. Causal Reasoning and Deep Learning (Advanced)
  • 221. z Causal Reasoning and Transfer Learning From A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms We propose to meta-learn causal structures based on how fast a learner adapts to new distributions arising from sparse distributional changes, e.g. due to interventions, actions of agents and other sources of non-stationarities. We show that under this assumption, the correct causal structural choices lead to faster adaptation to modified distributions because the changes are concentrated in one or just a few mechanisms when the learned knowledge is modularized appropriately. This leads to sparse expected gradients and a lower effective number of degrees of freedom needing to be relearned while adapting to the change. It motivates using the speed of adaptation to a modified distribution as a meta-learning objective. We demonstrate how this can be used to determine the cause-effect relationship between two observed variables. The distributional changes do not need to correspond to standard interventions (clamping a variable), and the learner has no direct knowledge of these interventions. We show that causal structures can be parameterized via continuous variables and learned end-to-end. We then explore how these ideas could be used to also learn an encoder that would map low-level observed variables to unobserved causal variables leading to faster adaptation out- of-distribution, learning a representation space where one can satisfy the assumptions of independent mechanisms and of small and sparse changes in these mechanisms due to actions and non-stationarities. Causal Deep Learning from Bengio From https://www.wired.com/story/ai-pioneer-algorithms-understand-why/ From https://arxiv.org/abs/1901.10912
  • 222. z Causal Reasoning and Transfer Learning A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms From https://arxiv.org/abs/1901.10912 Proposition 1. The expected gradient over the transfer distribution of the regret (accumulated negative log-likelihood during the adaptation episode) with respect to the module parameters is zero for the parameters of the modules that (a) were correctly learned in the training phase, and (b) have the correct set of causal parents, corresponding to the ground truth causal graph, if (c) the corresponding ground truth conditional distributions did not change from the training distribution to the transfer distribution. Adaptation to the transfer distribution, as more transfer distribution examples are seen by the learner (horizontal axis), in terms of the log-likelihood on the transfer distribution (on a large test set from the transfer distribution, tested after each update of the parameters). Here the model is discrete, withN= 10. Curves are the median over 10 000 runs, with 25-75% quantiles intervals,for both the correct causal model (blue, top) and the incorrect one (red, bottom). We see that the correct causal model adapts faster (smaller regret), and that the most informative part of the trajectory (where the two models generalize the most differently) is in the first 10-20 examples
  • 223. z Causal Reasoning and Transfer Learning A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms From https://arxiv.org/abs/1901.10912 Equation (2) R = − log [sigmoid(γ)LA→B + (1 − sigmoid(γ))LB→A]
  • 224. z Causal Reasoning and Transfer Learning A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms From https://arxiv.org/abs/1901.10912
  • 226. z Causal Reasoning and Deep Learning References http://causality.cs.ucla.edu/blog/ http://causality.cs.ucla.edu/ https://www.google.com/search?client=firefox-b-1-d&q=deep+learning+causal+analysis https://arxiv.org/search/?query=causal&searchtype=title&source=header https://arxiv.org/abs/1901.10912 https://www.ericsson.com/en/blog/2020/2/causal-inference-machine-learning https://towardsdatascience.com/introduction-to-causality-in-machine- learning-4cee9467f06f
  • 227. References • Neural Networks and Deep Learning:A Textbook • Deep Learning (Adaptive Computation and Machine Learning series) • The Deep Learning Revolution (The MIT Press) • Introduction to Deep Learning (The MIT Press) • Deep Learning with PythonAn Introduction to Deep Reinforcement Learning • World Models • Learning and Querying Fast Generative Models for Reinforcement Learning • Imagination-Augmented Agents for Deep Reinforcement Learning • Neural Networks and Deep Learning:A Textbook • Google Brain • Convolutional Neural Nets (Detailed introduction) • Future of Deep Learning
  • 228. References (cont) • Recurrent Neural Networks • Guide to LSTM and Recurrent Neural Networks • Enterprise Deep Learning • 6 AI Trends for 2019 • Designing Neural Nets through Neural Evolution • Compositional Pattern Producing Networks • Deep Generator Networks • Deep Reinforcement Learning Course • N-Grams • A Beginners Guide to Deep Reinforcement Learning with many links • Verifiable AI from Specifications • Amazon Deep Learning Containers • A Deep Dive in to Deep Learning
  • 229. Google AI References • https://ai.google/research/pubs/?area=AlgorithmsandTheory • https://ai.google/research/pubs/?area=DistributedSystemsandParallelComputing • https://ai.google/research/pubs/?area=MachineTranslation • https://ai.google/research/pubs/?area=MachineIntelligence • https://ai.google/research/pubs/?area=MachinePerception • https://ai.google/research/pubs/?area=DataManagement • https://ai.google/research/pubs/?area=InformationRetrievalandtheWeb • https://ai.google/research/pubs/?area=NaturalLanguageProcessing • https://ai.google/research/pubs/?area=SpeechProcessing • Deep Mind Publications
  • 230. Deep Mind References DeepMind Home page https://deepmind.com/ DeepMind Research https://deepmind.com/research/ https://deepmind.com/research/publications/ DeepMind Blog https://deepmind.com/blog DeepMind Applied https://deepmind.com/applied Deep Compressed Sensing https://arxiv.org/pdf/1905.06723.pdf Deep Mind NIPS Papers https://deepmind.com/blog/deepmind-papers-nips-2017/ DeepMind Papers at ICML 2018 https://deepmind.com/blog/deepmind-papers-icml-2018/ DeepMind Papers at ICLR 2018 https://deepmind.com/blog/deepmind-papers-iclr-2018/ Proceedings of ICML Program 2018 http://proceedings.mlr.press/v97/
  • 231. References (cont) • OpenAI • OpenAI Blog • OpenAI Research • Deep Learning Book Lecture Notes • Deep Learning Course Lecture Notes • Bayesian Deep Learning Resources • Gradient Boosting Algorithms • Deep Mind Research • David Inouye Papers • Jeff Klune’s Research • Jeff Hawkins Books • Numenta • Reinforcement Learning Book