AGI Part 1.pdf

Artificial General
Intelligence 1
Bob Marcus
robert.marcus@et-strategies.com
Part 1 of 4 parts: Artificial Intelligence and Machine Learning

This is a first cut.
More details will be added later.

Part 1: Artificial Intelligence (AI)
Part 2: Natural Intelligence(NI)
Part 3: Artificial General Intelligence (AI + NI)
Part 4: Networked AGI Layer on top or Gaia and Human Society
Four Slide Sets on Artificial General Intelligence
AI = Artificial Intelligence (Task)
AGI = Artificial Mind (Simulation)
AB = Artificial Brain (Emulation)
AC = Artificial Consciousness (Synthetic)
AI < AGI < ? AB <AC (Is a partial brain emulation needed to create a mind?)
Mind is not required for task proficiency
Full Natural Brain architecture is not required for a mind
Consciousness is not required for a natural brain architecture

Philosophical Musings 10/2022
Focused Artifical Intelligence (AI) will get better at specific tasks
Specific AI implementations will probably exceed human performance in most tasks
Some will attain superhuman abilities is a wide range of tasks
“Common Sense” = low-level experiential broad knowledge could be an exception
Some AIs could use brain inspired architectures to improve complex ask performance
This is not equivalent to human or artificial general intelligence (AGI)
However networking task-centric AIs could provide a first step towards AGI
This is similar to the way human society achieves power from communication
The combination of the networked AIs could be the foundation of an artificial mind
In a similar fashion, human society can accomplish complex tasks without being conscious
Distributed division of labor enable tasks to be assigned to the most competent element
Networked humans and AIs could cooperate through brain-machine interfaces
In the brain, consciousness provides direction to the mind
In large societies, governments perform the role of conscious direction
With networked AIs, a “conscious operating system”could play a similar role.
This would probably have to be initially programmed by humans.
If the AI network included sensors, actuators, and robots it could be aware of the world
The AI network could form a grid managing society, biology, and geology layers
A conscious AI network could develop its own goals beyond eﬃcient management
Humans in the loop could be valuable in providing common sense and protective oversight

Outline
Classical AI
Knowledge Representation
Agents
Classical Machine Learning
Deep Learning
Deep Learning Models
Deep Learning Hardware
Reinforcement Learning
Google Research
Computing and Sensing Architecture
IoT and Deep Learning
DeepMind
Deep Learning 2020
Causal Reasoning and Deep Learning
References

Classical AI
Classical Paper Awards 1999-2022

Top 100 AI Start-ups
From https://singularityhub.com/2020/03/30/the-top-100-ai-startups-out-there-now-and-what-theyre-working-on/

Classical AI Tools
Lisp
https://en.wikipedia.org/wiki/Lisp_(programming_language)
Prolog
https://www.geeksforgeeks.org/prolog-an-introduction/
Knowledge Representation
https://en.wikipedia.org/wiki/Knowledge_representation_and_reasoning
Decision Trees
https://en.wikipedia.org/wiki/Decision_tree
Forward and Backward Chaining
https://www.section.io/engineering-education/forward-and-backward-chaining-in-ai/
Constraint Satisfaction
https://en.wikipedia.org/wiki/Constraint_satisfaction
OPS5
https://en.wikipedia.org/wiki/OPS5

Classical AI Systems
CYC
https://en.wikipedia.org/wiki/Cyc
Expert Systems
https://en.wikipedia.org/wiki/Expert_system
XCON
https://en.wikipedia.org/wiki/Xcon
MYCIN
https://en.wikipedia.org/wiki/Mycin
MYCON
https://www.slideshare.net/bobmarcus/1986-multilevel-constraintbased-configuration-article
https://www.slideshare.net/bobmarcus/1986-mycon-multilevel-constraint-based-configuration

Stored Knowledge Base
From https://www.researchgate.net/publication/327926311_Development_of_a_knowledge_base_based_on_context_analysis_of_external_information_resources/figures?lo=1

Pre-defined Models
From https://intelligence.org/2015/07/27/miris-approach/

AI Agents
From https://www.geeksforgeeks.org/agents-artificial-intelligence/

Intelligent Agents
From https://en.wikipedia.org/wiki/Intelligent_agent
In artificial intelligence, an intelligent agent (IA) is anything which perceives its environment, takes actions autonomously in order to achieve goals, and may improve
its performance with learning or may use knowledge. They may be simple or complex — a thermostat is considered an example of an intelligent agent, as is a human
being, as is any system that meets the definition, such as a firm, a state, or a biome.[1]
Leading AI textbooks define "artificial intelligence" as the "study and design of intelligent agents", a definition that considers goal-directed behavior to be the essence of
intelligence. Goal-directed agents are also described using a term borrowed from economics, "rational agent".[1]
An agent has an "objective function" that encapsulates all the IA's goals. Such an agent is designed to create and execute whatever plan will, upon completion, maximize
the expected value of the objective function.[2] For example, a reinforcement learning agent has a "reward function" that allows the programmers to shape the IA's desired
behavior,[3] and an evolutionary algorithm's behavior is shaped by a "fitness function".[4]
Intelligent agents in artificial intelligence are closely related to agents in economics, and versions of the intelligent agent paradigm are studied in cognitive science,
ethics, the philosophy of practical reason, as well as in many interdisciplinary socio-cognitive modeling and computer social simulations.
Intelligent agents are often described schematically as an abstract functional system similar to a computer program. Abstract descriptions of intelligent agents are called
abstract intelligent agents (AIA) to distinguish them from their real world implementations. An autonomous intelligent agent is designed to function in the absence of
human intervention. Intelligent agents are also closely related to software agents (an autonomous computer program that carries out tasks on behalf of users).

Node in Real-Time Control System (RCS) by Albus
From https://en.wikipedia.org/wiki/4D-RCS_Reference_Model_Architecture

Intelligent Agents for Network Management
From https://www.ericsson.com/en/blog/2022/6/who-are-the-intelligent-agents-in-network-operations-and-why-we-need-them

Intelligent Agents on the Web
From https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.230.5806&rep=rep1&type=pdf
Intelligent agents are goal-driven and autonomous, and can communicate and interact with each other. Moreover,
they can evaluate information obtained online from heterogeneoussources and present information tailored to an
individual’s needs. This article covers different facets of the intelligent agent paradigm and applications, while also
exploring new opportunities and trends for intelligent agents.
IAs cover several functionalities, ranging from adaptive user interfaces (called interface agents) tointelligent
mobile processes that cooperate with other agents to coordinate their activities in a distributed manner. The
requirements for IAs remain open for discussion. An agent should be able to:
• interact with humans and other agents
• anticipate user needs for information
• adapt to changes in user needs and the environment
• cope with heterogeneity of information and other agents.
The following attributes characterize an IA-based systems’ main capabilities:
• Intelligence. The method an agent uses to de-velop its intelligence includes using the agent’sown software
content and knowledge representation, which describes vocabulary data, conditions, goals, and tasks.
• Continuity. An agent is a continuously running process that can detect changes in its environment, modify its
behavior, and update its knowledge base (which describes the environment).
• Communication. An agent can communicate with other agents to achieve its goals, and it can interact with users
directly by using appropriate interfaces.
• Cooperation. An agent automatically customizes itself to its users’ needs based on previous experiences and
monitored profiles.
• Mobility. The degree of mobility with which an agent can perform varies from remote execution, in which the
agent is transferred from
a distant system, to a situation in which the agent creates new agents, dies, or executes partially during migratiion

Smart Agents 2022 Comparison
From https://www.businessnewsdaily.com/10315-siri-cortana-google-assistant-amazon-alexa-face-off.html
When AI assistants first hit the market, they were far from ubiquitous, but thanks to more third-party OEMs jumping on the smart speaker bandwagon,
there are more choices for assistant-enabled devices than ever. In addition to increasing variety, in terms of hardware, devices that support multiple types
of AI assistants are becoming more common. Despite more integration, competition between AI assistants is still stiff, so to save you time and
frustration, we did an extensive hands-on test – not to compare speakers against each other, but to compare the AI assistants themselves.
There are four frontrunners in the AI assistant space: Amazon (Alexa), Apple (Siri), Google (Google Assistant) and Microsoft (Cortana). Rather than
gauge each assistant’s efficacy based on company-reported features, I spent hours testing each assistant by issuing commands and asking questions that
many business users would use. I constructed questions to test basic understanding as well as contextual understanding and general vocal recognition.
Accessibility and trends
Ease of setup
Voice recognition
Success of queries and ability to understand context
Bottom line
None of the AI assistants are perfect; this is young technology, and it has a long way to go. There was a handful of questions that none of the virtual
assistants on my list could answer. For example, when I asked for directions to the closest airport, even the two best assistants on my list, Google
Assistant and Siri, failed hilariously: Google Assistant directed me to a travel agency (those still exist?), while Siri directed me to a seaplane base (so
close!).
Judging purely on out-of-the-box functionality, I would choose either Siri or Google Assistant, and I would make the final choice based on hardware
preferences. None of the assistants are good enough to go out of your way to adopt. Choose between Siri and Google Assistant based on convenience
and what hardware you already have
IFTTT = "if this, then that," is a service that lets you connect apps, services, and smart home devices.

Amazon Alexa
From https://en.wikipedia.org/wiki/Amazon_Alexa
Amazon Alexa, also known simply as Alexa,[2] is a virtual assistant technology largely based on a Polish speech synthesiser
named Ivona, bought by Amazon in 2013.[3][4] It was first used in the Amazon Echo smart speaker and the Echo Dot, Echo
Studio and Amazon Tap speakers developed by Amazon Lab126. It is capable of voice interaction, music playback, making
to-do lists, setting alarms, streaming podcasts, playing audiobooks, and providing weather, traffic, sports, and other real-time
information, such as news.[5] Alexa can also control several smart devices using itself as a home automation system. Users
are able to extend the Alexa capabilities by installing "skills" (additional functionality developed by third-party vendors, in other
settings more commonly called apps) such as weather programs and audio features. It uses automatic speech recognition,
natural language processing, and other forms of weak AI to perform these tasks.[6]
Most devices with Alexa allow users to activate the device using a wake-word[7] (such as Alexa or Amazon); other devices
(such as the Amazon mobile app on iOS or Android and Amazon Dash Wand) require the user to click a button to activate
Alexa's listening mode, although, some phones also allow a user to say a command, such as "Alexa" or "Alexa wake".

Google Assistant
From https://en.wikipedia.org/wiki/Google_Assistant
Google Assistant is a virtual assistant software application developed by Google that is primarily available on mobile and home
automation devices. Based on artificial intelligence, Google Assistant can engage in two-way conversations,[1] unlike the
company's previous virtual assistant, Google Now.
Google Assistant debuted in May 2016 as part of Google's messaging app Allo, and its voice-activated speaker Google Home.
After a period of exclusivity on the Pixel and Pixel XL smartphones, it was deployed on other Android devices starting in February
2017, including third-party smartphones and Android Wear (now Wear OS), and was released as a standalone app on
the iOS operating system in May 2017. Alongside the announcement of a software development kit in April 2017, Assistant has
been further extended to support a large variety of devices, including cars and third-party smart home appliances. The
functionality of the Assistant can also be enhanced by third-party developers.
Users primarily interact with the Google Assistant through natural voice, though keyboard input is also supported. Assistant is
able to answer questions, schedule events and alarms, adjust hardware settings on the user's device, show information from the
user's Google account, play games, and more. Google has also announced that Assistant will be able to identify objects and
gather visual information through the device's camera, and support purchasing products and sending money.

Apple Siri
https://en.wikipedia.org/wiki/Siri
Siri (/ˈsɪri/ SEER-ee) is a virtual assistant that is part of Apple Inc.'s iOS, iPadOS, watchOS, macOS, tvOS, and audioOS operating systems.[1]
[2] It uses voice queries, gesture based control, focus-tracking and a natural-language user interface to answer questions, make
recommendations, and perform actions by delegating requests to a set of Internet services. With continued use, it adapts to users' individual
language usages, searches and preferences, returning individualized results.
Siri is a spin-off from a project developed by the SRI International Artificial Intelligence Center. Its speech recognition engine was provided
by Nuance Communications, and it uses advanced machine learning technologies to function. Its original American, British and
Australian voice actors recorded their respective voices around 2005, unaware of the recordings' eventual usage. Siri was released as an app
for iOS in February 2010. Two months later, Apple acquired it and integrated into iPhone 4S at its release on 4 October, 2011, removing the
separate app from the iOS App Store. Siri has since been an integral part of Apple's products, having been adapted into other hardware
devices including newer iPhone models, iPad, iPod Touch, Mac, AirPods, Apple TV, and HomePod.
Siri supports a wide range of user commands, including performing phone actions, checking basic information, scheduling events and
reminders, handling device settings, searching the Internet, navigating areas, finding information on entertainment, and is able to engage with
iOS-integrated apps. With the release of iOS 10 in 2016, Apple opened up limited third-party access to Siri, including third-party messaging
apps, as well as payments, ride-sharing, and Internet calling apps. With the release of iOS 11, Apple updated Siri's voice and added support
for follow-up questions, language translation, and additional third-party actions.

Microsoft Cortana
From https://en.wikipedia.org/wiki/Cortana_(virtual_assistant)
Cortana is a virtual assistant developed by Microsoft that uses the Bing search engine to perform tasks such as setting reminders
and answering questions for the user.
Cortana is currently available in English, Portuguese, French, German, Italian, Spanish, Chinese, and Japanese language editions, depending
on the software platform and region in which it is used.[8]
Microsoft began reducing the prevalence of Cortana and converting it from an assistant into different software integrations in 2019.[9] It was split
from the Windows 10 search bar in April 2019.[10] In January 2020, the Cortana mobile app was removed from certain markets,[11][12] and on
March 31, 2021, the Cortana mobile app was shut down globally.[13]
Microsoft has integrated Cortana into numerous products such as Microsoft Edge,[28] the browser bundled with Windows 10. Microsoft's
Cortana assistant is deeply integrated into its Edge browser. Cortana can find opening hours when on restaurant sites, show retail coupons for
websites, or show weather information in the address bar. At the Worldwide Partners Conference 2015 Microsoft demonstrated Cortana
integration with products such as GigJam.[29] Conversely, Microsoft announced in late April 2016 that it would block anything other than Bing
and Edge from being used to complete Cortana searches, again raising questions of anti-competitive practices by the company.[30]
In May 2017, Microsoft in collaboration with Harman Kardon announced INVOKE, a voice-activated speaker featuring Cortana. The premium
speaker has a cylindrical design and offers 360 degree sound, the ability to make and receive calls with Skype, and all of the other features
currently available with Cortana.[42]

Machine Learning Types
From https://towardsdatascience.com/coding-deep-learning-for-beginners-types-of-machine-learning-b9e651e1ed9d

Perceptron
From https://deepai.org/machine-learning-glossary-and-terms/perceptron
How does a Perceptron work?
The process begins by taking all the input values and multiplying them by their weights. Then, all of these
multiplied values are added together to create the weighted sum. The weighted sum is then applied to the
activation function, producing the perceptron's output. The activation function plays the integral role of
ensuring the output is mapped between required values such as (0,1) or (-1,1). It is important to note that
the weight of an input is indicative of the strength of a node. Similarly, an input's bias value gives the
ability to shift the activation function curve up or down.

Ensemble Machine Learning
From https://machinelearningmastery.com/tour-of-ensemble-learning-algorithms/
Ensemble learning is a general meta approach to machine learning that seeks better predictive
performance by combining the predictions from multiple models.
Although there are a seemingly unlimited number of ensembles that you can develop for your predictive
modeling problem, there are three methods that dominate the field of ensemble learning. So much so, that
rather than algorithms per se, each is a field of study that has spawned many more specialized methods.
The three main classes of ensemble learning methods are bagging, stacking, and boosting, and it is
important to both have a detailed understanding of each method and to consider them on your predictive
modeling project.
But, before that, you need a gentle introduction to these approaches and the key ideas behind each method
prior to layering on math and code.
In this tutorial, you will discover the three standard ensemble learning techniques for machine learning.
After completing this tutorial, you will know:
• Bagging involves fitting many decision trees on different samples of the same dataset and averaging
the predictions.
• Stacking involves fitting many different models types on the same data and using another model to
learn how to best combine the predictions.
• Boosting involves adding ensemble members sequentially that correct the predictions made by prior
models and outputs a weighted average of the predictions.

Bagging
From https://en.wikipedia.org/wiki/Bootstrap_aggregating
Bootstrap aggregating, also called bagging (from bootstrap aggregating), is a machine learning ensemble
meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in
statistical classification and regression. It also reduces variance and helps to avoid overfitting. Although it
is usually applied to decision tree methods, it can be used with any type of method. Bagging is a special
case of the model averaging approach.
Given a standard training set of size n, bagging generates m new training sets , each of size nʹ, by
sampling from D uniformly and with replacement. By sampling with replacement, some observations may
be repeated in each . If nʹ=n, then for large n the set is expected to have the fraction (1 - 1/e) (≈63.2%) of
the unique examples of D, the rest being duplicates.[1] This kind of sample is known as a bootstrap sample.
Sampling with replacement ensures each bootstrap is independent from its peers, as it does not depend on
previous chosen samples when sampling. Then, m models are fitted using the above m bootstrap samples
and combined by averaging the output (for regression) or voting (for classification).

Boosting
From https://www.ibm.com/cloud/learn/boosting and
https://en.wikipedia.org/wiki/Boosting_(machine_learning)
In machine learning, boosting is an ensemble meta-algorithm for primarily reducing bias, and also variance[1]
in supervised learning, and a family of machine learning algorithms that convert weak learners to strong ones
Bagging vs Boosting
Bagging and boosting are two main types of ensemble learning methods. As highlighted in this study (PDF, 242 KB)
(link resides outside IBM), the main difference between these learning methods is the way in which they are trained.
In bagging, weak learners are trained in parallel, but in boosting, they learn sequentially. This means that a series of
models are constructed and with each new model iteration, the weights of the misclassified data in the previous
model are increased. This redistribution of weights helps the algorithm identify the parameters that it needs to focus
on to improve its performance. AdaBoost, which stands for “adaptative boosting algorithm,” is one of the most
popular boosting algorithms as it was one of the first of its kind. Other types of boosting algorithms include
XGBoost, GradientBoost, and BrownBoost.
Another difference between bagging and boosting is in how they are used. For example, bagging methods are
typically used on weak learners that exhibit high variance and low bias, whereas boosting methods are leveraged
when low variance and high bias is observed. While bagging can be used to avoid overfitting, boosting methods
can be more prone to this (link resides outside IBM) although it really depends on the dataset. However, parameter
tuning can help avoid the issue.
As a result, bagging and boosting have different real-world applications as well. Bagging has been leveraged for
loan approval processes and statistical genomics while boosting has been used more within image recognition
apps and search engines.
Boosting is an ensemble learning method that combines a set of weak learners into a strong learner
to minimize training errors. In boosting, a random sample of data is selected, fitted with a model and
then trained sequentially—that is, each model tries to compensate for the weaknesses of its
predecessor. With each iteration, the weak rules from each individual classifier are combined to form
one, strong prediction rule.

Stacking
From https://www.geeksforgeeks.org/stacking-in-machine-learning/
Stacking is a way to ensemble multiple classifications or regression model. There are many ways to ensemble
models, the widely known models are Bagging or Boosting. Bagging allows multiple similar models with high
variance are averaged to decrease variance. Boosting builds multiple incremental models to decrease the bias, while
keeping variance small.
Stacking (sometimes called Stacked Generalization) is a different paradigm. The point of stacking is to explore a
space of different models for the same problem. The idea is that you can attack a learning problem with different
types of models which are capable to learn some part of the problem, but not the whole space of the problem. So, you
can build multiple different learners and you use them to build an intermediate prediction, one prediction for each
learned model. Then you add a new model which learns from the intermediate predictions the same target.
This final model is said to be stacked on the top of the others, hence the name. Thus, you might improve your overall
performance, and often you end up with a model which is better than any individual intermediate model. Notice
however, that it does not give you any guarantee, as is often the case with any machine learning technique.

Gradient Boosting
From https://en.wikipedia.org/wiki/Gradient_boosting
Gradient boosting is a machine learning technique used in regression and classification tasks, among others. It gives a
prediction model in the form of an ensemble of weak prediction models, which are typically decision trees.[1][2] When a
decision tree is the weak learner, the resulting algorithm is called gradient-boosted trees; it usually outperforms random
forest.[1][2][3] A gradient-boosted trees model is built in a stage-wise fashion as in other boosting methods, but it generalizes
the other methods by allowing optimization of an arbitrary diﬀerentiable loss function.

Introduction to XG Boost
From https://machinelearningmastery.com/gentle-introduction-xgboost-applied-machine-learning/

Terminology
・SoftMax https://en.wikipedia.org/wiki/Softmax_function
・SoftPlus https://en.wikipedia.org/wiki/Rectifier_(neural_networks)#Softplus
・Logit https://en.wikipedia.org/wiki/Logit
・Sigmoid https://en.wikipedia.org/wiki/Sigmoid_function
・Logistic Function https://en.wikipedia.org/wiki/Logistic_function
・Tanh https://brenocon.com/blog/2013/10/tanh-is-a-rescaled-logistic-sigmoid-function/
・ReLu https://en.wikipedia.org/wiki/Rectifier_(neural_networks)
・Maxpool Selects the maximum in subsets of convolutional neural nets layer
・

Relationships
SoftMax
SoftPlus
Sigmoid = Logistic
Tanh
Logit
Inverses
Derivative
SoftMax (z, 0)
First component
SoftMax (z, -z)
First component
SoftMax (z, -z)
Second component
-
x = log (2p/(1-p))
(0, x)
(-1, 1)
(0, 1)
(-∞, + ∞)
(0,1)
Log (SoftMax (z1, z2)
First component)/ (SoftMax (z1, z2)
Second component))
ReLu
(0, x)

Terminology (continued)
・Ηeteroscedastic https://en.wiktionary.org/wiki/scedasticity
・Maxout https://stats.stackexchange.com/questions/129698/what-is-maxout-in-neural-network/298705
・Cross-Entropy https://en.wikipedia.org/wiki/Cross_entropy -Ep(log q)
・Joint Entropy https://en.wikipedia.org/wiki/Joint_entropy - Ep(X,Y) (log (p(X,Y))
・KL Divergence https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
・H(P,Q) = H(P) + KL(P,Q) or Ep(log q) = -Ep(log p) + {Ep(log p) - Ep(log q)}
・Mutual Information https://en.wikipedia.org/wiki/Mutual_information KL (p(x,y), p(x)p(y))
・Ridge Regression and Lasso Regression
https://hackernoon.com/practical-machine-learning-ridge-regression-vs-lasso-a00326371ece
・Logistic Regression https://www.stat.cmu.edu/~cshalizi/uADA/12/lectures/ch12.pdf
・Dropout https://en.wikipedia.org/wiki/Dropout_(neural_networks)
・RMSProp and AdaGrad and AdaDelta and Adam
https://www.quora.com/What-are-differences-between-update-rules-like-AdaDelta-RMSProp-AdaGrad-and-AdaM
・Pooling https://www.quora.com/Is-pooling-indispensable-in-deep-learning
・Boltzmann Machine https://en.wikipedia.org/wiki/Boltzmann_machine
・Hyperparameters
・

Reinforcement Learning Book
From https://www.andrew.cmu.edu/course/10-703/textbook/BartoSutton.pdf

Acumos Shared Model Process Flow
From https://arxiv.org/ftp/arxiv/papers/1810/1810.07159.pdf

Distributed AI
From https://en.wikipedia.org/wiki/Distributed_artificial_intelligence
Distributed Artificial Intelligence (DAI) also called Decentralized Artificial Intelligence[1] is a subfield of artificial intelligence research dedicated to the
development of distributed solutions for problems. DAI is closely related to and a predecessor of the field of multi-agent systems.
The objectives of Distributed Artificial Intelligence are to solve the reasoning, planning, learning and perception problems of artificial intelligence,
especially if they require large data, by distributing the problem to autonomous processing nodes (agents). To reach the objective, DAI requires:
• A distributed system with robust and elastic computation on unreliable and failing resources that are loosely coupled
• Coordination of the actions and communication of the nodes
• Subsamples of large data sets and online machine learning
There are many reasons for wanting to distribute intelligence or cope with multi-agent systems. Mainstream problems in DAI research include the
following:
• Parallel problem solving: mainly deals with how classic artificial intelligence concepts can be modified, so that multiprocessor systems and clusters
of computers can be used to speed up calculation.
• Distributed problem solving (DPS): the concept of agent, autonomous entities that can communicate with each other, was developed to serve as an
abstraction for developing DPS systems. See below for further details.
• Multi-Agent Based Simulation (MABS): a branch of DAI that builds the foundation for simulations that need to analyze not only phenomena at
macro level but also at micro level, as it is in many social simulation scenarios.

Swarm Intelligence
From https://en.wikipedia.org/wiki/Swarm_intelligence
Swarm intelligence (SI) is the collective behavior of decentralized, self-organized systems, natural or artificial. The concept is employed in work
on artificial intelligence. The expression was introduced by Gerardo Beni and Jing Wang in 1989, in the context of cellular robotic systems.[1]
SI systems consist typically of a population of simple agents or boids interacting locally with one another and with their environment.[2] The
inspiration often comes from nature, especially biological systems. The agents follow very simple rules, and although there is no centralized control
structure dictating how individual agents should behave, local, and to a certain degree random, interactions between such agents lead to
the emergence of "intelligent" global behavior, unknown to the individual agents.[3] Examples of swarm intelligence in natural systems include ant
colonies, bee colonies, bird flocking, hawks hunting, animal herding, bacterial growth, fish schooling and microbial intelligence.
The application of swarm principles to robots is called swarm robotics while swarm intelligence refers to the more general set of algorithms. Swarm
prediction has been used in the context of forecasting problems. Similar approaches to those proposed for swarm robotics are considered
for genetically modified organisms in synthetic collective intelligence.[4]
• 1 Models of swarm behavior
◦ 1.1 Boids (Reynolds 1987)
◦ 1.2 Self-propelled particles (Vicsek et al. 1995)
• 2 Metaheuristics
◦ 2.1 Stochastic diffusion search (Bishop 1989)
◦ 2.2 Ant colony optimization (Dorigo 1992)
◦ 2.3 Particle swarm optimization (Kennedy, Eberhart & Shi 1995)
◦ 2.4 Artificial Swarm Intelligence (2015)
• 3 Applications
◦ 3.1 Ant-based routing
◦ 3.2 Crowd simulation
▪ 3.2.1 Instances
◦ 3.3 Human swarming
◦ 3.4 Swarm grammars
◦ 3.5 Swarmic art

IBM Watson
From https://en.wikipedia.org/wiki/IBM_Watson
IBM Watson is a question-answering computer system capable of answering questions posed in natural language,[2] developed in IBM's
DeepQA project by a research team led by principal investigator David Ferrucci.[3] Watson was named after IBM's founder and first CEO,
industrialist Thomas J. Watson.[4][5]
Software -Watson uses IBM's DeepQA software and the Apache UIMA (Unstructured Information Management Architecture) framework implementation. The system
was written in various languages, including Java, C++, and Prolog, and runs on the SUSE Linux Enterprise Server 11 operating system using the Apache Hadoop
framework to provide distributed computing.[12][13][14]
Hardware -The system is workload-optimized, integrating massively parallel POWER7 processors and built on IBM's DeepQA technology,[15] which it uses to generate
hypotheses, gather massive evidence, and analyze data.[2] Watson employs a cluster of ninety IBM Power 750 servers, each of which uses a 3.5 GHz POWER7 eight-
core processor, with four threads per core. In total, the system has 2,880 POWER7 processor threads and 16 terabytes of RAM.[15] According to John Rennie, Watson
can process 500 gigabytes (the equivalent of a million books) per second.[16] IBM master inventor and senior consultant Tony Pearson estimated Watson's hardware cost
at about three million dollars.[17] Its Linpack performance stands at 80 TeraFLOPs, which is about half as fast as the cut-off line for the Top 500 Supercomputers list.[18]
According to Rennie, all content was stored in Watson's RAM for the Jeopardy game because data stored on hard drives would be too slow to compete with human
Jeopardy champions.[16]
Data -The sources of information for Watson include encyclopedias, dictionaries, thesauri, newswire articles and literary works. Watson also used databases,
taxonomies and ontologies including DBPedia, WordNet and Yago.[19] The IBM team provided Watson with millions of documents, including dictionaries,
encyclopedias and other reference material, that it could use to build its knowledge.[20]
From https://www.researchgate.net/publication/282644173_Implementation_of_a_Natural_Language_Processing_Tool_for_Cyber-Physical_Systems/figures?lo=1

Three Types of Deep Learning
From https://www.slideshare.net/TerryTaewoongUm/introduction-to-deep-learning-with-tensorflow

Convolutional Neural Networks
https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53

Convolutional Neural Nets Comparison (2016)
From https://medium.com/@culurciello/analysis-of-deep-neural-networks-dcf398e71aae
Reference: https://towardsdatascience.com/neural-network-architectures-156e5bad51ba

Recurrent Neural Networks
From https://medium.com/deep-math-machine-learning-ai/chapter-10-deepnlp-recurrent-neural-networks-with-math-c4a6846a50a2

From colah.github.io/posts/2015-08-Understanding-LSTMs/
Recurrent Neural Networks and Long Short Term Memory

Dynamical System View on Recurrent Neural Networks
From https://openreview.net/pdf?id=ryxepo0cFX

From https://arxiv.org/pdf/1412.3555v1.pdf
Gated Recurrent Units vs Long Short Term Memory

Deep Learning Models
From https://arxiv.org/pdf/1712.04301.pdf

Neural Net Models
From https://becominghuman.ai/cheat-sheets-for-ai-neural-networks-machine-learning-deep-learning-big-data-678c51b4b463

Neural Net Models (cont)

TensorFlow
From https://en.wikipedia.org/wiki/TensorFlow
TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of
tasks but has a particular focus on training and inference of deep neural networks.[4][5]
TensorFlow was developed by the Google Brain team for internal Google use in research and production.[6][7][8] The initial version
was released under the Apache License 2.0 in 2015.[1][9] Google released the updated version of TensorFlow, named TensorFlow 2.0,
in September 2019.[10]
TensorFlow can be used in a wide variety of programming languages, most notably Python, as well as Javascript, C++, and Java.[11]
This flexibility lends itself to a range of applications in many different sectors.

Keras
From https://en.wikipedia.org/wiki/Keras
Keras is an open-source software library that provides a Python interface for artificial neural networks. Keras acts
as an interface for the TensorFlow library.
Up until version 2.3, Keras supported multiple backends, including TensorFlow, Microsoft Cognitive Toolkit,
Theano, and PlaidML.[1][2][3] As of version 2.4, only TensorFlow is supported. Designed to enable fast
experimentation with deep neural networks, it focuses on being user-friendly, modular, and extensible. It was
developed as part of the research effort of project ONEIROS (Open-ended Neuro-Electronic Intelligent Robot
Operating System),[4] and its primary author and maintainer is François Chollet, a Google engineer. Chollet is also
the author of the Xception deep neural network model.[5]

Comparison of Deep Learning Frameworks

Popularity of Deep Learning Frameworks
From https://medium.com/implodinggradients/tensorflow-or-keras-which-one-should-i-learn-5dd7fa3f9ca0

Acronyms in Deep Learning
• RBM - Restricted Boltzmann Machines
• MLP - Multi-layer Perceptron
• DBN - Deep Belief Network
• CNN - Convolution Neural Network
• RNN - Recurrent Neural Network
• SGD - Stochastic Gradient Descent
• XOR - Exclusive Or
• SVM - SupportVector Machine
• ReLu - Rectified Linear Unit
• MNIST - Modified National Institute of Standards and Technology
• RBF - Radial Basis Function
• HMM - Hidden Markovv Model
• MAP - Maximum A Postiori
• MLE - Maximum Likelihood Estimate
• Adam - Adaptive Moment Estimation
• LSTM - Long Short Term Memory
• GRU - Gated Recurrent Unit

Concerns for Deep Learning by Gary Marcus
Deep Learning thus far:
• Is data hungry
• Is shallow and has limited capacity for transfer
• Has no natural way to deal with hierarchical structure
• Has struggled with open-ended inference
• Is not sufficiently transparent
• Has not been well integrated with prior knowledge
• Cannot inherently distinguish causation from correlation
• Presumes a largely stable world, in ways that may be problematic
• Works well as an approximation, but answers often can’t be fully trusted
• Is difficult to engineer with

Watson Architecture
From https://seekingalpha.com/article/4087604-much-artificial-intelligence-ibm-watson

How transferable are features in deep neural networks?
From http://cs231n.github.io/transfer-learning/

Transfer Learning
From https://www.mathematik.hu-berlin.de/~perkowsk/files/thesis.pdf

More Transfer Learning
From https://towardsdatascience.com/transfer-learning-from-pre-trained-models-f2393f124751

More Transfer Learning
From http://ruder.io/transfer-learning/

Bayesian Deep Learning
From https://alexgkendall.com/computer_vision/bayesian_deep_learning_for_safe_ai/

Bayesian Learning vis Stochastic Gradient Langevin Dynamics
From https://tinyurl.com/22xayz76
In this paper we propose a new framework for learning from large scale
datasets based on iterative learning from small minibatches.By adding the
right amount of noise to a standard stochastic gradient optimization
algorithm we show that the iterates will converge to samples from the true
posterior distribution as we anneal the stepsize. This seamless transition
between optimization and Bayesian posterior sampling provides an in-
built protection against overfitting. We also propose a practical method for
Monte Carlo estimates of posterior statistics which monitors a “sampling
threshold” and collects samples after it has been surpassed. We apply the
method to three models: a mixture of Gaussians, logistic regression and
ICA with natural gradients
Our method combines Robbins-Monro type algorithms which stochastically
optimize a likelihood, with Langevin dynamics which injects noise into the
parameter updates in such a waythat the trajectory of the parameters will
converge to the full posterior distribution rather than just themaximum a
posteriori mode. The resulting algorithm starts off being similar to stochastic
optimization, then automatically transitions to one that simulates samples from
the posterior using Langevin dynamics.

DeterministicVariational Inference for Robust Bayesian NNs
From https://openreview.net/pdf?id=B1l08oAct7

Bayesian Deep Learning Survey
Conclusion and Future Research
In this survey, we identified a current trend of merging probabilistic graphical models and neural networks (deep
learning) and reviewed recent work on Bayesian deep learning, which strives to combine the merits of PGM and NN by
organically integrating them in a single principled probabilistic framework. To learn parameters in BDL, several
algorithms have been proposed, ranging from block coordinate descent, Bayesian conditional density filtering, and
stochastic gradient thermostats to stochastic gradient variational Bayes. Bayesian deep learning gains its popularity
both from the success of PGM and from the recent promising advances on deep learning. Since many real-world tasks
involve both perception and inference, BDL is a natural choice to harness the perception ability from NN and the (causal
and logical) inference ability from PGM. Although current applications of BDL focus on recommender systems, topic
models, and stochastic optimal control, in the future, we can expect an increasing number of other applications like link
prediction, community detection, active learning, Bayesian reinforcement learning, and many other complex tasks that
need interaction between perception and causal inference. Besides, with the advances of efficient Bayesian neural
networks (BNN), BDL with BNN as an important component is expected to be more and more scalable

Ensemble Methods for Deep Learning
From https://machinelearningmastery.com/ensemble-methods-for-deep-learning-neural-networks/

Comparing Loss Functions
From Neural Networks and Deep Learning Book

Seed Reinforcement Learning from Google
From https://ai.googleblog.com/2020/03/massively-scaling-reinforcement.html
The field of reinforcement learning (RL) has recently seen impressive results across a variety oftasks. This has in
part been fueled by the introduction of deep learning in RL and the introduction of accelerators such as GPUs. In
the very recent history, focus on massive scale has been key to solve a number of complicated games such as
AlphaGo (Silver et al., 2016), Dota (OpenAI, 2018)and StarCraft 2 (Vinyals et al., 2017).
The sheer amount of environment data needed to solve tasks trivial to humans, makes distributed machine
learning unavoidable for fast experiment turnaround time. RL is inherently comprised of heterogeneous tasks:
running environments, model inference, model training, replay buffer, etc. and current state-of-the-art distributed
algorithms do not efficiently use compute resources for the tasks.The amount of data and inefficient use of
resources makes experiments unreasonably expensive. The two main challenges addressed in this paper are
scaling of reinforcement learning and optimizing the use of modern accelerators, CPUs and other resources.
We introduce SEED (Scalable, Efficient, Deep-RL), a modern RL agent that scales well, is flexible and efficiently
utilizes available resources. It is a distributed agent where model inference is done centrally combined with fast
streaming RPCs to reduce the overhead of inference calls. We show that with simple methods, one can achieve
state-of-the-art results faster on a number of tasks. For optimal performance, we use TPUs (cloud.google.com/
tpu/) and TensorFlow 2 (Abadi et al., 2015)to simplify the implementation. The cost of running SEED is analyzed
against IMPALA (Espeholtet al., 2018) which is a commonly used state-of-the-art distributed RL algorithm (Veeriah
et al.(2019); Li et al. (2019); Deverett et al. (2019); Omidshafiei et al. (2019); Vezhnevets et al. (2019);Hansen et
al. (2019); Schaarschmidt et al.; Tirumala et al. (2019), ...). We show cost reductions of up to 80% while being
significantly faster. When scaling SEED to many accelerators, it can train on millions of frames per second. Finally,
the implementation is open-sourced together with examples of running it at scale on Google Cloud (see Appendix
A.4 for details) making it easy to reproduce results and try novel ideas

Designing Neural Nets through Neuroevolution
From tinyurl.com/mykhb52y
Much of recent machine learning has focused on deep learning, in which neural network weights are trained through
variantsof stochastic gradient descent. An alternative approach comes from the field of neuroevolution, which harnesses
evolutionary algorithms to optimize neural networks, inspired by the fact that natural brains themselves are the products of
an evolutionary process. Neuroevolution enables important capabilities that are typically unavailable to gradient-based
approaches, including learning neural network building blocks (for example activation functions), hyperparameters,
architectures and even the algorithms for learning themselves. Neuroevolution also differs from deep learning (and deep
reinforcement learning) by maintaining a population of solutions during search, enabling extreme exploration and massive
parallelization. Finally, because neuroevolution research has (until recently) developed largely in isolation from gradient-
based neural network research, ithas developed many unique and effective techniques that should be effective in other
machine learning areas too.
This Review looks at several key aspects of modern neuroevolution, including large-scale computing, the benefits of novelty
and diversity, the power of indirect encoding, and the field’s contributions to meta-learning and architecture search. Our hope
is to inspire renewed interest in the field as it meets the potential of the increasing computation available today, to highlight
how many of its ideas can provide an exciting resource for inspiration and hybridization to the deep learning, deep
reinforcement learning and machine learning communities, and to explain how neuroevolution could prove to be a critical
tool in the long-term pursuit of artificial general intelligence

Illuminating Search Spaces by Mapping Elites

From https://blog.openai.com/reinforcement-learning-with-prediction-based-rewards/#implementationjump
Reinforcement Learning with Prediction-based Rewards

From https://arxiv.org/pdf/1412.3555v1.pdf
A transformer is a deep learning model that adopts the mechanism of self-attention, differentially weighting the significance of each part of the input data. It is used primarily
in the fields of natural language processing (NLP)[1] and computer vision (CV).[2]
Like recurrent neural networks (RNNs), transformers are designed to process sequential input data, such as natural language, with applications towards tasks such as translation
and text summarization. However, unlike RNNs, transformers process the entire input all at once. The attention mechanism provides context for any position in the input
sequence. For example, if the input data is a natural language sentence, the transformer does not have to process one word at a time. This allows for more parallelization than
RNNs and therefore reduces training times.[1]
Transformers were introduced in 2017 by a team at Google Brain[1] and are increasingly the model of choice for NLP problems,[3] replacing RNN models such as long short-
term memory (LSTM). The additional training parallelization allows training on larger datasets. This led to the development of pretrained systems such as BERT (Bidirectional
Encoder Representations from Transformers) and GPT (Generative Pre-trained Transformer), which were trained with large language datasets, such as the Wikipedia Corpus
and Common Crawl, and can be fine-tuned for specific tasks.[4][5]
Attention mechanisms let a model draw from the state at any preceding point along the sequence. The attention layer can access all previous states and weight them according to
a learned measure of relevance, providing relevant information about far-away tokens. When added to RNNs, attention mechanisms increase performance. The development of
the Transformer architecture revealed that attention mechanisms were powerful in themselves and that sequential recurrent processing of data was not necessary to achieve the
quality gains of RNNs with attention. Transformers use an attention mechanism without an RNN, processing all tokens at the same time and calculating attention weights
between them in successive layers. Since the attention mechanism only uses information about other tokens from lower layers, it can be computed for all tokens in parallel,
which leads to improved training speed.
Like earlier seq2seq models, the original Transformer model used an encoder–decoder architecture. The encoder consists of encoding layers that process the input iteratively
one layer after another, while the decoder consists of decoding layers that do the same thing to the encoder's output. The function of each encoder layer is to generate encodings
that contain information about which parts of the inputs are relevant to each other. It passes its encodings to the next encoder layer as inputs. Each decoder layer does the
opposite, taking all the encodings and using their incorporated contextual information to generate an output sequence.[6] To achieve this, each encoder and decoder layer makes
use of an attention mechanism. For each input, attention weighs the relevance of every other input and draws from them to produce the output.[7] Each decoder layer has an
additional attention mechanism that draws information from the outputs of previous decoders, before the decoder layer draws information from the encodings. Both the encoder
and decoder layers have a feed-forward neural network for additional processing of the outputs and contain residual connections and layer normalization steps.
Transformers

From https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)
Transformers
Before transformers, most state-of-the-art NLP systems relied on gated RNNs, such as LSTMs and gated recurrent units (GRUs), with added
attention mechanisms. Transformers also make use of attention mechanisms but, unlike RNNs, do not have a recurrent structure. This means that
provided with enough training data, attention mechanisms alone can match the performance of RNNs with attention.[1]
Sequential processing
Gated RNNs process tokens sequentially, maintaining a state vector that contains a representation of the data seen prior to the current token. To
process the th token, the model combines the state representing the sentence up to token with the information of the new token to create a new
state, representing the sentence up to token . Theoretically, the information from one token can propagate arbitrarily far down the sequence, if at
every point the state continues to encode contextual information about the token. In practice this mechanism is flawed: the vanishing gradient
problem leaves the model's state at the end of a long sentence without precise, extractable information about preceding tokens. The dependency of
token computations on results of previous token computations also makes it hard to parallelize computation on modern deep learning hardware.
This can make the training of RNNs inefficient.
Self-Attention
These problems were addressed by attention mechanisms. Attention mechanisms let a model draw from the state at any preceding point along the
sequence. The attention layer can access all previous states and weight them according to a learned measure of relevance, providing relevant
information about far-away tokens.
A clear example of the value of attention is in language translation, where context is essential to assign the meaning of a word in a sentence. In an
English-to-French translation system, the first word of the French output most probably depends heavily on the first few words of the English input.
However, in a classic LSTM model, in order to produce the first word of the French output, the model is given only the state vector after processing
the last English word. Theoretically, this vector can encode information about the whole English sentence, giving the model all necessary
knowledge. In practice, this information is often poorly preserved by the LSTM. An attention mechanism can be added to address this problem: the
decoder is given access to the state vectors of every English input word, not just the last, and can learn attention weights that dictate how much to
attend to each English input state vector.
When added to RNNs, attention mechanisms increase performance. The development of the Transformer architecture revealed that attention
mechanisms were powerful in themselves and that sequential recurrent processing of data was not necessary to achieve the quality gains of RNNs
with attention. Transformers use an attention mechanism without an RNN, processing all tokens at the same time and calculating attention weights
between them in successive layers. Since the attention mechanism only uses information about other tokens from lower layers, it can be computed
for all tokens in parallel, which leads to improved training speed.

From https://en.wikipedia.org/wiki/GPT-3
GPT-3
Generative Pre-trained Transformer 3 (GPT-3; stylized GPT·3) is an autoregressive language model that uses deep learning to
produce human-like text.
The architecture is a standard transformer network (with a few engineering tweaks) with the unprecedented size of 2048-token-long
context and 175 billion parameters (requiring 800 GB of storage). The training method is "generative pretraining", meaning that it is
trained to predict what the next token is. The model demonstrated strong few-shot learning on many text-based tasks.
It is the third-generation language prediction model in the GPT-n series (and the successor to GPT-2) created by OpenAI, a San
Francisco-based artificial intelligence research laboratory.[2] GPT-3's full version has a capacity of 175 billion machine learning
parameters. GPT-3, which was introduced in May 2020, and was in beta testing as of July 2020,[3] is part of a trend in natural language
processing (NLP) systems of pre-trained language representations.[1]
The quality of the text generated by GPT-3 is so high that it can be difficult to determine whether or not it was written by a human,
which has both benefits and risks.[4] Thirty-one OpenAI researchers and engineers presented the original May 28, 2020 paper
introducing GPT-3. In their paper, they warned of GPT-3's potential dangers and called for research to mitigate risk.[1]:34 David
Chalmers, an Australian philosopher, described GPT-3 as "one of the most interesting and important AI systems ever produced."[5]
Microsoft announced on September 22, 2020, that it had licensed "exclusive" use of GPT-3; others can still use the public API to receive
output, but only Microsoft has access to GPT-3's underlying model.[6]
An April 2022 review in The New York Times described GPT-3's capabilities as being able to write original prose with fluency
equivalent to that of a human.[7]

OpenAI
From https://openai.com/
Recent Research
Efficient Training of Language Models to Fill in the Middle
Hierarchical Text-Conditional Image Generation with CLIP Latents
Formal Mathematics Statement Curriculum Learning
Training language models to follow instructions with human feedback
Text and Code Embeddings by Contrastive Pre-Training
WebGPT: Browser-assisted question-answering with human feedback
Training Verifiers to Solve Math Word Problems
Recursively Summarizing Books with Human Feedback
Evaluating Large Language Models Trained on Code
Process for Adapting Language Models to
Society (PALMS) with Values-Targeted Datasets
Multimodal Neurons in Artificial Neural Networks
Learning Transferable Visual Models From Natural Language Supervision
Zero-Shot Text-to-Image Generation
Understanding the Capabilities, Limitations,
and Societal Impact of Large Language Models
OpenAI is an AI research and deployment company. Our mission is to ensure that artificial general intelligence benefits all of humanity.

From https://deepbrainai.io/?www.deepbrainai.io=
Deep Brain

Reservoir Computing
From https://martinuzzifrancesco.github.io/posts/a-brief-introduction-to-reservoir-computing/
Reservoir Computing is an umbrella term used to identify a general framework of computation derived from Recurrent Neural Networks (RNN),
indipendently developed by Jaeger [1] and Maass et al. [2]. These papers introduced the concepts of Echo State Networks (ESN) and Liquid State Machines
(LSM) respectively. Further improvements over these two models constitute what is now called the field of Reservoir Computing. The main idea lies in
leveraging a fixed non-linear system, of higher dimension than the input, onto which to input signal is mapped. After this mapping is only necessary to use a
simple readout layer to harvest the state of the reservoir and to train it to the desired output. In principle, given a complex enough system, this architecture
should be capable of any computation [3]. The intuition was born from the fact that in training RNNs most of the times the weights showing most change were
the ones in the last layer [4]. In the next section we will also see that ESNs actually use a fixed random RNN as the reservoir. Given the static nature of this
implementation usually ESNs can yield faster results and in some cases even better, in particular when dealing with chaotic time series predictions [5].
But not every complex system is suited to be a good reservoir. A good reservoir is one that is able to separate inputs; different external inputs should drive the
system to different regions of the configuration space [3]. This is called the separability condition. Furthermore an important property for the reservoirs of
ESNs is the Echo State property which states that inputs to the reservoir echo in the system forever, or util they dissipate. A more formal definition of this
property can be found in [6].
Reservoir computing is a best-in-class machine learning algorithm for processing information generated by dynamical systems using observed time-series
data. Importantly, it requires very small training data sets, uses linear optimization, and thus requires minimal computing resources. However, the
algorithm uses randomly sampled matrices to define the underlying recurrent neural network and has a multitude of metaparameters that must be
optimized. Recent results demonstrate the equivalence of reservoir computing to nonlinear vector autoregression, which requires no random matrices,
fewer metaparameters, and provides interpretable results. Here, we demonstrate that nonlinear vector autoregression excels at reservoir computing
benchmark tasks and requires even shorter training data sets and training time, heralding the next generation of reservoir computing.
A dynamical system evolves in time, with examples including the Earth’s weather system and human-built devices such as unmanned aerial vehicles. One practical
goal is to develop models for forecasting their behavior. Recent machine learning (ML) approaches can generate a model using only observed data, but many of these
algorithms tend to be data hungry, requiring long observation times and substantial computational resources.
Reservoir computing1,2 is an ML paradigm that is especially well-suited for learning dynamical systems. Even when systems display chaotic3 or complex
spatiotemporal behaviors4, which are considered the hardest-of-the-hard problems, an optimized reservoir computer (RC) can handle them with ease.
From https://www.nature.com/articles/s41467-021-25801-2

Reservoir Computing Trends
From https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.709.514&rep=rep1&type=pdf

Brain Connectivity meets Reservoir Computing
From https://www.biorxiv.org/content/10.1101/2021.01.22.427750v1
The connectivity of Artificial Neural Networks (ANNs) is different from the one observed in Biological Neural Networks (BNNs).
Can the wiring of actual brains help improve ANNs architectures? Can we learn from ANNs about what network features support
computation in the brain when solving a task?
ANNs’ architectures are carefully engineered and have crucial importance in many recent performance improvements. On the
other hand, BNNs’ exhibit complex emergent connectivity patterns. At the individual level, BNNs connectivity results from brain
development and plasticity processes, while at the species level, adaptive reconfigurations during evolution also play a major role
shaping connectivity.
Ubiquitous features of brain connectivity have been identified in recent years, but their role in the brain’s ability to perform
concrete computations remains poorly understood. Computational neuroscience studies reveal the influence of specific brain
connectivity features only on abstract dynamical properties, although the implications of real brain networks topologies on
machine learning or cognitive tasks have been barely explored.
Here we present a cross-species study with a hybrid approach integrating real brain connectomes and Bio-Echo State Networks,
which we use to solve concrete memory tasks, allowing us to probe the potential computational implications of real brain
connectivity patterns on task solving.
We find results consistent across species and tasks, showing that biologically inspired networks perform as well as classical echo
state networks, provided a minimum level of randomness and diversity of connections is allowed. We also present a framework,
bio2art, to map and scale up real connectomes that can be integrated into recurrent ANNs. This approach also allows us to show
the crucial importance of the diversity of interareal connectivity patterns, stressing the importance of stochastic processes
determining neural networks connectivity in general.

Sharing Models

Summary of Deep Learning Models: Survey

Deep Learning Acronyms

Deep Learning Hardware
From https://medium.com/iotforall/using-deep-learning-processors-for-intelligent-iot-devices-1a7ed9d2226d

Deep Learning MIT
From https://deeplearning.mit.edu/

ONNX
From http://onnx.ai/

GitHub ONNX Models
From https://github.com/onnx/models

HPC vs Big Data Ecosystems
From https://www.hpcwire.com/2018/08/31/the-convergence-of-big-data-and-extreme-scale-hpc/

HPC and ML
From http://dsc.soic.indiana.edu/publications/Learning_Everywhere_Summary.pdf
HPCforML: Using HPC to execute and enhance ML performance, or using HPC simulations to train ML algorithms
(theory
guided machine learning), which are then used to understand experimental data or simulations.
•MLforHPC: Using ML to enhance HPC applications and systems
•This categorization is related to Jeff Dean’s ”Machine Learning for Systems and Systems for Machine Learning” [6] and
Matsuoka’s convergence of AI and HPC [7].We further subdivide HPCforML as
•• HPCrunsML: Using HPC to execute ML with high performance • SimulationTrainedML: Using HPC simulations to train
ML algorithms, which are then used to understand experimental data or simulations. We also subdivide MLforHPC as •
MLautotuning: Using ML to configure (autotune) ML or HPC simulations. Already, autotuning with systems like ATLAS
is hugely successful and gives an initial view of MLautotuning. As well as choosing block sizes to improve cache use and
vectorization, MLautotuning can also be used for simulation mesh sizes [8] and in big data problems for configuring
databases and complex systems like Hadoop and Spark [9], [10]
•. • MLafterHPC: ML analyzing results of HPC as in trajectory analysis and structure identification in biomolecular
simulations • MLaroundHPC: Using ML to learn from simulations and produce learned surrogates for the simulations. The
same ML wrapper can also learn configurations as well as results. This differs from SimulationTrainedML as there
typically a learnt network is used to redirect observation whereas in MLaroundHPC we are using the ML to improve the
HPC performance
•. • MLControl: Using simulations (with HPC) in contro of experiments and in objective driven computational campaigns
[11]. Here the simulation surrogates are very valuable to allow real-time predictions.

Designing Neural Nets through Neuroevolution
From www.evolvingai.org/stanley-clune-lehman-2019-designing-neural-networks

Go Explore Algorithm
From http://www.evolvingai.org/files/1901.10995.pdf

Deep Density Destructors
From https://www.cs.cmu.edu/~dinouye/papers/inouye2018-deep-density-destructors-icml2018.pdf
We propose a unified framework for deep density models by formally defining density
destructors. A density destructor is an invertible function that transforms a given density to
the uniform density—essentially destroying any structure in the original density. This
destructive transformation generalizes Gaussianization via ICA and more recent
autoregressive models such as MAF and Real NVP. Informally, this transformation can be
seen as a generalized whitening procedure or a multivariate generalization of the univariate
CDF function. Unlike Gaussianization, our destructive transformation has the elegant
property that the density function is equal to the absolute value of the Jacobian determinant.
Thus, each layer of a deep density can be seen as a shallow density—uncovering a
fundamental connection between shallow and deep densities. In addition, our framework
provides a common interface for all previous methods enabling them to be systematically
combined, evaluated and improved. Leveraging the connection to shallow densities, we also
propose a novel tree destructor based on tree densities and an image-specific destructor based
on pixel locality. We illustrate our framework on a 2D dataset, MNIST, and CIFAR-10.

Predictive Perception
From https://www.quantamagazine.org/to-make-sense-of-the-present-brains-may-predict-the-future-20180710/

Sci-Kit Learning Decision Tree

Imitation Learning
From https://drive.google.com/file/d/12QdNmMll-bGlSWnm8pmD_TawuRN7xagX/view https://drive.google.com/file/d/12QdNmMll-bGlSWnm8pmD_TawuRN7xagX/view

Generative Adversarial Networks (GANs)
From https://skymind.ai/wiki/generative-adversarial-network-gan

Deep Generative Network-based Activation Management (DGN-AMs)

Paired Open Ended Trailblazer (POET)
From https://drive.google.com/file/d/12QdNmMll-bGlSWnm8pmD_TawuRN7xagX/view https://drive.google.com/file/d/12QdNmMll-bGlSWnm8pmD_TawuRN7xagX/view

One Model to Learn Them All

Self-modifying NNs With Differentiable Neuromodulated Plasticity

Stein Variational Gradient Descent

Linux Foundation Deep Learning (LFDL) Projects
From https://lfdl.io/projects/

Graphical Processing Units (GPU)
From https://www.intel.com/content/www/us/en/products/docs/processors/what-is-a-gpu.html
Graphics processing technology has evolved to deliver unique benefits in the world of computing. The latest
graphics processing units (GPUs) unlock new possibilities in gaming, content creation, machine learning, and more.
What Does a GPU Do?
The graphics processing unit, or GPU, has become one of the most important types of computing technology, both for personal
and business computing. Designed for parallel processing, the GPU is used in a wide range of applications, including graphics and
video rendering. Although they’re best known for their capabilities in gaming, GPUs are becoming more popular for use in
creative production and artificial intelligence (AI).
GPUs were originally designed to accelerate the rendering of 3D graphics. Over time, they became more flexible and
programmable, enhancing their capabilities. This allowed graphics programmers to create more interesting visual effects and
realistic scenes with advanced lighting and shadowing techniques. Other developers also began to tap the power of GPUs to
dramatically accelerate additional workloads in high performance computing (HPC), deep learning, and more.
GPU and CPU: Working Together
The GPU evolved as a complement to its close cousin, the CPU (central processing unit). While CPUs have continued to deliver performance
increases through architectural innovations, faster clock speeds, and the addition of cores, GPUs are specifically designed to accelerate
computer graphics workloads. When shopping for a system, it can be helpful to know the role of the CPU vs. GPU so you can make the most
of both.
GPU vs. Graphics Card: What’s the Difference?
While the terms GPU and graphics card (or video card) are often used interchangeably, there is a subtle distinction between these terms.
Much like a motherboard contains a CPU, a graphics card refers to an add-in board that incorporates the GPU. This board also includes the
raft of components required to both allow the GPU to function and connect to the rest of the system.
GPUs come in two basic types: integrated and discrete. An integrated GPU does not come on its own separate card at all and is instead
embedded alongside the CPU. A discrete GPU is a distinct chip that is mounted on its own circuit board and is typically attached to a PCI
Express slot.

NVidia Graphical Processing Units (GPU)
From https://en.wikipedia.org/wiki/Nvidia
Nvidia Corporation[note 1][note 2] (/ɛnˈvɪdiə/ en-VID-ee-ə) is an American multinational technology company incorporated in
Delaware and based in Santa Clara, California.[2] It is a software and fabless company which designs graphics processing units
(GPUs), application programming interface (APIs) for data science and high-performance computing as well as system on a chip
units (SoCs) for the mobile computing and automotive market. Nvidia is a global leader in artificial intelligence hardware and
software.[3][4] Its professional line of GPUs are used in workstations for applications in such fields as architecture, engineering and
construction, media and entertainment, automotive, scientific research, and manufacturing design.[5]
In addition to GPU manufacturing, Nvidia provides an API called CUDA that allows the creation of massively parallel programs
which utilize GPUs.[6][7] They are deployed in supercomputing sites around the world.[8][9] More recently, it has moved into the
mobile computing market, where it produces Tegra mobile processors for smartphones and tablets as well as vehicle navigation
and entertainment systems.[10][11][12] In addition to AMD, its competitors include Intel,[13] Qualcomm[14] and AI-accelerator
companies such as Graphcore.
Nvidia's GPUs are used for edge to cloud computing, and supercomputers (Nvidia provides the accelerators, i.e. the GPUs for
many of them, including a previous top fastest, while it has been replaced, and current fastest, and most-power efficient, are
powered by AMD GPUs and CPUs) and Nvidia expanded its presence in the gaming industry with its handheld game consoles
Shield Portable, Shield Tablet, and Shield Android TV and its cloud gaming service GeForce Now.
Nvidia announced plans on September 13, 2020, to acquire Arm from SoftBank, pending regulatory approval, for a value of
US$40 billion in stock and cash, which would be the largest semiconductor acquisition to date. SoftBank Group will acquire
slightly less than a 10% stake in Nvidia, and Arm would maintain its headquarters in Cambridge.[15][16][17][18]

Tesla unveils new Dojo Supercomouter
From https://electrek.co/2022/10/01/tesla-dojo-supercomputer-tripped-power-grid/
Tesla has unveiled its latest version of its Dojo supercomputer and it’s apparently so powerful that it tripped the power grid in Palo
Alto. Dojo is Tesla’s own custom supercomputer platform built from the ground up for AI machine learning and more specifically
for video training using the video data coming from its fleet of vehicles.
The automaker already has a large NVIDIA GPU-based supercomputer that is one of the most powerful in the world, but the new
Dojo custom-built computer is using chips and an entire infrastructure designed by Tesla.The custom-built supercomputer is
expected to elevate Tesla’s capacity to train neural nets using video data, which is critical to its computer vision technology
powering its self-driving effort.
Last year, at Tesla’s AI Day, the company unveiled its Dojo supercomputer, but the company was still ramping up its effort at the
time. It only had its first chip and training tiles, and it was still working on building a full Dojo cabinet and cluster or
“Exapod.”Now Tesla has unveiled the progress made with the Dojo program over the last year during its AI Day 2022 last night.
Why does Tesla need to Dojo supercomputer?
It’s a fair question. Why is an automaker developing the world’s most powerful supercomputer? Well, Tesla would tell you that it’s
not just an automaker, but a technology company developing products to accelerate the transition to a sustainable economy.Musk
said it makes sense to offer a Dojo as a service, perhaps to take on his buddy Jeff Bezos’s Amazon AWS and calling it a “service
that you can use that’s available online where you can train your models way faster and for less money.”
But more specifically, Tesla needs Dojo to auto-label train videos from its fleet and train its neural nets to build its self-driving
system.Tesla realized that its approach to developing a self-driving system using neural nets training on millions of videos coming
from its customer fleet requires a lot of computing power. and it decided to develop its own supercomputer to deliver that power.
That’s the short-term goal, but Tesla will have plenty of use for the supercomputer going forward as it has big ambitions to
develop other artificial intelligence programs.

Introduction to Deep Reinforcement Learning
From https://skymind.ai/wiki/deep-reinforcement-learning
Many RL references at this site

Model-based Reinforcement Learning
From http://rail.eecs.berkeley.edu/deeprlcourse-fa17/f17docs/lecture_9_model_based_rl.pdfhttp://rail.eecs.berkeley.edu/deeprlcourse-fa17/f17docs/lecture_9_model_based_rl.pdf

Hierarchical Deep Reinforcement Learning
From https://papers.nips.cc/paper/6233-hierarchical-deep-reinforcement-learning-integrating-temporal-abstraction-and-intrinsic-motivation.pdf

Meta Learning Shared Hierarchy
From https://skymind.ai/wiki/deep-reinforcement-learning

Learning with Hierarchical Deep Models
From https://www.cs.toronto.edu/~rsalakhu/papers/HD_PAMI.pdf
We introduce HD (or “Hierarchical-Deep”) models, a new compositional learning architecture
that integrates deep learning models with structured hierarchical Bayesian (HB) models.
Specifically, we show how we can learn a hierarchical Dirichlet process (HDP) prior over the
activities of the top-level features in a deep Boltzmann machine (DBM). This compound HDP-
DBM model learns to learn novel concepts from very few training example by learning low-
level generic features, high-level features that capture correlations among low-level features,
and a category hierarchy for sharing priors over the high-level features that are typical of
different kinds of concepts. We present efficient learning and inference algorithms for the
HDP-DBM model and show that it is able to learn new concepts from very few examples on
CIFAR-100 object recognition, handwritten character recognition, and human motion capture
datasets.

Transfer Learning
From http://cs231n.github.io/transfer-learning/

Convolutional Deep Belief Networks for Scalable
Unsupervised Learning of Hierarchical Representations
From https://web.eecs.umich.edu/~honglak/icml09-ConvolutionalDeepBeliefNetworks.pdf
There has been much interest in unsupervised learning of hierarchical generative models such as deep belief networks. Scaling such models to
full-sized, high-dimensional images remains a difficult problem. To address this problem, we present the convolutional deep belief network, a
hierarchical generative model which scales to realistic image sizes. This model is translation-invariant and supports efficient bottom-up and top-
down probabilistic inference. Key to our approach is probabilistic max-pooling, a novel technique which shrinks the representations of higher
layers in a probabilistically sound way. Our experiments show that the algorithm learns useful high-level visual features, such as object parts, from
unlabeled images of objects and natural scenes. We demonstrate excellent performance on several visual recognition tasks and show that our
model can perform hierarchical (bottom-up and top-down) inference over full-sized images.
The visual world can be described at many levels: pixel intensities, edges, object parts, objects, and beyond. The prospect of learning hierarchical
models which simultaneously represent multiple levels has recently generated much interest. Ideally, such “deep” representations would learn
hierarchies of feature detectors, and further be able to combine top-down and bottomup processing of an image. For instance, lower layers could
support object detection by spotting low-level features indicative of object parts. Conversely, information about objects in the higher layers could
resolve lower-level ambiguities in the image or infer the locations of hidden object parts. Deep architectures consist of feature detector units
arranged in layers. Lower layers detect simple features and feed into higher layers, which in turn detect more complex features. There have been
several approaches to learning deep networks (LeCun et al., 1989; Bengio et al., 2006; Ranzato et al., 2006; Hinton et al., 2006). In particular, the
deep belief network (DBN) (Hinton et al., 2006) is a multilayer generative model where each layer encodes statistical dependencies among the
units in the layer below it; it is trained to (approximately) maximize the likelihood of its training data. DBNs have been successfully used to learn
high-level structure in a wide variety of domains, including handwritten digits (Hinton et al., 2006) and human motion capture data (Taylor et al.,
2007). We build upon the DBN in this paper because we are interested in learning a generative model of images which can be trained in a purely
unsupervised manner
This paper presents the convolutional deep belief network, a hierarchical generative model that scales to full-sized images. Another key to our
approach is probabilistic max-pooling, a novel technique that allows higher-layer units to cover larger areas of the input in a probabilistically
sound way. To the best of our knowledge, ours is the first translation invariant hierarchical generative model which supports both top-down and
bottom-up probabilistic inference and scales to realistic image sizes. The first, second, and third layers of our network learn edge detectors, object
parts, and objects respectively. We show that these representations achieve excellent performance on several visual recognition tasks and allow
“hidden” object parts to be inferred from high-level object information.

Learning with Hierarchical-Deep Models
From https://www.cs.toronto.edu/~rsalakhu/papers/HD_PAMI.pdf
We introduce HD (or “Hierarchical-Deep”) models, a new compositional learning architecture that integrates deep learning models with structured
hierarchical Bayesian (HB) models. Specifically, we show how we can learn a hierarchical Dirichlet process (HDP) prior over the activities of the
top-level features in a deep Boltzmann machine (DBM). This compound HDP-DBM model learns to learn novel concepts from very few training
example by learning low-level generic features, high-level features that capture correlations among low-level features, and a category hierarchy for
sharing priors over the high-level features that are typical of different kinds of concepts. We present efficient learning and inference algorithms for
the HDP-DBM model and show that it is able to learn new concepts from very few examples on CIFAR-100 object recognition, handwritten
character recognition, and human motion capture datasets
The ability to learn abstract representations that support transfer to novel but related tasks lies at the core of many problems in computer vision,
natural language processing, cognitive science, and machine learning. In typical applications of machine classification algorithms today, learning a
new concept requires tens, hundreds, or thousands of training examples. For human learners, however, just one or a few examples are often
sufficient to grasp a new category and make meaningful generalizations to novel instances [15], [25], [31], [44]. Clearly, this requires very strong
but also appropriately tuned inductive biases. The architecture we describe here takes a step toward this ability by learning several forms of abstract
knowledge at different levels of abstraction that support transfer of useful inductive biases from previously learned concepts to novel ones.
We call our architectures compound HD models, where “HD” stands for “Hierarchical-Deep,” because they are derived by composing hierarchical
nonparametric Bayesian models with deep networks, two influential approaches from the recent unsupervised learning literature with
complementary strengths. Recently introduced deep learning models, including deep belief networks (DBNs) [12], deep Boltzmann machines
(DBM) [29], deep autoencoders [19], and many others [9], [10], [21], [22], [26], [32], [34], [43], have been shown to learn useful distributed feature
representations for many high-dimensional datasets. The ability to automatically learn in multiple layers allows deep models to construct
sophisticated domain-specific features without the need to rely on precise human-crafted input representations, increasingly important with the
proliferation of datasets and application domains.

Reinforcement Learning: Fast and Slow
From https://www.cell.com/trends/cognitive-sciences/fulltext/S1364-6613(19)30061-0
Meta-RL: Speeding up Deep RL by Learning to Learn
As discussed earlier, a second key source of slowness in standard deep RL, alongside incremental
updating, is weak inductive bias. As formalized in the idea of the bias–variance tradeoff, fast learning
requires the learner to go in with a reasonably sized set of hypotheses concerning the structure of the
patterns that it will face. The narrower the hypothesis set, the faster learning can be. However, as
foreshadowed earlier, there is a catch: a narrow hypothesis set will only speed learning if it contains
the correct hypothesis. While strong inductive biases can accelerate learning, they will only do so if
the specific biases the learner adopts happen to fit with the material to be learned. As a result of this, a
new learning problem arises: how can the learner know what inductive biases to adopt?
Episodic Deep RL: Fast Learning through Episodic Memory
If incremental parameter adjustment is one source of slowness in deep RL, then one way to
learn faster might be to avoid such incremental updating. Naively increasing the learning rate
governing gradient descent optimization leads to the problem of catastrophic interference.
However, recent research shows that there is another way to accomplish the same goal, which
is to keep an explicit record of past events, and use this record directly as a point of reference
in making new decisions. This idea, referred to as episodic RL parallels ‘non-parametric’
approaches in machine learning and resembles ‘instance-’ or ‘exemplar-based’ theories of
learning in psychology When a new situation is encountered and a decision must be made
concerning what action to take, the procedure is to compare an internal representation of the
current situation with stored representations of past situations. The action chosen is then the
one associated with the highest value, based on the outcomes of the past situations that are
most similar to the present. When the internal state representation is computed by a multilayer
neural network, we refer to the resulting algorithm as ‘episodic deep RL’.

Google Research featuring Jeff Dean

Large-Scale Deep Learning (Jeff Dean)
From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf

Embedding for Sparse Inputs (Jeff Dean)

Efficient Vector Representation of Words (Jeff Dean)

Deep Convolution Neural Nets and Gaussian Processes
From https://ai.google/research/pubs/pub47671

Deep Convolution Neural Nets and Gaussian Processes(cont)
From https://ai.google/research/pubs/pub47671

Google’s Inception Network
From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202

Computing and Sensing Architecture

Simple
Event
Processing
Complex
Event
Processing
Hierarchical C4ISR Flow Model from Bob Marcus
Preprocess
In
Input
Devices
u
World
Model
Update
New
World
Model
Strategy
Tactics
HQ
Operations
Field
Operations
Situation Impact
Object Process
Simple
Response
Complex
Response
Update Plan
Create New
Goals and Plan
Sensor and
Effects
Management
In
Actuator
Devices
Measurement
Field
Processors
Data Structured Data Information Knowledge Wisdom
Devices
Awareness
Decision
Adapted From http://www.et-strategies.com/great-global-grid/Events.pdf

Computing and Sensing Architectures
From https://www.researchgate.net/publication/323835314_Greening_Trends_in_Energy-Efficiency_of_IoT-based_Heterogeneous_Wireless_Nodes/figures?lo=1

Bio-Inspired Distributed Intelligence
From https://news.mit.edu/2022/wiggling-toward-bio-inspired-machine-intelligence-juncal-arbelaiz-1002
More than half of an octopus’ nerves are distributed through its eight arms, each of which has some degree of autonomy. This
distributed sensing and information processing system intrigued Arbelaiz, who is researching how to design decentralized
intelligence for human-made systems with embedded sensing and computation. At MIT, Arbelaiz is an applied math student who
is working on the fundamentals of optimal distributed control and estimation in the final weeks before completing her PhD this
fall.
She finds inspiration in the biological intelligence of invertebrates such as octopus and jellyfish, with the ultimate goal of
designing novel control strategies for flexible “soft” robots that could be used in tight or delicate surroundings, such as a surgical
tool or for search-and-rescue missions.
“The squishiness of soft robots allows them to dynamically adapt to different environments. Think of worms, snakes, or jellyfish,
and compare their motion and adaptation capabilities to those of vertebrate animals,” says Arbelaiz. “It is an interesting expression
of embodied intelligence — lacking a rigid skeleton gives advantages to certain applications and helps to handle uncertainty in the
real world more efficiently. But this additional softness also entails new system-theoretic challenges.”
In the biological world, the “controller” is usually associated with the brain and central nervous system — it creates motor
commands for the muscles to achieve movement. Jellyfish and a few other soft organisms lack a centralized nerve center, or brain.
Inspired by this observation, she is now working toward a theory where soft-robotic systems could be controlled using
decentralized sensory information sharing.
“When sensing and actuation are distributed in the body of the robot and onboard computational capabilities are limited, it might
be difficult to implement centralized intelligence,” she says. “So, we need these sort of decentralized schemes that, despite sharing
sensory information only locally, guarantee the desired global behavior. Some biological systems, such as the jellyfish, are
beautiful examples of decentralized control architectures — locomotion is achieved in the absence of a (centralized) brain. This is
fascinating as compared to what we can achieve with human-made machines.”

From https://cse.buffalo.edu/~lusu/papers/Computer2018.pdf
Deep Learning for IoT

Deep Learning for IoT Overview: Survey

Standardized IoT Data Sets: Survey

DeepMind Website
DeepMind Home page
https://deepmind.com/
DeepMind Research
https://deepmind.com/research/
https://deepmind.com/research/publications/
DeepMind Blog
https://deepmind.com/blog
DeepMind Applied
https://deepmind.com/applied

DeepMind Featured Research Publications
From https://deepmind.com/research
AlphaGo
https://www.deepmind.com/research/highlighted-research/alphago
Deep Reinforcement Learning
https://deepmind.com/research/dqn/
A Dual Approach to Scalable Verification of Deep Networks
http://auai.org/uai2018/proceedings/papers/204.pdf
https://www.youtube.com/watch?v=SV05j3GM0LI
Learning to reinforcement learn
https://arxiv.org/abs/1611.05763
Neural Programmer - Interpreters
https://arxiv.org/pdf/1511.06279v3.pdf
Dueling Network Architectures for Deep Reinforcement Learning
https://arxiv.org/pdf/1511.06581.pdf
DeepMind Research over 400 publications

DeepMind Applied
From https://deepmind.com/applied/
DeepMind Health
https://deepmind.com/applied/deepmind-health/
DeepMind for Google
https://deepmind.com/applied/deepmind-google/
DeepMind Ethics and Society
https://deepmind.com/applied/deepmind-ethics-society/

AlphaGo and AlphaGoZero
From https://www.deepmind.com/research/highlighted-research/alphago
We created AlphaGo, a computer program that combines advanced search tree with deep neural
networks. These neural networks take a description of the Go board as an input and process it
through a number of different network layers containing millions of neuron-like connections.
One neural network, the “policy network”, selects the next move to play. The other neural network,
the “value network”, predicts the winner of the game. We introduced AlphaGo to numerous amateur
games to help it develop an understanding of reasonable human play. Then we had it play against
different versions of itself thousands of times, each time learning from its mistakes.
Over time, AlphaGo improved and became increasingly stronger and better at learning and decision-
making. This process is known as reinforcement learning. AlphaGo went on to defeat Go world
champions in different global arenas and arguably became the greatest Go player of all time.
Following the summit, we revealed AlphaGo Zero. While AlphaGo learnt the game by
playing thousands of matches with amateur and professional players, AlphaGo Zero
learnt by playing against itself, starting from completely random play.
This powerful technique is no longer constrained by the limits of human knowledge. Instead,
the computer program accumulated thousands of years of human knowledge during a period of
just a few days and learned to play Go from the strongest player in the world, AlphaGo.
AlphaGo Zero quickly surpassed the performance of all previous versions and also discovered new
knowledge, developing unconventional strategies and creative new moves, including those which
beat the World Go Champions Lee Sedol and Ke Jie. These creative moments give us confidence
that AI can be used as a positive multiplier for human ingenuity.

AlphaZero
From https://www.deepmind.com/blog/alphazero-shedding-new-light-on-chess-shogi-and-go
In late 2017 we introduced AlphaZero, a single system that taught itself from scratch how to master the
games of chess, shogi(Japanese chess), and Go, beating a world-champion program in each case. We were
excited by the preliminary results and thrilled to see the response from members of the chess community,
who saw in AlphaZero’s games a ground-breaking, highly dynamic and “unconventional” style of play that
differed from any chess playing engine that came before it.
Today, we are delighted to introduce the full evaluation of AlphaZero, published in the journal Science (Open
Access version here), that confirms and updates those preliminary results. It describes how AlphaZero quickly
learns each game to become the strongest player in history for each, despite starting its training from random play,
with no in-built domain knowledge but the basic rules of the game.
This ability to learn each game afresh, unconstrained by the norms of human play, results in a distinctive,
unorthodox, yet creative and dynamic playing style. Chess Grandmaster Matthew Sadler and Women’s
International Master Natasha Regan, who have analysed thousands of AlphaZero’s chess games for their
forthcoming book Game Changer (New in Chess, January 2019), say its style is unlike any traditional chess
engine.” It’s like discovering the secret notebooks of some great player from the past,” says Matthew.
Traditional chess engines – including the world computer chess champion Stockfish and IBM’s ground-
breaking Deep Blue – rely on thousands of rules and heuristics handcrafted by strong human players that try
to account for every eventuality in a game. Shogi programs are also game specific, using similar search
engines and algorithms to chess programs.
AlphaZero takes a totally different approach, replacing these hand-crafted rules with a deep neural network
and general purpose algorithms that know nothing about the game beyond the basic rules.

AlphaTensor
From https://www.deepmind.com/blog/discovering-novel-algorithms-with-alphatensor
First extension of AlphaZero to mathematics unlocks new possibilities for research
Algorithms have helped mathematicians perform fundamental operations for thousands of years. The ancient Egyptians created an
algorithm to multiply two numbers without requiring a multiplication table, and Greek mathematician Euclid described an algorithm
to compute the greatest common divisor, which is still in use today.
During the Islamic Golden Age, Persian mathematician Muhammad ibn Musa al-Khwarizmi designed new algorithms to solve linear
and quadratic equations. In fact, al-Khwarizmi’s name, translated into Latin as Algoritmi, led to the term algorithm. But, despite the
familiarity with algorithms today – used throughout society from classroom algebra to cutting edge scientific research – the process
of discovering new algorithms is incredibly difficult, and an example of the amazing reasoning abilities of the human mind.
In our paper, published today in Nature, we introduce AlphaTensor, the first artificial intelligence (AI) system for discovering novel,
efficient, and provably correct algorithms for fundamental tasks such as matrix multiplication. This sheds light on a 50-year-old open
question in mathematics about finding the fastest way to multiply two matrices.
This paper is a stepping stone in DeepMind’s mission to advance science and unlock the most fundamental problems using AI. Our
system, AlphaTensor, builds upon AlphaZero, an agent that has shown superhuman performance on board games, like chess, Go and
shogi, and this work shows the journey of AlphaZero from playing games to tackling unsolved mathematical problems for the first
time
Matrix multiplication
Matrix multiplication is one of the simplest operations in algebra, commonly taught in high school maths classes. But outside the
classroom, this humble mathematical operation has enormous influence in the contemporary digital world and is ubiquitous in
modern computing.

AlphaTensor (cont)
From https://www.deepmind.com/blog/discovering-novel-algorithms-with-alphatensor
First, we converted the problem of finding efficient algorithms for matrix multiplication into a single-player game. In this game, the board is
a three-dimensional tensor (array of numbers), capturing how far from correct the current algorithm is. Through a set of allowed moves,
corresponding to algorithm instructions, the player attempts to modify the tensor and zero out its entries. When the player manages to do so,
this results in a provably correct matrix multiplication algorithm for any pair of matrices, and its efficiency is captured by the number of steps
taken to zero out the tensor.
This game is incredibly challenging – the number of possible algorithms to consider is much greater than the number of atoms in the
universe, even for small cases of matrix multiplication. Compared to the game of Go, which remained a challenge for AI for decades, the
number of possible moves at each step of our game is 30 orders of magnitude larger (above 1033 for one of the settings we consider).
Essentially, to play this game well, one needs to identify the tiniest of needles in a gigantic haystack of possibilities. To tackle the challenges
of this domain, which significantly departs from traditional games, we developed multiple crucialcomponents including a novel neural
network architecture that incorporates problem-specific inductive biases, a procedure to generate useful synthetic data, and a recipe to
leverage symmetries of the problem.
We then trained an AlphaTensor agent using reinforcement learning to play the game, starting without any knowledge about existing
matrix multiplication algorithms. Through learning, AlphaTensor gradually improves over time, re-discovering historical fast matrix
multiplication algorithms such as Strassen’s, eventually surpassing the realm of human intuition and discovering algorithms faster
than previously known.
Detailed Article in Nature

AlphaTensor
From https://www.nature.com/articles/s41586-022-05172-4

Complex Cooperative Agents
From https://deepmind.com/blog/capture-the-flag-science/
From https://science.sciencemag.org/content/364/6443/859 5/19

Complex Cooperative Agents (cont)
From https://science.sciencemag.org/content/364/6443/859 5/19

Unsupervised Learning
From https://deepmind.com/blog/unsupervised-learning/
Unsupervised learning is a paradigm designed to create autonomous intelligence by
rewarding agents (that is, computer programs) for learning about the data they observe
without a particular task in mind. In other words, the agent learns for the sake of learning.
A key motivation for unsupervised learning is that, while the data passed to learning
algorithms is extremely rich in internal structure (e.g., images, videos and text), the targets
and rewards used for training are typically very sparse (e.g., the label ‘dog’ referring to that
particularly protean species, or a single one or zero to denote success or failure in a game).
This suggests that the bulk of what is learned by an algorithm must consist of understanding
the data itself, rather than applying that understanding to particular tasks.

Unsupervised Learning (cont)
Unsupervised learning is a paradigm designed to create autonomous intelligence by
rewarding agents (that is, computer programs) for learning about the data they observe
without a particular task in mind. In other words, the agent learns for the sake of learning.
A key motivation for unsupervised learning is that, while the data passed to learning
algorithms is extremely rich in internal structure (e.g., images, videos and text), the targets
and rewards used for training are typically very sparse (e.g., the label ‘dog’ referring to that
particularly protean species, or a single one or zero to denote success or failure in a game).
This suggests that the bulk of what is learned by an algorithm must consist of understanding
the data itself, rather than applying that understanding to particular tasks.
These results resonate with our intuitions about the human mind. Our ability to learn about the
world without explicit supervision is fundamental to what we regard as intelligence. On a train
ride we might listlessly gaze through the window, drag our fingers over the velvet of the seat,
regard the passengers sitting across from us. We have no agenda in these studies: we
almost can’t help but gather information, our brains ceaselessly working to understand the
world around us, and our place within it.

Decoding the elements of vision
2012 was a landmark year for deep learning, when AlexNet (named after its lead architect Alex Krizhnevsky) swept the
ImageNet classification competition. AlexNet’s abilities to recognize images were unprecedented, but even more
striking is what was happening under the hood. When researchers analysed what AlexNet was doing, they discovered that
it interprets images by building increasingly complex internal representations of its inputs. Low-level features, such as
textures and edges, are represented in the bottom layers, and these are then combined to form high-level concepts such
as wheels and dogs in higher layers.
This is remarkably similar to how information is processed in our brains, where simple edges and textures in primary
sensory processing areas are assembled into complex objects like faces in higher areas. The representation of a complex
scene can therefore be built out of visual primitives, in much the same way that meaning emerges from the individual
words comprising a sentence. Without explicit guidance to do so, the layers of AlexNet had discovered a fundamental
‘vocabulary’ of vision in order to solve its task. In a sense, it had learned to play what Wittgenstein called a ‘language
game’ that iteratively translates from pixels to labels.

Transfer learning
From the perspective of general intelligence, the most interesting thing about AlexNet’s vocabulary is that it can be reused,
or transferred, to visual tasks other than the one it was trained on, such as recognising whole scenes rather than
individual objects. Transfer is essential in an ever-changing world, and humans excel at it: we are able to rapidly adapt
the skills and understanding we’ve gleaned from our experiences (our ‘world model’) to whatever situation is at hand. For
example, a classically-trained pianist can pick up jazz piano with relative ease. Artificial agents that form the right internal
representations of the world, the reasoning goes, should be able to do similarly.
Nonetheless, the representations learned by classifiers such as AlexNet have limitations. In particular, as the network was
only trained to label images with a single class (cat, dog, car, volcano), any information not required to infer the label—no
matter how useful it might be for other tasks—is liable to be ignored. For example, the representations may fail to capture
the background of the image if the label always refers to the foreground. A possible solution is to provide more
comprehensive training signals, like detailed captions describing the images: not just “dog,” but “A Corgi catching a
frisbee in a sunny park.” However, such targets are laborious to provide, especially at scale, and still may be insufficient to
capture all the information needed to complete a task. The basic premise of unsupervised learning is that the best way to
learn rich, broadly transferable representations is to attempt to learn everything that can be learned about the data.
If the notion of transfer through representation learning seems too abstract, consider a child who has learned to draw
people as stick figures. She has discovered a representation of the human form that is both highly compact and rapidly
adaptable. By augmenting each stick figure with specifics, she can create portraits of all her classmates: glasses for her
best friend, her deskmate in his favorite red tee-shirt. And she has developed this skill not in order to complete a specific
task or receive a reward, but rather in response to her basic urge to reflect the world around her.

Learning by creating: generative models
Perhaps the simplest objective for unsupervised learning is to train an algorithm to generate its own instances of data. So-
called generative models should not simply reproduce the data they are trained on (an uninteresting act of memorisation),
but rather build a model of the underlying class from which that data was drawn: not a particular photograph of a horse or
a rainbow, but the set of all photographs of horses and rainbows; not a specific utterance from a specific speaker, but the
general distribution of spoken utterances. The guiding principle of generative models is that being able to construct a
convincing example of the data is the strongest evidence of having understood it: as Richard Feynman put it, "what I
cannot create, I do not understand.”
For images, the most successful generative model so far has been the Generative Adversarial Network (GAN for short),
in which two networks—a generator and a discriminator—engage in a contest of discernment akin to that of an artistic
forger and a detective. The generator produces images with the goal of tricking the discriminator into believing they are
real; the discriminator, meanwhile, is rewarded for spotting the fakes. The generated images, first messy and random, are
refined over many iterations, and the ongoing dynamic between the networks leads to ever-more realistic images that are
in many cases indistinguishable from real photographs. Generative adversarial networks can also dream details of
landscapes defined by the rough sketches of users.
A glance at the images below is enough to convince us that the network has learned to represent many of the key features
of the photographs they were trained on, such as the structure of animal’s bodies, the texture of grass, and detailed effects
of light and shade (even when refracted through a soap bubble). Close inspection reveals slight anomalies, such as the
white dog’s apparent extra leg and the oddly right-angled flow of one of the jets in the fountain. While the creators of
generative models strive to avoid such imperfections, their visibility highlights one of the benefits of recreating familiar data
such as images: by inspecting the samples, researchers can infer what the model has and hasn’t learned.

Creating by predicting
Another notable family within unsupervised learning are autoregressive models, in which the data is split into a
sequence of small pieces, each of which is predicted in turn. Such models can be used to generate data by successively
guessing what will come next, feeding in a guess as input and guessing again. Language models, where each word is
predicted from the words before it, are perhaps the best known example: these models power the text predictions that pop
up on some email and messaging apps. Recent advances in language modelling have enabled the generation of strikingly
plausible passages, such as the one shown below from OpenAI’s GPT-2.
By controlling the input sequence used to condition the out predictions, autoregressive models can also be used to
transform one sequence into another. This demo uses a conditional autoregressive model to transform text into realistic
handwriting. WaveNet transforms text into natural sounding speech, and is now used to generate voices for Google
Assistant. A similar process of conditioning and autoregressive generation can be used to translate from one language
to another.
Autoregressive models learn about data by attempting to predict each piece of it in a particular order. A more general
class of unsupervised learning algorithms can be built by predicting any part of the data from any other. For example, this
could mean removing a word from a sentence, and attempting to predict it from whatever remains. By learning to make
lots of localised predictions, the system is forced to learn about the data as a whole.
One concern around generative models is their potential for misuse. While manipulating evidence with photo, video, and
audio editing has been possible for a long time, generative models could make it even easier to edit media with malicious
intent. We have already seen demonstrations of so-called ‘deepfakes’—for instance, this fabricated video footage of
President Obama. It’s encouraging to see that several major efforts to address these challenges are already underway,
including using statistical techniques to help detect synthetic media and verify authentic media, raising public
awareness, and discussions around limiting the availability of trained generative models.

Re-imagining intelligence
Generative models are fascinating in their own right, but our principal interest in them at DeepMind is as a stepping stone
towards general intelligence. Endowing an agent with the ability to generate data is a way of giving it an imagination, and
hence the ability to plan and reason about the future. Even without explicit generation, our studies show that learning to
predict different aspects of the environment enriches the agent’s world model, and thereby improves its ability to solve
problems.
These results resonate with our intuitions about the human mind. Our ability to learn about the world without explicit
supervision is fundamental to what we regard as intelligence. On a train ride we might listlessly gaze through the window,
drag our fingers over the velvet of the seat, regard the passengers sitting across from us. We have no agenda in these
studies: we almost can’t help but gather information, our brains ceaselessly working to understand the world around us,
and our place within it.

Towards Robust andVerified AI
From https://deepmind.com/blog/robust-and-verified-ai/
Bugs and software have gone hand in hand since the beginning of computer programming. Over time, software developers
have established a set of best practices for testing and debugging before deployment, but these practices are not suited for
modern deep learning systems. Today, the prevailing practice in machine learning is to train a system on a training data set,
and then test it on another set. While this reveals the average-case performance of models, it is also crucial to ensure
robustness, or acceptably high performance even in the worst case. In this article, we describe three approaches for rigorously identifying and
eliminating bugs in learned predictive models: adversarial testing, robust learning, and formal verification.
This is not an entirely new problem. Computer programs have always had bugs. Over decades, software engineers have assembled an
impressive toolkit of techniques, ranging from unit testing to formal verification. These methods work well on traditional software, but
adapting these approaches to rigorously test machine learning models like neural networks is extremely challenging due to the scale and
lack of structure in these models, which may contain hundreds of millions of parameters. This necessitates the need for developing novel
approaches for ensuring that machine learning systems are robust at deployment.
From a programmer’s perspective, a bug is any behaviour that is inconsistent with the specification, i.e. the intended functionality, of a
system. As part of our mission of solving intelligence, we conduct research into techniques for evaluating whether machine learning
systems are consistent not only with the train and test set, but also with a list of specifications describing desirable properties of a system.
Such properties might include robustness to sufficiently small perturbations in inputs, safety constraints to avoid catastrophic failures, or
producing predictions consistent with the laws of physics.

Towards Robust andVerified AI (cont)
In this article, we discuss three important technical challenges for the machine learning community to take on, as we collectively work
towards rigorous development and deployment of machine learning systems that are reliably consistent with desired specifications:
• Testing consistency with specifications efficiently. We explore efficient ways to test that machine learning systems are
consistent with properties (such as invariance or robustness) desired by the designer and users of the system. One approach to
uncover cases where the model might be inconsistent with the desired behaviour is to systematically search for worst-case outcomes
during evaluation.
• Training machine learning models to be specification-consistent. Even with copious training data, standard machine learning
algorithms can produce predictive models that make predictions inconsistent with desirable specifications like robustness or fairness -
requires us to reconsider training algorithms that produce models that not only fit training data well, but also are consistent with a list of
specifications.
• Formally proving that machine learning models are specification-consistent. There is a need for algorithms that can verify
the model predictions are provably consistent with a specification of interest for all possible inputs. While the field of formal verification h
studied such algorithms for several decades, these approaches do not easily scale to modern deep learning systems despite
impressive progress.

Testing consistency with specifications efficiently
Robustness to adversarial examples is a relatively well-studied problem in deep learning. One major theme that has come out of this
work is the importance of evaluating against strong attacks, and designing transparent models which can be efficiently analysed.
Alongside other researchers from the community, we have found that many models appear robust when evaluated against weak
adversaries. However, they show essentially 0% adversarial accuracy when evaluated against stronger adversaries (Athalye et al.,
2018, Uesato et al., 2018, Carlini and Wagner, 2017).
While most work has focused on rare failures in the context of supervised learning (largely image classification), there is a need to
extend these ideas to other settings. In recent work on adversarial approaches for uncovering catastrophic failures, we apply these
ideas towards testing reinforcement learning agents intended for use in safety-critical settings. One challenge in developing
autonomous systems is that because a single mistake may have large consequences, very small failure probabilities are unacceptable.
Our objective is to design an “adversary” to allow us to detect such failures in advance (e.g., in a controlled environment). If the
adversary can efficiently identify the worst-case input for a given model, this allows us to catch rare failure cases before deploying a
model. As with image classifiers, evaluating against a weak adversary provides a false sense of security during deployment. This is
similar to the software practice of red-teaming, though extends beyond failures caused by malicious adversaries, and also includes
failures which arise naturally, for example due to lack of generalization.
We developed two complementary approaches for adversarial testing of RL agents. In the first, we use a derivative-free optimisation to
directly minimise the expected reward of an agent. In the second, we learn an adversarial value function which predicts from
experience which situations are most likely to cause failures for the agent. We then use this learned function for optimisation to focus
the evaluation on the most problematic inputs. These approaches form only a small part of a rich, growing space of potential
algorithms, and we are excited about future development in rigorous evaluation of agents.
Already, both approaches result in large improvements over random testing. Using our method, failures that would have taken days to
uncover, or even gone undetected entirely, can be detected in minutes (Uesato et al., 2018b). We also found that adversarial testing
may uncover qualitatively different behaviour in our agents from what might be expected from evaluation on a random test set. In
particular, using adversarial environment construction we found that agents performing a 3D navigation task, which match human-level
performance on average, still failed to find the goal completely on surprisingly simple mazes (Ruderman et al., 2018). Our work also
highlights that we need to design systems that are secure against natural failures, not only against adversaries.

Training machine learning models to be specification-consistent
Adversarial testing aims to find a counter example that violates specifications. As such, it often leads to overestimating the
consistency of models with respect to these specifications. Mathematically, a specification is some relationship that has to
hold between the inputs and outputs of a neural network. This can take the form of upper and lower bounds on certain key
input and output parameters.
Motivated by this observation, several researchers (Raghunathan et al., 2018; Wong et al., 2018; Mirman et al., 2018;
Wang et al., 2018) including our team at DeepMind (Dvijotham et al., 2018; Gowal et al., 2018), have worked on
algorithms that are agnostic to the adversarial testing procedure (used to assess consistency with the specification). This
can be understood geometrically - we can bound (e.g., using interval bound propagation; Ehlers 2017, Katz et al. 2017,
Mirman et al., 2018) the worst violation of a specification by bounding the space of outputs given a set of inputs. If this
bound is differentiable with respect to network parameters and can be computed quickly, it can be used during training.
The original bounding box can then be propagated through each layer of the network.
We show that interval bound propagation is fast, efficient, and — contrary to prior belief — can achieve strong results
(Gowal et al., 2018). In particular, we demonstrate that it can decrease the provable error rate (i.e., maximal error rate
achievable by any adversary) over state-of-the-art in image classification on both MNIST and CIFAR-10 datasets.
Going forward, the next frontier will be to learn the right geometric abstractions to compute tighter overapproximations of
the space of outputs. We also want to train networks to be consistent with more complex specifications capturing desirable
behavior, such as above mentioned invariances and consistency with physical laws.

Formally proving that machine learning models are specification-consistent
Rigorous testing and training can go a long way towards building robust machine learning systems. However, no amount of
testing can formally guarantee that a system will behave as we want. In large-scale models, enumerating all possible outputs
for a given set of inputs (for example, infinitesimal perturbations to an image) is intractable due to the astronomical number of
choices for the input perturbation. However, as in the case of training, we can find more efficient approaches by setting
geometric bounds on the set of outputs. Formal verification is a subject of ongoing research at DeepMind.
The machine learning community has developed several interesting ideas on how to compute precise geometric bounds on
the space of outputs of the network (Katz et al. 2017, Weng et al., 2018; Singh et al., 2018). Our approach (Dvijotham et al.,
2018), based on optimisation and duality, consists of formulating the verification problem as an optimisation problem that tries
to find the largest violation of the property being verified. By using ideas from duality in optimisation, the problem becomes
computationally tractable. This results in additional constraints that refine the bounding boxes computed by interval bound
propagation, using so-called cutting planes. This approach is sound but incomplete: there may be cases where the property of
interest is true, but the bound computed by this algorithm is not tight enough to prove the property. However, once we obtain a
bound, this formally guarantees that there can be no violation of the property. The figure below graphically illustrates the
approach.
This approach enables us to extend the applicability of verification algorithms to more general networks (activation functions,
architectures), general specifications and more sophisticated deep learning models (generative models, neural processes,
etc.) and specifications beyond adversarial robustness (Qin, 2018).

Outlook
Deployment of machine learning in high-stakes situations presents unique challenges, and requires the
development of evaluation techniques that reliably detect unlikely failure modes. More broadly, we believe that
learning consistency with specifications can provide large efficiency improvements over approaches where
specifications only arise implicitly from training data. We are excited about ongoing research into adversarial
evaluation, learning robust models, and verification of formal specifications.
Much more work is needed to build automated tools for ensuring that AI systems in the real world will do the
“right thing”. In particular, we are excited about progress in the following directions:
• Learning for adversarial evaluation and verification: As AI systems scale and become more
complex, it will become increasingly difficult to design adversarial evaluation and verification algorithms
that are well-adapted to the AI model. If we can leverage the power of AI to facilitate evaluation and
verification, this process can be bootstrapped to scale.
• Development of publicly-available tools for adversarial evaluation and verification: It is important
to provide AI engineers and practitioners with easy-to-use tools that shed light on the possible failure
modes of the AI system before it leads to widespread negative impact. This would require some degree of
standardisation of adversarial evaluation and verification algorithms.
• Broadening the scope of adversarial examples: To date, most work on adversarial examples has
focused on model invariances to small perturbations, typically of images. This has provided an excellent
testbed for developing approaches to adversarial evaluation, robust learning, and verification. We have
begun to explore alternate specifications for properties directly relevant in the real world, and are excited
by future research in this direction.
• Learning specifications: Specifications that capture “correct” behavior in AI systems are often
difficult to precisely state. Building systems that can use partial human specifications and learn further
specifications from evaluative feedback would be required as we build increasingly intelligent agents
capable of exhibiting complex behaviors and acting in unstructured environments.

TF-Replicator: Distributed Machine Learning for Researchers
From https://deepmind.com/blog/tf-replicator-distributed-machine-learning/
At DeepMind, the Research Platform Team builds infrastructure to empower and accelerate our AI research.
Today, we are excited to share how we developed TF-Replicator, a software library that helps researchers
deploy their TensorFlow models on GPUs and Cloud TPUs with minimal effort and no previous experience
with distributed systems. TF-Replicator’s programming model has now been open sourced as part of
TensorFlow’s tf.distribute.Strategy. This blog post gives an overview of the ideas and technical challenges
underlying TF-Replicator. For a more comprehensive description, please read our arXiv paper.
A recurring theme in recent AI breakthroughs -- from AlphaFold to BigGAN to AlphaStar -- is the need for effortless
and reliable scalability. Increasing amounts of computational capacity allow researchers to train ever-larger neural
networks with new capabilities. To address this, the Research Platform Team developed TF-Replicator, which allows
researchers to target different hardware accelerators for Machine Learning, scale up workloads to many devices, and
seamlessly switch between different types of accelerators. While it was initially developed as a library on top of
TensorFlow, TF-Replicator’s API has since been integrated into TensorFlow 2.0’s new tf.distribute.Strategy.
While TensorFlow provides direct support for CPU, GPU, and TPU (Tensor Processing Unit) devices, switching
between targets requires substantial effort from the user. This typically involves specialising code for a particular
hardware target, constraining research ideas to the capabilities of that platform. Some existing frameworks built on
top of TensorFlow, e.g. Estimators, seek to address this problem. However, they are typically targeted at production
use cases and lack the expressivity and flexibility required for rapid iteration of research ideas.

AlphaFold Protein Folding
From https://deepmind.com/blog/alphafold/

AlphaFold Protein Folding (cont)
From https://deepmind.com/blog/alphafold/

Google Streams for NHS
From https://deepmind.com/applied/deepmind-health/working-partners/how-were-helping-today

Open Sourcing TRFL
From https://deepmind.com/blog/trfl/

Open Sourcing TRFL (cont)
From https://deepmind.com/blog/trfl/

Multi-Task Learning (e.g.Atari)
From https://deepmind.com/blog/preserving-outputs-precisely-while-adaptively-rescaling-targets/

Multi-Task Learning (cont)
From https://deepmind.com/blog/preserving-outputs-precisely-while-adaptively-rescaling-targets/

Measuring Abstract Reasoning in Neural Nets
From http://proceedings.mlr.press/v80/santoro18a/santoro18a.pdf
Whether neural networks can learn abstract reasoning or whether they merely rely on superficial statistics is a
topic of recent debate. Here, we propose a dataset and challenge designed to probe abstract reasoning, inspired by a
well-known human IQ test. To succeed at this challenge, models must cope with various generalisation ‘regimes’ in
which the training and test data differ in clearly- defined ways. We show that popular models such as ResNets
perform poorly, even when the training and test sets differ only minimally, and we present a novel architecture,
with a structure designed to encourage reasoning, that does significantly better. When we vary the way in which
the test questions and training data differ, we find that our model is notably proficient at certain forms of
generalisation, but notably weak at others. We further show that the model’s ability to generalise improves
markedly if it is trained to predict symbolic explanations for its answers. Altogether, we introduce and explore
ways to both measure and induce stronger abstract reasoning in neural networks. Our freely-available dataset
should motivate further progress in this direction.
One of the long-standing goals of artificial intelligence is to develop machines with abstract reasoning capabilities that equal or
better those of humans. Though there has also been substantial progress in both reasoning and abstract representation learning
in neural nets (Botvinick et al., 2017; LeCun et al., 2015; Higgins et al., 2016; 2017), the extent to which these models exhibit
anything like general abstract reasoning is the subject of much debate (Garnelo et al., 2016; Lake & Baroni, 2017; Marcus,
2018). The research presented here was therefore motivated by two main goals. (1) To understand whether, and (2) to
understand how, deep neural networks might be able to solve abstract visual reasoning problems.
Our answer to (1) is that, with important caveats, neural networks can indeed learn to infer and apply abstract reasoning
principles. Our best performing model learned to solve complex visual reasoning questions, and to do so, it needed to induce
and detect from raw pixel input the presence of abstract notions such as logical operations and arithmetic progressions, and
apply these principles to never-before observed stimuli. Importantly, we found that the architecture of the model made a
critical difference to its ability to learn and execute such processes. While standard visual- processing models such as CNNs
and ResNets performed poorly, a model that promoted the representation of, and comparison between parts of the stimuli
performed very well. We found ways to improve this performance via additional supervision: the training outcomes and the
model’s ability to generalise were improved if it was required to decode its representations into symbols corresponding to the
reason behind the correct answer.

Learning to Navigate Cities without a Map
From https://arxiv.org/abs/1804.00168
Navigating through unstructured environments is a basic capability of intelligent
creatures, and thus is of fundamental interest in the study and development of artificial
intelligence. Long-range navigation is a complex cognitive task that relies on developing
an internal representation of space, grounded by recognisable landmarks and robust
visual processing, that can simultaneously support continuous self-localisation ("I am
here") and a representation of the goal ("I am going there"). Building upon recent
research that applies deep reinforcement learning to maze navigation problems, we
present an end-to-end deep reinforcement learning approach that can be applied on a
city scale. Recognising that successful navigation relies on integration of general policies
with locale-specific knowledge, we propose a dual pathway architecture that allows
locale-specific features to be encapsulated, while still enabling transfer to multiple cities.
We present an interactive navigation environment that uses Google StreetView for its
photographic content and worldwide coverage, and demonstrate that our learning
method allows agents to learn to navigate multiple cities and to traverse to target
destinations that may be kilometres away. The project webpage this http URL contains a
video summarising our research and showing the trained agent in diverse city
environments and on the transfer task, the form to request the StreetLearn dataset and
links to further resources. The StreetLearn environment code is available at this https
URL

Learning to Generate Images
From https://deepmind.com/blog/learning-to-generate-images/
Advances in deep generative networks have led to impressive results in recent years.
Neverthe- less, such models can often waste their capacity on the minutiae of datasets,
presumably due to weak inductive biases in their decoders. This is where graphics
engines may come in handy since they abstract away low-level details and represent
images as high-level programs. Current methods that combine deep learning and
renderers are limited by hand-crafted likelihood or distance functions, a need for large
amounts of supervision, or difficulties in scaling their inference algorithms to richer
datasets. To mitigate these issues, we present SPIRAL, an adversarially trained agent that
generates a program which is executed by a graphics engine to interpret and sample
images. The goal of this agent is to fool a discriminator network that distinguishes
between real and rendered data, trained with a distributed reinforcement learning setup
without any supervision. A surprising finding is that using the discriminator’s output as
a reward signal is the key to allow the agent to make meaningful progress at match- ing
the desired output rendering. To the best of our knowledge, this is the first demonstration
of an end-to-end, unsupervised and adversarial in- verse graphics agent on challenging
real world (MNIST, OMNIGLOT, CELEBA) and synthetic 3D datasets. A video of the
agent can be found at https://youtu.be/iSyvwAwa7vk.

Neuron Deletion
From https://deepmind.com/blog/understanding-deep-learning-through-neuron-deletion/
We measured the performance impact of damaging the network by deleting individual neurons as
well as groups of neurons. Our experiments led to two surprising findings:
• Although many previous studies have focused on understanding easily interpretable
individual neurons (e.g. “cat neurons”, or neurons in the hidden layers of deep networks
which are only active in response to images of cats), we found that these interpretable
neurons are no more important than confusing neurons with difficult-to-interpret activity.
• Networks which correctly classify unseen images are more resilient to neuron deletion
than networks which can only classify images they have seen before. In other words,
networks which generalise well are much less reliant on single directions than those which
memorise.
To evaluate neuron importance, we measured how network performance on image classification
tasks changes when a neuron is deleted. If a neuron is very important, deleting it should be
highly damaging and substantially decrease network performance, while the deletion of an
unimportant neuron should have little impact. Neuroscientists routinely perform similar
experiments, although they cannot achieve the fine-grained precision which is necessary for
these experiments and readily available in artificial neural networks.
Surprisingly, we found that there was little relationship between selectivity and importance. In
other words, “cat neurons” were no more important than confusing neurons. This finding echoes
recent work in neuroscience which has demonstrated that confusing neurons can actually be
quite informative, and suggests that we must look beyond the most easily interpretable neurons in
order to understand deep neural networks.

Learning by Playing
We propose Scheduled Auxiliary Control (SAC- X), a new learning paradigm in the context of
Reinforcement Learning (RL). SAC-X enables learning of complex behaviors – from scratch – in
the presence of multiple sparse reward signals. To this end, the agent is equipped with a set of gen-
eral auxiliary tasks, that it attempts to learn simultaneously via off-policy RL. The key idea behind
our method is that active (learned) scheduling and execution of auxiliary policies allows the agent
to efficiently explore its environment – enabling it to excel at sparse reward RL. Our experiments
in several challenging robotic manipulation settings demonstrate the power of our approach. A
video of the rich set of learned behaviors can be found at https://youtu.be/mPKyvocNe M.
This paper introduces SAC-X, a method that simultaneously learns intention policies on a set of
auxiliary tasks, and actively schedules and executes these to explore its observation space - in
search for sparse rewards of externally defined target tasks. Utilizing simple auxiliary tasks enables
SAC-X to learn complicated target tasks from rewards defined in a ’pure’, sparse, manner: only the
end goal is specified, but not the solution path.
We demonstrated the power of SAC-X on several challenging robotics tasks in simulation, using a
common set of simple and sparse auxiliary tasks and on a real robot. The learned intentions are
highly reactive, reliable, and exhibit a rich and robust behavior. We consider this as an important
step towards the goal of applying RL to real world domains.

Scalable Distributed DeepRL
From https://deepmind.com/blog/impala-scalable-distributed-deeprl-dmlab-30/
Deep Reinforcement Learning (DeepRL) has achieved remarkable success in a
range of tasks, from continuous control problems in robotics to playing games
like Go and Atari. The improvements seen in these domains have so far been
limited to individual tasks where a separate agent has been tuned and trained for
each task.
In our most recent work, we explore the challenge of training a single agent on many
tasks.
Today we are releasing DMLab-30, a set of new tasks that span a large variety of
challenges in a visually unified environment with a common action space. Training an
agent to perform well on many tasks requires massive throughput and making efficient
use of every data point. To this end, we have developed a new, highly scalable agent
architecture for distributed training called Importance Weighted Actor-Learner
Architecture that uses a new off-policy correction algorithm called V-trace
DMLab-30 is a collection of new levels designed using our open source RL
environment DeepMind Lab. These environments enable any DeepRL researcher to
test systems on a large spectrum of interesting tasks either individually or in a multi-
task setting.

Scalable Distributed DeepRL (cont)
From https://deepmind.com/blog/impala-scalable-distributed-deeprl-dmlab-30/
In order to tackle the challenging DMLab-30 suite, we developed a new distributed agent
called Importance Weighted Actor-Learner Architecture that maximises data throughput using an
efficient distributed architecture with TensorFlow.
Importance Weighted Actor-Learner Architecture is inspired by the popular A3C architecture which
uses multiple distributed actors to learn the agent’s parameters. In models like this, each of the
actors uses a clone of the policy parameters to act in the environment. Periodically, actors pause
their exploration to share the gradients they have computed with a central parameter server that
applies updates.

Learning Explanatory Rules from Noisy Data
From https://deepmind.com/blog/learning-explanatory-rules-noisy-data/
The distinction is interesting to us because these two types of thinking correspond to two different approaches
to machine learning: deep learning and symbolic program synthesis. Deep learning concentrates on intuitive
perceptual thinking whereas symbolic program synthesis focuses on conceptual, rule-based thinking. Each
system has different merits - deep learning systems are robust to noisy data but are difficult to interpret and
require large amounts of data to train, whereas symbolic systems are much easier to interpret and require less
training data but struggle with noisy data. While human cognition seamlessly combines these two distinct
ways of thinking, it is much less clear whether or how it is possible to replicate this in a single AI system.
Our new paper, recently published in JAIR, demonstrates it is possible for systems to combine intuitive
perceptual with conceptual interpretable reasoning. The system we describe, ∂ILP, is robust to noise, data-
efficient, and produces interpretable rules.

Learning Explanatory Rules from Noisy Data (cont)
In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a
single set of parameters. A key challenge is to handle the increased amount of data and extended training
time. We have developed a new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture)
that not only uses resources more efficiently in single-machine training but also scales to thousands of
machines without sacrificing data efficiency or resource utilisation. We achieve stable learning at high
throughput by combining decoupled acting and learning with a novel off-policy correction method called V-
trace. We demonstrate the effectiveness of IMPALA for multi-task reinforcement learning on DMLab-30 (a set
of 30 tasks from the DeepMind Lab environment (Beattie et al., 2016)) and Atari-57 (all available Atari games
in Arcade Learning Environment (Bellemare et al., 2013a)). Our results show that IMPALA is able to achieve
better performance than previous agents with less data, and crucially exhibits positive transfer between tasks
as a result of its multi-task approach.

DeepMind Lab
From https://deepmind.com/blog/open-sourcing-deepmind-lab/
The development of innovative agents goes hand in hand with the careful design and implementation of
rationally selected, flexible and well-maintained environments. To that end, we at DeepMind have invested
considerable effort toward building rich simulated environments to serve as “laboratories” for AI research.
Now we are open-sourcing our flagship platform, DeepMind Lab, so the broader research community can
make use of it.
DeepMind Lab is a fully 3D game-like platform tailored for agent-based AI research. It is observed from a
first-person viewpoint, through the eyes of the simulated agent. Scenes are rendered with rich science
fiction-style visuals. The available actions allow agents to look around and move in 3D. The agent’s “body”
is a floating orb. It levitates and moves by activating thrusters opposite its desired direction of movement,
and it has a camera that moves around the main sphere as a ball-in-socket joint tracking the rotational look
actions. Example tasks include collecting fruit, navigating in mazes, traversing dangerous passages while
avoiding falling off cliffs, bouncing through space using launch pads to move between platforms, playing
laser tag, and quickly learning and remembering random procedurally generated environments. An
illustration of how agents in DeepMind Lab perceive and interact with the world can be seen below:

Game Theory for Asymmetric Players
From https://deepmind.com/blog/game-theory-insights-asymmetric-multi-agent-games/
As AI systems start to play an increasing role in the real world it is important to understand how different systems will
interact with one another. In our latest paper, published in the journal Scientific Reports, we use a branch of game
theory to shed light on this problem. In particular, we examine how two intelligent systems behave and respond in a
particular type of situation known as an asymmetric game, which include Leduc poker and various board games such
as Scotland Yard. Asymmetric games also naturally model certain real-world scenarios such as automated auctions
where buyers and sellers operate with different motivations. Our results give us new insights into these situations and
reveal a surprisingly simple way to analyse them. While our interest is in how this theory applies to the interaction of
multiple AI systems, we believe the results could also be of use in economics, evolutionary biology and empirical game
theory among others
Game theory is a field of mathematics that is used to analyse the strategies used by decision makers in competitive
situations. It can apply to humans, animals, and computers in various situations but is commonly used in AI research to
study “multi-agent” environments where there is more than one system, for example several household robots
cooperating to clean the house. Traditionally, the evolutionary dynamics of multi-agent systems have been analysed
using simple, symmetric games, such as the classic Prisoner’s Dilemma, where each player has access to the same
set of actions. Although these games can provide useful insights into how multi-agent systems work and tell us how to
achieve a desirable outcome for all players - known as the Nash equilibrium - they cannot model all situations.
Our new technique allows us to quickly and easily identify the strategies used to find the Nash equilibrium in more
complex asymmetric games - characterised as games where each player has different strategies, goals and rewards.
These games - and the new technique we use to understand them - can be illustrated using an example from ‘Battle of
the Sexes’, a coordination game commonly used in game theory research.
UPDATE 20/03/18: Our latest paper, forthcoming at the Autonomous Agents and Multi-Agent Systems conference
(AAMAS), builds on the Scientific Reports paper outlined above. A Generalised Method for Empirical Game
Theoretic Analysis introduces a general method to perform empirical analysis of multi-agent interactions, both in
symmetric and asymmetric games. The method allows to understand how multi-agent strategies interact, what the
attractors are and what the basins of attraction look like, giving an intuitive understanding for the strength of the involved
strategies. Furthermore, it explains how many data samples to consider in order to guarantee that the equilibria of the
approximating game are sufficiently reliable. We apply the method to several domains, including AlphaGo, Colonel
Blotto and Leduc poker.

A Generalised Method for Empirical Game Theoretic Analysis
This paper provides theoretical bounds for empirical game theoretical analysis of
complex multi-agent interactions. We provide insights in the empirical meta game
showing that a Nash equilibrium of the meta-game is an approximate Nash
equilibrium of the true underlying game. We investigate and show how many data
samples are required to obtain a close enough approximation of the underlying game.
Additionally, we extend the meta-game analysis methodology to asymmetric games.
The state-of-the-art has only considered empirical games in which agents have
access to the same strategy sets and the payoff structure is symmetric, implying that
agents are interchangeable. Finally, we carry out an empirical illustration of the
generalised method in several domains, illustrating the theory and evolutionary
dynamics of several versions of the AlphaGo algorithm (symmetric), the dynamics of
the Colonel Blotto game played by human players on Facebook (symmetric), and an
example of a meta-game in Leduc Poker (asymmetric), generated by the PSRO multi-
agent learning algorithm.

DeepMind 2017 Review
From https://deepmind.com/blog/2017-deepminds-year-review/
The approach we take at DeepMind is inspired by neuroscience, helping to make progress in
critical areas such as imagination, reasoning, memory and learning. Take imagination, for
example: this distinctively human ability plays a crucial part in our daily lives, allowing us to plan and
reason about the future, but is hugely challenging for computers. We continue to work hard on this
problem, this year introducing imagination-augmented agents that are able to extract relevant
information from an environment in order to plan what to do in the future.
Separately, we made progress in the field of generative models. Just over a year ago we presented WaveNet, a
deep neural network for generating raw audio waveforms that was capable of producing better and more
realistic-sounding speech than existing techniques. At that time, the model was a research prototype and was too
computationally intensive to work in consumer products. Over the last 12 months, our teams managed to create a
new model that was 1000x faster. In October, we revealed that this new Parallel WaveNet is now being used in
the real world, generating the Google Assistant voices for US English and Japanese.
This is an example of the effort we invest in making it easier to build, train and optimise AI systems. Other
techniques we worked on this year, such as distributional reinforcement learning, population based training
for neural networks and new neural architecture search methods, promise to make systems easier to build,
more accurate and quicker to optimise. We have also dedicated significant time to creating new and challenging
environments in which to test our systems, including our work with Blizzard to open up StarCraft II for research
But we know that technology is not value neutral. We cannot simply make progress in fundamental research
without also taking responsibility for the ethical and social impact of our work. This drives our research in critical
areas such as interpretability, where we have been exploring novel methods to understand and explain how our
systems work. It’s also why we have an established technical safety team that continued to develop practical
ways to ensure that we can depend on future systems and that they remain under meaningful human control.

Population Based Training of Neural Networks
Neural networks dominate the modern machine learning landscape, but their
training and success still suffer from sensitivity to empirical choices of
hyperparameters such as model architecture, loss function, and optimisation
algorithm. In this work we present emph{Population Based Training (PBT)}, a
simple asynchronous optimisation algorithm which effectively utilises a fixed
computational budget to jointly optimise a population of models and their
hyperparameters to maximise performance. Importantly, PBT discovers a
schedule of hyperparameter settings rather than following the generally sub-
optimal strategy of trying to find a single fixed set to use for the whole course
of training. With just a small modification to a typical distributed
hyperparameter training framework, our method allows robust and reliable
training of models. We demonstrate the effectiveness of PBT on deep
reinforcement learning problems, showing faster wall-clock convergence and
higher final performance of agents by optimising over a suite of
hyperparameters. In addition, we show the same method can be applied to
supervised learning for machine translation, where PBT is used to maximise
the BLEU score directly, and also to training of Generative Adversarial Networks
to maximise the Inception score of generated images. In all cases PBT results
in the automatic discovery of hyperparameter schedules and model selection
which results in stable training and better final performance.

Neuroscience Inspired Artificial Intelligence
From https://www.cell.com/neuron/fulltext/S0896-6273(17)30509-3
The fields of neuroscience and artificial intelligence (AI) have a long and intertwined history. In more recent
times, however, communication and collaboration between the two fields has become less commonplace. In
this article, we argue that better understanding biological brains could play a vital role in building intelligent
machines. We survey historical interactions between the AI and neuroscience fields and emphasize current
advances in AI that have been inspired by the study of neural computation in humans and other animals. We
conclude by highlighting shared themes that may be key for advancing future research in both fields.
In this perspective, we have reviewed some of the many ways in which neuroscience has made fundamental
contributions to advancing AI research, and argued for its increasingly important relevance. In strategizing for
the future exchange between the two fields, it is important to appreciate that the past contributions of
neuroscience to AI have rarely involved a simple transfer of full-fledged solutions that could be directly re-
implemented in machines. Rather, neuroscience has typically been useful in a subtler way, stimulating
algorithmic-level questions about facets of animal learning and intelligence of interest to AI researchers and
providing initial leads toward relevant mechanisms. As such, our view is that leveraging insights gained from
neuroscience research will expedite progress in AI research, and this will be most effective if AI researchers
actively initiate collaborations with neuroscientists to highlight key questions that could be addressed by
empirical work.
The successful transfer of insights gained from neuroscience to the development of AI algorithms is critically
dependent on the interaction between researchers working in both these fields, with insights often developing
through a continual handing back and forth of ideas between fields. In the future, we hope that greater
collaboration between researchers in neuroscience and AI, and the identification of a common language
between the two fields (Marblestone et al., 2016), will permit a virtuous circle whereby research is accelerated
through shared theoretical insights and common empirical advances. We believe that the quest to develop AI
will ultimately also lead to a better understanding of our own minds and thought processes. Distilling
intelligence into an algorithmic construct and comparing it to the human brain might yield insights into some of
the deepest and the most enduring mysteries of the mind, such as the nature of creativity, dreams, and
perhaps one day, even consciousness.

Toward an Integration of Deep Learning and Neuroscience
From https://www.frontiersin.org/articles/10.3389/fncom.2016.00094/full
Neuroscience has focused on the detailed implementation of computation, studying neural
codes, dynamics and circuits. In machine learning, however, artificial neural networks tend to
eschew precisely designed codes, dynamics or circuits in favor of brute force optimization of a
cost function, often using simple and relatively uniform initial architectures. Two recent
developments have emerged within machine learning that create an opportunity to connect these
seemingly divergent perspectives.
First, structured architectures are used, including dedicated systems for attention, recursion and
various forms of short- and long-term memory storage.
Second, cost functions and training procedures have become more complex and are varied across
layers and over time. Here we think about the brain in terms of these ideas. We hypothesize that
(1) the brain optimizes cost functions, (2) the cost functions are diverse and differ across brain
locations and over development, and (3) optimization operates within a pre-structured
architecture matched to the computational problems posed by behavior.
In support of these hypotheses, we argue that a range of implementations of credit assignment
through multiple layers of neurons are compatible with our current knowledge of neural
circuitry, and that the brain's specialized systems can be interpreted as enabling efficient
optimization for specific problem classes. Such a heterogeneously optimized system, enabled by
a series of interacting cost functions, serves to make learning data-efficient and precisely
targeted to the needs of the organism. We suggest directions by which neuroscience could seek to
refine and test these hypotheses.

Hippocampus Predictive Map
From https://deepmind.com/blog/hippocampus-predictive-map/
In our new paper, in Nature Neuroscience, we apply a neuroscience lens to a longstanding mathematical
theory from machine learning to provide new insights into the nature of learning and memory. Specifically,
we propose that the area of the brain known as the hippocampus offers a unique solution to this problem by
compactly summarising future events using what we call a “predictive map.”
The hippocampus has traditionally been thought to only represent an animal’s current state, particularly in
spatial tasks, such as navigating a maze. This view gained significant traction with the discovery of “place
cells” in the rodent hippocampus, which fire selectively when the animal is in specific locations. While this
theory accounts for many neurophysiological findings, it does not fully explain why the hippocampus is also
involved in other functions, such as memory, relational reasoning, and decision making.
Our new theory thinks about navigation as part of the more general problem of computing plans that maximise
future reward. Our insights were derived from reinforcement learning, the subdiscipline of AI research that
focuses on systems that learn by trial and error. The key computational idea we drew on is that to estimate
future reward, an agent must first estimate how much immediate reward it expects to receive in each state,
and then weight this expected reward by how often it expects to visit that state in the future. By summing up
this weighted reward across all possible states, the agent obtains an estimate of future reward.
Similarly, we argue that the hippocampus represents every situation - or state - in terms of the future states
which it predicts. For example, if you are leaving work (your current state) your hippocampus might represent
this by predicting that you will likely soon be on your commute, picking up your kids from school or, more
distantly, at home. By representing each current state in terms of its anticipated successor states, the
hippocampus conveys a compact summary of future events, known formally as the “successor
representation”. We suggest that this specific form of predictive map allows the brain to adapt rapidly in
environments with changing rewards, but without having to run expensive simulations of the future.

Going Beyond Average for Neural Learning
From https://deepmind.com/blog/going-beyond-average-reinforcement-learning/
Randomness is something we encounter everyday and has a profound effect on how we
experience the world. The same is true in reinforcement learning (RL) applications, systems
that learn by trial and error and are motivated by rewards. Typically, an RL algorithm predicts
the average reward it receives from multiple attempts at a task, and uses this prediction to
decide how to act. But random perturbations in the environment can alter its behaviour by
changing the exact amount of reward the system receives.
In a new paper, we show it is possible to model not only the average but also the full variation
of this reward, what we call the value distribution. This results in RL systems that are more
accurate and faster to train than previous models, and more importantly opens up the
possibility of rethinking the whole of reinforcement learning.
In this paper we argue for the fundamental importance of the value distribution: the
distribution of the random return received by a reinforcement learning agent. This is in
contrast to the common approach to reinforcement learning which models the
expectation of this return, or value. Although there is an established body of literature
studying the value distribution, thus far it has always been used for a specific purpose
such as implementing risk-aware behaviour. We begin with theoretical results in both
the policy evaluation and control settings, exposing a significant distributional
instability in the latter. We then use the distributional perspective to design a new
algorithm which applies Bellman's equation to the learning of approximate value
distributions. We evaluate our algorithm using the suite of games from the Arcade
Learning Environment. We obtain both state-of-the-art results and anecdotal evidence
demonstrating the importance of the value distribution in approximate reinforcement
learning. Finally, we combine theoretical and empirical evidence to highlight the ways
in which the value distribution impacts learning in the approximate setting.

Agents that Imagine and Plan
From https://deepmind.com/blog/agents-imagine-and-plan/
In two new papers, we describe a new family of approaches for imagination-based planning.
We also introduce architectures which provide new ways for agents to learn and construct
plans to maximise the efficiency of a task. These architectures are efficient, robust to complex
and imperfect models, and can adopt flexible strategies for exploiting their imagination.
Imagination-augmented agents
The agents we introduce benefit from an ‘imagination encoder’- a neural network which learns
to extract any information useful for the agent’s future decisions, but ignore that which is not
relevant. These agents have a number of distinct features:
• they learn to interpret their internal simulations. This allows them to use models which
coarsely capture the environmental dynamics, even when those dynamics are not perfect.
• they use their imagination efficiently. They do this by adapting the number of imagined
trajectories to suit the problem. Efficiency is also enhanced by the encoder, which is able
to extract additional information from imagination beyond rewards - these trajectories may
contain useful clues even if they do not necessarily result in high reward.
• they can learn different strategies to construct plans. They do this by choosing between
continuing a current imagined trajectory or restarting from scratch. Alternatively, they can
use different imagination models, with different accuracies and computational costs. This
offers them a broad spectrum of effective planning strategies, rather than being restricted
to a one-size-fits-all approach which might limit adaptability in imperfect environments.

Agents that Imagine and Plan
We introduce Imagination-Augmented Agents (I2As), a novel architecture for deep reinforcement learning
combining model-free and model-based aspects. In contrast to most existing model-based reinforcement
learning and planning methods, which prescribe how a model should be used to arrive at a policy, I2As learn to
interpret predictions from a learned environment model to construct implicit plans in arbitrary ways, by using
the predictions as additional context in deep policy networks. I2As show improved data efficiency, performance,
and robustness to model misspecification compared to several baselines.
Conventional wisdom holds that model-based planning is a powerful approach to sequential decision-making. It
is often very challenging in practice, however, because while a model can be used to evaluate a plan, it does not
prescribe how to construct a plan. Here we introduce the "Imagination-based Planner", the first model-based,
sequential decision-making agent that can learn to construct, evaluate, and execute plans. Before any action, it
can perform a variable number of imagination steps, which involve proposing an imagined action and evaluating
it with its model-based imagination. All imagined actions and outcomes are aggregated, iteratively, into a "plan
context" which conditions future real and imagined actions. The agent can even decide how to imagine: testing
out alternative imagined actions, chaining sequences of actions together, or building a more complex
"imagination tree" by navigating flexibly among the previously imagined states using a learned policy. And our
agent can learn to plan economically, jointly optimizing for external rewards and computational costs associated
with using its imagination. We show that our architecture can learn to solve a challenging continuous control
problem, and also learn elaborate planning strategies in a discrete maze-solving task. Our work opens a new
direction toward learning the components of a model-based planning system and how to use them.

Creating NewVisual Concepts
From https://deepmind.com/blog/imagine-creating-new-visual-concepts-recombining-familiar-ones/
In our new paper, we propose a novel theoretical approach to address this problem. We also demonstrate a
new neural network component called the Symbol-Concept Association Network (SCAN), that can, for the
first time, learn a grounded visual concept hierarchy in a way that mimics human vision and word
acquisition, enabling it to imagine novel concepts guided by language instructions.
Our approach can be summarised as follows:
• The SCAN model experiences the visual world in the same way as a young baby might during the first
few months of life. This is the period when the baby’s eyes are still unable to focus on anything more
than an arm’s length away, and the baby essentially spends all her time observing various objects
coming into view, moving and rotating in front of her. To emulate this process, we placed SCAN in a
simulated 3D world of DeepMind Lab, where, like a baby in a cot, it could not move, but it could rotate
its head and observe one of three possible objects presented to it against various coloured
backgrounds - a hat, a suitcase or an ice lolly. Like the baby’s visual system, our model learns the basic
structure of the visual world and how to represent objects in terms of interpretable visual “primitives”.
For example, when looking at an apple, the model will learn to represent it in terms of its colour, shape,
size, position or lighting.
The seemingly infinite diversity of the natural world arises from a relatively small set of coherent rules, such as
the laws of physics or chemistry. We conjecture that these rules give rise to regularities that can be discovered
through primarily unsupervised experiences and represented as abstract concepts. If such representations are
compositional and hierarchical, they can be recombined into an exponentially large set of new concepts. This
paper describes SCAN (Symbol-Concept Association Network), a new framework for learning such abstractions in
the visual domain. SCAN learns concepts through fast symbol association, grounding them in disentangled visual
primitives that are discovered in an unsupervised manner. Unlike state of the art multimodal generative model
baselines, our approach requires very few pairings between symbols and images and makes no assumptions
about the form of symbol representations. Once trained, SCAN is capable of multimodal bi-directional inference,
generating a diverse set of image samples from symbolic descriptions and vice versa. It also allows for traversal
and manipulation of the implicit hierarchy of visual concepts through symbolic instructions and learnt logical
recombination operations. Such manipulations enable SCAN to break away from its training data distribution and
imagine novel visual concepts through symbolically instructed recombination of previously learnt concepts.

Producing Flexible Behaviors in Simulation Environments
From https://deepmind.com/blog/producing-flexible-behaviours-simulated-environments/
True motor intelligence requires learning how to control and coordinate a flexible body to solve tasks in a
range of complex environments. Existing attempts to control physically simulated humanoid bodies come
from diverse fields, including computer animation and biomechanics. A trend has been to use hand-
crafted objectives, sometimes with motion capture data, to produce specific behaviors. However, this
may require considerable engineering effort, and can result in restricted behaviours or behaviours that
may be difficult to repurpose for new tasks.
In three new papers, we seek ways to produce flexible and natural behaviours that can be reused and
adapted to solve tasks.
Read:
Emergence of locomotion behaviours in rich environments
Learning human behaviours from motion capture by adversarial imitation
Robust imitation of diverse behaviours
Achieving flexible and adaptive control of simulated bodies is a key element of AI research. Our work
aims to develop flexible systems which learn and adapt skills to solve motor control tasks while reducing
the manual engineering required to achieve this goal. Future work could extend these approaches to
enable coordination of a greater range of behaviours in more complex situations.

Producing Flexible Behaviors in Simulation Environments
The reinforcement learning paradigm allows, in principle, for complex behaviours to be learned directly from simple
reward signals. In practice, however, it is common to carefully hand-design the reward function to encourage a
particular solution, or to derive it from demonstration data. In this paper explore how a rich environment can help
to promote the learning of complex behavior. Specifically, we train agents in diverse environmental contexts, and
find that this encourages the emergence of robust behaviours that perform well across a suite of tasks. We
demonstrate this principle for locomotion -- behaviours that are known for their sensitivity to the choice of reward.
We train several simulated bodies on a diverse set of challenging terrains and obstacles, using a simple reward
function based on forward progress. Using a novel scalable variant of policy gradient reinforcement learning, our
agents learn to run, jump, crouch and turn as required by the environment without explicit reward-based guidance.
A visual depiction of highlights of the learned behavior can be viewed following this https URL .
Learning human behaviours from motion capture by adversarial imitation
Emergence of locomotion behaviours in rich environments
Rapid progress in deep reinforcement learning has made it increasingly feasible to train controllers for high-
dimensional humanoid bodies. However, methods that use pure reinforcement learning with simple reward functions
tend to produce non-humanlike and overly stereotyped movement behaviors. In this work, we extend generative
adversarial imitation learning to enable training of generic neural network policies to produce humanlike movement
patterns from limited demonstrations consisting only of partially observed state features, without access to actions,
even when the demonstrations come from a body with different and unknown physical parameters. We leverage this
approach to build sub-skill policies from motion capture data and show that they can be reused to solve tasks when
controlled by a higher level controller.
Robust imitation of diverse behaviours
Deep generative models have recently shown great promise in imitation learning for motor control. Given enough data,
even supervised approaches can do one-shot imitation learning; however, they are vulnerable to cascading failures
when the agent trajectory diverges from the demonstrations. Compared to purely supervised methods, Generative
Adversarial Imitation Learning (GAIL) can learn more robust controllers from fewer demonstrations, but is inherently
mode-seeking and more difficult to train. In this paper, we show how to combine the favourable aspects of these two
approaches. The base of our model is a new type of variational autoencoder on demonstration trajectories that learns
semantic policy embeddings. We show that these embeddings can be learned on a 9 DoF Jaco robot arm in reaching
tasks, and then smoothly interpolated with a resulting smooth interpolation of reaching behavior. Leveraging these
policy representations, we develop a new version of GAIL that (1) is much more robust than the purely-supervised
controller, especially with few demonstrations, and (2) avoids mode collapse, capturing many diverse behaviors when
GAIL on its own does not. We demonstrate our approach on learning diverse gaits from demonstration on a 2D biped
and a 62 DoF 3D humanoid in the MuJoCo physics environment.

DQN - Deep Reinforcement Learning
From https://deepmind.com/research/dqn
Nature Paper
https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf

DQN - Deep Reinforcement Learning Paper
From https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf

Reward Tampering Problems and Solutions in Reinforcement Learning

PathNet from Google DeepMind
For artificial general intelligence (AGI) it would be eﬃcient
if multiple users trained the same giant neural network, per-
mitting parameter reuse, without catastrophic forgetting.
PathNet is a first step in this direction. It is a neural net-
work algorithm that uses agents embedded in the neural net-
work whose task is to discover which parts of the network to
re-use for new tasks. Agents are pathways (views) through
the network which determine the subset of parameters that
are used and updated by the forwards and backwards passes
of the backpropogation algorithm. During learning, a tour-
nament selection genetic algorithm is used to select path-
ways through the neural network for replication and muta-
tion. Pathway fitness is the performance of that pathway
measured according to a cost function. We demonstrate
successful transfer learning; fixing the parameters along a
path learned on task A and re-evolving a new population
of paths for task B, allows task B to be learned faster than
it could be learned from scratch or after fine-tuning. Paths
evolved on task B re-use parts of the optimal path evolved
on task A. Positive transfer was demonstrated for binary
MNIST, CIFAR, and SVHN supervised learning classifica-
tion tasks, and a set of Atari and Labyrinth reinforcement
learning tasks, suggesting PathNets have general applicabil-
ity for neural network training. Finally, PathNet also signif-
icantly improves the robustness to hyperparameter choices
of a parallel asynchronous reinforcement learning algorithm
Pathways

2020 References
• Future of Deep Learning
https://thenextweb.com/neural/2020/04/05/self-supervised-learning-is-the-future-of-ai-syndication/
• Turing Award Winners Video
https://www.youtube.com/watch?v=UX8OubxsY8w
• MIT Deep Learning Video
https://www.youtube.com/watch?v=0VH1Lim8gL8

Three Challenges of Deep Learning fromYann LeCun
From https://thenextweb.com/neural/2020/04/05/self-supervised-learning-is-the-future-of-ai-syndication/
1. First, we need to develop AI systems that learn with fewer samples or fewer trials.“My
suggestion is to use unsupervised learning, or I prefer to call it self-supervised learning because
the algorithms we use are really akin to supervised learning, which is basically learning to fill in
the blanks,” LeCun says.“Basically, it’s the idea of learning to represent the world before
learning a task.This is what babies and animals do.We run about the world, we learn how it
works before we learn any task. Once we have good representations of the world, learning a
task requires few trials and few samples.”
2. The second challenge is creating deep learning systems that can reason. Current deep
learning systems are notoriously bad at reasoning and abstraction, which is why they need huge
amounts of data to learn simple tasks.“The question is, how do we go beyond feed-forward
computation and system 1? How do we make reasoning compatible with gradient-based
learning? How do we make reasoning differentiable? That’s the bottom line,” LeCun said.
System 1 is the kind of learning tasks that don’t require active thinking, such as navigating a
known area or making small calculations. System 2 is the more active kind of thinking, which
requires reasoning. Symbolic artificial intelligence, the classic approach to AI, has proven to be
much better at reasoning and abstraction.
3.The third challenge is to create deep learning systems that can lean and plan complex action
sequences, and decompose tasks into subtasks. Deep learning systems are good at providing
end-to-end solutions to problems but very bad at breaking them down into specific
interpretable and modifiable steps.There have been advances in creating learning-based AI
systems that can decompose images, speech, and text. Capsule networks, invented by Geoffry
Hinton, address some of these challenges. But learning to reason about complex tasks is
beyond today’s AI.“We have no idea how to do this,” LeCun admits.

Foundation Models
From https://research.ibm.com/blog/what-are-foundation-models
In recent years, we’ve managed to build AI systems that can learn from thousands, or millions, of examples to help us better
understand our world, or find new solutions to difficult problems. These large-scale models have led to systems that can
understand when we talk or write, such as the natural-language processing and understanding programs we use every day,
from digital assistants to speech-to-text programs. Other systems, trained on things like the entire work of famous artists, or
every chemistry textbook in existence, have allowed us to build generative models that can create new works of art based on
those styles, or new compound ideas based on the history of chemical research.
While many new AI systems are helping solve all sorts of real-world problems, creating and deploying each new system
often requires a considerable amount of time and resources. For each new application, you need to ensure that there’s a large,
well-labelled dataset for the specific task you want to tackle. If a dataset didn’t exist, you’d have to have people spend
hundreds or thousands of hours finding and labelling appropriate images, text, or graphs for the dataset. Then the AI model
has to learn to recognize everything in the dataset, and then it can be applied to the use case you have, from recognizing
language to generating new molecules for drug discovery. And training one large natural-language processing model, for
example, has roughly the same carbon footprint as running five cars over their lifetime.
The next wave in AI looks to replace the task-specific models that have dominated the AI landscape to date. The future is
models that are trained on a broad set of unlabeled data that can be used for different tasks, with minimal fine-tuning. These
are called foundation models, a term first popularized by the Stanford Institute for Human-Centered Artificial Intelligence.
We’ve seen the first glimmers of the potential of foundation models in the worlds of imagery and language. Early examples
of models, like GPT-3, BERT, or DALL-E 2, have shown what’s possible. Input a short prompt, and the system generates an
entire essay, or a complex image, based on your parameters, even if it wasn’t specifically trained on how to execute that
exact argument or generate an image in that way.
What makes these new systems foundation models is that they, as the name suggests, can be the foundation for many
applications of the AI model. Using self-supervised learning and transfer learning, the model can apply information it’s learnt
about one situation to another. While the amount of data is considerably more than the average person needs to transfer
understanding from one task to another, the end result is relatively similar: You learn to drive on one car, for example, and
without too much effort, you can drive most other cars — or even a truck or a bus.

Challenges and Risks of Foundation Models
AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad
data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to
underscore their critically central yet incomplete character. This report provides a thorough account of the
opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics,
reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data,
systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact
(e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation
models are based on standard deep learning and transfer learning, their scale results in new emergent
capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides
powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted
models downstream. Despite the impending widespread deployment of foundation models, we currently lack a
clear understanding of how they work, when they fail, and what they are even capable of due to their emergent
properties. To tackle these questions, we believe much of the critical research on foundation models will require
deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.
This report investigates an emerging paradigm for building artificial intelligence (AI) systems
based on a general class of models which we term foundation models.2 A foundation model is any
model that is trained on broad data (generally using self-supervision at scale) that can be adapted
(e.g., fine-tuned) to a wide range of downstream tasks; current examples include BERT [Devlin et al .
2019], GPT-3 [Brown et al . 2020], and CLIP [Radford et al . 2021]. From a technological point of view,
foundation models are not new — they are based on deep neural networks and self-supervised
learning, both of which have existed for decades. However, the sheer scale and scope of foundation
models from the last few years have stretched our imagination of what is possible; for example,
GPT-3 has 175 billion parameters and can be adapted via natural language prompts to do a passable
job on a wide range of tasks despite not being trained explicitly to do many of those tasks [Brown
et al. 2020]. At the same time, existing foundation models have the potential to accentuate harms,
and their characteristics are in general poorly understood. Given their impending widespread
deployment, they have become a topic of intense scrutiny [Bender et al. 2021]

Capsule Neural Nets
From https://en.wikipedia.org/wiki/Capsule_neural_network
A Capsule Neural Network (CapsNet) is a machine learning system that is a type of artificial neural network (ANN) that
can be used to better model hierarchical relationships. The approach is an attempt to more closely mimic biological neural
organization.[1]
The idea is to add structures called “capsules” to a convolutional neural network (CNN), and to reuse output from several
of those capsules to form more stable (with respect to various perturbations) representations for higher capsules.[2] The
output is a vector consisting of the probability of an observation, and a pose for that observation. This vector is similar to
what is done for example when doing classification with localization in CNNs.
Among other benefits, capsnets address the "Picasso problem" in image recognition: images that have all the right parts
but that are not in the correct spatial relationship (e.g., in a "face", the positions of the mouth and one eye are switched).
For image recognition, capsnets exploit the fact that while viewpoint changes have nonlinear effects at the pixel level, they
have linear effects at the part/object level.[3] This can be compared to inverting the rendering of an object of multiple parts.
[4]

Capsules
From https://www.youtube.com/watch?v=UX8OubxsY8w

Dalle-2
From https://www.nytimes.com/2022/08/24/technology/ai-technology-progress.html
For the past few days, I’ve been playing around with DALL-E 2, an app developed by the San
Francisco company OpenAI that turns text descriptions into hyper-realistic images.
What’s impressive about DALL-E 2 isn’t just the art it generates. It’s how it generates art. These
aren’t composites made out of existing internet images — they’re wholly new creations made
through a complex A.I. process known as “diffusion,” which starts with a random series of pixels
and refines it repeatedly until it matches a given text description. And it’s improving quickly —
DALL-E 2’s images are four times as detailed as the images generated by the original DALL-E,
which was introduced only last year.
DALL-E 2 got a lot of attention when it was announced this year, and rightfully so. It’s an
impressive piece of technology with big implications for anyone who makes a living working with
images — illustrators, graphic designers, photographers and so on. It also raises important
questions about what all of this A.I.-generated art will be used for, and whether we need to worry
about a surge in synthetic propaganda, hyper-realistic deepfakes or even nonconsensual
pornography.
Dalle-2 available to all
If you've been itching to try OpenAI's image synthesis tool but have been stymied by the lack of an
invitation, now's your chance. Today, OpenAI announced that it removed the waitlist for its DALL-E AI
image generator service. That means anyone can sign up and use it.
DALL-E is a deep learning image synthesis model that has been trained on hundreds of millions of images
pulled from the Internet. It uses a technique called latent diffusion to learn associations between words and
images. As a result, DALL-E users can type in a text description—called a prompt—and see it rendered
visually as a 1024×1024 pixel image in almost any artistic style.

Make-a-Video
From https://makeavideo.studio/
Make-A-Video research builds on the recent progress made in text-to-image generation technology built
to enable text-to-video generation. The system uses images with descriptions to learn what the world
looks like and how it is often described. It also uses unlabeled videos to learn how the world moves.
With this data, Make-A-Video lets you bring your imagination to life by generating whimsical, one-of-a-
kind videos with just a few words or lines of text.
From Make-a-Video Paper
We propose Make-A-Video – an approach for directly translating the tremendous
recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V). Our
intuition is simple: learn what the world looks like and how it is described from
paired text-image data, and learn how the world moves from unsupervised video
footage. Make-A-Video has three advantages: (1) it accelerates training of the
T2V model (it does not need to learn visual and multimodal representations from
scratch), (2) it does not require paired text-video data, and (3) the generated
videos inherit the vastness (diversity in aesthetic, fantastical depictions, etc.)
of today’s image generation models. We design a simple yet effective way to
build on T2I models with novel and effective spatial-temporal modules. First, we
decompose the full temporal U-Net and attention tensors and approximate them
in space and time. Second, we design a spatial temporal pipeline to generate
high resolution and frame rate videos with a video decoder, interpolation model
and two super resolution models that can enable various applications besides
T2V. In all aspects, spatial and temporal resolution, faithfulness to text, and
quality, Make-A-Video sets the new state-of-the-art in text-to-video generation,
as determined by both qualitative and quantitative measures

Causal Reasoning and
Deep Learning (Advanced)

z
Causal Reasoning and Transfer Learning
From A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms
We propose to meta-learn causal structures based on how fast a learner adapts to
new distributions arising from sparse distributional changes, e.g. due to
interventions, actions of agents and other sources of non-stationarities. We show
that under this assumption, the correct causal structural choices lead to faster
adaptation to modified distributions because the changes are concentrated in one
or just a few mechanisms when the learned knowledge is modularized appropriately.
This leads to sparse expected gradients and a lower eﬀective number of degrees of
freedom needing to be relearned while adapting to the change. It motivates using
the speed of adaptation to a modified distribution as a meta-learning objective. We
demonstrate how this can be used to determine the cause-eﬀect relationship
between two observed variables. The distributional changes do not need to
correspond to standard interventions (clamping a variable), and the learner has no
direct knowledge of these interventions. We show that causal structures can be
parameterized via continuous variables and learned end-to-end. We then explore
how these ideas could be used to also learn an encoder that would map low-level
observed variables to unobserved causal variables leading to faster adaptation out-
of-distribution, learning a representation space where one can satisfy the
assumptions of independent mechanisms and of small and sparse changes in these
mechanisms due to actions and non-stationarities.
Causal Deep Learning from Bengio
From https://www.wired.com/story/ai-pioneer-algorithms-understand-why/

z
A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms
Proposition 1. The expected gradient over the transfer distribution of the regret
(accumulated negative log-likelihood during the adaptation episode) with respect to the
module parameters is zero for the parameters of the modules that (a) were correctly
learned in the training phase, and (b) have the correct set of causal parents, corresponding
to the ground truth causal graph, if (c) the corresponding ground truth conditional
distributions did not change from the training distribution to the transfer distribution.
Adaptation to the transfer distribution, as more transfer distribution examples are seen by
the learner (horizontal axis), in terms of the log-likelihood on the transfer distribution (on a
large test set from the transfer distribution, tested after each update of the parameters).
Here the model is discrete, withN= 10. Curves are the median over 10 000 runs, with
25-75% quantiles intervals,for both the correct causal model (blue, top) and the incorrect
one (red, bottom). We see that the correct causal model adapts faster (smaller regret), and
that the most informative part of the trajectory (where the two models generalize the most
differently) is in the first 10-20 examples

z
Equation (2) R = − log [sigmoid(γ)LA→B + (1 − sigmoid(γ))LB→A]

z

z
Causal Reasoning and Deep Learning References
http://causality.cs.ucla.edu/blog/
http://causality.cs.ucla.edu/
https://www.google.com/search?client=firefox-b-1-d&q=deep+learning+causal+analysis
https://arxiv.org/search/?query=causal&searchtype=title&source=header
https://arxiv.org/abs/1901.10912
https://www.ericsson.com/en/blog/2020/2/causal-inference-machine-learning
https://towardsdatascience.com/introduction-to-causality-in-machine-
learning-4cee9467f06f

References
• Neural Networks and Deep Learning:A Textbook
• Deep Learning (Adaptive Computation and Machine Learning series)
• The Deep Learning Revolution (The MIT Press)
• Introduction to Deep Learning (The MIT Press)
• Deep Learning with PythonAn Introduction to Deep Reinforcement Learning
• World Models
• Learning and Querying Fast Generative Models for Reinforcement Learning
• Imagination-Augmented Agents for Deep Reinforcement Learning
• Neural Networks and Deep Learning:A Textbook
• Google Brain
• Convolutional Neural Nets (Detailed introduction)
• Future of Deep Learning

References (cont)
• Recurrent Neural Networks
• Guide to LSTM and Recurrent Neural Networks
• Enterprise Deep Learning
• 6 AI Trends for 2019
• Designing Neural Nets through Neural Evolution
• Compositional Pattern Producing Networks
• Deep Generator Networks
• Deep Reinforcement Learning Course
• N-Grams
• A Beginners Guide to Deep Reinforcement Learning with many links
• Verifiable AI from Specifications
• Amazon Deep Learning Containers
• A Deep Dive in to Deep Learning

Google AI References
• https://ai.google/research/pubs/?area=AlgorithmsandTheory
• https://ai.google/research/pubs/?area=DistributedSystemsandParallelComputing
• https://ai.google/research/pubs/?area=MachineTranslation
• https://ai.google/research/pubs/?area=MachineIntelligence
• https://ai.google/research/pubs/?area=MachinePerception
• https://ai.google/research/pubs/?area=DataManagement
• https://ai.google/research/pubs/?area=InformationRetrievalandtheWeb
• https://ai.google/research/pubs/?area=NaturalLanguageProcessing
• https://ai.google/research/pubs/?area=SpeechProcessing
• Deep Mind Publications

Deep Mind References
DeepMind Home page
https://deepmind.com/
DeepMind Research
https://deepmind.com/research/
DeepMind Blog
https://deepmind.com/blog
DeepMind Applied
https://deepmind.com/applied
Deep Compressed Sensing
https://arxiv.org/pdf/1905.06723.pdf
Deep Mind NIPS Papers
https://deepmind.com/blog/deepmind-papers-nips-2017/
DeepMind Papers at ICML 2018
https://deepmind.com/blog/deepmind-papers-icml-2018/
DeepMind Papers at ICLR 2018
https://deepmind.com/blog/deepmind-papers-iclr-2018/
Proceedings of ICML Program 2018
http://proceedings.mlr.press/v97/

References (cont)
• OpenAI
• OpenAI Blog
• OpenAI Research
• Deep Learning Book Lecture Notes
• Deep Learning Course Lecture Notes
• Bayesian Deep Learning Resources
• Gradient Boosting Algorithms
• Deep Mind Research
• David Inouye Papers
• Jeff Klune’s Research
• Jeff Hawkins Books
• Numenta
• Reinforcement Learning Book

AGI Part 1.pdf

More Related Content

Similar to AGI Part 1.pdf (20)

Recently uploaded (20)

AGI Part 1.pdf