Machine Learning for Chemistry: Representing and Intervening

1. Machine Learning for Chemistry: Representing and Intervening Ichigaku Takigawa takigawa@icredd.hokudai.ac.jp Apr 26, 2021 @ Hokkaido University Joint Symposium of Engineering & Information Science & WPI-ICReDD

2. I am a graduate of School of Engineering and IST! 1995-2005 (10 years) Hokkaido Univ School of Engineering Grad School of Engineering Grad School of Info Sci & Tech 2012-2019 (7 years) Hokkaido Univ B.Eng (1999) M.Eng (2001), PhD (2004) Postdoc (2004-2005) Grad School of Info Sci & Tech Tenure Track (2012-2014) Assoc Prof (2014-2019) KUDO Mineichi TANAKA Yuzuru SHIMBO Masaru MINATO Shinichi TANAKA Yuzuru IMAI Hideyuki

3. 2005-2011 (7 years) Kyoto Univ 2019-present (2 years) The “Cross-Appointment System” But when I stepped outside Physically I’m at Kyoto

4. Things go interdisciplinary… • Bioinformatics Center Institute for Chemical Research • Grad School of Pharmaceutical Sci • Medical-risk Avoidance based on iPS Cells Team • Institute for Chemical Reaction Design and Discovery Assist Prof (2005-2011) 2005-2011 (7 years) Kyoto Univ 2019-present (2 years) The “Cross-Appointment System”

5. This talk • Why it is needed? • What are exciting for computer scientists? Machine Learning (ML) for Chemistry

6. It’s a hot topic in Chemistry

7. But also in Machine Learning! NeurIPS 2020 ICML 2020 ICLR 2020 • Self-Supervised Graph Transformer on Large-Scale Molecular Data • RetroXpert: Decompose Retrosynthesis Prediction Like A Chemist • Reinforced Molecular Optimization with Neighborhood-Controlled Grammars • Autofocused Oracles for Model-based Design • Barking Up the Right Tree: an Approach to Search over Molecule Synthesis DAGs • On the Equivalence of Molecular Graph Convolution and Molecular Wave Function with Poor Basis Set • CogMol: Target-Speciﬁc and Selective Drug Design for COVID-19 Using Deep Generative Models • A Graph to Graphs Framework for Retrosynthesis Prediction • Hierarchical Generation of Molecular Graphs using Structural Motifs • Learning to Navigate in Synthetically Accessible Chemical Space Using Reinforcement Learning • Reinforcement Learning for Molecular Design Guided by Quantum Mechanics • Multi-Objective Molecule Generation using Interpretable Substructures • Improving Molecular Design by Stochastic Iterative Target Augmentation • A Generative Model for Molecular Distance Geometry • Directional Message Passing for Molecular Graphs • GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation • Augmenting Genetic Algorithms with Deep Neural Networks for Exploring the Chemical Space • A Fair Comparison of Graph Neural Networks for Graph Classiﬁcation

8. Mixed feelings of curiosity, optimism, skepticism?

9. Inseparably linked to automation “These illustrate how rapid advancements in hardware automation and machine learning continue to transform the nature of experimentation and modeling.” Automation is the use of technology to perform tasks with reduced human involvement or human labor.

10. Towards machine autonomy in discovery Organic synthesis in a modular robotic system. Science 363 (2019) A mobile robotic chemist. Nature 583 (2020) Automating drug discovery. Nature Reviews Drug Discovery 17 (2018) Automation has been impactfully changing our daily life, society, as well as scientiﬁc experiments and computations.

11. This talk • Why it is needed? • What are exciting for computer scientists? I’ll brieﬂy cover these from two aspects: 2. (Experimental) Intervention Machine Learning (ML) for Chemistry • What are good ML-readable representations for chemistry? • What information should be recorded and given to ML? 1. Representation • What are essential to make real chemical discoveries? • Any principled ways for data acquisition and experimental design?

12. Two pillars for scientific discovery? In essence, ML for chemistry is metascience (the science on how to do science) unexpectedly hitting age-old unsolved questions in the philosophy of natural science.

13. Machine Learning (ML) https://www.forbes.com/sites/forbestechcouncil/2020/02/19/ in-praise-of-boring-ai-a-k-a-machine-learning/ … “Let’s face it: So far, the artificial intelligence plastered all over PowerPoint slides hasn’t lived up to its hype.” The AI frenzy: hope & hype

14. Machine Learning (ML) From AAAI-20 Oxford-Style Debate https://www.forbes.com/sites/forbestechcouncil/2020/02/19/ in-praise-of-boring-ai-a-k-a-machine-learning/ … “Let’s face it: So far, the artificial intelligence plastered all over PowerPoint slides hasn’t lived up to its hype.” The AI frenzy: hope & hype

15. Machine Learning (ML) All about statistical and algorithmic techniques for surface-model ﬁtting to data points by adjusting model parameters. Random Forest Neural Networks SVR Kernel Ridge “Predictive Modeling” Fitted surface used for making predictions on unseen data points Variable 1 Variable 2 <latexit sha1_base64="Ill3Als4zZd947f5Xm9sW99d0QA=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTomJcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkI/HxK1YPL0RSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVA6Qmg==</latexit> x1 <latexit sha1_base64="QFtMwnKe2I12XGZu0bNJbdnDaaE=">AAAChnichVG7TgJBFD2sL8QHqI2JDZFgrMhAVIwV0caShzwSJGR3HXHDvrK7EJH4Aya2UlhpYmH8AD/Axh+w4BOMJSY2Fl6WTYwS8W5m58yZe+6cmSuZqmI7jHV9wtj4xOSUfzowMzs3HwwtLBZso2HJPC8bqmGVJNHmqqLzvKM4Ki+ZFhc1SeVFqb7X3y82uWUrhn7gtExe0cSarhwrsugQlTutJqqhCIsxN8LDIO6BCLxIG6FHHOIIBmQ0oIFDh0NYhQibvjLiYDCJq6BNnEVIcfc5zhEgbYOyOGWIxNbpX6NV2WN1Wvdr2q5aplNUGhYpw4iyF3bPeuyZPbBX9vlnrbZbo++lRbM00HKzGrxYzn38q9JodnDyrRrp2cExtl2vCnk3XaZ/C3mgb551ermdbLS9xm7ZG/m/YV32RDfQm+/yXYZnr0f4kcgLvRg1KP67HcOgkIjFt2KJzEYkteu1yo8VrGKd+pFECvtII0/1a7jEFTqCX4gJm0JykCr4PM0SfoSQ+gJWLpCb</latexit> x2 <latexit sha1_base64="Ill3Als4zZd947f5Xm9sW99d0QA=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTomJcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkI/HxK1YPL0RSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVA6Qmg==</latexit> x1 <latexit sha1_base64="QFtMwnKe2I12XGZu0bNJbdnDaaE=">AAAChnichVG7TgJBFD2sL8QHqI2JDZFgrMhAVIwV0caShzwSJGR3HXHDvrK7EJH4Aya2UlhpYmH8AD/Axh+w4BOMJSY2Fl6WTYwS8W5m58yZe+6cmSuZqmI7jHV9wtj4xOSUfzowMzs3HwwtLBZso2HJPC8bqmGVJNHmqqLzvKM4Ki+ZFhc1SeVFqb7X3y82uWUrhn7gtExe0cSarhwrsugQlTutJqqhCIsxN8LDIO6BCLxIG6FHHOIIBmQ0oIFDh0NYhQibvjLiYDCJq6BNnEVIcfc5zhEgbYOyOGWIxNbpX6NV2WN1Wvdr2q5aplNUGhYpw4iyF3bPeuyZPbBX9vlnrbZbo++lRbM00HKzGrxYzn38q9JodnDyrRrp2cExtl2vCnk3XaZ/C3mgb551ermdbLS9xm7ZG/m/YV32RDfQm+/yXYZnr0f4kcgLvRg1KP67HcOgkIjFt2KJzEYkteu1yo8VrGKd+pFECvtII0/1a7jEFTqCX4gJm0JykCr4PM0SfoSQ+gJWLpCb</latexit> x2

16. Modern aspects of ML 1. High dimensionality: Data can have many input variables. a 100x100 pixel grayscale image = 10000 input variables (a 10000-dimensional array)

17. Modern aspects of ML 1. High dimensionality: Data can have many input variables. a 100x100 pixel grayscale image = 10000 input variables (a 10000-dimensional array) 2. Multiformity and multimodality: Data take many forms + modes Numerical values, discrete structures, networks, variable-length sequences, etc. Images, volumes, videos, audios, texts, point clouds, geometries, sensor signals, etc.

18. Modern aspects of ML 1. High dimensionality: Data can have many input variables. a 100x100 pixel grayscale image = 10000 input variables (a 10000-dimensional array) 3. Overrepresentation: ML models can have many parameters. ResNet50: 26 million params ResNet101: 45 million params EﬃcientNet-B7: 66 million params VGG19: 144 million params 12-layer, 12-heads BERT: 110 million params 24-layer, 16-heads BERT: 336 million params GPT-2 XL: 1558 million params GPT-3: 175 billion params 2. Multiformity and multimodality: Data take many forms + modes Numerical values, discrete structures, networks, variable-length sequences, etc. Images, volumes, videos, audios, texts, point clouds, geometries, sensor signals, etc.

19. Modern aspects of ML 1. High dimensionality: Data can have many input variables. a 100x100 pixel grayscale image = 10000 input variables (a 10000-dimensional array) 3. Overrepresentation: ML models can have many parameters. ResNet50: 26 million params ResNet101: 45 million params EﬃcientNet-B7: 66 million params VGG19: 144 million params 12-layer, 12-heads BERT: 110 million params 24-layer, 16-heads BERT: 336 million params GPT-2 XL: 1558 million params GPT-3: 175 billion params Can you imagine what would happen if we try to ﬁt a surface model having 175 billion parameters to 100 million data points in 10 thousand dimension?? 2. Multiformity and multimodality: Data take many forms + modes Numerical values, discrete structures, networks, variable-length sequences, etc. Images, volumes, videos, audios, texts, point clouds, geometries, sensor signals, etc.

20. Modern aspects of ML 4. Representation learning: Models can have “feature learning” blocks, and they can be “pre-trained” by different large datasets. Prediction Input variables Surface model Classiﬁer or Regressor <latexit sha1_base64="Ill3Als4zZd947f5Xm9sW99d0QA=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTomJcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkI/HxK1YPL0RSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVA6Qmg==</latexit> x1 <latexit sha1_base64="QFtMwnKe2I12XGZu0bNJbdnDaaE=">AAAChnichVG7TgJBFD2sL8QHqI2JDZFgrMhAVIwV0caShzwSJGR3HXHDvrK7EJH4Aya2UlhpYmH8AD/Axh+w4BOMJSY2Fl6WTYwS8W5m58yZe+6cmSuZqmI7jHV9wtj4xOSUfzowMzs3HwwtLBZso2HJPC8bqmGVJNHmqqLzvKM4Ki+ZFhc1SeVFqb7X3y82uWUrhn7gtExe0cSarhwrsugQlTutJqqhCIsxN8LDIO6BCLxIG6FHHOIIBmQ0oIFDh0NYhQibvjLiYDCJq6BNnEVIcfc5zhEgbYOyOGWIxNbpX6NV2WN1Wvdr2q5aplNUGhYpw4iyF3bPeuyZPbBX9vlnrbZbo++lRbM00HKzGrxYzn38q9JodnDyrRrp2cExtl2vCnk3XaZ/C3mgb551ermdbLS9xm7ZG/m/YV32RDfQm+/yXYZnr0f4kcgLvRg1KP67HcOgkIjFt2KJzEYkteu1yo8VrGKd+pFECvtII0/1a7jEFTqCX4gJm0JykCr4PM0SfoSQ+gJWLpCb</latexit> x2 <latexit sha1_base64="lFhRrRrVTrFR31ebbMgRp5myJpc=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCIDPjCuiG5c8pBHgoS0dcCG0jZtISLxB0zcysKVJi6MH+AHuPEHXPAJxiUmblx4KU2MEvE20zlz5p47Z+ZKhqpYNmNdjzA2PjE55Z32zczOzfsDC4s5S2+YMs/KuqqbBUm0uKpoPGsrtsoLhsnFuqTyvFTb7+/nm9y0FF07tFsGL9XFqqZUFFm0icqcljfKgRCLMCeCwyDqghDcSOqBRxzhGDpkNFAHhwabsAoRFn1FRMFgEFdCmziTkOLsc5zDR9oGZXHKEImt0b9Kq6LLarTu17QctUynqDRMUgYRZi/snvXYM3tgr+zzz1ptp0bfS4tmaaDlRtl/sZz5+FdVp9nGybdqpGcbFew4XhXybjhM/xbyQN886/Qyu+lwe43dsjfyf8O67IluoDXf5bsUT1+P8CORF3oxalD0dzuGQS4WiW5HYqnNUGLPbZUXK1jFOvUjjgQOkESW6ldxiSt0BK8QEbaE+CBV8LiaJfwIIfEFWE6QnA==</latexit> x3 <latexit sha1_base64="0IPXcU0UIDvzZlYURjV2A/THv9U=">AAACiXichVG7SgNBFD2ur/hM1EawEYNiFWZFNKQKprGMj0TBBNndTHR0X+xOFmLwB6zsRK0ULMQP8ANs/AELP0EsFWwsvNksiAbjXWbnzJl77pyZq7um8CVjz11Kd09vX39sYHBoeGQ0nhgbL/pOzTN4wXBMx9vWNZ+bwuYFKaTJt12Pa5Zu8i39MNfc3wq45wvH3pR1l5ctbc8WVWFokqhiKag40t9NJFmKhTHdDtQIJBFF3knco4QKHBiowQKHDUnYhAafvh2oYHCJK6NBnEdIhPscxxgkbY2yOGVoxB7Sf49WOxFr07pZ0w/VBp1i0vBIOY1Z9sRu2Rt7ZHfshX3+WasR1mh6qdOst7Tc3Y2fTG58/KuyaJbY/1Z19CxRRTr0Ksi7GzLNWxgtfXB09raRWZ9tzLFr9kr+r9gze6Ab2MG7cbPG1y87+NHJC70YNUj93Y52UFxIqUuphbXFZHYlalUMU5jBPPVjGVmsIo8C1T/AKc5xoQwpqpJWMq1UpSvSTOBHKLkvAi+SPA==</latexit> . . .

21. Modern aspects of ML Prediction Input variables Surface model <latexit sha1_base64="Ill3Als4zZd947f5Xm9sW99d0QA=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTomJcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkI/HxK1YPL0RSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVA6Qmg==</latexit> x1 <latexit sha1_base64="QFtMwnKe2I12XGZu0bNJbdnDaaE=">AAAChnichVG7TgJBFD2sL8QHqI2JDZFgrMhAVIwV0caShzwSJGR3HXHDvrK7EJH4Aya2UlhpYmH8AD/Axh+w4BOMJSY2Fl6WTYwS8W5m58yZe+6cmSuZqmI7jHV9wtj4xOSUfzowMzs3HwwtLBZso2HJPC8bqmGVJNHmqqLzvKM4Ki+ZFhc1SeVFqb7X3y82uWUrhn7gtExe0cSarhwrsugQlTutJqqhCIsxN8LDIO6BCLxIG6FHHOIIBmQ0oIFDh0NYhQibvjLiYDCJq6BNnEVIcfc5zhEgbYOyOGWIxNbpX6NV2WN1Wvdr2q5aplNUGhYpw4iyF3bPeuyZPbBX9vlnrbZbo++lRbM00HKzGrxYzn38q9JodnDyrRrp2cExtl2vCnk3XaZ/C3mgb551ermdbLS9xm7ZG/m/YV32RDfQm+/yXYZnr0f4kcgLvRg1KP67HcOgkIjFt2KJzEYkteu1yo8VrGKd+pFECvtII0/1a7jEFTqCX4gJm0JykCr4PM0SfoSQ+gJWLpCb</latexit> x2 <latexit sha1_base64="lFhRrRrVTrFR31ebbMgRp5myJpc=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCIDPjCuiG5c8pBHgoS0dcCG0jZtISLxB0zcysKVJi6MH+AHuPEHXPAJxiUmblx4KU2MEvE20zlz5p47Z+ZKhqpYNmNdjzA2PjE55Z32zczOzfsDC4s5S2+YMs/KuqqbBUm0uKpoPGsrtsoLhsnFuqTyvFTb7+/nm9y0FF07tFsGL9XFqqZUFFm0icqcljfKgRCLMCeCwyDqghDcSOqBRxzhGDpkNFAHhwabsAoRFn1FRMFgEFdCmziTkOLsc5zDR9oGZXHKEImt0b9Kq6LLarTu17QctUynqDRMUgYRZi/snvXYM3tgr+zzz1ptp0bfS4tmaaDlRtl/sZz5+FdVp9nGybdqpGcbFew4XhXybjhM/xbyQN886/Qyu+lwe43dsjfyf8O67IluoDXf5bsUT1+P8CORF3oxalD0dzuGQS4WiW5HYqnNUGLPbZUXK1jFOvUjjgQOkESW6ldxiSt0BK8QEbaE+CBV8LiaJfwIIfEFWE6QnA==</latexit> x3 <latexit sha1_base64="0IPXcU0UIDvzZlYURjV2A/THv9U=">AAACiXichVG7SgNBFD2ur/hM1EawEYNiFWZFNKQKprGMj0TBBNndTHR0X+xOFmLwB6zsRK0ULMQP8ANs/AELP0EsFWwsvNksiAbjXWbnzJl77pyZq7um8CVjz11Kd09vX39sYHBoeGQ0nhgbL/pOzTN4wXBMx9vWNZ+bwuYFKaTJt12Pa5Zu8i39MNfc3wq45wvH3pR1l5ctbc8WVWFokqhiKag40t9NJFmKhTHdDtQIJBFF3knco4QKHBiowQKHDUnYhAafvh2oYHCJK6NBnEdIhPscxxgkbY2yOGVoxB7Sf49WOxFr07pZ0w/VBp1i0vBIOY1Z9sRu2Rt7ZHfshX3+WasR1mh6qdOst7Tc3Y2fTG58/KuyaJbY/1Z19CxRRTr0Ksi7GzLNWxgtfXB09raRWZ9tzLFr9kr+r9gze6Ab2MG7cbPG1y87+NHJC70YNUj93Y52UFxIqUuphbXFZHYlalUMU5jBPPVjGVmsIo8C1T/AKc5xoQwpqpJWMq1UpSvSTOBHKLkvAi+SPA==</latexit> . . . Latent variables Variable transformation Feature learning Classiﬁer or Regressor 4. Representation learning: Models can have “feature learning” blocks, and they can be “pre-trained” by different large datasets.

27. Modern aspects of ML Prediction Input variables Surface model <latexit sha1_base64="Ill3Als4zZd947f5Xm9sW99d0QA=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTomJcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkI/HxK1YPL0RSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVA6Qmg==</latexit> x1 <latexit sha1_base64="QFtMwnKe2I12XGZu0bNJbdnDaaE=">AAAChnichVG7TgJBFD2sL8QHqI2JDZFgrMhAVIwV0caShzwSJGR3HXHDvrK7EJH4Aya2UlhpYmH8AD/Axh+w4BOMJSY2Fl6WTYwS8W5m58yZe+6cmSuZqmI7jHV9wtj4xOSUfzowMzs3HwwtLBZso2HJPC8bqmGVJNHmqqLzvKM4Ki+ZFhc1SeVFqb7X3y82uWUrhn7gtExe0cSarhwrsugQlTutJqqhCIsxN8LDIO6BCLxIG6FHHOIIBmQ0oIFDh0NYhQibvjLiYDCJq6BNnEVIcfc5zhEgbYOyOGWIxNbpX6NV2WN1Wvdr2q5aplNUGhYpw4iyF3bPeuyZPbBX9vlnrbZbo++lRbM00HKzGrxYzn38q9JodnDyrRrp2cExtl2vCnk3XaZ/C3mgb551ermdbLS9xm7ZG/m/YV32RDfQm+/yXYZnr0f4kcgLvRg1KP67HcOgkIjFt2KJzEYkteu1yo8VrGKd+pFECvtII0/1a7jEFTqCX4gJm0JykCr4PM0SfoSQ+gJWLpCb</latexit> x2 <latexit sha1_base64="lFhRrRrVTrFR31ebbMgRp5myJpc=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCIDPjCuiG5c8pBHgoS0dcCG0jZtISLxB0zcysKVJi6MH+AHuPEHXPAJxiUmblx4KU2MEvE20zlz5p47Z+ZKhqpYNmNdjzA2PjE55Z32zczOzfsDC4s5S2+YMs/KuqqbBUm0uKpoPGsrtsoLhsnFuqTyvFTb7+/nm9y0FF07tFsGL9XFqqZUFFm0icqcljfKgRCLMCeCwyDqghDcSOqBRxzhGDpkNFAHhwabsAoRFn1FRMFgEFdCmziTkOLsc5zDR9oGZXHKEImt0b9Kq6LLarTu17QctUynqDRMUgYRZi/snvXYM3tgr+zzz1ptp0bfS4tmaaDlRtl/sZz5+FdVp9nGybdqpGcbFew4XhXybjhM/xbyQN886/Qyu+lwe43dsjfyf8O67IluoDXf5bsUT1+P8CORF3oxalD0dzuGQS4WiW5HYqnNUGLPbZUXK1jFOvUjjgQOkESW6ldxiSt0BK8QEbaE+CBV8LiaJfwIIfEFWE6QnA==</latexit> x3 <latexit sha1_base64="0IPXcU0UIDvzZlYURjV2A/THv9U=">AAACiXichVG7SgNBFD2ur/hM1EawEYNiFWZFNKQKprGMj0TBBNndTHR0X+xOFmLwB6zsRK0ULMQP8ANs/AELP0EsFWwsvNksiAbjXWbnzJl77pyZq7um8CVjz11Kd09vX39sYHBoeGQ0nhgbL/pOzTN4wXBMx9vWNZ+bwuYFKaTJt12Pa5Zu8i39MNfc3wq45wvH3pR1l5ctbc8WVWFokqhiKag40t9NJFmKhTHdDtQIJBFF3knco4QKHBiowQKHDUnYhAafvh2oYHCJK6NBnEdIhPscxxgkbY2yOGVoxB7Sf49WOxFr07pZ0w/VBp1i0vBIOY1Z9sRu2Rt7ZHfshX3+WasR1mh6qdOst7Tc3Y2fTG58/KuyaJbY/1Z19CxRRTr0Ksi7GzLNWxgtfXB09raRWZ9tzLFr9kr+r9gze6Ab2MG7cbPG1y87+NHJC70YNUj93Y52UFxIqUuphbXFZHYlalUMU5jBPPVjGVmsIo8C1T/AKc5xoQwpqpJWMq1UpSvSTOBHKLkvAi+SPA==</latexit> . . . Latent variables Variable transformation Feature learning Classiﬁer or Regressor Linear 4. Representation learning: Models can have “feature learning” blocks, and they can be “pre-trained” by different large datasets.

28. Prior Info Observational data Reported facts Textbook knowledge Needs and excitement around ML for Chemistry Discovery Representation Model (Belief) Intervention Hypothesis New Info Prior Info • Identify relevant variables • Set design choices • Set experiments • Interpret results Model (Belief) Hypothesis Can we somehow externalize “experience and intuition” of experienced chemists to rationalize and accelerate discoveries?

29. Prior Info Observational data Reported facts Textbook knowledge Needs and excitement around ML for Chemistry Discovery Representation Model (Belief) Intervention Hypothesis New Info Prior Info • Identify relevant variables • Set design choices • Set experiments • Interpret results Model (Belief) Hypothesis Can we somehow externalize “experience and intuition” of experienced chemists to rationalize and accelerate discoveries?

30. Representation Reactions Materials Molecules ML computer programs • Observational data • Reported facts • Textbook knowledge ? Identifying relevant factors and establishing any necessary and suﬃcient computer-readable representations are inevitable preconditions, but this is far from trivial and quite paradoxical since we haven’t understood the target. Any rationalized “real” discovery only comes from understanding and discovery of the causal relations between relevant factors.

31. Representation <latexit sha1_base64="dwtAUUE0cfsFu6+2FLg7b109CNE=">AAACi3ichVG7SgNBFL1ZX/ERjdoINsGgWIW7a0iiWIgiWKoxMaASdtdJMmRf7E4CMfgDljYW2ihYiB/gB9j4AxZ+glhGsLHw7mZFLIx3mZ07Z+65c2aO5hjcE4gvEamvf2BwKDo8MjoWG5+IT04VPbvh6qyg24btljTVYwa3WEFwYbCS4zLV1Ay2r9U3/P39JnM9blt7ouWwI1OtWrzCdVUQVDoUNSbUMi/Hk5hazmWUdCaBKcSsrMh+omTTS+mETIgfSQhj244/wCEcgw06NMAEBhYIyg1QwaPvAGRAcAg7gjZhLmU82GdwCiPEbVAVowqV0Dr9q7Q6CFGL1n5PL2DrdIpBwyVmAubxGe+wg094j6/4+WevdtDD19KiWetymVOeOJvJf/zLMmkWUPth9dQsoAK5QCsn7U6A+LfQu/zmyUUnv7I7317AG3wj/df4go90A6v5rt/usN3LHno00kIvRgZ9u5D4OykqKTmTUnbSybX10KoozMIcLJIfWViDLdiGQuDDOVzClRSTlqQVabVbKkVCzjT8CmnzC0ydk0A=</latexit> ✓i <latexit sha1_base64="tkPRNIYeS8tNgbH62CO/ULi3LDw=">AAACi3ichVHLSsNAFL2Nr/quuhHcBIviqtykoa3iQhTBZbXWFtpSkjjaaF4k04IWf8ClGxe6UXAhfoAf4MYfcOEniMsKblx4k0bEhXrDZO6cuefOmTmaaxo+R3yOCT29ff0D8cGh4ZHRsfHExOSO7zQ9nRV1x3S8sqb6zDRsVuQGN1nZ9ZhqaSYraYdrwX6pxTzfcOxtfuSymqXu28aeoaucoHKVNxhX6wf1RBJTi7mMrGRETCFmJVkKEjmrpBVRIiSIJESRdxL3UIVdcECHJljAwAZOuQkq+PRVQAIEl7AatAnzKDPCfQYnMETcJlUxqlAJPaT/Pq0qEWrTOujph2ydTjFpeMQUYQ6f8BY7+Ih3+IIfv/Zqhz0CLUc0a10uc+vjp9OF939ZFs0cGt+sPzVz2INcqNUg7W6IBLfQu/zW8XmnsLQ1157Ha3wl/Vf4jA90A7v1pt9ssq2LP/RopIVejAz6ckH8PdmRU1ImJW8qyZXVyKo4zMAsLJAfWViBDchDMfThDC7gUhgV0sKSsNwtFWIRZwp+hLD+CU69k0E=</latexit> ✓j O N N N H NH N N N CH3 CH3 Levels of Theory/Model Abstraction First Principle and Simulation (Quantum Chemistry) Spatio-Temporal Flexibility, Variations, Dynamics, and Interactions

32. Representation Latent variables Representation learning Reactions Materials Molecules Graphs (of different size) Node features Edge features CC1CCNO1 Graph Neural Networks (GNNs) NCc1ccoc1.S=(Cl)Cl>>[RX_5]S=C=NCc1ccoc1 … Classiﬁer or Regressor Diverse Downstream Tasks Modular Hierarchy Amide Proline Oxazoline Compositionality Phenyl Carboxyl Methyl Ethyl Tert-butyl Isoprophyl Trifluoromethyl Benzyl Substituents Graph  Coarsening Combinatorial aspects

33. Representation NB: Transformers can be considered as a special case of GNNs, and many Transformer-type GNNs are also developed. Transformer Core (Multihead) Self-attention Feed-forward NN Add + LayerNorm Add + LayerNorm <latexit sha1_base64="I4mbdBylFC3Uuk1C7RrdvvfeVHQ=">AAACqXichVFNS9xQFD2m9dvqqJtCN8GpogjDy1CqKIXBbrp01NFBI+ElvnEeky+SN0N16B+YP9CFKwUX4qa70m676R9w4U8Qlxa66cKbTEBUqjck97zz7rk57107dGWsGLvs0V687O3rHxgcGh55NTqWG5/YjINm5IiKE7hBVLV5LFzpi4qSyhXVMBLcs12xZTc+JvtbLRHFMvA31EEodj2+78uadLgiysq9DfQPuhk3PUvqJnfDOrfk7Oc5vZakZVPVheJzVi7PCiwN/TEwMpBHFqtB7jtM7CGAgyY8CPhQhF1wxPTswABDSNwu2sRFhGS6L/AFQ6RtUpWgCk5sg777tNrJWJ/WSc84VTv0F5feiJQ6ptkFO2M37Dc7Z1fs3397tdMeiZcDynZXK0JrrPN6/e+zKo+yQv1O9aRnhRoWU6+SvIcpk5zC6epbh19v1pfWptsz7IRdk/9jdsl+0Qn81h/ntCzWjp7wY5MXujEakPFwHI/BZrFgvC8Uy+/ypZVsVAN4gynM0jwWUMInrKJC/Tv4hh/4qc1rZa2qbXdLtZ5MM4l7oTm3XZydSQ==</latexit> o = X i ↵i(x)fi(x; ✓) Effective pretraining is a crucial open problem because in practice, we can only access to limited data for each speciﬁc problem. Pretraining with self-supervised pretext tasks have transformed NLP

34. Prior Info Observational data Reported facts Textbook knowledge Needs and excitement around ML for Chemistry Discovery Representation Model (Belief) Intervention Hypothesis New Info Prior Info • Identify relevant variables • Set design choices • Set experiments • Interpret results Model (Belief) Hypothesis Can we somehow externalize “experience and intuition” of experienced chemists to rationalize and accelerate discoveries? New Info

35. Prior Info Observational data Reported facts Textbook knowledge Needs and excitement around ML for Chemistry Discovery Representation Model (Belief) Intervention Hypothesis New Info Prior Info • Identify relevant variables • Set design choices • Set experiments • Interpret results Model (Belief) Hypothesis Can we somehow externalize “experience and intuition” of experienced chemists to rationalize and accelerate discoveries? New Info

36. (Experimental) Intervention New Info Hypothesis ? Automation Reactions Materials Molecules Any rationalized “real” discovery only comes from understanding and discovery of the causal relations between relevant factors. Information about causal relations can be acquired by passive observation and active intervention. Correlation does not imply causation. ML computer programs • Observational data • Reported facts • Textbook knowledge

37. (Experimental) Intervention We need to carefully rethink how an experiment should be performed to be informative about causal structure of targets.

38. (Experimental) Intervention We need to carefully rethink how an experiment should be performed to be informative about causal structure of targets. • Correlation vs Causation ML models trained over passive observational data can be trapped by spurious correlations between variables, being totally ignorant of the underlying causality.

39. (Experimental) Intervention We need to carefully rethink how an experiment should be performed to be informative about causal structure of targets. • Correlation vs Causation ML models trained over passive observational data can be trapped by spurious correlations between variables, being totally ignorant of the underlying causality. • Garbage In, Garbage Out (GIGO) ML models are just representative of the given data. If it has any bias, ML predictions can be miserably misleading.

40. (Experimental) Intervention We need to carefully rethink how an experiment should be performed to be informative about causal structure of targets. • Correlation vs Causation ML models trained over passive observational data can be trapped by spurious correlations between variables, being totally ignorant of the underlying causality. • Garbage In, Garbage Out (GIGO) ML models are just representative of the given data. If it has any bias, ML predictions can be miserably misleading. • Unavoidable Human-Caused Biases Always remember that “most chemical experiments are planned by human scientists and therefore are subject to a variety of human cognitive biases, heuristics and social inﬂuences.” * Jia, X., Lynch, A., Huang, Y. et al. Anthropogenic biases in chemical reaction data hinder exploratory inorganic synthesis. Nature 573, 251–255 (2019).

41. https://www.chemistryworld.com/news/dispute-over-reaction-prediction-puts-machine-learnings- pitfalls-in-spotlight/3009912.article • Main paper https://doi.org/10.1126/science.aar5169 • Erratum https://doi.org/10.1126/science.aat7648 • Negative comment paper https://doi.org/10.1126/science.aat8603 • Author's response https://doi.org/10.1126/science.aat8763 (Experimental) Intervention

42. Keys: fusing modern ML with ﬁrst-principles, simulations, domain knowledge, and collaboratively working with experimental experts. Current ML is too data-hungry and vulnerable to any data bias, but acquisition of clean representative data is often quite impractical. (Experimental) Intervention • Deep learning techniques thus far have proven to be data hungry, shallow, brittle, and limited in their ability to generalize (Marcus, 2018) • Current machine learning techniques are data-hungry and brittle—they can only make sense of patterns they've seen before. (Chollet, 2020) • A growing body of evidence shows that state-of-the-art models learn to exploit spurious statistical patterns in datasets... instead of learning meaning in the ﬂexible and generalizable way that humans do. (Nie et al., 2019) • Current machine learning methods seem weak when they are required to generalize beyond the training distribution, which is what is often needed in practice. (Bengio et al., 2019)

43. (Experimental) Intervention AlphaGo (Nature, 2016) AlphaGo Zero (Nature, 2017) AlphaZero (Science, 2018) MuZero (Nature, 2020) This has reignited the old war between induction and deduction, and we’re re-encountering the long-standing problems in AI. • Knowledge acquisition / Principled data acquisition Experimental design, Model-based optimization, Evolutionary computation • Reconciliation between inductive and deductive ML Hybrid models of causal/logical/algorithmic ML and deep learning • Balancing exploitation and exploration Model-based reinforcement learning or search in a combinatorial space

44. ML for Chemistry to me (a ML researcher) An exciting “real” test bench for the long-standing unsolved but attractive fundamental problems in “AI for automating discovery”, involving many fascinating technical topics of modern ML. Prior Info Observational data Reported facts Textbook knowledge Discovery Representation Model (Belief) Intervention Hypothesis New Info Prior Info • Identify relevant variables • Set design choices • Set experiments • Interpret results Model (Belief) Hypothesis

45. Summary • Why it is needed? • What are exciting for computer scientists? Two aspects: 2. (Experimental) Intervention Machine Learning (ML) for Chemistry • What are good ML-readable representations for chemistry? • What information should be recorded and given to ML? 1. Representation • What are essential to make real chemical discoveries? • Any principled ways for data acquisition and experimental design?

Machine Learning for Chemistry: Representing and Intervening

More Related Content

What's hot (20)

Similar to Machine Learning for Chemistry: Representing and Intervening (20)

More from Ichigaku Takigawa (20)

Recently uploaded (20)

Machine Learning for Chemistry: Representing and Intervening