Alexander Borzunov
How to do research
at a large IT company
2
Who am I?
Alexander Borzunov
• Researcher at Yandex
• NEERC ICPC 2017 prize winner
• Bachelor’s at Ural FU
• Master’s at HSE University +
Yandex School of Data Analysis
3
Plan
• Why do companies need research?
• What researchers do?
• How to get there?
4
Why do companies need research?
Product development:
• Developers address user feedback/business needs
• No time to dive deeply into a problem (e. g. invent a new algorithm)
Research:
• Experts work on problems from a particular area full-time
• Necessary to get innovations in the long term
5
How is it different from universities?
Research in companies:
• More funding
• Access to more compute
• Interaction with product teams
6
Many breakthroughs in modern computer science
are made by companies
7
What researchers do?
• Follow latest findings and results (e. g. on Twitter)
8
What researchers do?
9
What researchers do?
• Follow latest findings and results (e. g. on Twitter)
• Choose promising research directions
10
What researchers do?
• Follow latest findings and results (e. g. on Twitter)
• Choose promising research directions
• Collaborate with each other
11
What researchers do?
• Follow latest findings and results (e. g. on Twitter)
• Choose promising research directions
• Collaborate with each other
• Conduct experiments (you need to write code quickly to evaluate many ideas)
12
What researchers do?
• Follow latest findings and results (e. g. on Twitter)
• Choose promising research directions
• Collaborate with each other
• Conduct experiments
• Design rigorous proofs
13
What researchers do?
If the method works:
• Write a paper for an (international) conference
• Defend it in a discussion with reviewers
• If accepted:
• Travel to a conference ✈️
• Tell the world about it on Twitter, Reddit, Habr, etc. 🌎
• Your results may be adopted by product teams
14
Yandex Research
• Focus: machine learning and related algorithms
• Computer vision, image generation
• Language processing
• Program synthesis with neural nets (e. g. trained on Codeforces solutions)
• Systems for distributed training
• Theory, e. g. continuous optimization
• Publications in top venues such as NeurIPS, ICML, CVPR, ACL
15
Collaboration with product teams
Self-driving and Robotics Voice assistants
16
Yandex Research
• Joint labs with paid programs for Master’s/PhD students:
• Collaborations:
17
How did I get there?
2014 – 2018 Bachelor’s at Ural FU, participated in ICPC
▎ “What’s next?”
▎ “Machine learning – a growing field”
18
Machine learning on “Cats vs. Dogs”
No methods known to get 60% accuracy (random gives 50%)
2007
vs.
19
Machine learning on “Cats vs. Dogs”
No methods known to get 60% accuracy (random gives 50%)
Solved with 98% accuracy
2007
2014
vs.
20
Machine learning on “Cats vs. Dogs”
No methods known to get 60% accuracy (random gives 50%)
Solved with 98% accuracy
Neural nets can draw cats and dogs themselves
(this cat does not exist)
2007
2014
2019
vs.
21
Machine learning in 2021
Neural nets can draw cats and dogs themselves
Neural nets draw pictures matching any text description
2019
2021
22
How did I get there?
2018 – 2020
2019 – 2021
2021 – Now
Master’s at HSE University + Yandex School of Data Analysis
▎ “Self-driving – a product that may change everyday life”
Research Engineer at Yandex Self-Driving
▎ “Research – a place where people invent new things”
Yandex Research
23
What I do?
• Compute needed for training latest neural nets grows quickly
• Popular training methods are designed for high-performance clusters
• Cluster to train GPT-3 costs over $250 million
• Hard to get if you are in a university or a startup
• Solution: distributed training over the Internet (like BitTorrent)
24
First use case: Language models
• Training one large neural net allows to solve many tasks:
• Understanding intents, tone, logical relations from a sentence
• Answering questions
• Extracting entities (locations, persons, etc.)
• Once trained, it is easy to use for your business/research
First use case: Language model for Bengali
• TOP-6 language by no. of native speakers
• No good model yet
First use case: Language model for Bengali
• We offered people to train one together!
Together with:
• Got a competitive model, state-of-the-art on some tasks
Roadblock to scaling: Security
• To train a neural net, you need to average
computations performed by peers on
different data samples
• A troll or competitor may destroy the
model by sending wrong values once
28
Secure distributed training
Idea #1: Clip outliers among computations
(it does not hurt training if done right)
29
Idea #2:
• Peers broadcast hashes of their calculations.
• Then, the system selects “policemen” to validate results of some peers.
• If a policeman accuses someone, we can learn who is right from the hashes.
Secure distributed training
Secure distributed training
Result: We ban offenders and quickly recover training progress
31
Thank you!
Check out our publications and
available positions on
research.yandex.com
I am available for a chat or questions
at the Yandex area
on the 3rd floor terrace until 7 pm 🙂

More Related Content

PPT
Cat Herding and Community Gardens: Practical e-Science Project Management
PPT
Robert hillis instructional_design
PDF
PPTX
Developing a digital mindset - recording
PPTX
Technology has all the right answers - but we have to start thinking about wh...
PPTX
CILIP Conference 2019 - Digital innovation - Andy Tattersall
PDF
Innovation Ecosystem Design in the Non-Profit Sector
PPT
Jisc e assess-mar_12
Cat Herding and Community Gardens: Practical e-Science Project Management
Robert hillis instructional_design
Developing a digital mindset - recording
Technology has all the right answers - but we have to start thinking about wh...
CILIP Conference 2019 - Digital innovation - Andy Tattersall
Innovation Ecosystem Design in the Non-Profit Sector
Jisc e assess-mar_12

What's hot (16)

PDF
Online Collaboration - What’s Up in Singapore?
PPTX
Trainers Matter: Making the Case for VILT
PPTX
Interface Design for Elearning - Tips and Tricks
PDF
SCALE12X DevOps Day LA: 9 Principles for Navigating Change
PDF
A Rapid Introduction to Rapid Software Testing
PDF
Agile Development in Large-Scale: Challenges and Insight from Research
PDF
Rapid Software Testing: Strategy
PDF
9 Principles for Navigating Change
PDF
Prelude Suite Deck / South Summit 2018
PDF
Lets Talk Toolbox Talks: How to Effectively Reinforce Safe Work Practices
PDF
How Virtual is Virtual: Designing for Distributed Work in Innovation
PPTX
Trainers Matter: Making the Case for VILT
PDF
PuppetConf 2016: Collaboration and Empowerment: Driving Change in Infrastruct...
PPTX
Hiring Tips For Distributed Teams from PowerToFly
PPTX
PDF
ROI On DLP
Online Collaboration - What’s Up in Singapore?
Trainers Matter: Making the Case for VILT
Interface Design for Elearning - Tips and Tricks
SCALE12X DevOps Day LA: 9 Principles for Navigating Change
A Rapid Introduction to Rapid Software Testing
Agile Development in Large-Scale: Challenges and Insight from Research
Rapid Software Testing: Strategy
9 Principles for Navigating Change
Prelude Suite Deck / South Summit 2018
Lets Talk Toolbox Talks: How to Effectively Reinforce Safe Work Practices
How Virtual is Virtual: Designing for Distributed Work in Innovation
Trainers Matter: Making the Case for VILT
PuppetConf 2016: Collaboration and Empowerment: Driving Change in Infrastruct...
Hiring Tips For Distributed Teams from PowerToFly
ROI On DLP
Ad

Similar to How to do science in a large IT company (ICPC World Finals 2021, Moscow) (20)

PDF
Deep Learning Class #0 - You Can Do It
PDF
DL Classe 0 - You can do it
PPTX
AI: the silicon brain
PDF
Machine Learning Challenges and Opportunities in Education, Industry, and Res...
PPTX
The deep learning tour - Q1 2017
PPSX
Artificial intelligence
PDF
Machine learning tutorial
PDF
Machine learning tutorial
PDF
深度学习639页PPT/////////////////////////////
PPTX
Artificial Intelligence (AI) basics.pptx
PDF
An Overview On Neural Network And Its Application
PDF
[Nov 26] introduction to AI / ML
PDF
ML MODULE 1_slideshare.pdf
PPTX
Demystifying AI
PPTX
What Deep Learning Means for Artificial Intelligence
PPTX
Artificial Intelligence (and the telecom industry)
PDF
Soft Computing
PDF
ML4CS_L08_NeuralNetworks machine learning
PDF
PDF
Lebanon SoftShore Artificial Intelligence Seminar - March 38, 2014
Deep Learning Class #0 - You Can Do It
DL Classe 0 - You can do it
AI: the silicon brain
Machine Learning Challenges and Opportunities in Education, Industry, and Res...
The deep learning tour - Q1 2017
Artificial intelligence
Machine learning tutorial
Machine learning tutorial
深度学习639页PPT/////////////////////////////
Artificial Intelligence (AI) basics.pptx
An Overview On Neural Network And Its Application
[Nov 26] introduction to AI / ML
ML MODULE 1_slideshare.pdf
Demystifying AI
What Deep Learning Means for Artificial Intelligence
Artificial Intelligence (and the telecom industry)
Soft Computing
ML4CS_L08_NeuralNetworks machine learning
Lebanon SoftShore Artificial Intelligence Seminar - March 38, 2014
Ad

Recently uploaded (20)

PPTX
Understanding the Circulatory System……..
PPT
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
PDF
Science Form five needed shit SCIENEce so
PPT
veterinary parasitology ````````````.ppt
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PDF
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
PPTX
gene cloning powerpoint for general biology 2
PDF
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PPT
Enhancing Laboratory Quality Through ISO 15189 Compliance
PPTX
Microbes in human welfare class 12 .pptx
PPT
Biochemestry- PPT ON Protein,Nitrogenous constituents of Urine, Blood, their ...
PPTX
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
PPT
Computional quantum chemistry study .ppt
PPTX
A powerpoint on colorectal cancer with brief background
PDF
Communicating Health Policies to Diverse Populations (www.kiu.ac.ug)
PDF
S2 SOIL BY TR. OKION.pdf based on the new lower secondary curriculum
PPTX
Introcution to Microbes Burton's Biology for the Health
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
PPTX
ap-psych-ch-1-introduction-to-psychology-presentation.pptx
Understanding the Circulatory System……..
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
Science Form five needed shit SCIENEce so
veterinary parasitology ````````````.ppt
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
gene cloning powerpoint for general biology 2
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
Enhancing Laboratory Quality Through ISO 15189 Compliance
Microbes in human welfare class 12 .pptx
Biochemestry- PPT ON Protein,Nitrogenous constituents of Urine, Blood, their ...
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
Computional quantum chemistry study .ppt
A powerpoint on colorectal cancer with brief background
Communicating Health Policies to Diverse Populations (www.kiu.ac.ug)
S2 SOIL BY TR. OKION.pdf based on the new lower secondary curriculum
Introcution to Microbes Burton's Biology for the Health
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
ap-psych-ch-1-introduction-to-psychology-presentation.pptx

How to do science in a large IT company (ICPC World Finals 2021, Moscow)

  • 1. Alexander Borzunov How to do research at a large IT company
  • 2. 2 Who am I? Alexander Borzunov • Researcher at Yandex • NEERC ICPC 2017 prize winner • Bachelor’s at Ural FU • Master’s at HSE University + Yandex School of Data Analysis
  • 3. 3 Plan • Why do companies need research? • What researchers do? • How to get there?
  • 4. 4 Why do companies need research? Product development: • Developers address user feedback/business needs • No time to dive deeply into a problem (e. g. invent a new algorithm) Research: • Experts work on problems from a particular area full-time • Necessary to get innovations in the long term
  • 5. 5 How is it different from universities? Research in companies: • More funding • Access to more compute • Interaction with product teams
  • 6. 6 Many breakthroughs in modern computer science are made by companies
  • 7. 7 What researchers do? • Follow latest findings and results (e. g. on Twitter)
  • 9. 9 What researchers do? • Follow latest findings and results (e. g. on Twitter) • Choose promising research directions
  • 10. 10 What researchers do? • Follow latest findings and results (e. g. on Twitter) • Choose promising research directions • Collaborate with each other
  • 11. 11 What researchers do? • Follow latest findings and results (e. g. on Twitter) • Choose promising research directions • Collaborate with each other • Conduct experiments (you need to write code quickly to evaluate many ideas)
  • 12. 12 What researchers do? • Follow latest findings and results (e. g. on Twitter) • Choose promising research directions • Collaborate with each other • Conduct experiments • Design rigorous proofs
  • 13. 13 What researchers do? If the method works: • Write a paper for an (international) conference • Defend it in a discussion with reviewers • If accepted: • Travel to a conference ✈️ • Tell the world about it on Twitter, Reddit, Habr, etc. 🌎 • Your results may be adopted by product teams
  • 14. 14 Yandex Research • Focus: machine learning and related algorithms • Computer vision, image generation • Language processing • Program synthesis with neural nets (e. g. trained on Codeforces solutions) • Systems for distributed training • Theory, e. g. continuous optimization • Publications in top venues such as NeurIPS, ICML, CVPR, ACL
  • 15. 15 Collaboration with product teams Self-driving and Robotics Voice assistants
  • 16. 16 Yandex Research • Joint labs with paid programs for Master’s/PhD students: • Collaborations:
  • 17. 17 How did I get there? 2014 – 2018 Bachelor’s at Ural FU, participated in ICPC ▎ “What’s next?” ▎ “Machine learning – a growing field”
  • 18. 18 Machine learning on “Cats vs. Dogs” No methods known to get 60% accuracy (random gives 50%) 2007 vs.
  • 19. 19 Machine learning on “Cats vs. Dogs” No methods known to get 60% accuracy (random gives 50%) Solved with 98% accuracy 2007 2014 vs.
  • 20. 20 Machine learning on “Cats vs. Dogs” No methods known to get 60% accuracy (random gives 50%) Solved with 98% accuracy Neural nets can draw cats and dogs themselves (this cat does not exist) 2007 2014 2019 vs.
  • 21. 21 Machine learning in 2021 Neural nets can draw cats and dogs themselves Neural nets draw pictures matching any text description 2019 2021
  • 22. 22 How did I get there? 2018 – 2020 2019 – 2021 2021 – Now Master’s at HSE University + Yandex School of Data Analysis ▎ “Self-driving – a product that may change everyday life” Research Engineer at Yandex Self-Driving ▎ “Research – a place where people invent new things” Yandex Research
  • 23. 23 What I do? • Compute needed for training latest neural nets grows quickly • Popular training methods are designed for high-performance clusters • Cluster to train GPT-3 costs over $250 million • Hard to get if you are in a university or a startup • Solution: distributed training over the Internet (like BitTorrent)
  • 24. 24 First use case: Language models • Training one large neural net allows to solve many tasks: • Understanding intents, tone, logical relations from a sentence • Answering questions • Extracting entities (locations, persons, etc.) • Once trained, it is easy to use for your business/research
  • 25. First use case: Language model for Bengali • TOP-6 language by no. of native speakers • No good model yet
  • 26. First use case: Language model for Bengali • We offered people to train one together! Together with: • Got a competitive model, state-of-the-art on some tasks
  • 27. Roadblock to scaling: Security • To train a neural net, you need to average computations performed by peers on different data samples • A troll or competitor may destroy the model by sending wrong values once
  • 28. 28 Secure distributed training Idea #1: Clip outliers among computations (it does not hurt training if done right)
  • 29. 29 Idea #2: • Peers broadcast hashes of their calculations. • Then, the system selects “policemen” to validate results of some peers. • If a policeman accuses someone, we can learn who is right from the hashes. Secure distributed training
  • 30. Secure distributed training Result: We ban offenders and quickly recover training progress
  • 31. 31 Thank you! Check out our publications and available positions on research.yandex.com I am available for a chat or questions at the Yandex area on the 3rd floor terrace until 7 pm 🙂