SlideShare a Scribd company logo
Fixing Data
Science
Challenges, Problems, Issues, Measures,
Mistakes, Opportunities, Ideas,
Technologies, Research and Visions
Manoj Kumar Ragupathi
Challenges
Source:
https://www.kaggle.com/survey
s/2017
Current Relevance Discussion:
https://www.reddit.com/r/datas
cience/comments/eeok6g/how
_relevant_are_these_challeng
es_in_data_science/
Issues
• Wrong Focus
• Wrong Commitments and Promises
• Misunderstanding-led Wrong Expectations
• Unexplainable AI
• Narrowed and Inability to Transfer Knowledge
Problems
• The Over Hype – Failed Promises
• https://www.reddit.com/r/datascience/comments/egqsmy/how_many_successful_aiml_models_i
mplementations/
• https://analyticsindiamag.com/the-role-of-big-data-analytics-in-the-future-of-managers/,
accordingly says,
• Gartner reported in November 2017, that 60% of big data projects failed. A year later, Gartner analyst Nick
Heudecker said his company was “too conservative” with its 60% estimate and put the failure rate at closer to
85%. Today, he says nothing has changed.
• In July 2019, VentureBeat AI reported that 87% of data science projects never make it into production
• In January 2019, NewVantage survey reported that 77% of “business adoption” of big data and AI
initiatives continued to represent a big challenge for business, (which meant three-fourth of the software
being built is apparently collecting dust)
• Another AI Winter
• https://mindmatters.ai/2019/12/just-a-light-frost-or-ai-winter/
Data Transformation
Technical Efforts Segmentation in Data Science
Data Engineering Data Preparation and Analysis Productionization
Modelling and Validation
Data Exploration
Domain Understanding
Insights Gathering
Hypothesis Validation
Feature Engineering
Data Visualization
APIfication
Containerization
Continuous Train & Test
DevOps CI/CD
Monitoring
Data Architecting
Data Acquisition
Building Data Pipeline
Ensuring Reliability
Performance Tuning
Providing DS Infrastructure
Data Discovery Enablement
Mistakes
• Professionals & Students are mostly focusing on learning ML,
DL, NLP, while it needs least effort in the entire Data Science
Cycle
• Fastest Growing Technical Ecosystem (Software, Tools,
Techniques and Practices) without Standardization
• Reusability of efforts spent is lacking
Mistakes: Data Infrastructure Sharing
• Businesses have Data Science Infrastructure, which is for
internal DS team
• Rarely, it is open for one IT vendor
• Cloud Data Science Infrastructure Providers’ Profitability is
more, due to data infrastructure redundancy and often leads
to huge waste of resources
• Need for Data Mesh
Mistakes: “My Precious” Data
• Businesses won’t share data, easily. So, no way for “Open-
Data”, unless Governments mandate it.
• Data Science Projects won’t succeed without using external
data
• Data Vendors’ Profitability is more
• Data Monetization is not done, due to lack of trust and
visibility
Mistakes: If Data = Oil, then, from Power
Perspective
Mistakes: The Silent “Linked Data”
• Social Media and Tech Giants
• Cloud Providers with Admin Access
• Blockchain Systems connects global business data together
“Artificial General Super Intelligence Powered By Tech Giants”
- Safe AI or Dystopic Future?
The Vision: A Platform
• Serves as Global Data Hub for Global Linked Data
• Anybody with access Can Peek & Work, Cannot Sneak and Steal
• Data Science for Digital Nomads and Telecommuters
• Hyper Data Monetization by Businesses
• Data Control and Tracking
• Nano-Payments for Outcomes
• Data Science Effort Reutilization and Transfer Learning
• A Safe Artificial Super Intelligence (ASI) Powered Global Auto Governance
The Virtual Glove Box
Platform
For Global Data Science Efforts, Tracking,
Monetization and Safe AI Governance
Safe ASI
• According to wiki, glovebox (or glove box) is a sealed
container that is designed to allow one to manipulate objects
where a separate atmosphere is desired.
• We need a virtual glove box for ASI Initiatives
• We can accelerate ASI Development through this Platform
Vision Enabler 1: Data
Mesh
https://www.slideshare.net/ManojKumarR41/data-mesh-212917511
https://martinfowler.com/articles/data-monolith-to-mesh.html
https://fast.wistia.net/embed/iframe/vys2juvzc3?videoFoam
Vision Enabler 2 : Data Trajectories
http://www.ijdc.net/article/view/11.1.1/419
If Data = Oil,
then, where
are the
refineries?
Vision Enabler 3: Hash
Graph
https://www.swirlds.com/downloads/SWIRLDS-TR-2016-02.pdf
https://www.hedera.com/hh-whitepaper-v2.0-17Sep19.pdf
Vision Enabler 4:
BigPrivacy from Anonos
https://www.anonos.com/ : Anonos technology is ”cool” because it enables the
creation of re-linkable non-identifying privacy-enhanced data called Variant
Twins that enable lawful analytics, AI, ML, data sharing and combining.
Vision Enabler 5: Citrix HDX
https://www.citrix.com/en-in/digital-workspace/hdx/
Combination of these 5 and few other ideas will
ultimately lead us to the VGB Platform. Will soon
come up with other document explaining the
vision and how exactly work on the vision to
gradually develop this Platform, which fixes Data
Science Efforts Globally and also accelerates
ASI Development.
To Be Continued…
I thank all the ideators, inventors,
companies, who come up with
these awesome enablers.
About me: https://www.linkedin.com/in/manoj-kumar-r-427b0b195/

More Related Content

PDF
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
PDF
The Curse of the Data Lake Monster
PDF
Data Mesh Part 4 Monolith to Mesh
PDF
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r2)
PDF
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
PDF
Flash session -streaming--ses1243-lon
PDF
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
The Curse of the Data Lake Monster
Data Mesh Part 4 Monolith to Mesh
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Lakehouse, Data Mesh, and Data Fabric (r2)
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
Flash session -streaming--ses1243-lon
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)

What's hot (20)

PDF
Why Data Virtualization? An Introduction.
PDF
Datamesh community meetup 28th jan 2021
PDF
[XConf Brasil 2020] Data mesh
PDF
Data Virtualization: From Zero to Hero (Middle East)
PPTX
Applying Big Data Superpowers to Healthcare
PDF
Data Virtualization - Enabling Next Generation Analytics
PDF
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
PDF
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
PDF
Analytics in a Day Ft. Synapse Virtual Workshop
 
PDF
Apache Kafka® and the Data Mesh
PDF
An Introduction to Data Virtualization in 2018
PDF
Future of Data Strategy (ASEAN)
PDF
Modern Data Architecture
PDF
Data Mesh at CMC Markets: Past, Present and Future
PDF
Unlock Your Data for ML & AI using Data Virtualization
PDF
Company report xinglian
PDF
Data Virtualization: From Zero to Hero
PDF
Data Virtualization: The Agile Delivery Platform
PDF
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
PDF
3 Reasons Data Virtualization Matters in Your Portfolio
Why Data Virtualization? An Introduction.
Datamesh community meetup 28th jan 2021
[XConf Brasil 2020] Data mesh
Data Virtualization: From Zero to Hero (Middle East)
Applying Big Data Superpowers to Healthcare
Data Virtualization - Enabling Next Generation Analytics
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Analytics in a Day Ft. Synapse Virtual Workshop
 
Apache Kafka® and the Data Mesh
An Introduction to Data Virtualization in 2018
Future of Data Strategy (ASEAN)
Modern Data Architecture
Data Mesh at CMC Markets: Past, Present and Future
Unlock Your Data for ML & AI using Data Virtualization
Company report xinglian
Data Virtualization: From Zero to Hero
Data Virtualization: The Agile Delivery Platform
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
3 Reasons Data Virtualization Matters in Your Portfolio
Ad

Similar to Fixing data science & Accelerating Artificial Super Intelligence Development (20)

PDF
Problem Definition muAoPS | Analytics Problem Solving | Mu Sigma
PDF
Never Mind Big Data: We're Still Living in the Era of Big Spreadsheet
PPTX
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
PPTX
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...
PDF
Data Strategy Best Practices
PPTX
introduction to data science
PPSX
Integrated Marketing Analytics & Data-Driven Intelligence
PPTX
Accelerating Data Lakes and Streams with Real-time Analytics
PDF
Modern Data Challenges require Modern Graph Technology
PPTX
Architecting for Big Data: Trends, Tips, and Deployment Options
PPTX
Trends in data analytics
PPTX
Machine Learning for Auditors
PPTX
Momentum v2.0
PDF
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
PDF
influence of AI in IS
PDF
Introduction to Data Science - Fundamentals
PDF
Driving Business Value Through Agile Data Assets
PPTX
Big Data is on a Collision Course With Your Network - Are You Ready?
PPTX
Breed data scientists_ A Presentation.pptx
PDF
Lecture 1-big data engineering (Introduction).pdf
Problem Definition muAoPS | Analytics Problem Solving | Mu Sigma
Never Mind Big Data: We're Still Living in the Era of Big Spreadsheet
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...
Data Strategy Best Practices
introduction to data science
Integrated Marketing Analytics & Data-Driven Intelligence
Accelerating Data Lakes and Streams with Real-time Analytics
Modern Data Challenges require Modern Graph Technology
Architecting for Big Data: Trends, Tips, and Deployment Options
Trends in data analytics
Machine Learning for Auditors
Momentum v2.0
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
influence of AI in IS
Introduction to Data Science - Fundamentals
Driving Business Value Through Agile Data Assets
Big Data is on a Collision Course With Your Network - Are You Ready?
Breed data scientists_ A Presentation.pptx
Lecture 1-big data engineering (Introduction).pdf
Ad

Recently uploaded (20)

PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Machine learning based COVID-19 study performance prediction
PDF
KodekX | Application Modernization Development
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Approach and Philosophy of On baking technology
Review of recent advances in non-invasive hemoglobin estimation
Digital-Transformation-Roadmap-for-Companies.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Network Security Unit 5.pdf for BCA BBA.
Chapter 3 Spatial Domain Image Processing.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Big Data Technologies - Introduction.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Unlocking AI with Model Context Protocol (MCP)
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Machine learning based COVID-19 study performance prediction
KodekX | Application Modernization Development
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
The Rise and Fall of 3GPP – Time for a Sabbatical?
Approach and Philosophy of On baking technology

Fixing data science & Accelerating Artificial Super Intelligence Development

  • 1. Fixing Data Science Challenges, Problems, Issues, Measures, Mistakes, Opportunities, Ideas, Technologies, Research and Visions Manoj Kumar Ragupathi
  • 3. Issues • Wrong Focus • Wrong Commitments and Promises • Misunderstanding-led Wrong Expectations • Unexplainable AI • Narrowed and Inability to Transfer Knowledge
  • 4. Problems • The Over Hype – Failed Promises • https://www.reddit.com/r/datascience/comments/egqsmy/how_many_successful_aiml_models_i mplementations/ • https://analyticsindiamag.com/the-role-of-big-data-analytics-in-the-future-of-managers/, accordingly says, • Gartner reported in November 2017, that 60% of big data projects failed. A year later, Gartner analyst Nick Heudecker said his company was “too conservative” with its 60% estimate and put the failure rate at closer to 85%. Today, he says nothing has changed. • In July 2019, VentureBeat AI reported that 87% of data science projects never make it into production • In January 2019, NewVantage survey reported that 77% of “business adoption” of big data and AI initiatives continued to represent a big challenge for business, (which meant three-fourth of the software being built is apparently collecting dust) • Another AI Winter • https://mindmatters.ai/2019/12/just-a-light-frost-or-ai-winter/
  • 5. Data Transformation Technical Efforts Segmentation in Data Science Data Engineering Data Preparation and Analysis Productionization Modelling and Validation Data Exploration Domain Understanding Insights Gathering Hypothesis Validation Feature Engineering Data Visualization APIfication Containerization Continuous Train & Test DevOps CI/CD Monitoring Data Architecting Data Acquisition Building Data Pipeline Ensuring Reliability Performance Tuning Providing DS Infrastructure Data Discovery Enablement
  • 6. Mistakes • Professionals & Students are mostly focusing on learning ML, DL, NLP, while it needs least effort in the entire Data Science Cycle • Fastest Growing Technical Ecosystem (Software, Tools, Techniques and Practices) without Standardization • Reusability of efforts spent is lacking
  • 7. Mistakes: Data Infrastructure Sharing • Businesses have Data Science Infrastructure, which is for internal DS team • Rarely, it is open for one IT vendor • Cloud Data Science Infrastructure Providers’ Profitability is more, due to data infrastructure redundancy and often leads to huge waste of resources • Need for Data Mesh
  • 8. Mistakes: “My Precious” Data • Businesses won’t share data, easily. So, no way for “Open- Data”, unless Governments mandate it. • Data Science Projects won’t succeed without using external data • Data Vendors’ Profitability is more • Data Monetization is not done, due to lack of trust and visibility
  • 9. Mistakes: If Data = Oil, then, from Power Perspective
  • 10. Mistakes: The Silent “Linked Data” • Social Media and Tech Giants • Cloud Providers with Admin Access • Blockchain Systems connects global business data together “Artificial General Super Intelligence Powered By Tech Giants” - Safe AI or Dystopic Future?
  • 11. The Vision: A Platform • Serves as Global Data Hub for Global Linked Data • Anybody with access Can Peek & Work, Cannot Sneak and Steal • Data Science for Digital Nomads and Telecommuters • Hyper Data Monetization by Businesses • Data Control and Tracking • Nano-Payments for Outcomes • Data Science Effort Reutilization and Transfer Learning • A Safe Artificial Super Intelligence (ASI) Powered Global Auto Governance
  • 12. The Virtual Glove Box Platform For Global Data Science Efforts, Tracking, Monetization and Safe AI Governance
  • 13. Safe ASI • According to wiki, glovebox (or glove box) is a sealed container that is designed to allow one to manipulate objects where a separate atmosphere is desired. • We need a virtual glove box for ASI Initiatives • We can accelerate ASI Development through this Platform
  • 14. Vision Enabler 1: Data Mesh https://www.slideshare.net/ManojKumarR41/data-mesh-212917511 https://martinfowler.com/articles/data-monolith-to-mesh.html https://fast.wistia.net/embed/iframe/vys2juvzc3?videoFoam
  • 15. Vision Enabler 2 : Data Trajectories http://www.ijdc.net/article/view/11.1.1/419 If Data = Oil, then, where are the refineries?
  • 16. Vision Enabler 3: Hash Graph https://www.swirlds.com/downloads/SWIRLDS-TR-2016-02.pdf https://www.hedera.com/hh-whitepaper-v2.0-17Sep19.pdf
  • 17. Vision Enabler 4: BigPrivacy from Anonos https://www.anonos.com/ : Anonos technology is ”cool” because it enables the creation of re-linkable non-identifying privacy-enhanced data called Variant Twins that enable lawful analytics, AI, ML, data sharing and combining.
  • 18. Vision Enabler 5: Citrix HDX https://www.citrix.com/en-in/digital-workspace/hdx/
  • 19. Combination of these 5 and few other ideas will ultimately lead us to the VGB Platform. Will soon come up with other document explaining the vision and how exactly work on the vision to gradually develop this Platform, which fixes Data Science Efforts Globally and also accelerates ASI Development.
  • 21. I thank all the ideators, inventors, companies, who come up with these awesome enablers. About me: https://www.linkedin.com/in/manoj-kumar-r-427b0b195/