SlideShare a Scribd company logo
The State of Data Science and Machine Learning 1
© 2023 TechTarget, Inc. All Rights Reserved. Back to Contents
© 2023 TechTarget, Inc. All Rights Reserved.
DECODING THE DATA UNIVERSE:
The State of Data Science
and Machine Learning
September 2023
Mike Leone, Principal Analyst
ENTERPRISE STRATEGY GROUP
The State of Data Science and Machine Learning 2
© 2023 TechTarget, Inc. All Rights Reserved. Back to Contents
Research Objectives
Several challenges are preventing organizations from successfully integrating machine learning (ML) models into their
software development lifecycle. Bridging the gap between different skill sets, handling complex and large data sets, managing
specialized hardware, and ensuring availability, scalability, and security in production collectively delay time to value and cause
organizational bottlenecks.
Due to the increasing interest in and complexity of machine learning projects, organizations need improved agility, efficiency,
and performance, with risk reduction through right-sized governance. Organizations recognize that they need clear data science
and machine learning strategies. As part of these strategies, MLOps can provide a structured and standardized approach to
developing, deploying, and maintaining ML models in production to see greater value. To gain further insight into these trends,
TechTarget’s Enterprise Strategy Group (ESG) surveyed 366 professionals at organizations in North America (US and Canada)
involved with data science and machine learning technologies and processes, including potential responsibility for strategizing,
evaluating, purchasing, building, and managing these technologies.
Identify investment plans, objectives, and
challenges of data science and machine
learning initiatives and projects.
Establish the current state of
operationalizing AI through MLOps.
Determine how organizations are prioritizing
solutions to best help them succeed.
Understand the evolving stakeholder landscape,
including team makeup, involvement, and
growth opportunities.
This study sought to:
The State of Data Science and Machine Learning 3
© 2023 TechTarget, Inc. All Rights Reserved.
The State of Data Science and Machine Learning
© 2023 TechTarget, Inc. All Rights Reserved.
key
findings
click to follow
Investments Point to
Staggering Growth, But
Challenges Loom Large
PAGE 4
Organizations Improve Their Ability
to Shift Models to Production But
Need Further Efficiencies
PAGE 14
Focus Sharpens on Improving
Early and Late Stages of Data
Science Lifecycle
PAGE 10
Data Science and Machine Learning
Become a Team Sport, With Vendors
Focused on Enabling All Stakeholders
PAGE 17
Investments
Point to
Staggering
Growth, But
Challenges
Loom Large
The State of Data Science and Machine Learning 5
© 2023 TechTarget, Inc. All Rights Reserved. Back to Contents
Improving operational efficiency continues to be the lynchpin to most business objectives driving data science and machine learning initiatives. It not only
empowers organizations to improve agility, cost-effectiveness, and customer centricity, but also lays the groundwork for sustainable growth and scale in an
increasingly data-driven world. Once operations are performing at optimal levels, organizations can focus more on other business imperatives. However, data
science and machine learning initiatives also are expected to improve product development, customer experience, risk management, and other areas.
Primary Business Objectives Point Inward
66%
49%
60%
47%
52%
43%
Improving operational
efficiency
Improving risk
management
Improving product
development and innovation
Enhancing decision
making
Enhancing customer experience/
improving customer satisfaction
Identifying new business opportunities
and/or increasing revenue
| Primary business objectives of data science and machine learning initiatives.
The State of Data Science and Machine Learning 6
© 2023 TechTarget, Inc. All Rights Reserved. Back to Contents
Nearly all (92%) organizations saw a
year-to-year increase in budget allocation
for data science and machine learning
projects/initiatives. These budgets
are significant, with nearly one in four
organizations (24%) planning to invest
at least $1 million in people, process,
or technology in association with data
science and machine learning over the
next several years. This heightened
investment reflects an understanding
that data science not only enhances
operational efficiency but also enables
informed decision making, predictive
analytics, and innovative product
development. This financial support
emphasizes the pivotal role that data
science and machine learning play in
enabling the business to extract valuable
knowledge from vast and complex data
sets, propelling organizations toward
success in the digital age.
Budgets Are on the Rise
| Change in budget for data science and machine learning projects/initiatives compared with previous year.
This heightened investment reflects an understanding that
data science not only enhances operational efficiency but also
enables informed decision making, predictive analytics, and
innovative product development.”
“
43+49+7+1J43%
13%
7%
Increased
significantly,
Increased
somewhat,
Stayed the same,
The State of Data Science and Machine Learning 7
© 2023 TechTarget, Inc. All Rights Reserved. Back to Contents
The willingness to sacrifice time to market and proceed with limited resources highlights the cautiously optimistic approach organizations are taking. They recognize they
can’t afford to wait but also that they must ensure robust model development, thorough testing, and accurate insights to avoid potential costly errors. This deliberate and
calculated approach can enhance long-term performance, reliability, and stakeholder confidence, which far outweigh the initial time investment.
Strategies Are Diverse When Prioritizing Data Science Projects
| Prioritized approach to data science-related projects.
Business impact
(i.e., projects with highest
potential business impact)
23+77+S
23%
Technical complexity
(i.e., projects with highest
technical complexity)
23+77+S
23%
Time to market
(i.e., projects with shortest time
to market)
7+93+S7%
Resource availability
(i.e., projects that can be completed
with available resources)
13+87+S13%
Customer feedback
(i.e., projects that address
customer feedback)
14+86+S14%
Executive leadership
(i.e., priorities are dictated by the
executive leadership team)
19+81+S19%
88%
of organizations agree
that open source is
critical to innovation
in data science and
machine learning.
The State of Data Science and Machine Learning 8
© 2023 TechTarget, Inc. All Rights Reserved. Back to Contents
53%
Improved operational
efficiency
37%
35%
Competitive advantage
Employee satisfaction/
happiness
48%
Customer satisfaction
37%
26%
Predictive accuracy
Social impact
45%
Cost savings or revenue
generation
39%
36%
Time savings
Innovation potential
| Areas used to measure data science projects/initiatives.
Each data science project brings
a distinct dimension to measuring
impact. The proximity of responses
is a testament to the diversity of
approaches and use cases that
highlight the transformative power
of data science across domains.
Because operational efficiency is
the most common business driver
for data science initiatives, it follows
that it is also the most common area
measured to ensure the performance
of those strategies. Customer
satisfaction and cost saving are also
commonly monitored to determine
the impact of these initiatives.
The Art of Measuring
Data Science Project
Impact
The State of Data Science and Machine Learning 9
© 2023 TechTarget, Inc. All Rights Reserved. Back to Contents
Nearly all (94%) organizations
face challenges in developing and
implementing data science projects.
Challenges Loom Large
Challenges come in several
shapes and sizes:
| Most significant challenges faced in developing and implementing data science projects.
Organizational:
skilled talent, budgets,
defining objectives, and
measuring outcomes.
Data/environment:
integrating with existing systems,
data accessibility, limited tools,
poor data quality, and siloed data.
Trust:
data security/privacy,
ethical concerns, and
data governance.
6%
14%
16%
16%
16%
19%
20%
21%
21%
22%
23%
25%
27%
We don’t have any challenges
Ineffective data governance
Ethical concerns
Siloed data
Poor data quality
Difficulty defining project objectives
Limited availability of the right tools
Insufficient data security and privacy
Difficulty measuring project outcomes
Lack of data access
Limited budget and resources
Insufficient integration with existing systems
Lack of skilled talent
Most significant challenges faced in developing and implementing data science
projects.
Focus Sharpens
on Improving
Early and Late
Stages of Data
Science Lifecycle
The State of Data Science and Machine Learning 11
© 2023 TechTarget, Inc. All Rights Reserved. Back to Contents
Many organizations have already
made massive investments in their
data science and machine learning
initiatives, so ensuring they still see
value from those investments is
critical. Simplifying implementation
and deployment highlights the desire
for organizations to ramp up quickly
and improve the time between data
generation and data insights. Note also
that over a quarter (26%) of organizations
consider compatibility with open source
technologies, likely foreshadowing a
larger open source deployment trend
moving forward.
Factors Weighed in
Consideration of Data Science
Purchases Highlight a Desire
for Integration and Simplicity
| Most important factors when considering purchases to support data science initiatives.
12%
16%
18%
19%
20%
21%
21%
22%
23%
24%
26%
33%
34%
Partner ecosystem
Customer case studies and proof points
Overall reputation of the vendor
Vendor stability and financial viability
User adoption and engagement
Industry-specific presence
Availability of a strong community and ecosystem
Availability of training and resources
Customer service and responsiveness
Alignment with the organization's strategic goals and vision
Compatibility with open source technologies
Ease of implementation and deployment
Integration with existing systems
Most important factors when considering purchases to support data science
initiatives.
The State of Data Science and Machine Learning 12
© 2023 TechTarget, Inc. All Rights Reserved. Back to Contents
Within the last year, organizations have made great strides in improving the operationalization of machine learning models and transitioning them into production
environments. Between robust frameworks and automated pipelines for model training, validation, and deployment, the industry has seen more seamless
integration into existing systems, as well as streamlined processes that enable faster iterations. At the root of this improved success is the advent of MLOps
practices to promote collaboration between data and IT stakeholders. However, despite these improvements, there is still significant room for improvement in the
rate at which organizations deploy machine learning models into production environments. For example, 45% of organizations see less than 25% of their models
make it into production. Challenges persist that require ongoing attention in managing the entire lifecycle of models, from initial development through continuous
monitoring and maintenance to deal with model drift, performance degradation, interpretability issues, and more.
Significant Room for Improvement Moving Models to Production
| Percentage of machine learning models deployed into production environments.
2%
17%
26%
33%
15%
3%
1% 4%
Less than 5% 5% to 10% 11% to 25% 26% to 50% 51% to 75% More than 75% We have not yet
deployed an ML
model into production
Don’t know
Percentage of machine learning models deployed into production environments.
The State of Data Science and Machine Learning 13
© 2023 TechTarget, Inc. All Rights Reserved. Back to Contents
Data accessibility and data preparation go hand in hand. Data accessibility forms the foundation for the entire data science lifecycle, highlighting not only why
this is most commonly performed on a regular basis but also why it poses the largest challenge for organizations today. Data preparation, including cleansing,
structuring, and transforming data, is a necessary step to ensure that subsequent analytical experiments are founded on a reliable and accurate basis.
The Importance of Data Cannot Be Overstated
| Data science lifecycle steps performed on a regular basis. Most challenging data science lifecycle steps.
28%
31%
32%
32%
35%
35%
36%
40%
50%
51%
Problem formulation
Model interpretation and communication
Model deployment
Model retraining
Model validation
Model development/feature engineering
Exploratory data analysis
Model monitoring and maintenance
Data preparation
Data access
Data science lifecycle steps performed on a regular basis.
3%
5%
6%
6%
8%
9%
11%
11%
12%
14%
15%
No steps cause challenges
Model retraining
Problem formulation
Model validation
Model deployment
Model interpretation and communication
Model development/feature engineering
Model monitoring and maintenance
Exploratory data analysis
Data preparation
Data access
Most challenging data science lifecycle steps.
Organizations
Improve Their
Ability to Shift
Models to
Production But
Need Further
Efficiencies
The State of Data Science and Machine Learning 15
© 2023 TechTarget, Inc. All Rights Reserved. Back to Contents
Considering 58% of organizations have significant room to improve on their processes for moving models into production, it makes sense that even the most mature
organizations run into challenges. Technical complexities arise when integrating models into existing infrastructure, ensuring compatibility with various systems,
and encountering unexpected real-world data variability. Compliance and governance challenges impact reliability and trust as well as introduce risk. Operational
complexities arise such as maintaining model performance over time and identifying/responding to failures. Continuous monitoring also poses challenges, such as
addressing data drift and managing model dependencies such as model versioning.
Unpacking Challenges in ML Deployment and Monitoring
| Challenges with deployment and monitoring of machine learning models.
35%
Difficulty managing multiple
environments
26%
Inefficient retaining
processes
33%
Difficulty ensuring compliance with
corporate governance policies
26%
Difficulty managing
dependencies
33%
Difficulty detecting and
responding to data drift
29%
Inconsistent model performance
in production
29%
Difficulty detecting and responding
to model failures
The State of Data Science and Machine Learning 16
© 2023 TechTarget, Inc. All Rights Reserved. Back to Contents
With 47% of organizations retraining models on at least a
weekly basis, it is important to understand the impact frequent
retraining can have on an organization, from resource strain and
inefficiency to amplifying data noise and creating versioning
complexities. While making changes via retraining based on data
drift is important, doing so excessively can disrupt operations,
confuse users, and hinder strategic focus on critical deployment
aspects like monitoring and ethics. Organizations must balance
retraining frequency and the potential downsides associated with
it. A well-defined strategy to model monitoring and maintenance
that factors in benefits, costs, and impact is essential to making
to the right decisions about the optimal retraining schedule.
Striking a Balance Between
Retraining and Maintaining
| Frequency of retraining machine learning models in production.
A well-defined strategy to model monitoring
and maintenance that factors in benefits, costs,
and impact is essential to making to the right
decisions about the optimal retraining schedule.”
“
11%
36%
23% 22%
1% 1% 2% 1%
3
Daily Weekly Monthly Quarterly Yearly Only when new data
is available
Only when accuracy
falls below a certain
threshold
Only when objectives
change
Don’t
Frequency of retraining machine learning models in production.
Data Science and
Machine Learning
Become a Team
Sport, With Vendors
Focused on Enabling
All Stakeholders
The State of Data Science and Machine Learning 18
© 2023 TechTarget, Inc. All Rights Reserved. Back to Contents
Collaboration among stakeholders
and team members is vital for
successful data science initiatives.
Organizations employ tools and
methods to integrate expertise,
fostering constructive dialogue,
strategy refinement, and collective
guidance. This open communication
empowers diverse roles to shape
outcomes, enhancing analysis quality
and propelling organizations toward
transformative insights and decisions.
Building Bridges for
Collaborative Data Science
Success
| Sources used to ensure collaboration between stakeholders and other team members on data science initiatives.
9%
20%
22%
23%
27%
30%
32%
43%
44%
45%
46%
Hackathons
Pair programming or peer code review
Shared notebooks
Agile methodologies
Open source community forums
Code repositories/version control systems
General purpose help groups/forums
Data science/machine learning platforms
Data science community forums and marketplaces
Virtual workspaces
Data visualization tools
Sources used to ensure collaboration between stakeholders and other team
members on data science initiatives.
The State of Data Science and Machine Learning 19
© 2023 TechTarget, Inc. All Rights Reserved. Back to Contents
44%
Data collection/
supply
36%
28%
Model training
Logic building
40%
Data preprocessing
36%
27%
Model evaluation
Use case/problem
definition
39%
Model deployment
38%
30%
Model monitoring/
maintenance
Model selection
| Machine learning model building areas that involve non-data science professionals (e.g., business analysts).
Non-data science stakeholders play a
significant role across the data science
lifecycle, influencing various stages
from data collection and preprocessing
to model deployment and model
management. This is a big reason why
92% of respondents rated the experience
of business stakeholders involved in
data science initiatives and working with
data science teams as positive, if not
very positive. Creating data science and
machine learning solutions that cater to
the non-data science community poses
significant opportunities for vendors
as organizations move forward in data
science regardless of their levels of data
science expertise.
Mapping Stakeholder
Involvement Across the
Data Science Lifecycle
The State of Data Science and Machine Learning 20
© 2023 TechTarget, Inc. All Rights Reserved. Back to Contents
52%
Career advancement opportunities 520+480=
50%
Keeping pace with industry trends 500+500=
45%
Job security 450+550=
44%
Employer requirements 440+560=
40%
General interest in the field
400+600=
40%
Salary increase 400+600=
35%
Personal fulfillment 350+650=
| Employees’ drivers to improve skills in data science and machine learning.
With 99% of people motivated to improve
their data science and machine learning skills,
the research highlights that improvements
are fueled by a combination of intrinsic and
extrinsic motivations. The prospects of
career advancement, recognition, and salary
increases, along with the promise of contributing
meaningfully to cutting-edge projects, act as
powerful external motivators. This combination
of tangible rewards with intellectual curiosity
creates an interesting dynamic within the work
environment where employees are inspired to
invest time (sometimes outside of work) to
continue honing their skills.
Unlocking Employee Potential
99% of responents are motivated to improve
their data science and machine learning skills.
The State of Data Science and Machine Learning 21
© 2023 TechTarget, Inc. All Rights Reserved. Back to Contents
KNIME helps everybody make sense of data.
Its free and open-source KNIME Analytics Platform enables anyone--whether they come from a
business, technical or data background–to intuitively work with data, every day. KNIME Business Hub
is the commercial complement to KNIME Analytics Platform and enables users to collaborate on data
science and share insights across the organization. Together, the products support the complete data
science lifecycle, allowing teams at all levels of analytics readiness to support the operationalization of
data and to build a scalable data science practice.
Learn More
The State of Data Science and Machine Learning 22
© 2023 TechTarget, Inc. All Rights Reserved. Back to Contents
Research Methodology and Demographics
To gather data for this report, ESG conducted a comprehensive online survey of data professionals from private- and public-sector organizations in North America (United States
and Canada) between June 5, 2023 and June 27, 2023. To qualify for this survey, respondents were required to be involved with data science and machine learning technologies
and processes, including potential responsibility for strategizing, evaluating, purchasing, building, and managing these technologies. All respondents were provided an incentive
to complete the survey in the form of cash awards and/or cash equivalents.
After filtering out unqualified respondents, removing duplicate responses, and screening the remaining completed responses (on a number of criteria) for data integrity, we were
left with a final total sample of 366 data professionals.
Respondents by Number of Employees Respondents by Age of Company Respondents by Industry
100 to 499,
20%
500 to 999,
22%
1,000 to
2,499, 17%
2,500 to
4,999, 14%
5,000 to
9,999, 14%
10,000 to
19,999, 7%
20,000 or
more, 6%
Fewer than 5
years, 1%
5 to 10 years,
17%
11 to 20 years,
48%
21 to 50 years,
25%
More than 50
years, 10%
Don’t know, 1% 37%
13%
10%
7%
7%
6%
5%
1%
14%
Manufacturing
Financial services
Technology
Healthcare
Retail/wholesale
Communications and media
Business services
Government
Other
All product names, logos, brands, and trademarks are the property of their respective owners. Information contained in this publication has been obtained by sources TechTarget, Inc. considers to be reliable but is not warranted by TechTarget, Inc.
This publication may contain opinions of TechTarget, Inc., which are subject to change. This publication may include forecasts, projections, and other predictive statements that represent TechTarget, Inc.’s assumptions and expectations in light of
currently available information. These forecasts are based on industry trends and involve variables and uncertainties. Consequently, TechTarget, Inc. makes no warranty as to the accuracy of specific forecasts, projections or predictive statements
contained herein.
This publication is copyrighted by TechTarget, Inc. Any reproduction or redistribution of this publication, in whole or in part, whether in hard-copy format, electronically, or otherwise to persons not authorized to receive it, without the express
consent of TechTarget, Inc., is in violation of U.S. copyright law and will be subject to an action for civil damages and, if applicable, criminal prosecution. Should you have any questions, please contact Client Relations at cr@esg-global.com.
Enterprise Strategy Group is an integrated technology analysis, research, and strategy firm providing market intelligence, actionable
insight, and go-to-market content services to the global technology community.
© 2023 TechTarget, Inc. All Rights Reserved.

More Related Content

PPTX
Reinventing Enterprise Operations
PDF
Unveiling Strategic Trends in Global Finance, Banking, and Insurance - IT Ex...
PDF
A Data-driven Maturity Model for Modernized, Automated, and Transformed IT
PDF
From Chaos to Clarity: Crafting a Data Strategy Roadmap for Organizational Tr...
PDF
Revolutionizing IT Project Delivery - Embrace the Future with OnePlan’s AI-Po...
PDF
Big data web
PDF
Executive Overview on EDM Strategy
PDF
How to Create a Data Analytics Roadmap
 
Reinventing Enterprise Operations
Unveiling Strategic Trends in Global Finance, Banking, and Insurance - IT Ex...
A Data-driven Maturity Model for Modernized, Automated, and Transformed IT
From Chaos to Clarity: Crafting a Data Strategy Roadmap for Organizational Tr...
Revolutionizing IT Project Delivery - Embrace the Future with OnePlan’s AI-Po...
Big data web
Executive Overview on EDM Strategy
How to Create a Data Analytics Roadmap
 

Similar to state-of-data-science-and-machine-learning.pdf (20)

PDF
Emerging Trends in Data Architecture – What’s the Next Big Thing?
PDF
Data Management
PDF
Vertical Technology Solutions Overview v1.2
PDF
Accenture big-data
PDF
Synergetics-IIF-Tech-Ind
PDF
Build a Winning Data Strategy in 2022.pdf
PDF
Driving a data-centric culture: a bottom-up opportunity
PDF
Driving A Data-Centric Culture: A Bottom Up Opportunity
PPTX
Future-ready Insurance Systems – An Insurer’s Guide to Optimizing Technology ...
PPTX
Blue Modern Data Economy Presentation.pptx
PDF
How ‘Big Data’ Can Create Significant Impact on Enterprises? Part I: Findings...
DOCX
Report on strategic rules of Information System for changing the bases of com...
PDF
Transforming Business with Data Science: Trends, Tools, and Techniques
PDF
Predictive Maintenance Solution -1019
PDF
Dtt en wp_techtrends_10022014
PPTX
The Trusted Path That Driven Big Data to Success
PDF
Big Data is Here for Financial Services White Paper
PDF
Report: CIOs & Big Data
PDF
An Analysis of Big Data Computing for Efficiency of Business Operations Among...
PDF
How to harness big data to drive performance across your project portfolio
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Data Management
Vertical Technology Solutions Overview v1.2
Accenture big-data
Synergetics-IIF-Tech-Ind
Build a Winning Data Strategy in 2022.pdf
Driving a data-centric culture: a bottom-up opportunity
Driving A Data-Centric Culture: A Bottom Up Opportunity
Future-ready Insurance Systems – An Insurer’s Guide to Optimizing Technology ...
Blue Modern Data Economy Presentation.pptx
How ‘Big Data’ Can Create Significant Impact on Enterprises? Part I: Findings...
Report on strategic rules of Information System for changing the bases of com...
Transforming Business with Data Science: Trends, Tools, and Techniques
Predictive Maintenance Solution -1019
Dtt en wp_techtrends_10022014
The Trusted Path That Driven Big Data to Success
Big Data is Here for Financial Services White Paper
Report: CIOs & Big Data
An Analysis of Big Data Computing for Efficiency of Business Operations Among...
How to harness big data to drive performance across your project portfolio
Ad

Recently uploaded (20)

PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Mega Projects Data Mega Projects Data
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
Database Infoormation System (DBIS).pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Introduction to Knowledge Engineering Part 1
PDF
.pdf is not working space design for the following data for the following dat...
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
1_Introduction to advance data techniques.pptx
Supervised vs unsupervised machine learning algorithms
Mega Projects Data Mega Projects Data
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Business Ppt On Nestle.pptx huunnnhhgfvu
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Database Infoormation System (DBIS).pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Galatica Smart Energy Infrastructure Startup Pitch Deck
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Introduction to Knowledge Engineering Part 1
.pdf is not working space design for the following data for the following dat...
Ad

state-of-data-science-and-machine-learning.pdf

  • 1. The State of Data Science and Machine Learning 1 © 2023 TechTarget, Inc. All Rights Reserved. Back to Contents © 2023 TechTarget, Inc. All Rights Reserved. DECODING THE DATA UNIVERSE: The State of Data Science and Machine Learning September 2023 Mike Leone, Principal Analyst ENTERPRISE STRATEGY GROUP
  • 2. The State of Data Science and Machine Learning 2 © 2023 TechTarget, Inc. All Rights Reserved. Back to Contents Research Objectives Several challenges are preventing organizations from successfully integrating machine learning (ML) models into their software development lifecycle. Bridging the gap between different skill sets, handling complex and large data sets, managing specialized hardware, and ensuring availability, scalability, and security in production collectively delay time to value and cause organizational bottlenecks. Due to the increasing interest in and complexity of machine learning projects, organizations need improved agility, efficiency, and performance, with risk reduction through right-sized governance. Organizations recognize that they need clear data science and machine learning strategies. As part of these strategies, MLOps can provide a structured and standardized approach to developing, deploying, and maintaining ML models in production to see greater value. To gain further insight into these trends, TechTarget’s Enterprise Strategy Group (ESG) surveyed 366 professionals at organizations in North America (US and Canada) involved with data science and machine learning technologies and processes, including potential responsibility for strategizing, evaluating, purchasing, building, and managing these technologies. Identify investment plans, objectives, and challenges of data science and machine learning initiatives and projects. Establish the current state of operationalizing AI through MLOps. Determine how organizations are prioritizing solutions to best help them succeed. Understand the evolving stakeholder landscape, including team makeup, involvement, and growth opportunities. This study sought to:
  • 3. The State of Data Science and Machine Learning 3 © 2023 TechTarget, Inc. All Rights Reserved. The State of Data Science and Machine Learning © 2023 TechTarget, Inc. All Rights Reserved. key findings click to follow Investments Point to Staggering Growth, But Challenges Loom Large PAGE 4 Organizations Improve Their Ability to Shift Models to Production But Need Further Efficiencies PAGE 14 Focus Sharpens on Improving Early and Late Stages of Data Science Lifecycle PAGE 10 Data Science and Machine Learning Become a Team Sport, With Vendors Focused on Enabling All Stakeholders PAGE 17
  • 5. The State of Data Science and Machine Learning 5 © 2023 TechTarget, Inc. All Rights Reserved. Back to Contents Improving operational efficiency continues to be the lynchpin to most business objectives driving data science and machine learning initiatives. It not only empowers organizations to improve agility, cost-effectiveness, and customer centricity, but also lays the groundwork for sustainable growth and scale in an increasingly data-driven world. Once operations are performing at optimal levels, organizations can focus more on other business imperatives. However, data science and machine learning initiatives also are expected to improve product development, customer experience, risk management, and other areas. Primary Business Objectives Point Inward 66% 49% 60% 47% 52% 43% Improving operational efficiency Improving risk management Improving product development and innovation Enhancing decision making Enhancing customer experience/ improving customer satisfaction Identifying new business opportunities and/or increasing revenue | Primary business objectives of data science and machine learning initiatives.
  • 6. The State of Data Science and Machine Learning 6 © 2023 TechTarget, Inc. All Rights Reserved. Back to Contents Nearly all (92%) organizations saw a year-to-year increase in budget allocation for data science and machine learning projects/initiatives. These budgets are significant, with nearly one in four organizations (24%) planning to invest at least $1 million in people, process, or technology in association with data science and machine learning over the next several years. This heightened investment reflects an understanding that data science not only enhances operational efficiency but also enables informed decision making, predictive analytics, and innovative product development. This financial support emphasizes the pivotal role that data science and machine learning play in enabling the business to extract valuable knowledge from vast and complex data sets, propelling organizations toward success in the digital age. Budgets Are on the Rise | Change in budget for data science and machine learning projects/initiatives compared with previous year. This heightened investment reflects an understanding that data science not only enhances operational efficiency but also enables informed decision making, predictive analytics, and innovative product development.” “ 43+49+7+1J43% 13% 7% Increased significantly, Increased somewhat, Stayed the same,
  • 7. The State of Data Science and Machine Learning 7 © 2023 TechTarget, Inc. All Rights Reserved. Back to Contents The willingness to sacrifice time to market and proceed with limited resources highlights the cautiously optimistic approach organizations are taking. They recognize they can’t afford to wait but also that they must ensure robust model development, thorough testing, and accurate insights to avoid potential costly errors. This deliberate and calculated approach can enhance long-term performance, reliability, and stakeholder confidence, which far outweigh the initial time investment. Strategies Are Diverse When Prioritizing Data Science Projects | Prioritized approach to data science-related projects. Business impact (i.e., projects with highest potential business impact) 23+77+S 23% Technical complexity (i.e., projects with highest technical complexity) 23+77+S 23% Time to market (i.e., projects with shortest time to market) 7+93+S7% Resource availability (i.e., projects that can be completed with available resources) 13+87+S13% Customer feedback (i.e., projects that address customer feedback) 14+86+S14% Executive leadership (i.e., priorities are dictated by the executive leadership team) 19+81+S19% 88% of organizations agree that open source is critical to innovation in data science and machine learning.
  • 8. The State of Data Science and Machine Learning 8 © 2023 TechTarget, Inc. All Rights Reserved. Back to Contents 53% Improved operational efficiency 37% 35% Competitive advantage Employee satisfaction/ happiness 48% Customer satisfaction 37% 26% Predictive accuracy Social impact 45% Cost savings or revenue generation 39% 36% Time savings Innovation potential | Areas used to measure data science projects/initiatives. Each data science project brings a distinct dimension to measuring impact. The proximity of responses is a testament to the diversity of approaches and use cases that highlight the transformative power of data science across domains. Because operational efficiency is the most common business driver for data science initiatives, it follows that it is also the most common area measured to ensure the performance of those strategies. Customer satisfaction and cost saving are also commonly monitored to determine the impact of these initiatives. The Art of Measuring Data Science Project Impact
  • 9. The State of Data Science and Machine Learning 9 © 2023 TechTarget, Inc. All Rights Reserved. Back to Contents Nearly all (94%) organizations face challenges in developing and implementing data science projects. Challenges Loom Large Challenges come in several shapes and sizes: | Most significant challenges faced in developing and implementing data science projects. Organizational: skilled talent, budgets, defining objectives, and measuring outcomes. Data/environment: integrating with existing systems, data accessibility, limited tools, poor data quality, and siloed data. Trust: data security/privacy, ethical concerns, and data governance. 6% 14% 16% 16% 16% 19% 20% 21% 21% 22% 23% 25% 27% We don’t have any challenges Ineffective data governance Ethical concerns Siloed data Poor data quality Difficulty defining project objectives Limited availability of the right tools Insufficient data security and privacy Difficulty measuring project outcomes Lack of data access Limited budget and resources Insufficient integration with existing systems Lack of skilled talent Most significant challenges faced in developing and implementing data science projects.
  • 10. Focus Sharpens on Improving Early and Late Stages of Data Science Lifecycle
  • 11. The State of Data Science and Machine Learning 11 © 2023 TechTarget, Inc. All Rights Reserved. Back to Contents Many organizations have already made massive investments in their data science and machine learning initiatives, so ensuring they still see value from those investments is critical. Simplifying implementation and deployment highlights the desire for organizations to ramp up quickly and improve the time between data generation and data insights. Note also that over a quarter (26%) of organizations consider compatibility with open source technologies, likely foreshadowing a larger open source deployment trend moving forward. Factors Weighed in Consideration of Data Science Purchases Highlight a Desire for Integration and Simplicity | Most important factors when considering purchases to support data science initiatives. 12% 16% 18% 19% 20% 21% 21% 22% 23% 24% 26% 33% 34% Partner ecosystem Customer case studies and proof points Overall reputation of the vendor Vendor stability and financial viability User adoption and engagement Industry-specific presence Availability of a strong community and ecosystem Availability of training and resources Customer service and responsiveness Alignment with the organization's strategic goals and vision Compatibility with open source technologies Ease of implementation and deployment Integration with existing systems Most important factors when considering purchases to support data science initiatives.
  • 12. The State of Data Science and Machine Learning 12 © 2023 TechTarget, Inc. All Rights Reserved. Back to Contents Within the last year, organizations have made great strides in improving the operationalization of machine learning models and transitioning them into production environments. Between robust frameworks and automated pipelines for model training, validation, and deployment, the industry has seen more seamless integration into existing systems, as well as streamlined processes that enable faster iterations. At the root of this improved success is the advent of MLOps practices to promote collaboration between data and IT stakeholders. However, despite these improvements, there is still significant room for improvement in the rate at which organizations deploy machine learning models into production environments. For example, 45% of organizations see less than 25% of their models make it into production. Challenges persist that require ongoing attention in managing the entire lifecycle of models, from initial development through continuous monitoring and maintenance to deal with model drift, performance degradation, interpretability issues, and more. Significant Room for Improvement Moving Models to Production | Percentage of machine learning models deployed into production environments. 2% 17% 26% 33% 15% 3% 1% 4% Less than 5% 5% to 10% 11% to 25% 26% to 50% 51% to 75% More than 75% We have not yet deployed an ML model into production Don’t know Percentage of machine learning models deployed into production environments.
  • 13. The State of Data Science and Machine Learning 13 © 2023 TechTarget, Inc. All Rights Reserved. Back to Contents Data accessibility and data preparation go hand in hand. Data accessibility forms the foundation for the entire data science lifecycle, highlighting not only why this is most commonly performed on a regular basis but also why it poses the largest challenge for organizations today. Data preparation, including cleansing, structuring, and transforming data, is a necessary step to ensure that subsequent analytical experiments are founded on a reliable and accurate basis. The Importance of Data Cannot Be Overstated | Data science lifecycle steps performed on a regular basis. Most challenging data science lifecycle steps. 28% 31% 32% 32% 35% 35% 36% 40% 50% 51% Problem formulation Model interpretation and communication Model deployment Model retraining Model validation Model development/feature engineering Exploratory data analysis Model monitoring and maintenance Data preparation Data access Data science lifecycle steps performed on a regular basis. 3% 5% 6% 6% 8% 9% 11% 11% 12% 14% 15% No steps cause challenges Model retraining Problem formulation Model validation Model deployment Model interpretation and communication Model development/feature engineering Model monitoring and maintenance Exploratory data analysis Data preparation Data access Most challenging data science lifecycle steps.
  • 14. Organizations Improve Their Ability to Shift Models to Production But Need Further Efficiencies
  • 15. The State of Data Science and Machine Learning 15 © 2023 TechTarget, Inc. All Rights Reserved. Back to Contents Considering 58% of organizations have significant room to improve on their processes for moving models into production, it makes sense that even the most mature organizations run into challenges. Technical complexities arise when integrating models into existing infrastructure, ensuring compatibility with various systems, and encountering unexpected real-world data variability. Compliance and governance challenges impact reliability and trust as well as introduce risk. Operational complexities arise such as maintaining model performance over time and identifying/responding to failures. Continuous monitoring also poses challenges, such as addressing data drift and managing model dependencies such as model versioning. Unpacking Challenges in ML Deployment and Monitoring | Challenges with deployment and monitoring of machine learning models. 35% Difficulty managing multiple environments 26% Inefficient retaining processes 33% Difficulty ensuring compliance with corporate governance policies 26% Difficulty managing dependencies 33% Difficulty detecting and responding to data drift 29% Inconsistent model performance in production 29% Difficulty detecting and responding to model failures
  • 16. The State of Data Science and Machine Learning 16 © 2023 TechTarget, Inc. All Rights Reserved. Back to Contents With 47% of organizations retraining models on at least a weekly basis, it is important to understand the impact frequent retraining can have on an organization, from resource strain and inefficiency to amplifying data noise and creating versioning complexities. While making changes via retraining based on data drift is important, doing so excessively can disrupt operations, confuse users, and hinder strategic focus on critical deployment aspects like monitoring and ethics. Organizations must balance retraining frequency and the potential downsides associated with it. A well-defined strategy to model monitoring and maintenance that factors in benefits, costs, and impact is essential to making to the right decisions about the optimal retraining schedule. Striking a Balance Between Retraining and Maintaining | Frequency of retraining machine learning models in production. A well-defined strategy to model monitoring and maintenance that factors in benefits, costs, and impact is essential to making to the right decisions about the optimal retraining schedule.” “ 11% 36% 23% 22% 1% 1% 2% 1% 3 Daily Weekly Monthly Quarterly Yearly Only when new data is available Only when accuracy falls below a certain threshold Only when objectives change Don’t Frequency of retraining machine learning models in production.
  • 17. Data Science and Machine Learning Become a Team Sport, With Vendors Focused on Enabling All Stakeholders
  • 18. The State of Data Science and Machine Learning 18 © 2023 TechTarget, Inc. All Rights Reserved. Back to Contents Collaboration among stakeholders and team members is vital for successful data science initiatives. Organizations employ tools and methods to integrate expertise, fostering constructive dialogue, strategy refinement, and collective guidance. This open communication empowers diverse roles to shape outcomes, enhancing analysis quality and propelling organizations toward transformative insights and decisions. Building Bridges for Collaborative Data Science Success | Sources used to ensure collaboration between stakeholders and other team members on data science initiatives. 9% 20% 22% 23% 27% 30% 32% 43% 44% 45% 46% Hackathons Pair programming or peer code review Shared notebooks Agile methodologies Open source community forums Code repositories/version control systems General purpose help groups/forums Data science/machine learning platforms Data science community forums and marketplaces Virtual workspaces Data visualization tools Sources used to ensure collaboration between stakeholders and other team members on data science initiatives.
  • 19. The State of Data Science and Machine Learning 19 © 2023 TechTarget, Inc. All Rights Reserved. Back to Contents 44% Data collection/ supply 36% 28% Model training Logic building 40% Data preprocessing 36% 27% Model evaluation Use case/problem definition 39% Model deployment 38% 30% Model monitoring/ maintenance Model selection | Machine learning model building areas that involve non-data science professionals (e.g., business analysts). Non-data science stakeholders play a significant role across the data science lifecycle, influencing various stages from data collection and preprocessing to model deployment and model management. This is a big reason why 92% of respondents rated the experience of business stakeholders involved in data science initiatives and working with data science teams as positive, if not very positive. Creating data science and machine learning solutions that cater to the non-data science community poses significant opportunities for vendors as organizations move forward in data science regardless of their levels of data science expertise. Mapping Stakeholder Involvement Across the Data Science Lifecycle
  • 20. The State of Data Science and Machine Learning 20 © 2023 TechTarget, Inc. All Rights Reserved. Back to Contents 52% Career advancement opportunities 520+480= 50% Keeping pace with industry trends 500+500= 45% Job security 450+550= 44% Employer requirements 440+560= 40% General interest in the field 400+600= 40% Salary increase 400+600= 35% Personal fulfillment 350+650= | Employees’ drivers to improve skills in data science and machine learning. With 99% of people motivated to improve their data science and machine learning skills, the research highlights that improvements are fueled by a combination of intrinsic and extrinsic motivations. The prospects of career advancement, recognition, and salary increases, along with the promise of contributing meaningfully to cutting-edge projects, act as powerful external motivators. This combination of tangible rewards with intellectual curiosity creates an interesting dynamic within the work environment where employees are inspired to invest time (sometimes outside of work) to continue honing their skills. Unlocking Employee Potential 99% of responents are motivated to improve their data science and machine learning skills.
  • 21. The State of Data Science and Machine Learning 21 © 2023 TechTarget, Inc. All Rights Reserved. Back to Contents KNIME helps everybody make sense of data. Its free and open-source KNIME Analytics Platform enables anyone--whether they come from a business, technical or data background–to intuitively work with data, every day. KNIME Business Hub is the commercial complement to KNIME Analytics Platform and enables users to collaborate on data science and share insights across the organization. Together, the products support the complete data science lifecycle, allowing teams at all levels of analytics readiness to support the operationalization of data and to build a scalable data science practice. Learn More
  • 22. The State of Data Science and Machine Learning 22 © 2023 TechTarget, Inc. All Rights Reserved. Back to Contents Research Methodology and Demographics To gather data for this report, ESG conducted a comprehensive online survey of data professionals from private- and public-sector organizations in North America (United States and Canada) between June 5, 2023 and June 27, 2023. To qualify for this survey, respondents were required to be involved with data science and machine learning technologies and processes, including potential responsibility for strategizing, evaluating, purchasing, building, and managing these technologies. All respondents were provided an incentive to complete the survey in the form of cash awards and/or cash equivalents. After filtering out unqualified respondents, removing duplicate responses, and screening the remaining completed responses (on a number of criteria) for data integrity, we were left with a final total sample of 366 data professionals. Respondents by Number of Employees Respondents by Age of Company Respondents by Industry 100 to 499, 20% 500 to 999, 22% 1,000 to 2,499, 17% 2,500 to 4,999, 14% 5,000 to 9,999, 14% 10,000 to 19,999, 7% 20,000 or more, 6% Fewer than 5 years, 1% 5 to 10 years, 17% 11 to 20 years, 48% 21 to 50 years, 25% More than 50 years, 10% Don’t know, 1% 37% 13% 10% 7% 7% 6% 5% 1% 14% Manufacturing Financial services Technology Healthcare Retail/wholesale Communications and media Business services Government Other
  • 23. All product names, logos, brands, and trademarks are the property of their respective owners. Information contained in this publication has been obtained by sources TechTarget, Inc. considers to be reliable but is not warranted by TechTarget, Inc. This publication may contain opinions of TechTarget, Inc., which are subject to change. This publication may include forecasts, projections, and other predictive statements that represent TechTarget, Inc.’s assumptions and expectations in light of currently available information. These forecasts are based on industry trends and involve variables and uncertainties. Consequently, TechTarget, Inc. makes no warranty as to the accuracy of specific forecasts, projections or predictive statements contained herein. This publication is copyrighted by TechTarget, Inc. Any reproduction or redistribution of this publication, in whole or in part, whether in hard-copy format, electronically, or otherwise to persons not authorized to receive it, without the express consent of TechTarget, Inc., is in violation of U.S. copyright law and will be subject to an action for civil damages and, if applicable, criminal prosecution. Should you have any questions, please contact Client Relations at cr@esg-global.com. Enterprise Strategy Group is an integrated technology analysis, research, and strategy firm providing market intelligence, actionable insight, and go-to-market content services to the global technology community. © 2023 TechTarget, Inc. All Rights Reserved.