SlideShare a Scribd company logo
https://portal.futuregrid.org 
Big Data in the Cloud: 
Research and Education
September 9 2013
PPAM 2013 Warsaw
Geoffrey Fox
gcf@indiana.edu
http://www.infomall.org http://www.futuregrid.org
School of Informatics and Computing
Community Grids Laboratory
Indiana University Bloomington
https://portal.futuregrid.org 
Some Issues to Discuss Today
• Economic Imperative: There are a lot of data and a lot of 
jobs
• Computing Model: Industry adopted clouds which are 
attractive for data analytics. HPC also useful in some cases
• Progress in scalable robust Algorithms: new data need 
different algorithms than before
• Progress in Data Intensive Programming Models
• Progress in Data Science Education: opportunities at 
universities
2
https://portal.futuregrid.org 
Data Deluge
3
https://portal.futuregrid.org  4Meeker/Wu May 29 2013 Internet Trends D11 Conference 
IP Traffic per year ~ 12% Total Created
https://portal.futuregrid.org  5Meeker/Wu May 29 2013 Internet Trends D11 Conference 
https://portal.futuregrid.org 
Some Data sizes
~40 109 Web pages at ~300 kilobytes each = 10
Petabytes
LHC 15 petabytes per year
Radiology 69 petabytes per year
Square Kilometer Array Telescope will be 100
terabits/second; LSST Survey >20TB per day
Earth Observation becoming ~4 petabytes per year
Earthquake Science – few terabytes total today
PolarGrid – 100’s terabytes/year becoming petabytes
Exascale simulation data dumps – terabytes/second
Deep Learning to train self driving car; 100 million
megapixel images ~ 100 terabytes
6
https://portal.futuregrid.org  7
4
8
3
10
51 NIST Big Data Use Cases
http://bigdatawg.nist.gov/usecases.php
https://portal.futuregrid.org  8
6
4
5
10
1
51 NIST Big Data Use Cases
http://bigdatawg.nist.gov/usecases.php
https://portal.futuregrid.org 
Jobs
9
https://portal.futuregrid.org 
Jobs v. Countries
10
http://www.microsoft.com/en‐us/news/features/2012/mar12/03‐05CloudComputingJobs.aspx
https://portal.futuregrid.org 
McKinsey Institute on Big Data Jobs
• There will be a shortage of talent necessary for organizations to take 
advantage of big data. By 2018, the United States alone could face a 
shortage of 140,000 to 190,000 people with deep analytical skills as well as 
1.5 million managers and analysts with the know‐how to use the analysis of 
big data to make effective decisions.
• At IU, Informatics aimed at 1.5 million jobs. Computer Science covers the 
140,000 to 190,000
11
http://www.mckinsey.com/mgi/publications/big_data/index.asp.
https://portal.futuregrid.org  12Meeker/Wu May 29 2013 Internet Trends D11 Conference 
https://portal.futuregrid.org  13Meeker/Wu May 29 2013 Internet Trends D11 Conference 
https://portal.futuregrid.org 
Computing Model
Industry adopted clouds which are 
attractive for data analytics
14
https://portal.futuregrid.org 
5 years Cloud Computing
2 years Big Data Transformational
https://portal.futuregrid.org 
Amazon making money
• It took Amazon Web Services (AWS) eight 
years to hit $650 million in revenue, according 
to Citigroup in 2010. 
• Just three years later, Macquarie Capital 
analyst Ben Schachter estimates that AWS will 
top $3.8 billion in 2013 revenue, up from $2.1 
billion in 2012 (estimated), valuing the AWS 
business at $19 billion.  
https://portal.futuregrid.org 
Physically Clouds are Clear
• A bunch of computers in an efficient data center with an 
excellent Internet connection
• They were produced to meet need of public‐facing Web 
2.0 e‐Commerce/Social Networking sites
• They can be considered as “optimal giant data center” 
plus internet connection
• Note enterprises use private clouds that are giant data 
centers but not optimized for Internet access
• Exascale build‐out of commercial cloud infrastructure: for 
2014‐15 expect 10,000,000 new servers and 10 Exabytes
of storage in major commercial cloud data centers 
worldwide. 
https://portal.futuregrid.org 
Data Intensive Applications and 
Programming Models
18
https://portal.futuregrid.org 
Clouds & Data Intensive Applications
• Applications tend to be new and so can consider emerging 
technologies such as clouds
• Do not have lots of small messages but rather large reduction (aka 
Collective) operations
– New optimizations e.g. for huge messages
• “Large Scale Optimization”: Deep Learning, Social Image 
Organization, Clustering and Multidimensional Scaling which are 
variants of EM 
• EM (expectation maximization) tends to be good for clouds and 
Iterative MapReduce
– Quite complicated computations (so compute largish compared to 
communicate)
– Communication is Reduction operations (global sums or linear) or Broadcast
• Machine Learning has FULL Matrix kernels
19
https://portal.futuregrid.org 
Some (NIST)Large Data mining Problems I
• Find W’s by iteration 
(Steepest Descent method)
• Find 11 Billion W’s from 10 
million images = 9 layer NN
• “Pure” Full Matrix 
Multiplication 
MPI+GPU gets near 
optimal performance
• GPU+MPI 100 times 
previous Google work
• Note Datamining
often gives full matrices 
• http://salsahpc.indiana.edu/summerworkshop2013/index.html
• Deep Learning: (Google/Stanford) Recognize features such as bikes 
or faces with a learning network
45
1
2
4
8
16
32
64
1 4 9 16 36 64
Factor Speedup
# GPUs
11.2B
6.9B
3.0B
1.9B
680M
185M
Linear
https://portal.futuregrid.org 
Protein Universe Browser for COG Sequences with a 
few illustrative biologically identified clusters
• Dimension reduction 
MDS for visualization 
and clustering in non 
metric spaces
• O(N2) algorithms 
with full matrices
• Important Online 
(interpolation) 
methods
• Expectation 
Maximization 
(Iterative AllReduce) 
and Levenberg
Marquardt with 
Conjugate Gradient 
21
https://portal.futuregrid.org 
Some (NIST)Large Data mining Problems II
• Determine optimal geo and angle representation of “all” images by giant 
least squares fit to 6‐D Camera pose of each image and 3D position of points 
in scene
• Levenberg‐Marquardt using Conjugate Gradient to estimate leading 
eigenvector and solve equations
• Note such Newton approaches fail for learning networks as too many 
parameters
• Need Hadoop and HDFS with “trivial problem” of just 15,000 images and 
75,000 points giving 1 TB messages per iteration
• Over 500 million images uploaded each day (1 in 1000 Eiffel tower) …..
22
https://portal.futuregrid.org 
Alternative Approach to Image Classification
• Instead of learning networks one can (always) use 
clustering to divide spaces into compact nearby 
regions
• Characterize images by a feature vector in 512‐2048 
dimensional spaces (HOG or Histograms of Oriented 
Gradients)
• Cluster (K‐means) 100 million vectors (100,000 
images) into 10 million clusters
• Giant Broadcast and AllReduce Operations that 
stress most MPI implementations
• Note Kmeans (Mahout) dreadful with Hadoop
23
https://portal.futuregrid.org 
Clusters v. Regions
• In Lymphocytes clusters are distinct
• In Pathology (NIST Big Data Use Case), clusters divide space 
into regions and sophisticated methods like deterministic 
annealing are probably unnecessary
24
Pathology 54D
Lymphocytes 4D
https://portal.futuregrid.org 
Map Collective Model (Judy Qiu)
• Combine MPI and MapReduce ideas
• Implement collectives optimally on Infiniband, 
Azure, Amazon ……
25
Input
map
Generalized Reduce
Initial Collective Step
Final Collective Step
Iterate
https://portal.futuregrid.org 
4 Forms of MapReduce
26
(a) Map Only
(d) Loosely
Synchronous
(c) Iterative
MapReduce
(b) Classic
MapReduce
Input
map
reduce
Input
map
reduce
Iterations
Input
Output
map
Pij
BLAST Analysis
Parametric sweep
Pleasingly Parallel
High Energy Physics
(HEP) Histograms
Distributed search
Classic MPI
PDE Solvers and
particle dynamics
Domain of MapReduce and Iterative Extensions
Science Clouds
MPI
Exascale
Expectation maximization
Clustering e.g. Kmeans
Linear Algebra, Page Rank
(a) Map Only
(d) Loosely
Synchronous
(c) Iterative
MapReduce
(b) Classic
MapReduce
InputInput
mapmap
reducereduce
InputInput
mapmap
reducereduce
IterationsIterations
InputInput
OutputOutput
mapmap
Pij
BLAST Analysis
Parametric sweep
Pleasingly Parallel
High Energy Physics
(HEP) Histograms
Distributed search
Classic MPI
PDE Solvers and
particle dynamics
Domain of MapReduce and Iterative Extensions
Science Clouds
MPI
Exascale
Expectation maximization
Clustering e.g. Kmeans
Linear Algebra, Page Rank
MPI is Map followed by Point to Point Communication – as in style d)
https://portal.futuregrid.org 
Twister for Data Intensive 
Iterative Applications
• (Iterative) MapReduce structure with Map‐Collective is 
framework
• Twister runs on Linux or Azure
• Twister4Azure is built on top of Azure tables, queues, 
storage
Compute Communication Reduce/ barrier
New Iteration
Larger Loop‐
Invariant Data
Generalize to Generalize to 
arbitrary 
Collective 
Broadcast
Smaller Loop‐
Variant Data
Qiu, Gunarathne
https://portal.futuregrid.org 
Kmeans Clustering on Azure
Number of tasks running as function of  time
0
50
100
150
200
250
300
0 25 50 75 100 125 150 175 200 225 250
Number of Executing Map Tasks
Elapsed Time (s)
This shows that the communication and synchronization overheads between iterations are very small 
(less than one second, which is the lowest measured unit for this graph). 
128 Million data points(19GB), 500 centroids (78KB), 20 dimensions
10 iterations, 256 cores, 256 map tasks per iteration
https://portal.futuregrid.org 
Kmeans Clustering
Execution Time per task
128 Million data points(19GB), 500 centroids (78KB), 20 dimensions
10 iterations, 256 cores, 256 map tasks per iteration
0
10
20
30
40
50
60
70
0 256 512 768 1024 1280 1536 1792 2048 2304
Task Execution Time (s)
Map Task ID
https://portal.futuregrid.org 
• Shaded areas are computing only where Hadoop on HPC cluster 
fastest
• Areas above shading are overheads where T4A smallest and T4A with 
AllReduce collective has lowest overhead
• Note even on Azure Java (Orange) faster than T4A C# 30
0
200
400
600
800
1000
1200
1400
32 x 32 M 64 x 64 M 128 x 128 M 256 x 256 M
Time (s)
Num. Cores X Num. Data Points
Hadoop AllReduce
Hadoop MapReduce
Twister4Azure AllReduce
Twister4Azure Broadcast
Twister4Azure
HDInsight
(AzureHadoop)
Kmeans and (Iterative) MapReduce
https://portal.futuregrid.org 
Details of K‐means Linux Hadoop and 
Hadoop with AllReduce Collective
31
https://portal.futuregrid.org 
Data Science Education
Opportunities at universities
see recent New York Times articles
http://datascience101.wordpress.com/2013/04/13/new‐york‐times‐data‐science‐articles/
32
https://portal.futuregrid.org 
Data Science Education
• Broad Range of Topics from Policy to curation to 
applications and algorithms, programming models, data 
systems, statistics, and broad range of CS subjects such as 
Clouds, Programming, HCI,
• Plenty of Jobs and broader range of possibilities than 
computational science but similar cosmic issues
– What type of degree (Certificate, minor, track, “real” degree)
– What implementation (department, interdisciplinary group 
supporting education and research program)
• NIST Big Data initiative identifies Big Data, Data Science, 
Data Scientist as core concepts
• There are over 40 Data Science Curricula (4 Undergraduate, 
31 Masters,  5 Certificate, 3 PhD)
33
https://portal.futuregrid.org 
Computational Science
• Interdisciplinary field between computer science 
and applications with primary focus on simulation 
areas
• Very successful as a research area
– XSEDE and Exascale systems enable
• Several academic programs but these have been 
less successful than computational science research 
as
– No consensus as to curricula and jobs (don’t appoint 
faculty in computational science; do appoint to DoE labs)
– Field relatively small 
• Started around 1990
34
https://portal.futuregrid.org 
Data Science at Indiana University
• Link Statistics & School of Informatics and Computing 
(Computer Science, Informatics, Information & Library 
Science)
• Broader than most offerings
• Ought IMHO to involve application faculty
• Areas Data Analysis and Statistics, Data Lifecycle, 
Infrastructure (Clouds, Security), Applications
– How broad should requirements be
• Offer online Masters in MOOC format in full scale Fall 2014 
and as certificate on January 2014.
– Also allow residential students in flipped mode
• Free trial run of my MOOC on Big Data Mid October 2013
35
https://portal.futuregrid.org 
MOOC’s
36
https://portal.futuregrid.org  37Meeker/Wu May 29 2013 Internet Trends D11 Conference 
https://portal.futuregrid.org 
Massive Open Online Courses (MOOC)
• MOOC’s are very “hot” these days with Udacity and 
Coursera as start‐ups; perhaps over 100,000 participants 
• Relevant to Data Science (where IU is preparing a MOOC) 
as this is a new field with few courses at most universities
• Typical model is collection of short prerecorded segments 
(talking head over PowerPoint) of length 3‐15 minutes
• These “lesson objects” can be viewed as “songs”
• Google Course Builder (python open source) builds 
customizable  MOOC’s as “playlists” of “songs”
• Tells you to capture all material as “lesson objects”
• We are aiming to build a repository of many “songs”; used 
in many ways – tutorials, classes …
38
https://portal.futuregrid.org  39Meeker/Wu May 29 2013 Internet Trends D11 Conference 
https://portal.futuregrid.org  40
• Twelve 
~10 
minutes 
lesson 
objects in 
this 
lecture
• IU wants 
us to close 
caption if 
use in real 
course
https://portal.futuregrid.org 
Customizable MOOC’s 
• We could  teach one class to 100,000 students or 2,000 classes to 50 
students
• The 2,000 class choice has 2 useful features
– One can use the usual (electronic) mentoring/grading technology
– One can customize each of 2,000 classes for a particular audience given their 
level and interests
– One can even allow student to customize – that’s what one does in making 
play lists in iTunes
– Flipped Classroom
• Both models can be supported by a repository of lesson objects (3‐
15 minute video segments) in the cloud
• The teacher can choose from existing lesson objects and add their 
own to produce a new customized course with new lessons 
contributed back to repository
41
https://portal.futuregrid.org 
Key MOOC areas costing money/effort
• Make content including content, quizzes, homework
• Record video 
• Make web site
• Social Networking Interaction for mentoring student‐
Teaching assistants and student‐student
• Defining how to support computing labs with FutureGrid 
or appliances + Virtual Box
– Appliances scale as download to student’s client
– Virtual machines essential 
• Analyse/Evaluate interactions
42
https://portal.futuregrid.org  43
FutureGrid hosts many classes per semester
How to use FutureGrid is shared MOOC
https://portal.futuregrid.org 
Conclusions
44
https://portal.futuregrid.org 
Conclusions• Data Intensive programs are not like simulations as they have large 
“reductions” (“collectives”) and do not have many small messages
– Clouds suitable and in fact HPC sometimes optimal
• Iterative MapReduce an interesting approach; need to optimize collectives 
for new applications (Data analytics) and resources (clouds, GPU’s …)
• Need an initiative to build scalable high performance data analytics library
on top of interoperable cloud‐HPC platform
– Full matrices important
• More employment opportunities in clouds than HPC and Grids and in data 
than simulation; so cloud and data related activities popular with students
• Community activity to discuss data science education
– Agree on curricula; is such a degree attractive?
• Role of MOOC’s for either
– Disseminating new curricula 
– Managing course fragments that can be assembled into custom courses 
for particular interdisciplinary students
45

More Related Content

PPTX
Reading lists as open data - Meeting the Reading List Challenge 2016
PPTX
Supercomputing and the cloud - the next big paradigm shift?
PDF
Robotics: Current Topics
PPTX
Big Data and Data Science: Opportunities for Biomedical Engineering
PPTX
Health and clinical research - data futures, NIHR accelerating digital programme
PPTX
The Graph Structure of the Web - Aggregated by Pay-Level Domain
PPTX
Big data and the dark arts - Jisc Digital Media 2015
PPTX
Big Data HPC Convergence and a bunch of other things
Reading lists as open data - Meeting the Reading List Challenge 2016
Supercomputing and the cloud - the next big paradigm shift?
Robotics: Current Topics
Big Data and Data Science: Opportunities for Biomedical Engineering
Health and clinical research - data futures, NIHR accelerating digital programme
The Graph Structure of the Web - Aggregated by Pay-Level Domain
Big data and the dark arts - Jisc Digital Media 2015
Big Data HPC Convergence and a bunch of other things

What's hot (20)

PDF
Web search-metrics-tutorial-www2010-section-1of7-introduction
PPTX
Frictionless Supercomputing - MEW25
PDF
Web search-metrics-tutorial-www2010-section-7of7-presentation
PPTX
Digital Transformation of Civil Engineering and Construction
PDF
Elements of AI Luxembourg - session 5
PPTX
Digital Transformation of Civil Engineering and Construction
PPTX
Cloud Programming Models: eScience, Big Data, etc.
PPTX
Big Data in the Cloud: Enabling the Fourth Paradigm by Matching SMEs with Dat...
PPTX
How Does Data Science Impact the Semantic Web?
PPTX
The Semantic Web Exists. What Next?
PPTX
Jisc - Rebooting a National Innovation Agency (EUNIS 2014)
PPTX
Research data spring: DataVault
PPTX
Makers Go To College - Your Digital Future 2016
PPT
Evolving the Web into a Global Dataspace – Advances and Applications
PPT
GrenchMark at CCGrid, May 2006.
PPTX
Putting Data to Work: Moving science forward together beyond where we thought...
PPTX
The future of cloud computing - Jisc Digifest 2016
PPTX
SafeShare - Networkshop44
PPTX
Global Research Data Initiatives
PDF
Designing a second generation of open data platforms
Web search-metrics-tutorial-www2010-section-1of7-introduction
Frictionless Supercomputing - MEW25
Web search-metrics-tutorial-www2010-section-7of7-presentation
Digital Transformation of Civil Engineering and Construction
Elements of AI Luxembourg - session 5
Digital Transformation of Civil Engineering and Construction
Cloud Programming Models: eScience, Big Data, etc.
Big Data in the Cloud: Enabling the Fourth Paradigm by Matching SMEs with Dat...
How Does Data Science Impact the Semantic Web?
The Semantic Web Exists. What Next?
Jisc - Rebooting a National Innovation Agency (EUNIS 2014)
Research data spring: DataVault
Makers Go To College - Your Digital Future 2016
Evolving the Web into a Global Dataspace – Advances and Applications
GrenchMark at CCGrid, May 2006.
Putting Data to Work: Moving science forward together beyond where we thought...
The future of cloud computing - Jisc Digifest 2016
SafeShare - Networkshop44
Global Research Data Initiatives
Designing a second generation of open data platforms
Ad

Similar to Big Data and Clouds: Research and Education (20)

PPTX
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...
PPTX
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
PPT
PUC Masterclass Big Data
PDF
PDF
Introduction to big data
PDF
Challenges and outlook with Big Data
PDF
From DARPA to Shakespeare: All the Data we Can Handle
KEY
Lessons from lhc
PPTX
Sharing Big Data - Bob Jones
PDF
Introduction to big data
PDF
kambatla2014.pdf
PPTX
Data Science Curriculum for Professionals
PDF
Data Science - Poster - Kirk Borne - RDAP12
PPTX
Distributed data mining
PDF
Understand the Idea of Big Data and in Present Scenario
PDF
Graham Pryor
PDF
The Open Science Data Cloud: Empowering the Long Tail of Science
PDF
GADLJRIET850691
PPTX
Cloud Computing y Big Data, próxima frontera de la innovación
PDF
Hpc, grid and cloud computing - the past, present, and future challenge
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
PUC Masterclass Big Data
Introduction to big data
Challenges and outlook with Big Data
From DARPA to Shakespeare: All the Data we Can Handle
Lessons from lhc
Sharing Big Data - Bob Jones
Introduction to big data
kambatla2014.pdf
Data Science Curriculum for Professionals
Data Science - Poster - Kirk Borne - RDAP12
Distributed data mining
Understand the Idea of Big Data and in Present Scenario
Graham Pryor
The Open Science Data Cloud: Empowering the Long Tail of Science
GADLJRIET850691
Cloud Computing y Big Data, próxima frontera de la innovación
Hpc, grid and cloud computing - the past, present, and future challenge
Ad

More from Geoffrey Fox (20)

PPTX
AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...
PPTX
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
PPTX
High Performance Computing and Big Data
PPTX
Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...
PPTX
Big Data HPC Convergence
PPTX
Data Science and Online Education
PPTX
High Performance Processing of Streaming Data
PPTX
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
PPTX
Visualizing and Clustering Life Science Applications in Parallel 
PPTX
Lessons from Data Science Program at Indiana University: Curriculum, Students...
PPTX
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
PPTX
Data Science Curriculum at Indiana University
PPTX
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
DOCX
Experience with Online Teaching with Open Source MOOC Technology
PPTX
Cloud Services for Big Data Analytics
PPTX
Matching Data Intensive Applications and Hardware/Software Architectures
PPTX
Comparing Big Data and Simulation Applications and Implications for Software ...
PDF
High Performance Data Analytics and a Java Grande Run Time
PDF
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
PDF
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
High Performance Computing and Big Data
Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...
Big Data HPC Convergence
Data Science and Online Education
High Performance Processing of Streaming Data
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Visualizing and Clustering Life Science Applications in Parallel 
Lessons from Data Science Program at Indiana University: Curriculum, Students...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
Data Science Curriculum at Indiana University
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
Experience with Online Teaching with Open Source MOOC Technology
Cloud Services for Big Data Analytics
Matching Data Intensive Applications and Hardware/Software Architectures
Comparing Big Data and Simulation Applications and Implications for Software ...
High Performance Data Analytics and a Java Grande Run Time
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC

Recently uploaded (20)

PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
STKI Israel Market Study 2025 version august
PPTX
TLE Review Electricity (Electricity).pptx
PPTX
1. Introduction to Computer Programming.pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Getting Started with Data Integration: FME Form 101
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
Tartificialntelligence_presentation.pptx
PDF
Architecture types and enterprise applications.pdf
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
OMC Textile Division Presentation 2021.pptx
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
STKI Israel Market Study 2025 version august
TLE Review Electricity (Electricity).pptx
1. Introduction to Computer Programming.pptx
NewMind AI Weekly Chronicles - August'25-Week II
Getting Started with Data Integration: FME Form 101
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
WOOl fibre morphology and structure.pdf for textiles
Web App vs Mobile App What Should You Build First.pdf
A novel scalable deep ensemble learning framework for big data classification...
Developing a website for English-speaking practice to English as a foreign la...
Zenith AI: Advanced Artificial Intelligence
1 - Historical Antecedents, Social Consideration.pdf
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Assigned Numbers - 2025 - Bluetooth® Document
Tartificialntelligence_presentation.pptx
Architecture types and enterprise applications.pdf
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Univ-Connecticut-ChatGPT-Presentaion.pdf
OMC Textile Division Presentation 2021.pptx

Big Data and Clouds: Research and Education