SlideShare a Scribd company logo
Major
Project-1
Presented by:
500086300- Siddharth
Sankar
500083986- Aryaman Malik
500082783- Vishnu Madhav
500083036- Himanshi Tyagi
Mentored By:
Shaurya Gupta
Activity
Coordinator :
Nitika Nigam
Data Analytics End to End Data Engineering
Project
Contents
• Introduction
• Objectives
• Literature Review
• Methodology
• References
• Big data refers to massive datasets that traditional information systems
and processes cannot analyze. Companies like Uber, Google, YouTube,
Facebook, Amazon, and Alibaba generate and store petabytes of
unstructured data every minute. This data can be used for
recommendations and to gain insights into markets and businesses.
• A project such as this can analyze Uber data such as pick-up/drop location,
average time for a ride, average cost of a ride, etc are invaluable data
points that different cab aggregators can use to expand their business.
• Uber is a transportation network company that was founded in 2009 by
Travis Kalanick and Garrett Camp. It has since become one of the most
recognizable and influential companies in the gig economy and the tech
industry. It gathers petabytes of data on a daily basis.
INTRODUCTIO
N
Motivation
• To help taxi companies build
and organize their fleet in a
more efficient and focused
manner using data analysis of
region wise data of taxi rides.
Problem Statement
• Finding the most suitable way
for cab aggregators to expand
into the Electric Vehicle Era.
Area of Application
• Visualization of data for a quick
understanding of the best
region and most suitable routes
for cab aggregators in order to
maximize profits.
Motivation
• There is growing competition
amongst cab aggregators.
• These Companies can use such data
pipelines and dashboards to customize
their business model and maximize
profits.
Problem
Statement
Cab Aggregators are transitioning from
traditional cars to electric vehicles. There
will be a huge difference on how these
electric cabs will operate when compared
to the older cabs.
This dashboard will showcase how to
efficiently place the cabs in a region in such
a way that it’s both efficient and profiting.
Area of
Application
• Cab aggregator company can apply the insights and tools developed in this project to
optimize routes, enhance customer experiences, and improve operational efficiency.
• New companies looking to come into this business can also gain insights from this
project.
Literature
Review
Uber is committed to delivering safer and more reliable transportation across our
global markets. To accomplish this, Uber relies heavily on making data-driven
decisions at every level, from forecasting rider demand during high traffic events
to identifying and addressing bottlenecks in our driver-partner sign-up process.
Over time, the need for more insights has resulted in over 100 petabytes of
analytical data that needs to be cleaned, stored, and served with minimum latency
through our Apache Hadoop® based Big Data platform. Since 2014, we have
worked to develop a Big Data solution that ensures data reliability, scalability, and
ease-of-use, and are now focusing on increasing our platform’s speed and
efficiency.
Literature
Review
Using Hadoop, we can analyze the sentiment analysis of twitter data. hence this
is termed as opinion mining. The general attitude of the people can be analyzed
using twitter data i.e. positive or negative or neutral. The significant analysis of
this twitter data is to classify and categorize based on the polarity of the words.
First the data sets are collected from the twitter using twitter streaming API. This
twitter data will be stored in HDFS in specified format. Again, the data was
transferred to mapper in map reduce programming approach. This twitter data
was processed by using java and distributed processing software framework and
by using map reduce programming model and Apache hive frame work. Finally,
we can represent the analysis of twitter data in the form of positive, negative
and neutral tweets.
Data Set And
Input Format
The data used in the attached datasets were collected and provided to the NYC Taxi and Limousine Commission (TLC) by
technology providers authorized under the Taxicab & Livery Passenger Enhancement Programs (TPEP/LPEP).
For-Hire Vehicle (“FHV”) trip records include fields capturing the dispatching base license number and the pick-up date, time,
and taxi zone location ID (shape file below). These records are generated from the FHV Trip Record submissions made by
bases.
Data Model And Facts and Dimension
Table
METHODOLOGY
The dataset is downloaded and uploaded to a Google Cloud Storage , once that is done a data lake is
created. Then using Mage we crawl through our data and perform ETL jobs. After this data cleaning and
pre processing is done using Jupyter. Then querying analysis is done using BigQuery and finally
visualization will be done using Looker Studio.
Data Analytics Uber using google cloud and dashboard
SWOT ANALYSIS STRENGTH:
Uber has the resources to invest in the latest data analytics
technologies.
Uber has a team of experienced data scientists and engineers
who can develop and implement the data analytics project.
Uber has a vast amount of data that can be used to gain
valuable insights into its business operations.
WEAKNESS:
Uber's data is spread across many different systems. This can
make it difficult to access and analyze the data.
Uber's data is often incomplete or inaccurate. This can impact the
quality of the data analysis.
Uber does not have a well-defined data governance process. This
can lead to data security and privacy risks.
OPPORTUNITIES:
Uber can use data analytics to improve its business operations in
a number of ways, such as optimizing driver scheduling,
reducing fraud, and developing new products and services.
THREAT:
Other ride-sharing companies are also investing in data analytics. This
could make it more difficult for Uber to maintain its competitive
advantage.
Creating Bucket in Google
Cloud
Creating Dictionary
Adding Credentials
Exporting Data to BigQuery
Mage Pipeline
Data Loaded Into
BigQuery
Query To get The Table for Analysis
Final Table
Final
Dashboard
Data Analytics Uber using google cloud and dashboard
Reference
s
• Reza , R. (2014) Uber’s Big Data Platform: 100+ Petabytes with
Minute Latency. Available at:
https://www.uber.com/en-IN/blog/uber-big-data-platform/
(Accessed: 2023).
• Ajinkya Ingle, Anjali Kante, Shriya Samak, Anita Kumari. 2005.
Sentiment Analysis of Twitter Data Using Big Data Tools.
http://www.pnrsolution.org/. [Online]. May 2016.
http://www.pnrsolution.org/Datacenter/Vol3/Issue6/18.pdf
• H. Li, X. Cheng, and J. Liu, “Understanding video sharing
propagation in social networks: Measurement and analysis,” ACM
Trans. Multimed. Comput. Commun. Appl. TOMM, vol. 10, no. 4, p.
33, 2014

More Related Content

PPTX
Big data solutions on cloud – the way forward
PPTX
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
PDF
Hadoop,Big Data Analytics and More
PDF
Big Data
PPTX
[Strata NYC 2019] Turning big data into knowledge: Managing metadata and data...
PDF
Digital Supply Chain Platforms – Extending Information, Intelligence and Busi...
PPTX
Enabling the Connected Car Revolution

PDF
Autonomous Driving: The Big Data Value Myth
Big data solutions on cloud – the way forward
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
Hadoop,Big Data Analytics and More
Big Data
[Strata NYC 2019] Turning big data into knowledge: Managing metadata and data...
Digital Supply Chain Platforms – Extending Information, Intelligence and Busi...
Enabling the Connected Car Revolution

Autonomous Driving: The Big Data Value Myth

Similar to Data Analytics Uber using google cloud and dashboard (20)

PDF
Hire Taxi Booking App Development Company
PDF
Big data Introduction by Mohan
PDF
CarStream: An Industrial System of Big Data Processing for Internet of Vehicles
PPTX
BIG Data & Hadoop Applications in Logistics
PPTX
Data Analytics in Digital Transformation
PDF
Platform for Comprehensive Vendor Research & Analysis
PPTX
What Drives the Car Business: Moving from Anecdotes to Data
PDF
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
PPTX
Migrating to the Cloud
PDF
Vertica Analytics Database general overview
PDF
Tableau reseller partner in Bangladesh Bilytica Best business Intelligence Co...
PDF
TTableau reseller partner in Benin Bilytica Best business Intelligence Compa...
PDF
Tableau reseller partner in Andorra Bilytica Best business Intelligence compa...
PDF
Tableau reseller partner in Austria Bilytica Best business Intelligence compa...
PDF
Tableau reseller partner in Algeria Bilytica Best business Intelligence compa...
PDF
Tableau reseller partner in Armenia Bilytica Best business Intelligence comp...
PDF
Tableau reseller partner in Qatar Bilytica Best business Intelligence company...
PDF
Tableau reseller partner in Angola Bilytica Best business Intelligence compan...
PDF
Tableau reseller partner in Afghanistan Bilytica Best business Intelligence c...
PDF
Tableau reseller-partner-in-benin-bilytica-best-business-intelligence-company...
Hire Taxi Booking App Development Company
Big data Introduction by Mohan
CarStream: An Industrial System of Big Data Processing for Internet of Vehicles
BIG Data & Hadoop Applications in Logistics
Data Analytics in Digital Transformation
Platform for Comprehensive Vendor Research & Analysis
What Drives the Car Business: Moving from Anecdotes to Data
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Migrating to the Cloud
Vertica Analytics Database general overview
Tableau reseller partner in Bangladesh Bilytica Best business Intelligence Co...
TTableau reseller partner in Benin Bilytica Best business Intelligence Compa...
Tableau reseller partner in Andorra Bilytica Best business Intelligence compa...
Tableau reseller partner in Austria Bilytica Best business Intelligence compa...
Tableau reseller partner in Algeria Bilytica Best business Intelligence compa...
Tableau reseller partner in Armenia Bilytica Best business Intelligence comp...
Tableau reseller partner in Qatar Bilytica Best business Intelligence company...
Tableau reseller partner in Angola Bilytica Best business Intelligence compan...
Tableau reseller partner in Afghanistan Bilytica Best business Intelligence c...
Tableau reseller-partner-in-benin-bilytica-best-business-intelligence-company...
Ad

Recently uploaded (20)

PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
1_Introduction to advance data techniques.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPT
Quality review (1)_presentation of this 21
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Mega Projects Data Mega Projects Data
PDF
Lecture1 pattern recognition............
PPTX
Database Infoormation System (DBIS).pptx
PDF
annual-report-2024-2025 original latest.
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
climate analysis of Dhaka ,Banglades.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
1_Introduction to advance data techniques.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
ISS -ESG Data flows What is ESG and HowHow
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Quality review (1)_presentation of this 21
IB Computer Science - Internal Assessment.pptx
Qualitative Qantitative and Mixed Methods.pptx
Miokarditis (Inflamasi pada Otot Jantung)
STUDY DESIGN details- Lt Col Maksud (21).pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
IBA_Chapter_11_Slides_Final_Accessible.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Mega Projects Data Mega Projects Data
Lecture1 pattern recognition............
Database Infoormation System (DBIS).pptx
annual-report-2024-2025 original latest.
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Ad

Data Analytics Uber using google cloud and dashboard

  • 1. Major Project-1 Presented by: 500086300- Siddharth Sankar 500083986- Aryaman Malik 500082783- Vishnu Madhav 500083036- Himanshi Tyagi Mentored By: Shaurya Gupta Activity Coordinator : Nitika Nigam Data Analytics End to End Data Engineering Project
  • 2. Contents • Introduction • Objectives • Literature Review • Methodology • References
  • 3. • Big data refers to massive datasets that traditional information systems and processes cannot analyze. Companies like Uber, Google, YouTube, Facebook, Amazon, and Alibaba generate and store petabytes of unstructured data every minute. This data can be used for recommendations and to gain insights into markets and businesses. • A project such as this can analyze Uber data such as pick-up/drop location, average time for a ride, average cost of a ride, etc are invaluable data points that different cab aggregators can use to expand their business. • Uber is a transportation network company that was founded in 2009 by Travis Kalanick and Garrett Camp. It has since become one of the most recognizable and influential companies in the gig economy and the tech industry. It gathers petabytes of data on a daily basis. INTRODUCTIO N
  • 4. Motivation • To help taxi companies build and organize their fleet in a more efficient and focused manner using data analysis of region wise data of taxi rides. Problem Statement • Finding the most suitable way for cab aggregators to expand into the Electric Vehicle Era. Area of Application • Visualization of data for a quick understanding of the best region and most suitable routes for cab aggregators in order to maximize profits.
  • 5. Motivation • There is growing competition amongst cab aggregators. • These Companies can use such data pipelines and dashboards to customize their business model and maximize profits.
  • 6. Problem Statement Cab Aggregators are transitioning from traditional cars to electric vehicles. There will be a huge difference on how these electric cabs will operate when compared to the older cabs. This dashboard will showcase how to efficiently place the cabs in a region in such a way that it’s both efficient and profiting.
  • 7. Area of Application • Cab aggregator company can apply the insights and tools developed in this project to optimize routes, enhance customer experiences, and improve operational efficiency. • New companies looking to come into this business can also gain insights from this project.
  • 8. Literature Review Uber is committed to delivering safer and more reliable transportation across our global markets. To accomplish this, Uber relies heavily on making data-driven decisions at every level, from forecasting rider demand during high traffic events to identifying and addressing bottlenecks in our driver-partner sign-up process. Over time, the need for more insights has resulted in over 100 petabytes of analytical data that needs to be cleaned, stored, and served with minimum latency through our Apache Hadoop® based Big Data platform. Since 2014, we have worked to develop a Big Data solution that ensures data reliability, scalability, and ease-of-use, and are now focusing on increasing our platform’s speed and efficiency.
  • 9. Literature Review Using Hadoop, we can analyze the sentiment analysis of twitter data. hence this is termed as opinion mining. The general attitude of the people can be analyzed using twitter data i.e. positive or negative or neutral. The significant analysis of this twitter data is to classify and categorize based on the polarity of the words. First the data sets are collected from the twitter using twitter streaming API. This twitter data will be stored in HDFS in specified format. Again, the data was transferred to mapper in map reduce programming approach. This twitter data was processed by using java and distributed processing software framework and by using map reduce programming model and Apache hive frame work. Finally, we can represent the analysis of twitter data in the form of positive, negative and neutral tweets.
  • 10. Data Set And Input Format The data used in the attached datasets were collected and provided to the NYC Taxi and Limousine Commission (TLC) by technology providers authorized under the Taxicab & Livery Passenger Enhancement Programs (TPEP/LPEP). For-Hire Vehicle (“FHV”) trip records include fields capturing the dispatching base license number and the pick-up date, time, and taxi zone location ID (shape file below). These records are generated from the FHV Trip Record submissions made by bases.
  • 11. Data Model And Facts and Dimension Table
  • 12. METHODOLOGY The dataset is downloaded and uploaded to a Google Cloud Storage , once that is done a data lake is created. Then using Mage we crawl through our data and perform ETL jobs. After this data cleaning and pre processing is done using Jupyter. Then querying analysis is done using BigQuery and finally visualization will be done using Looker Studio.
  • 14. SWOT ANALYSIS STRENGTH: Uber has the resources to invest in the latest data analytics technologies. Uber has a team of experienced data scientists and engineers who can develop and implement the data analytics project. Uber has a vast amount of data that can be used to gain valuable insights into its business operations. WEAKNESS: Uber's data is spread across many different systems. This can make it difficult to access and analyze the data. Uber's data is often incomplete or inaccurate. This can impact the quality of the data analysis. Uber does not have a well-defined data governance process. This can lead to data security and privacy risks. OPPORTUNITIES: Uber can use data analytics to improve its business operations in a number of ways, such as optimizing driver scheduling, reducing fraud, and developing new products and services. THREAT: Other ride-sharing companies are also investing in data analytics. This could make it more difficult for Uber to maintain its competitive advantage.
  • 15. Creating Bucket in Google Cloud
  • 18. Exporting Data to BigQuery
  • 21. Query To get The Table for Analysis
  • 25. Reference s • Reza , R. (2014) Uber’s Big Data Platform: 100+ Petabytes with Minute Latency. Available at: https://www.uber.com/en-IN/blog/uber-big-data-platform/ (Accessed: 2023). • Ajinkya Ingle, Anjali Kante, Shriya Samak, Anita Kumari. 2005. Sentiment Analysis of Twitter Data Using Big Data Tools. http://www.pnrsolution.org/. [Online]. May 2016. http://www.pnrsolution.org/Datacenter/Vol3/Issue6/18.pdf • H. Li, X. Cheng, and J. Liu, “Understanding video sharing propagation in social networks: Measurement and analysis,” ACM Trans. Multimed. Comput. Commun. Appl. TOMM, vol. 10, no. 4, p. 33, 2014