The game reviews db – from the dbpublic in
Projects Data
PBI Summary
Overall game analysis : It’s a booming
industry!
AGR:
nb_games = 185%
total_games_sold = 115%
The leader seems to be nintendo but
competition is growing fast
PBI Summary
Quantity vs Quality :
-critics have been more generous since
the 2010’s
-Is there a corelation between best
selling games and best critics scores?
I made up a comparing method using
ranks and the answer is : no!
I also asked chat GPT to evaluate :
-Pearson correlation = 0.072//0.062
-Spearman cor = 0.17 // 0.114
PBI Summary
And the winner is Nintendo!
The company started in 1985 with
Super Mario Bros 2nd best revenue
game of all time! The GOAT selling
game is the Wii sports in 2006 and the
best seller console is the Wii!
But they made a mistake when they
betrayed Sony in the early 90’s
because…
PBI Summary
…Trends are looking good for PS
More devs, more (exclusive) games,
more revenues!
And who knows maybe they will have
exclusive games that sale better than
Nintendo games ;)
The Data
Data available in datacamp for
students to practice their skills
How many games in game reviews:
SELECT COUNT(DISTINCT name) AS nb_games
FROM public.game_sales
Null values case + left join:
SELECT
SUM(CASE WHEN gs.name IS NULL THEN 1 ELSE 0 END) AS gs_name_null_value,
SUM(CASE WHEN r.critic_score IS NULL THEN 1 ELSE 0 END) AS critic_score_null_values,
SUM(CASE WHEN r.user_score IS NULL THEN 1 ELSE 0 END) AS user_score_null_values
FROM public.game_sales AS gs
LEFT JOIN reviews AS r ON gs.name = r.name;
Join to export raw data for an Excel analysis
SELECT gs.name, gs.platform, gs.publisher, gs.year, gs.games_sold, r.critic_score
FROM public.game_sales AS gs
LEFT JOIN public.reviews AS r
ON gs.name = r.name
WHERE r.critic_score IS NOT NULL
-nb_games_gs : 400 games
-nb_games_r : 400 games
-Null values in gs.name : 0
-Null values in cs : 1 / LJ : 31
-Null values in cs : 212 / LF : 222
-When Joined +30 Null values that we
can’t fill => only use cs!
-Many redundant values in name
(same company nut different names)
-Join to export csv or export both
tables to work on PBI (join col = name)
We can work with 369 rows!
Exploring
game_sales &
reviews
ANALYZING
game_sales &
reviews
game_sales basis stats:
SELECT COUNT(DISTINCT name) AS nb_games, SUM(games_sold) AS tot_sold,
MAX (games_sold) AS max_sold, MIN(games_sold) AS min_sold,
AVG(games_sold) AS avg_sold, STDDEV(games_sold) AS std_sold
FROM public.game_sales;
game_sales period:
SELECT MIN(year) AS first_year, MAX(year) AS last_year
FROM public.game_sales;
Reviews basic stats:
SELECT AVG(critic_score) AS avg_cs, MAX(critic_score) AS max_cs,
MIN(critic_score) AS min_cs, STDDEV(public.reviews.critic_score)
FROM public.reviews
WHERE critic_score IS NOT NULL
-gs overview =>
Between 1981 and 2020 400 games
generated 3 478.55M $, range 3,98-
82,90$, avg 8,70 (big gaps)
-r overview =>
AVG cs is 8,57! From 2 to 10 are critics
too generous?
AVG us is 7,72, from1,10 to 10, still
pretty high (but 212 null values!)
SALES Analysis in
game_sales
-In 2010, $219,30 total sales
-In 1981, $4,31 total sales
-Year with most games made ≠ Year
most selling
-Year most games 2011 with 26
-Year less games 1981 with 1
-AVG sales by year => 94,01$ (code in
notes)
-8/10 best selling games were made by
Nintendo !
Best and Worst selling games years:
WITH yearly_sales AS (
SELECT year, SUM(games_sold) AS total_sales
FROM public.game_sales
GROUP BY year),
max_min_sales AS (
SELECT year, total_sales, 'max' AS sales_type
FROM yearly_sales
WHERE total_sales = (SELECT MAX(total_sales) FROM yearly_sales)
UNION ALL
SELECT year, total_sales, 'min' AS sales_type
FROM yearly_sales
WHERE total_sales = (SELECT MIN(total_sales) FROM yearly_sales)
)
SELECT * FROM max_min_sales;
How many games per year? LIMIT 5:
SELECT year, COUNT(name) AS nb_games, SUM(games_sold) AS tot_sold
FROM public.game_sales
GROUP BY year
ORDER BY nb_games DESC
LIMIT 5;
Years Game industry sold x3 the avg_games_sold
SELECT year, name, publisher, SUM(games_sold) AS tot_sold
FROM public.game_sales
GROUP BY year,name, games_sold
HAVING games_sold > (SELECT AVG(games_sold)*3 FROM public.game_sales)
ORDER BY tot_sold DESC;
SALES Analysis in
game_sales by
year
-In 2011 made 26 games, 16 more
than the avg/year (10,81)
-Wii sports (from 2006) is the best
selling game ever, with 82.90M$ which
is 74M$ more than avg
-Worst selling game : Namco Museum
with 3.98M$ in 2005
Nb of published game by year vs the avg_year
-- Step 1: Count the number of games per year
WITH games_per_year AS (
SELECT year, COUNT(*) AS nb_games
FROM game_sales -- Using the correct table name
GROUP BY year
)
--Final query: Calculate the difference
SELECT gpy.year, gpy.nb_games,
gpy.nb_games - (SELECT COUNT(*) / COUNT(DISTINCT year) FROM
game_sales) AS diff_nb_games_avg
FROM games_per_year gpy
ORDER BY nb_games DESC;
Game sales compared to avg
SELECT year, name, SUM(games_sold) AS tot_sold,
SUM(games_sold) - AVG(games_sold) OVER () AS diff_tot_avg_sales
FROM public.game_sales
GROUP BY year, name, games_sold
ORDER BY diff_tot_avg_sales DESC;
Best and Worst selling games
SELECT name, games_sold, publisher, year
FROM public.game_sales
WHERE games_sold = (SELECT MAX(games_sold) FROM public.game_sales) OR
games_sold = (SELECT MIN(games_sold) FROM public.game_sales)
SALES Analysis in
game_sales by
platform &
console
-4/5 best selling platforms (consoles) is
Wii and the other one is NES
(Nintendo)
-In 2006 with 6 games, NINTENDO sold
151,78M$
-In 2014 PlayStation (PS) managed to
reach top 5 selling 84,51M$, with 12
games
-PS tends to make more games 7/10 in
the top 10, and 151 games made tot
since 1996 => avg of 6,3 per year (vs
nintendo (4,02)
Nb of games, and sales by platform over the years
SELECT year, platform, COUNT(name) AS nb_games, SUM(games_sold) AS tot_sold
FROM public.game_sales
GROUP by public.game_sales.platform, year
ORDER BY tot_sold DESC, year DESC
LIMIT 5;
Console sales and nb of published games
SELECT CASE
WHEN platform LIKE 'PS%' THEN 'Playstation’
WHEN platform LIKE 'X%' THEN 'Xbox’
WHEN platform = 'PC' THEN 'PC’
WHEN platform IN ('GEN','2600') THEN 'Other consoles’
ELSE 'NINTENDO’
END AS p_category, COUNT(name) AS nb_games, SUM(games_sold) AS tot_sold
FROM public.game_sales
GROUP BY p_category
ORDER BY tot_sold DESC, nb_games DESC;
SALES Analysis in
game_sales by
decade
-AVG growth rate :
nb_games = 185%
total_games_sold = 115%
Nb of games, sales and critic score by decade
SELECT CASE
WHEN gs.year BETWEEN 1980 AND 1989 THEN '1980s’
WHEN gs.year BETWEEN 1990 AND 1999 THEN '1990s’
WHEN gs.year BETWEEN 2000 AND 2009 THEN '2000s’
WHEN gs.year BETWEEN 2010 AND 2019 THEN '2010s’
ELSE NULL
END AS decade,
COUNT(gs.name) AS nb_games, SUM(gs.games_sold) AS total_games_sold, AVG(r.critic_score)
AS avg_critic_score
FROM public.game_sales AS gs
LEFT JOIN public.reviews AS r ON gs.name = r.name
WHERE r.critic_score IS NOT NULL
GROUP BY
CASE
WHEN gs.year BETWEEN 1980 AND 1989 THEN '1980s’
WHEN gs.year BETWEEN 1990 AND 1999 THEN '1990s’
WHEN gs.year BETWEEN 2000 AND 2009 THEN '2000s’
WHEN gs.year BETWEEN 2010 AND 2019 THEN '2010s’
ELSE NULL
END
HAVING
CASE
WHEN gs.year BETWEEN 1980 AND 1989 THEN '1980s’
WHEN gs.year BETWEEN 1990 AND 1999 THEN '1990s’
WHEN gs.year BETWEEN 2000 AND 2009 THEN '2000s’
WHEN gs.year BETWEEN 2010 AND 2019 THEN '2010s’
ELSE NULL
END IS NOT NULL
ORDER BY decade;
Quality Analysis
in reviews
-10/10 games for the critics : Super
Mario, Minecraft, GTA V , IV, Mario
Kart SNES, ZELDA-Link to the Past (3
NINTENDO!)
-10/10 games for users : Zelda - BOTW,
God of War, FF X, Zelda – Ocarina, RDR
(2 NIN & 2 PS)
Comparing critics vs users max and mins
WITH max_cte AS (
SELECT name,critic_score, user_score
FROM public.reviews
WHERE critic_score = (SELECT MAX(critic_score) FROM reviews) OR user_score =
(SELECT MAX(user_score) FROM reviews)
),
min_cte AS (
SELECT name,critic_score, user_score
FROM public.reviews
WHERE critic_score = (SELECT MIN(critic_score) FROM reviews) OR user_score =
(SELECT MIN(user_score) FROM reviews )
)
SELECT name, critic_score, user_score
FROM max_cte
UNION ALL
SELECT name, critic_score, user_score
FROM min_cte
Quality Analysis
in reviews
-XBOX has the best critic score avg !
But only 48 games
-Mojang is the best developer
according to cs 1 game a10/10.
-Rockstar Games is the best publisher
with 16 games and 9.5 critic score
Critic score by console
SELECT CASE
WHEN gs.platform LIKE 'PS%' THEN 'Playstation’
WHEN gs.platform LIKE 'X%' THEN 'Xbox’
WHEN gs.platform = 'PC' THEN 'PC’
WHEN gs.platform IN ('GEN','2600') THEN 'Other consoles’
ELSE 'NINTENDO’
END AS p_category, COUNT(r.critic_score), AVG(r.critic_score) AS avg_cs
FROM public.game_sales AS gs
LEFT JOIN public.reviews AS r ON gs.name = r.name
GROUP BY p_category
ORDER BY avg_cs DESC
Critic score avg and nb_games by publisher
SELECT gs.publisher, AVG(r.critic_score) AS avg_c_s, COUNT(gs.publisher) AS
nb_games, SUM (gs.games_sold) AS tot_sold
FROM public.game_sales AS gs
LEFT JOIN public.reviews AS r
ON gs.name=r.name
GROUP BY gs.publisher
HAVING AVG(r.critic_score) IS NOT NULL
ORDER BY avg_c_s DESC, nb_games DESC;
Quality Analysis
in reviews –
overall_score
Overall score = cs + us
-> remember many null values in us
-Ocarina/BOTW/GoW/RDR/ Mario
Galawy
-Nintendo, Sony Rockstar and
Activision have the best os
Best games overall_score
SELECT gs.name, (r.critic_score + r.user_score) AS overall_score, gs.publisher
FROM public.reviews AS r
LEFT JOIN public.game_sales AS gs
ON r.name = gs.name
WHERE (r.critic_score + r.user_score) IS NOT NULL
ORDER BY overall_score DESC
LIMIT 5;
Publishers by overall_score
SELECT gs.publisher, (r.critic_score + r.user_score) AS overall_score, RANK()
OVER(ORDER BY (r.critic_score + r.user_score) DESC)
FROM public.game_sales AS gs
LEFT JOIN public.reviews As r
ON gs.name = r.name
GROUP BY gs.publisher, (r.critic_score + r.user_score)
HAVING (r.critic_score + r.user_score) IS NOT NULL;
Is there a
correlation
between sales
and quality?
I decided to rank the critic score and
sales. Then I compared the diff
between both rankings and if =>
-diff less than 10 ‘close’
-diff between 10-30 ‘medium’
-diff more than 30 ‘huge’
Huge = 81%
Medium =13%
Close = 6%
Nothing suggest that biggest sales
make for better cs for this dataset
Best games overall_score
WITH gap_cte AS (
SELECT subquery.year, subquery.rank_sold, subquery.rank_cs,
CASE
WHEN ABS(subquery.rank_sold - subquery.rank_cs) <= 10 THEN
'close’
WHEN ABS(subquery.rank_sold - subquery.rank_cs) <= 30 THEN
'medium’
ELSE 'huge’
END AS gap_sold_cs
FROM (
SELECT gs.year,
RANK() OVER(ORDER BY gs.games_sold DESC) AS rank_sold,
RANK() OVER(ORDER BY r.critic_score DESC) AS rank_cs
FROM public.game_sales AS gs
LEFT JOIN public.reviews AS r ON gs.name = r.name
WHERE r.critic_score IS NOT NULL
) AS subquery
)
SELECT gap_sold_cs, COUNT(gap_sold_cs)
FROM gap_cte
GROUP BY gap_sold_cs

More Related Content

PPT
Independent Games Sales: Stats 101
PPT
Indie Game Metrics - October 2009 Update
DOCX
Video game sales analysis
PDF
Video Games Sales Analysis
PPTX
The Top Grossing Mobile Games: Dissected and Explained | Nebojsha Mitrikeski
PPTX
Similar Games Research
PPTX
X box vs ps3
DOCX
Marketing Research Report
Independent Games Sales: Stats 101
Indie Game Metrics - October 2009 Update
Video game sales analysis
Video Games Sales Analysis
The Top Grossing Mobile Games: Dissected and Explained | Nebojsha Mitrikeski
Similar Games Research
X box vs ps3
Marketing Research Report

Similar to SQL-PBI Portfolio - Game sales and critics DB (20)

PPTX
Gaming
PDF
Game Console Industry Report
DOCX
Morales EBM assignment 3
DOCX
Gaming Console Report By Fahad
PDF
Mysqldbrentalgamesdb
PDF
EEDAR Zatkin_Geoffrey_Awesome.pdf
PDF
Data Analytics Bootcamp - Sprint Presentatio n
PDF
Patrick Hess - Term Paper
PPSX
Introduction To Videogame Industry
DOCX
Brand audit report
PPTX
Game Industry - trends
PDF
Launching PC & Console Titles in the Ever-Changing Games Market in 2023
PDF
Minney play station
PDF
Recomendation Report
PPTX
Marketing plan for The FLOW
PPT
In-Game Advertising
PPTX
Understanding the Data from Steam Spy | Sergey Galyonkin
PDF
Gaming and eSports for Brands
PPTX
Game industry
PDF
Industria Gaming 2023
Gaming
Game Console Industry Report
Morales EBM assignment 3
Gaming Console Report By Fahad
Mysqldbrentalgamesdb
EEDAR Zatkin_Geoffrey_Awesome.pdf
Data Analytics Bootcamp - Sprint Presentatio n
Patrick Hess - Term Paper
Introduction To Videogame Industry
Brand audit report
Game Industry - trends
Launching PC & Console Titles in the Ever-Changing Games Market in 2023
Minney play station
Recomendation Report
Marketing plan for The FLOW
In-Game Advertising
Understanding the Data from Steam Spy | Sergey Galyonkin
Gaming and eSports for Brands
Game industry
Industria Gaming 2023
Ad

Recently uploaded (20)

PPTX
1 percent Clicks, percent Traffic Loss-Your SEO Stack Isn’t Built for AI
PDF
AI powered Digital Marketing- How AI changes
PPTX
Transform Your Business with Top Digital Marketing Services_EGlogics.pptx
DOCX
IREV Platform: Future of Affiliate Marketing
PDF
SEO vs. AEO: Optimizing for Google vs AI-Powered Search Assistants
PDF
Salmanubnu Zakariya P – Digital Marketer & Frontend Developer Portfolio
PPTX
Best LLM SEO Tools for B2B Brands in 2025
PPT
Market Segmentation and Positioning(3).ppt
PDF
How to Break Into AI Search with Andrew Holland
PDF
digital marketing courses online with od
PPTX
CH 1 AN INTRODUCTION OF INTEGRATED MARKETING COMMUNICATION (COMBINE)
DOCX
Auctioneer project lead by Ali Hasnain jappa
PDF
The B2B Startup Marketing Playbook - How To Build A Revenue-Generating B2B Ma...
PDF
The Role of Search Intent in Shaping SEO Strategies in 2025
PPTX
Strategic Sage Digital-The Professional Digital Marketing Company in Mohali.pptx
PPT
Market research before Marketing Research .PPT
PPTX
The Rise of Chatbots in Conversational Commerce.pptx
PPTX
B2B Marketplace India – Connect & Grow..
PPTX
Choose the Right SEO Agency India - 7 Key Tips by Clickbold Media
PPTX
CH 2 The Role of IMC in the Marketing Process (combined)
1 percent Clicks, percent Traffic Loss-Your SEO Stack Isn’t Built for AI
AI powered Digital Marketing- How AI changes
Transform Your Business with Top Digital Marketing Services_EGlogics.pptx
IREV Platform: Future of Affiliate Marketing
SEO vs. AEO: Optimizing for Google vs AI-Powered Search Assistants
Salmanubnu Zakariya P – Digital Marketer & Frontend Developer Portfolio
Best LLM SEO Tools for B2B Brands in 2025
Market Segmentation and Positioning(3).ppt
How to Break Into AI Search with Andrew Holland
digital marketing courses online with od
CH 1 AN INTRODUCTION OF INTEGRATED MARKETING COMMUNICATION (COMBINE)
Auctioneer project lead by Ali Hasnain jappa
The B2B Startup Marketing Playbook - How To Build A Revenue-Generating B2B Ma...
The Role of Search Intent in Shaping SEO Strategies in 2025
Strategic Sage Digital-The Professional Digital Marketing Company in Mohali.pptx
Market research before Marketing Research .PPT
The Rise of Chatbots in Conversational Commerce.pptx
B2B Marketplace India – Connect & Grow..
Choose the Right SEO Agency India - 7 Key Tips by Clickbold Media
CH 2 The Role of IMC in the Marketing Process (combined)
Ad

SQL-PBI Portfolio - Game sales and critics DB

  • 1. The game reviews db – from the dbpublic in Projects Data
  • 2. PBI Summary Overall game analysis : It’s a booming industry! AGR: nb_games = 185% total_games_sold = 115% The leader seems to be nintendo but competition is growing fast
  • 3. PBI Summary Quantity vs Quality : -critics have been more generous since the 2010’s -Is there a corelation between best selling games and best critics scores? I made up a comparing method using ranks and the answer is : no! I also asked chat GPT to evaluate : -Pearson correlation = 0.072//0.062 -Spearman cor = 0.17 // 0.114
  • 4. PBI Summary And the winner is Nintendo! The company started in 1985 with Super Mario Bros 2nd best revenue game of all time! The GOAT selling game is the Wii sports in 2006 and the best seller console is the Wii! But they made a mistake when they betrayed Sony in the early 90’s because…
  • 5. PBI Summary …Trends are looking good for PS More devs, more (exclusive) games, more revenues! And who knows maybe they will have exclusive games that sale better than Nintendo games ;)
  • 6. The Data Data available in datacamp for students to practice their skills
  • 7. How many games in game reviews: SELECT COUNT(DISTINCT name) AS nb_games FROM public.game_sales Null values case + left join: SELECT SUM(CASE WHEN gs.name IS NULL THEN 1 ELSE 0 END) AS gs_name_null_value, SUM(CASE WHEN r.critic_score IS NULL THEN 1 ELSE 0 END) AS critic_score_null_values, SUM(CASE WHEN r.user_score IS NULL THEN 1 ELSE 0 END) AS user_score_null_values FROM public.game_sales AS gs LEFT JOIN reviews AS r ON gs.name = r.name; Join to export raw data for an Excel analysis SELECT gs.name, gs.platform, gs.publisher, gs.year, gs.games_sold, r.critic_score FROM public.game_sales AS gs LEFT JOIN public.reviews AS r ON gs.name = r.name WHERE r.critic_score IS NOT NULL -nb_games_gs : 400 games -nb_games_r : 400 games -Null values in gs.name : 0 -Null values in cs : 1 / LJ : 31 -Null values in cs : 212 / LF : 222 -When Joined +30 Null values that we can’t fill => only use cs! -Many redundant values in name (same company nut different names) -Join to export csv or export both tables to work on PBI (join col = name) We can work with 369 rows! Exploring game_sales & reviews
  • 8. ANALYZING game_sales & reviews game_sales basis stats: SELECT COUNT(DISTINCT name) AS nb_games, SUM(games_sold) AS tot_sold, MAX (games_sold) AS max_sold, MIN(games_sold) AS min_sold, AVG(games_sold) AS avg_sold, STDDEV(games_sold) AS std_sold FROM public.game_sales; game_sales period: SELECT MIN(year) AS first_year, MAX(year) AS last_year FROM public.game_sales; Reviews basic stats: SELECT AVG(critic_score) AS avg_cs, MAX(critic_score) AS max_cs, MIN(critic_score) AS min_cs, STDDEV(public.reviews.critic_score) FROM public.reviews WHERE critic_score IS NOT NULL -gs overview => Between 1981 and 2020 400 games generated 3 478.55M $, range 3,98- 82,90$, avg 8,70 (big gaps) -r overview => AVG cs is 8,57! From 2 to 10 are critics too generous? AVG us is 7,72, from1,10 to 10, still pretty high (but 212 null values!)
  • 9. SALES Analysis in game_sales -In 2010, $219,30 total sales -In 1981, $4,31 total sales -Year with most games made ≠ Year most selling -Year most games 2011 with 26 -Year less games 1981 with 1 -AVG sales by year => 94,01$ (code in notes) -8/10 best selling games were made by Nintendo ! Best and Worst selling games years: WITH yearly_sales AS ( SELECT year, SUM(games_sold) AS total_sales FROM public.game_sales GROUP BY year), max_min_sales AS ( SELECT year, total_sales, 'max' AS sales_type FROM yearly_sales WHERE total_sales = (SELECT MAX(total_sales) FROM yearly_sales) UNION ALL SELECT year, total_sales, 'min' AS sales_type FROM yearly_sales WHERE total_sales = (SELECT MIN(total_sales) FROM yearly_sales) ) SELECT * FROM max_min_sales; How many games per year? LIMIT 5: SELECT year, COUNT(name) AS nb_games, SUM(games_sold) AS tot_sold FROM public.game_sales GROUP BY year ORDER BY nb_games DESC LIMIT 5; Years Game industry sold x3 the avg_games_sold SELECT year, name, publisher, SUM(games_sold) AS tot_sold FROM public.game_sales GROUP BY year,name, games_sold HAVING games_sold > (SELECT AVG(games_sold)*3 FROM public.game_sales) ORDER BY tot_sold DESC;
  • 10. SALES Analysis in game_sales by year -In 2011 made 26 games, 16 more than the avg/year (10,81) -Wii sports (from 2006) is the best selling game ever, with 82.90M$ which is 74M$ more than avg -Worst selling game : Namco Museum with 3.98M$ in 2005 Nb of published game by year vs the avg_year -- Step 1: Count the number of games per year WITH games_per_year AS ( SELECT year, COUNT(*) AS nb_games FROM game_sales -- Using the correct table name GROUP BY year ) --Final query: Calculate the difference SELECT gpy.year, gpy.nb_games, gpy.nb_games - (SELECT COUNT(*) / COUNT(DISTINCT year) FROM game_sales) AS diff_nb_games_avg FROM games_per_year gpy ORDER BY nb_games DESC; Game sales compared to avg SELECT year, name, SUM(games_sold) AS tot_sold, SUM(games_sold) - AVG(games_sold) OVER () AS diff_tot_avg_sales FROM public.game_sales GROUP BY year, name, games_sold ORDER BY diff_tot_avg_sales DESC; Best and Worst selling games SELECT name, games_sold, publisher, year FROM public.game_sales WHERE games_sold = (SELECT MAX(games_sold) FROM public.game_sales) OR games_sold = (SELECT MIN(games_sold) FROM public.game_sales)
  • 11. SALES Analysis in game_sales by platform & console -4/5 best selling platforms (consoles) is Wii and the other one is NES (Nintendo) -In 2006 with 6 games, NINTENDO sold 151,78M$ -In 2014 PlayStation (PS) managed to reach top 5 selling 84,51M$, with 12 games -PS tends to make more games 7/10 in the top 10, and 151 games made tot since 1996 => avg of 6,3 per year (vs nintendo (4,02) Nb of games, and sales by platform over the years SELECT year, platform, COUNT(name) AS nb_games, SUM(games_sold) AS tot_sold FROM public.game_sales GROUP by public.game_sales.platform, year ORDER BY tot_sold DESC, year DESC LIMIT 5; Console sales and nb of published games SELECT CASE WHEN platform LIKE 'PS%' THEN 'Playstation’ WHEN platform LIKE 'X%' THEN 'Xbox’ WHEN platform = 'PC' THEN 'PC’ WHEN platform IN ('GEN','2600') THEN 'Other consoles’ ELSE 'NINTENDO’ END AS p_category, COUNT(name) AS nb_games, SUM(games_sold) AS tot_sold FROM public.game_sales GROUP BY p_category ORDER BY tot_sold DESC, nb_games DESC;
  • 12. SALES Analysis in game_sales by decade -AVG growth rate : nb_games = 185% total_games_sold = 115% Nb of games, sales and critic score by decade SELECT CASE WHEN gs.year BETWEEN 1980 AND 1989 THEN '1980s’ WHEN gs.year BETWEEN 1990 AND 1999 THEN '1990s’ WHEN gs.year BETWEEN 2000 AND 2009 THEN '2000s’ WHEN gs.year BETWEEN 2010 AND 2019 THEN '2010s’ ELSE NULL END AS decade, COUNT(gs.name) AS nb_games, SUM(gs.games_sold) AS total_games_sold, AVG(r.critic_score) AS avg_critic_score FROM public.game_sales AS gs LEFT JOIN public.reviews AS r ON gs.name = r.name WHERE r.critic_score IS NOT NULL GROUP BY CASE WHEN gs.year BETWEEN 1980 AND 1989 THEN '1980s’ WHEN gs.year BETWEEN 1990 AND 1999 THEN '1990s’ WHEN gs.year BETWEEN 2000 AND 2009 THEN '2000s’ WHEN gs.year BETWEEN 2010 AND 2019 THEN '2010s’ ELSE NULL END HAVING CASE WHEN gs.year BETWEEN 1980 AND 1989 THEN '1980s’ WHEN gs.year BETWEEN 1990 AND 1999 THEN '1990s’ WHEN gs.year BETWEEN 2000 AND 2009 THEN '2000s’ WHEN gs.year BETWEEN 2010 AND 2019 THEN '2010s’ ELSE NULL END IS NOT NULL ORDER BY decade;
  • 13. Quality Analysis in reviews -10/10 games for the critics : Super Mario, Minecraft, GTA V , IV, Mario Kart SNES, ZELDA-Link to the Past (3 NINTENDO!) -10/10 games for users : Zelda - BOTW, God of War, FF X, Zelda – Ocarina, RDR (2 NIN & 2 PS) Comparing critics vs users max and mins WITH max_cte AS ( SELECT name,critic_score, user_score FROM public.reviews WHERE critic_score = (SELECT MAX(critic_score) FROM reviews) OR user_score = (SELECT MAX(user_score) FROM reviews) ), min_cte AS ( SELECT name,critic_score, user_score FROM public.reviews WHERE critic_score = (SELECT MIN(critic_score) FROM reviews) OR user_score = (SELECT MIN(user_score) FROM reviews ) ) SELECT name, critic_score, user_score FROM max_cte UNION ALL SELECT name, critic_score, user_score FROM min_cte
  • 14. Quality Analysis in reviews -XBOX has the best critic score avg ! But only 48 games -Mojang is the best developer according to cs 1 game a10/10. -Rockstar Games is the best publisher with 16 games and 9.5 critic score Critic score by console SELECT CASE WHEN gs.platform LIKE 'PS%' THEN 'Playstation’ WHEN gs.platform LIKE 'X%' THEN 'Xbox’ WHEN gs.platform = 'PC' THEN 'PC’ WHEN gs.platform IN ('GEN','2600') THEN 'Other consoles’ ELSE 'NINTENDO’ END AS p_category, COUNT(r.critic_score), AVG(r.critic_score) AS avg_cs FROM public.game_sales AS gs LEFT JOIN public.reviews AS r ON gs.name = r.name GROUP BY p_category ORDER BY avg_cs DESC Critic score avg and nb_games by publisher SELECT gs.publisher, AVG(r.critic_score) AS avg_c_s, COUNT(gs.publisher) AS nb_games, SUM (gs.games_sold) AS tot_sold FROM public.game_sales AS gs LEFT JOIN public.reviews AS r ON gs.name=r.name GROUP BY gs.publisher HAVING AVG(r.critic_score) IS NOT NULL ORDER BY avg_c_s DESC, nb_games DESC;
  • 15. Quality Analysis in reviews – overall_score Overall score = cs + us -> remember many null values in us -Ocarina/BOTW/GoW/RDR/ Mario Galawy -Nintendo, Sony Rockstar and Activision have the best os Best games overall_score SELECT gs.name, (r.critic_score + r.user_score) AS overall_score, gs.publisher FROM public.reviews AS r LEFT JOIN public.game_sales AS gs ON r.name = gs.name WHERE (r.critic_score + r.user_score) IS NOT NULL ORDER BY overall_score DESC LIMIT 5; Publishers by overall_score SELECT gs.publisher, (r.critic_score + r.user_score) AS overall_score, RANK() OVER(ORDER BY (r.critic_score + r.user_score) DESC) FROM public.game_sales AS gs LEFT JOIN public.reviews As r ON gs.name = r.name GROUP BY gs.publisher, (r.critic_score + r.user_score) HAVING (r.critic_score + r.user_score) IS NOT NULL;
  • 16. Is there a correlation between sales and quality? I decided to rank the critic score and sales. Then I compared the diff between both rankings and if => -diff less than 10 ‘close’ -diff between 10-30 ‘medium’ -diff more than 30 ‘huge’ Huge = 81% Medium =13% Close = 6% Nothing suggest that biggest sales make for better cs for this dataset Best games overall_score WITH gap_cte AS ( SELECT subquery.year, subquery.rank_sold, subquery.rank_cs, CASE WHEN ABS(subquery.rank_sold - subquery.rank_cs) <= 10 THEN 'close’ WHEN ABS(subquery.rank_sold - subquery.rank_cs) <= 30 THEN 'medium’ ELSE 'huge’ END AS gap_sold_cs FROM ( SELECT gs.year, RANK() OVER(ORDER BY gs.games_sold DESC) AS rank_sold, RANK() OVER(ORDER BY r.critic_score DESC) AS rank_cs FROM public.game_sales AS gs LEFT JOIN public.reviews AS r ON gs.name = r.name WHERE r.critic_score IS NOT NULL ) AS subquery ) SELECT gap_sold_cs, COUNT(gap_sold_cs) FROM gap_cte GROUP BY gap_sold_cs

Editor's Notes

  • #1: --Data set (fulljoin = 431 rows) SELECT * FROM public.game_sales AS gs LEFT JOIN public.reviews AS r ON gs.name=r.name WHERE gs.name IS NOT NULL AND critic_score IS NOT NULL
  • #8: --What year max sales? What year min sales? WITH yearly_sales AS ( SELECT year, SUM(games_sold) AS total_sales FROM public.game_sales GROUP BY year ), max_sales AS ( SELECT year, total_sales FROM yearly_sales WHERE total_sales = (SELECT MAX(total_sales) FROM yearly_sales) ), min_sales AS ( SELECT year, total_sales FROM yearly_sales WHERE total_sales = (SELECT MIN(total_sales) FROM yearly_sales) ) SELECT * FROM max_sales UNION ALL SELECT * FROM min_sales;
  • #9: AVG sales by year: WITH tot_sold_cte AS ( SELECT year, SUM(games_sold) AS tot_sold FROM public.game_sales GROUP BY year ) SELECT AVG(tot_sold) FROM tot_sold_cte;
  • #11: --Platform sales over the years? SELECT year, CASE WHEN platform LIKE 'PS%' THEN 'Playstation' WHEN platform LIKE 'X%' THEN 'Xbox' WHEN platform = 'PC' THEN 'PC' WHEN platform IN ('GEN','2600') THEN 'Other consoles' ELSE 'NINTENDO' END AS p_category, COUNT(name) AS nb_games, SUM(games_sold) AS tot_sold FROM public.game_sales GROUP BY p_category,year ORDER BY tot_sold DESC, nb_games DESC;
  • #12: --Platform sales over the years? SELECT year, CASE WHEN platform LIKE 'PS%' THEN 'Playstation' WHEN platform LIKE 'X%' THEN 'Xbox' WHEN platform = 'PC' THEN 'PC' WHEN platform IN ('GEN','2600') THEN 'Other consoles' ELSE 'NINTENDO' END AS p_category, COUNT(name) AS nb_games, SUM(games_sold) AS tot_sold FROM public.game_sales GROUP BY p_category,year ORDER BY tot_sold DESC, nb_games DESC;
  • #13: --BEST and WORST games WITH max_critic AS ( SELECT name, critic_score FROM reviews WHERE critic_score = (SELECT MAX(critic_score) FROM reviews) LIMIT 1 ), min_critic AS ( SELECT name, critic_score FROM reviews WHERE critic_score = (SELECT MIN(critic_score) FROM reviews) LIMIT 1 ), max_user AS ( SELECT name, user_score FROM reviews WHERE user_score = (SELECT MAX(user_score) FROM reviews) LIMIT 1 ), min_user AS ( SELECT name, user_score FROM reviews WHERE user_score = (SELECT MIN(user_score) FROM reviews) LIMIT 1 ) SELECT (SELECT name FROM max_critic) AS game_with_max_critic_score, (SELECT critic_score FROM max_critic) AS max_critic_score, (SELECT name FROM min_critic) AS game_with_min_critic_score, (SELECT critic_score FROM min_critic) AS min_critic_score, (SELECT name FROM max_user) AS game_with_max_user_score, (SELECT user_score FROM max_user) AS max_user_score, (SELECT name FROM min_user) AS game_with_min_user_score, (SELECT user_score FROM min_user) AS min_user_score;
  • #14: -- avg cs by developer // FILTER null val in scores SELECT gs.developer, AVG(r.critic_score) AS avg_c_s, SUM(gs.games_sold) AS tot_sold FROM public.game_sales AS gs LEFT JOIN public.reviews AS r ON gs.name=r.name GROUP BY gs.developer HAVING AVG(r.critic_score) IS NOT NULL ORDER BY avg_c_s DESC, tot_sold DESC --cs for nintendo games SELECT gs.name, r.critic_score FROM public.game_sales AS gs LEFT JOIN public.reviews AS r ON gs.name = r.name WHERE gs.publisher = 'Nintendo' AND r.critic_score IS NOT NULL ORDER BY r.critic_score DESC;
  • #15: WITH j_cte AS ( SELECT gs.publisher, r.critic_score, r.user_score FROM public.game_sales AS gs LEFT JOIN public.reviews AS r ON gs.name = r.name ) SELECT publisher, AVG(critic_score + user_score) AS overall_score, RANK() OVER (ORDER BY AVG(critic_score + user_score) DESC) AS rank FROM j_cte GROUP BY j_cte.publisher HAVING AVG(critic_score + user_score) IS NOT NULL ORDER BY overall_score DESC; -- Are years with biggest sales also years with os review success CTE WITH max_sold AS ( SELECT name, games_sold, year FROM public.game_sales WHERE games_sold = (SELECT MAX(games_sold) FROM public.game_sales) ), max_cs AS ( SELECT r.name, r.critic_score, gs.year FROM public.reviews AS r LEFT JOIN public.game_sales AS gs ON r.name = gs.name WHERE r.critic_score = (SELECT MAX(critic_score) FROM public.reviews) ) SELECT max_sold.name AS name_max_sold, max_sold.year AS year_max_sold, max_sold.games_sold AS max_sold, max_cs.name AS name_max_cs, max_cs.year AS year_max_cs, max_cs.critic_score AS max_cs FROM max_sold, max_cs;
  • #16: STEP 1 was --you want a col year, rank sales, rank cs, IF (ranksales - rankrank) cs is less than 10 then 'close' less than 30 'medium' else 'a lot' SELECT year, RANK() OVER(ORDER BY games_sold DESC) AS rank_sold, RANK() OVER(ORDER BY critic_score DESC) AS rank_cs, CASE WHEN ABS(rank_sold - rank_cs) <= 10 THEN 'close' WHEN ABS(rank_sold - rank_cs) <= 30 THEN 'medium' ELSE 'huge' END AS gap_sold_cs FROM public.game_sales AS gs LEFT JOIN public.reviews AS r ON gs.name = r.name WHERE r.critic_score IS NOT NULL STËP 2 WAS -- Compare rank of sales and critic scores to check for correlation SELECT subquery.year, subquery.rank_sold, subquery.rank_cs, CASE WHEN ABS(subquery.rank_sold - subquery.rank_cs) <= 10 THEN 'close' WHEN ABS(subquery.rank_sold - subquery.rank_cs) <= 30 THEN 'medium' ELSE 'huge' END AS gap_sold_cs FROM ( SELECT gs.year, RANK() OVER(ORDER BY gs.games_sold DESC) AS rank_sold, RANK() OVER(ORDER BY r.critic_score DESC) AS rank_cs FROM public.game_sales AS gs LEFT JOIN public.reviews AS r ON gs.name = r.name WHERE r.critic_score IS NOT NULL ) AS subquery;