SlideShare a Scribd company logo
DATA-CON BOSTON 2014 
TANYA CASHORALI 
@TANYACASH21 
This presentation and the information contained herein is confidential. By your acceptance and review of this presentation, each recipient agrees that it will not, 
and shall assure that its employees, agents, representatives and advisors will not copy, reproduce or distribute this presentation, in whole or in part, without the 
prior written consent of Comniscient Technologies LLC, and that it will keep confidential all information contained herein which is not already in the public 
domain. Further, the recipient will use the presentation only to obtain background information on the Company and its activities.
2 THE TELECOMMUNICATIONS 
MARKET IS FULLY SATURATED 
67% of new subscriber growth comes from switchers.* 
Last year, switchers put $29B up for grabs in the 
wireless industry. 
WINNING SWITCHERS IS KEY FOR 
GROWTH 
*Percentage of net adds that are not activations, tablets or wholesale 
Real Time Market Data & Analysis for the Telecom Industry
3 CARRIER VISION – THE WHAT, 
WHERE, WHEN 
Real Time Market Data & Analysis for the Telecom Industry
4 VOICE OF THE CUSTOMER – 
THE WHY 
Real Time Market Data & Analysis for the Telecom Industry
5 OBSERVED DATA ON TWITTER 
Real Time Market Data & Analysis for the Telecom Industry
6 STARTED GATHERING DATA USING 
STREAMR 
Real Time Market Data & Analysis for the Telecom Industry
7 HUMAN SCORED DATA 
We “S-scored” about 150-500 tweets per day until we had ~4,000 human scored tweets. 
Used this data set to learn how to systematically crowd source the same process and also 
automate the process using machine scoring “M-scoring” in R 
Training Set 
of 4,000 
tweets 
Real Time Market Data & Analysis for the Telecom Industry 
Crowd 
sourced 
~80,000 
tweets 
Derived rules 
M-scoring 
rules in R
8 CLASSIFY CARRIERS 
Real Time Market Data & Analysis for the Telecom Industry
9 NLP AND SENTIMENT ANALYSIS IS 
HARD 
Real Time Market Data & Analysis for the Telecom Industry
10 BASIC PATTERN MATCHING IS 
COMPLICATED ENOUGH 
• Phone Price 
• Customer Service 
• Coverage Quality 
• Upgrade Plan 
• Coverage Availability 
• Family Plan 
• Plan Price 
• Service Promo 
• Outage 
• Data Plan 
• Phone Availability 
• Device Promo 
• Switching to / from 
Real Time Market Data & Analysis for the Telecom Industry 
iPhone + expensive 
Samsung Galaxy + money 
Phone + cost 
Data, unlimited 
Switch from Verizon to ATT
Real Time Market Data & Analysis for the Telecom Industry 
11
12 M-SCORING EXAMPLES 
Phone 
iPhone 6 Switch to T-Mobile 
Real Time Market Data & Analysis for the Telecom Industry 
to T-Mobile Mention 
Switch from [carrier1] to [carrier2] 
Switch from [carrier1] to [carrier2] 
Switch from [carrier1] 
Switch to [carrier1]
13 M-SCORING EXAMPLES 
T-Mobile 
Mention 
Phone 
Real Time Market Data & Analysis for the Telecom Industry 
Switch from ATT 
From ATT 
Switch from [carrier1] to [carrier2] 
Switch from [carrier1] because I love [carrier2] 
Assume carrier2 is the ‘switch to’ carrier
14 CATEGORY CLASSIFICATION 
Real Time Market Data & Analysis for the Telecom Industry
15 SWITCHING TO/FROM 
Real Time Market Data & Analysis for the Telecom Industry
16 ARE WE IMPROVING? 
Sensitivity = TP / (TP + FN) 
Specificity = TN / (TN + FP) 
Real Time Market Data & Analysis for the Telecom Industry
17 MATURING INTO A REAL PRODUCT 
Real Time Market Data & Analysis for the Telecom Industry
18 GNIP PROCESS 
Table Daily Average Total 
Wireless 655 111,483 
Wireline 152 25,264 
Other 11,803 2,006,649 
Real Time Market Data & Analysis for the Telecom Industry
STREAMR VS GNIP VS DATASIFT 
Real Time Market Data & Analysis for the Telecom Industry 
19 
StreamR GNIP DataSift 
Full firehose data - X X 
Historical Twitter 
- X X 
data 
Real-time X X X 
Data Destinations - - X 
Data Buffer - 5 days ($$$) 2 hours 
(included) 
Demographics - - X 
Sentiment - - X 
Gender Detection - - X 
Topic Detection - - X 
Entity Extraction - - X 
Link Analysis - - X
Deployment Architecture Comlinkdata 
JSON 
Real Time Market Data & Analysis for the Telecom Industry 
20 
Data Source Amazon RDS 
GNIP 
Amazon EC2 
Technology Stack 
R, PostgreSQL PostgreSQL 
Kantar 
Events 
PostgreSQL 
/home/ubuntu/Documents/Scripts/TwitterScrapes/GNIP/GNIP 
Extract.R 
/home/ubuntu/Documents/Scripts/TwitterScrapes/voc-dataops/ 
GNIP/Json_Interpreter.R 
/home/ubuntu/Documents/Scripts/TwitterScrapes/voc-dataops/M-Scoring/ 
simple_MScore_GNIP.R 
/home/ubuntu/Documents/Scripts/TwitterScrapes/voc-dataops/ 
Aggregates/af_tweets.R 
HTTPS 
Application and Web Servers 
MG 
Tomcat 
Internet 
Chrome, Safari, Firefox, 
IE 
Web Service 
(REST) GUI 
Java 6, Spring 
(Security, MVC, 
JDBC) 
Angular JS/Ajax, 
HTML5, CSS3, D3, 
Bootstrap 
Shiny 
server@ubuntu 
gnip_wireless_raw 
gnip_landline_raw 
gnip_other_raw 
tweet_mwl 
tweet_mll 
a_tweets 
f_tweets 
/home/ubuntu/GNIP/data/
21 INITIAL PROTOTYPE IN TABLEAU 
Real Time Market Data & Analysis for the Telecom Industry
22 WHAT WE LEARNED 
• Always store raw unprocessed data 
somewhere 
• Beware of UTF encodings and special 
characters 
• Ensure time zones are synched across 
databases / applications 
• Don’t be afraid to cast a larger net of tweets 
given the ~1M tweet/month limit provided by 
most vendors 
• Consider how to deal with blast tweets, 
retweets. Source can be used to help 
identify blasts (TweetCaster, Scoop.it, etc.) 
Real Time Market Data & Analysis for the Telecom Industry
23 ENTIRE PRODUCT LIFECYCLE 
Real Time Market Data & Analysis for the Telecom Industry
24 CONSIDERATIONS 
Many variations of changing carriers: 
• Bye Sprint! 
• Getting rid of Verizon 
• Peace out T-Mobile 
• Going to AT&T 
• Twitter data is not necessarily representative of the entire population 
• Other languages 
• Geo-tagged is only ~5% 
• Expanding to Canada 
Real Time Market Data & Analysis for the Telecom Industry
25 FUTURE WORK 
• Migrate to DataSift from GNIP 
• Improve M-scoring using more complex NLP algorithms 
• Integrate additional data sources including downdetector.com, ad spend, and other 
relevant sources 
• Integrate additional Twitter data (mentions to carriers as well as tweets from the 
carriers, various language around switching) 
• Develop landline version and Canada version 
• Trending words / Keyword search 
• ??? 
Real Time Market Data & Analysis for the Telecom Industry
26 ACKNOWLEDGEMENTS 
Real Time Market Data & Analysis for the Telecom Industry 
Jacob Tobias 
Software Developer 
Sarah Bolt 
Marketing 
Manager 
Dylan Doyle 
Data Scientist 
Mallorie Ekstrom 
Graphic Designer 
Alan Tam 
Web Developer 
Josh Walker 
COO 
Ken Yeoh 
Data Engineer 
For more information, hiring, or 
questions, email 
contact@comlinkdata.com
27 REFERENCES 
• streamR - http://cran.r-project.org/web/packages/streamR/index.html 
• sqldf - http://cran.r-project.org/web/packages/sqldf/index.html 
• GNIP - http://gnip.com/ 
• DataSift - http://datasift.com/ 
Real Time Market Data & Analysis for the Telecom Industry

More Related Content

PDF
Smart Metering in Europe
PDF
Superior User Experience & Network Performance
PDF
2015 review & 2016 forecast to optical transport equipment market
PDF
Remote Terminal Units Market - Report present a clear picture of assessments ...
PPT
Technology Cost Optimization Strategies
PPTX
Building information modelling market 1
PDF
Smart Metering in Europe 2016 Edition
PPTX
Connecting an Ecosystem of Local Network Providers
Smart Metering in Europe
Superior User Experience & Network Performance
2015 review & 2016 forecast to optical transport equipment market
Remote Terminal Units Market - Report present a clear picture of assessments ...
Technology Cost Optimization Strategies
Building information modelling market 1
Smart Metering in Europe 2016 Edition
Connecting an Ecosystem of Local Network Providers

Similar to DataCon Talk (20)

PPTX
Solving Real Business Problems with Big Data: Measuring Customer Loyalty in t...
PDF
2020 2017 presentation analyst - james crawshaw
PPTX
THECONSULTING
PDF
Big Data Enabled: How YARN Changes the Game
PDF
Miguel Angel Perdiguero - Head of BIG data & analytics Atos Iberia - semanain...
PDF
Edge and 5G: What is in it for the developers?
PDF
HP Communications and Media | Solutions IoT Platform
PPT
Ecosystem Building for Hong Kong's IT Industry
PDF
Implementing AI powered NBO programs exacaster vivacom
PPTX
MBB in transformation(white)
PDF
Netweb flytxt-big-data-case-study
PPTX
iISP - v1.0 - Seed Funding - 2013.03.17 - External, General
PDF
TADSummit EMEA 2019, Challenges Consuming Programmable Telecoms from the Deve...
PDF
Challenges Consuming Programmable Telecoms from the Developer’s Perspective
PDF
Data Science Case Studies: The Internet of Things: Implications for the Enter...
PPT
Big Data: Operational Excellence
PDF
Reliance Communications
PPT
Data_Center_landscape_in_india.ppt
PPT
Data Centre Design Aspects and challenges
PPT
Yin & Yang of Enterprise Mobility: Gartner Mobile & Wireless 2008
Solving Real Business Problems with Big Data: Measuring Customer Loyalty in t...
2020 2017 presentation analyst - james crawshaw
THECONSULTING
Big Data Enabled: How YARN Changes the Game
Miguel Angel Perdiguero - Head of BIG data & analytics Atos Iberia - semanain...
Edge and 5G: What is in it for the developers?
HP Communications and Media | Solutions IoT Platform
Ecosystem Building for Hong Kong's IT Industry
Implementing AI powered NBO programs exacaster vivacom
MBB in transformation(white)
Netweb flytxt-big-data-case-study
iISP - v1.0 - Seed Funding - 2013.03.17 - External, General
TADSummit EMEA 2019, Challenges Consuming Programmable Telecoms from the Deve...
Challenges Consuming Programmable Telecoms from the Developer’s Perspective
Data Science Case Studies: The Internet of Things: Implications for the Enter...
Big Data: Operational Excellence
Reliance Communications
Data_Center_landscape_in_india.ppt
Data Centre Design Aspects and challenges
Yin & Yang of Enterprise Mobility: Gartner Mobile & Wireless 2008
Ad

More from Tanya Cashorali (9)

PPTX
Rapid Prototyping Data Products in Shiny - RStudio::Conf 2018
PPTX
When and Why to Use Shiny for Commercial Applications
PPTX
Strata 2017 NYC - How to Hire and Test for Data Skills: A One-Size-Fits-All I...
PPTX
Rapid Prototyping Data Products in Shiny - ODSC 2017
PPTX
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
PPTX
Popular Industry Applications of R
PPTX
Big data meetup_10_9_2013
PPTX
Front endrequirements 09_25_2013
PPTX
Microsoft NERD Talk - R and Tableau - 2-4-2013
Rapid Prototyping Data Products in Shiny - RStudio::Conf 2018
When and Why to Use Shiny for Commercial Applications
Strata 2017 NYC - How to Hire and Test for Data Skills: A One-Size-Fits-All I...
Rapid Prototyping Data Products in Shiny - ODSC 2017
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
Popular Industry Applications of R
Big data meetup_10_9_2013
Front endrequirements 09_25_2013
Microsoft NERD Talk - R and Tableau - 2-4-2013
Ad

Recently uploaded (20)

PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
project resource management chapter-09.pdf
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Approach and Philosophy of On baking technology
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Tartificialntelligence_presentation.pptx
PDF
A novel scalable deep ensemble learning framework for big data classification...
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
August Patch Tuesday
PDF
1 - Historical Antecedents, Social Consideration.pdf
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Univ-Connecticut-ChatGPT-Presentaion.pdf
project resource management chapter-09.pdf
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Encapsulation_ Review paper, used for researhc scholars
OMC Textile Division Presentation 2021.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Approach and Philosophy of On baking technology
Hindi spoken digit analysis for native and non-native speakers
gpt5_lecture_notes_comprehensive_20250812015547.pdf
cloud_computing_Infrastucture_as_cloud_p
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Tartificialntelligence_presentation.pptx
A novel scalable deep ensemble learning framework for big data classification...
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
A Presentation on Artificial Intelligence
August Patch Tuesday
1 - Historical Antecedents, Social Consideration.pdf

DataCon Talk

  • 1. DATA-CON BOSTON 2014 TANYA CASHORALI @TANYACASH21 This presentation and the information contained herein is confidential. By your acceptance and review of this presentation, each recipient agrees that it will not, and shall assure that its employees, agents, representatives and advisors will not copy, reproduce or distribute this presentation, in whole or in part, without the prior written consent of Comniscient Technologies LLC, and that it will keep confidential all information contained herein which is not already in the public domain. Further, the recipient will use the presentation only to obtain background information on the Company and its activities.
  • 2. 2 THE TELECOMMUNICATIONS MARKET IS FULLY SATURATED 67% of new subscriber growth comes from switchers.* Last year, switchers put $29B up for grabs in the wireless industry. WINNING SWITCHERS IS KEY FOR GROWTH *Percentage of net adds that are not activations, tablets or wholesale Real Time Market Data & Analysis for the Telecom Industry
  • 3. 3 CARRIER VISION – THE WHAT, WHERE, WHEN Real Time Market Data & Analysis for the Telecom Industry
  • 4. 4 VOICE OF THE CUSTOMER – THE WHY Real Time Market Data & Analysis for the Telecom Industry
  • 5. 5 OBSERVED DATA ON TWITTER Real Time Market Data & Analysis for the Telecom Industry
  • 6. 6 STARTED GATHERING DATA USING STREAMR Real Time Market Data & Analysis for the Telecom Industry
  • 7. 7 HUMAN SCORED DATA We “S-scored” about 150-500 tweets per day until we had ~4,000 human scored tweets. Used this data set to learn how to systematically crowd source the same process and also automate the process using machine scoring “M-scoring” in R Training Set of 4,000 tweets Real Time Market Data & Analysis for the Telecom Industry Crowd sourced ~80,000 tweets Derived rules M-scoring rules in R
  • 8. 8 CLASSIFY CARRIERS Real Time Market Data & Analysis for the Telecom Industry
  • 9. 9 NLP AND SENTIMENT ANALYSIS IS HARD Real Time Market Data & Analysis for the Telecom Industry
  • 10. 10 BASIC PATTERN MATCHING IS COMPLICATED ENOUGH • Phone Price • Customer Service • Coverage Quality • Upgrade Plan • Coverage Availability • Family Plan • Plan Price • Service Promo • Outage • Data Plan • Phone Availability • Device Promo • Switching to / from Real Time Market Data & Analysis for the Telecom Industry iPhone + expensive Samsung Galaxy + money Phone + cost Data, unlimited Switch from Verizon to ATT
  • 11. Real Time Market Data & Analysis for the Telecom Industry 11
  • 12. 12 M-SCORING EXAMPLES Phone iPhone 6 Switch to T-Mobile Real Time Market Data & Analysis for the Telecom Industry to T-Mobile Mention Switch from [carrier1] to [carrier2] Switch from [carrier1] to [carrier2] Switch from [carrier1] Switch to [carrier1]
  • 13. 13 M-SCORING EXAMPLES T-Mobile Mention Phone Real Time Market Data & Analysis for the Telecom Industry Switch from ATT From ATT Switch from [carrier1] to [carrier2] Switch from [carrier1] because I love [carrier2] Assume carrier2 is the ‘switch to’ carrier
  • 14. 14 CATEGORY CLASSIFICATION Real Time Market Data & Analysis for the Telecom Industry
  • 15. 15 SWITCHING TO/FROM Real Time Market Data & Analysis for the Telecom Industry
  • 16. 16 ARE WE IMPROVING? Sensitivity = TP / (TP + FN) Specificity = TN / (TN + FP) Real Time Market Data & Analysis for the Telecom Industry
  • 17. 17 MATURING INTO A REAL PRODUCT Real Time Market Data & Analysis for the Telecom Industry
  • 18. 18 GNIP PROCESS Table Daily Average Total Wireless 655 111,483 Wireline 152 25,264 Other 11,803 2,006,649 Real Time Market Data & Analysis for the Telecom Industry
  • 19. STREAMR VS GNIP VS DATASIFT Real Time Market Data & Analysis for the Telecom Industry 19 StreamR GNIP DataSift Full firehose data - X X Historical Twitter - X X data Real-time X X X Data Destinations - - X Data Buffer - 5 days ($$$) 2 hours (included) Demographics - - X Sentiment - - X Gender Detection - - X Topic Detection - - X Entity Extraction - - X Link Analysis - - X
  • 20. Deployment Architecture Comlinkdata JSON Real Time Market Data & Analysis for the Telecom Industry 20 Data Source Amazon RDS GNIP Amazon EC2 Technology Stack R, PostgreSQL PostgreSQL Kantar Events PostgreSQL /home/ubuntu/Documents/Scripts/TwitterScrapes/GNIP/GNIP Extract.R /home/ubuntu/Documents/Scripts/TwitterScrapes/voc-dataops/ GNIP/Json_Interpreter.R /home/ubuntu/Documents/Scripts/TwitterScrapes/voc-dataops/M-Scoring/ simple_MScore_GNIP.R /home/ubuntu/Documents/Scripts/TwitterScrapes/voc-dataops/ Aggregates/af_tweets.R HTTPS Application and Web Servers MG Tomcat Internet Chrome, Safari, Firefox, IE Web Service (REST) GUI Java 6, Spring (Security, MVC, JDBC) Angular JS/Ajax, HTML5, CSS3, D3, Bootstrap Shiny server@ubuntu gnip_wireless_raw gnip_landline_raw gnip_other_raw tweet_mwl tweet_mll a_tweets f_tweets /home/ubuntu/GNIP/data/
  • 21. 21 INITIAL PROTOTYPE IN TABLEAU Real Time Market Data & Analysis for the Telecom Industry
  • 22. 22 WHAT WE LEARNED • Always store raw unprocessed data somewhere • Beware of UTF encodings and special characters • Ensure time zones are synched across databases / applications • Don’t be afraid to cast a larger net of tweets given the ~1M tweet/month limit provided by most vendors • Consider how to deal with blast tweets, retweets. Source can be used to help identify blasts (TweetCaster, Scoop.it, etc.) Real Time Market Data & Analysis for the Telecom Industry
  • 23. 23 ENTIRE PRODUCT LIFECYCLE Real Time Market Data & Analysis for the Telecom Industry
  • 24. 24 CONSIDERATIONS Many variations of changing carriers: • Bye Sprint! • Getting rid of Verizon • Peace out T-Mobile • Going to AT&T • Twitter data is not necessarily representative of the entire population • Other languages • Geo-tagged is only ~5% • Expanding to Canada Real Time Market Data & Analysis for the Telecom Industry
  • 25. 25 FUTURE WORK • Migrate to DataSift from GNIP • Improve M-scoring using more complex NLP algorithms • Integrate additional data sources including downdetector.com, ad spend, and other relevant sources • Integrate additional Twitter data (mentions to carriers as well as tweets from the carriers, various language around switching) • Develop landline version and Canada version • Trending words / Keyword search • ??? Real Time Market Data & Analysis for the Telecom Industry
  • 26. 26 ACKNOWLEDGEMENTS Real Time Market Data & Analysis for the Telecom Industry Jacob Tobias Software Developer Sarah Bolt Marketing Manager Dylan Doyle Data Scientist Mallorie Ekstrom Graphic Designer Alan Tam Web Developer Josh Walker COO Ken Yeoh Data Engineer For more information, hiring, or questions, email contact@comlinkdata.com
  • 27. 27 REFERENCES • streamR - http://cran.r-project.org/web/packages/streamR/index.html • sqldf - http://cran.r-project.org/web/packages/sqldf/index.html • GNIP - http://gnip.com/ • DataSift - http://datasift.com/ Real Time Market Data & Analysis for the Telecom Industry

Editor's Notes

  • #18: Topsy for historical data, acquired by Apple
  • #19: Rules in JSON format? Put on GNIP servers Uses a https stream to enable transfer of Json formatted data to customer Must be processed from Json extracting useful data from the stream No buffer available customer must always be connected to the stream Adding 5 day buffer costs money Historical jobs are prohibitively expensive. Lots of issues, migrate to datasift 2.1M tweets total
  • #20: Priced up to 1 million tweets per month (GNIP = $1560/month)