SlideShare a Scribd company logo
2
Most read
3
Most read
10
Most read
Call Data Analysis
for Asterisk & FreeSWITCH
      with MongoDB

     Arezqui Belaid @areskib
     <info@star2billing.com>
Problems to solve

             - Millions of Call records
             - Multiple sources
             - Multiple data formats
             - Replication
             - Fast Analytics
             - Multi-Tenant
             - Realtime
             - Fraud detection
Why MongoDB
- NoSQL - Schema-Less
- Capacity / Sharding
- Upserts
- Replication : Increase read capacity
- Async writes : Millions of entries / acceptable losses
- Compared to CouchDB - native drivers
What does it look like?   Dashboard
Hourly / Daily / Monthly reporting
Compare call traffic
World Map
Realtime
Under the hood
- FreeSWITCH (freeswitch.org)
- Asterisk (asterisk.org)
- Django (djangoproject.com)
- Celery (celeryproject.org)
- RabbitMQ (rabbitmq.com)
- Socket.IO (socket.io)
- MongoDB (mongo.org)
- PyMongo (api.mongo.org)
- and more...
Our Data - Call Detail Record (CDR)
1) Call info :



2) BSON :    CDR = {                                                'hangup_cause_q850':'20',
               ...                                                  'hangup_cause':'NORMAL_CLEARING',
               'callflow':{                                         'sip_received_ip':'192.168.1.21',
                 'caller_profile':{                                 'sip_from_host':'127.0.0.1',
                                                                    'tts_voice':'kal',7',
                   'username':'1000',
                                                                    'accountcode':'1000',
                   'destination_number':'5578193435',               'sip_user_agent':'Blink 0.2.8 (Linux)',
                   'ani':'71737224',                                'answerusec':'0',
                   'caller_id_name':'71737224',                     'caller_id':'71737224',
                   ...                                              'call_uuid':'adee0934-a51b-11e1-a18c-
                 },                                             00231470a30c',
                 ...                                                'answer_stamp':'2012-05-23 15:45:09.856463',
               },                                                   'outbound_caller_id_name':'FreeSWITCH',
               'variables':{                                        'billsec':'66',
                 'mduration':'12960',                               'progress_uepoch':'0',
                 'effective_caller_id_name':'Extension 1000',       'answermsec':'0',
                                                                    'sip_via_rport':'60536',
                 'outbound_caller_id_number':'0000000000',
                                                                    'uduration':'12959984',
                 'duration':'3',                                    'sip_local_sdp_str':'v=0no=FreeSWITCH
                 'end_stamp':'2012-05-23 15:45:12.856527',      1327491731n'
                 'answer_uepoch':'1327521953952257',              },
                 'billmsec':'12960',                            ...
             ...
3) Insert Mongo : db.cdr.insert(CDR);
Pre-Aggregate
Pre-Aggregate - Daily Collection
Produce data easier to manipulate :
              current_y_m_d = datetime.strptime(str(start_uepoch)[:10], "%Y-%m-%d")
              CDR_DAILY.update({
                       'date_y_m_d': current_y_m_d,
                       'destination_number': destination_number,
                       'hangup_cause_id': hangup_cause_id,
                       'accountcode': accountcode,
                       'switch_id': switch.id,
                   },{
                       '$inc':
                          {'calls': 1,
                           'duration': int(cdr['variables']['duration']) }
                   }, upsert=True)

Output db.CDR_DAILY.find() :
{ "_id" : ..., "date_y_m_d" : ISODate("2012-04-30T00:00:00Z"), "accountcode" : "1000", "calls" : 1, "destination_number"
: "0045277522", "duration" : 23, "hangup_cause_id" :9, "switch_id" :1 }
...


                                                                           - Faster to query pre-aggregate data
                                                           - Upsert is your friend / update if exists - insert if not
Map-Reduce - Emit Step
- MapReduce is a batch processing of data
- Applying to previous pre-aggregate collection (Faster / Less data)

             map = mark_safe(u'''
                 function(){
                      emit( {
                           a_Year: this.date_y_m_d.getFullYear(),
                           b_Month: this.date_y_m_d.getMonth() + 1,
                           c_Day: this.date_y_m_d.getDate(),
                           f_Switch: this.switch_id
                         },
                         {calldate__count: 1, duration__sum: this.duration} )
                 }''')
Map-Reduce - Reduce Step
Reduce Step is trivial, it simply sums up and counts :

             reduce = mark_safe(u'''
                 function(key,vals) {
                    var ret = {
                                calldate__count : 0,
                                duration__sum: 0,
                                duration__avg: 0
                            };

                         for (var i=0; i < vals.length; i++){
                            ret.calldate__count += parseInt(vals[i].calldate__count);
                            ret.duration__sum += parseInt(vals[i].duration__sum);
                         }
                         return ret;
                  }
                  ''')
Map-Reduce
Query :
                  out = 'aggregate_cdr_daily'
                  calls_in_day = daily_data.map_reduce(map, reduce, out, query=query_var)


Output db.aggregate_cdr_daily.find() :
{ "_id" : { "a_Year" : 2012, "b_Month" : 5, "c_Day" : 13, "f_Switch" :1 }, "value" : { "calldate__count" : 91,
"duration__sum" : 5559, "duration__avg" : 0 } }
{ "_id" : { "a_Year" : 2012, "b_Month" : 5, "c_Day" : 14, "f_Switch" :1 }, "value" : { "calldate__count" : 284,
"duration__sum" : 13318, "duration__avg" : 0 } }
...
Roadmap

- Quality monitoring
- Audio recording
- Add support for other telecoms switches
- Improve - refactor (Beta)
- Testing
- Listen and Learn
WAT else...?

- Website : http://www.cdr-stats.org

- Code : github.com/star2billing/cdr-stats

- FOSS / Licensed MPLv2

- Get started : Install script
  Try it, it's easy!!!
Questions ?

  Twitter : @areskib
Email : areski@gmail.com

More Related Content

PPTX
Mobile Forensics
PPTX
Phases of penetration testing
PDF
MITRE ATT&CKcon 2018: Summiting the Pyramid of Pain: Operationalizing ATT&CK,...
PDF
Introduction to MITRE ATT&CK
PPTX
DevSecOps .pptx
PDF
Ch 2: TCP/IP Concepts Review
PPTX
Cyber kill chain
PDF
Technical interview questions &amp; answer for it support team
Mobile Forensics
Phases of penetration testing
MITRE ATT&CKcon 2018: Summiting the Pyramid of Pain: Operationalizing ATT&CK,...
Introduction to MITRE ATT&CK
DevSecOps .pptx
Ch 2: TCP/IP Concepts Review
Cyber kill chain
Technical interview questions &amp; answer for it support team

What's hot (20)

PPT
Ch 04 Data Acquisition for Digital Forensics.ppt
PDF
Physical Penetration Testing (RootedCON 2015)
PDF
Cyber Forensics Module 1
PDF
2 classical cryptosystems
PDF
Cis controls v8_guide (1)
PDF
CNIT 123 Ch 1: Ethical Hacking Overview
PPTX
C introduction by thooyavan
PPTX
Windows Hacking
PDF
In-depth forensic analysis of Windows registry files
PPTX
for loop in java
PPTX
Digital Forensics best practices with the use of open source tools and admiss...
PPTX
Static Analysis Primer
PPTX
E-mail Investigation
PPTX
Best Cyber Crime Investigation Service Provider | Fornsec Solutions
PPT
Mobile forensics
PPT
authentication.ppt
PPTX
Processing Crimes and Incident Scenes
PDF
Secrets of Top Pentesters
PDF
Access Control Presentation
PDF
Digital forensic upload
Ch 04 Data Acquisition for Digital Forensics.ppt
Physical Penetration Testing (RootedCON 2015)
Cyber Forensics Module 1
2 classical cryptosystems
Cis controls v8_guide (1)
CNIT 123 Ch 1: Ethical Hacking Overview
C introduction by thooyavan
Windows Hacking
In-depth forensic analysis of Windows registry files
for loop in java
Digital Forensics best practices with the use of open source tools and admiss...
Static Analysis Primer
E-mail Investigation
Best Cyber Crime Investigation Service Provider | Fornsec Solutions
Mobile forensics
authentication.ppt
Processing Crimes and Incident Scenes
Secrets of Top Pentesters
Access Control Presentation
Digital forensic upload
Ad

Viewers also liked (20)

PPTX
Big Data CDR Analyzer - Kanthaka
PDF
Kanthaka - High Volume CDR Analyzer
PPTX
telecom analytics ppt
PDF
Benefiting from Big Data - A New Approach for the Telecom Industry
PDF
Predictive Analytics in Telecommunication
PDF
Storage Characteristics Of Call Data Records In Column Store Databases
PDF
Connecting the Dots—How a Graph Database Enables Discovery
PDF
Nikola, a static blog & site generator python meetup 19 feb2014
PDF
Newfies dialer - autodialer : freeswitch weekly conference 13 march2013
PDF
Whitepaper newfies-dialer Autodialer
PDF
Newfies dialer Brief Introduction
PDF
Newfies dialer Auto dialer Software
PPT
Ativ1 4 rafaelaam
PPTX
Data Science Strategy
PPTX
Customer insights from telecom data using deep learning
PDF
Flask Introduction - Python Meetup
PDF
Big Data Telecom
PDF
Customizing the Django Admin
PPTX
Monetizing Big Data at Telecom Service Providers
PPTX
Deep Learning for Fraud Detection
Big Data CDR Analyzer - Kanthaka
Kanthaka - High Volume CDR Analyzer
telecom analytics ppt
Benefiting from Big Data - A New Approach for the Telecom Industry
Predictive Analytics in Telecommunication
Storage Characteristics Of Call Data Records In Column Store Databases
Connecting the Dots—How a Graph Database Enables Discovery
Nikola, a static blog & site generator python meetup 19 feb2014
Newfies dialer - autodialer : freeswitch weekly conference 13 march2013
Whitepaper newfies-dialer Autodialer
Newfies dialer Brief Introduction
Newfies dialer Auto dialer Software
Ativ1 4 rafaelaam
Data Science Strategy
Customer insights from telecom data using deep learning
Flask Introduction - Python Meetup
Big Data Telecom
Customizing the Django Admin
Monetizing Big Data at Telecom Service Providers
Deep Learning for Fraud Detection
Ad

Similar to CDR-Stats : VoIP Analytics Solution for Asterisk and FreeSWITCH with MongoDB (20)

PDF
Cdr stats-vo ip-analytics_solution_mongodb_meetup
PDF
Mongo db world 2014 billrun
PDF
MongoDB World 2014 - BillRun, Billing on top of MongoDB
PDF
Map/Confused? A practical approach to Map/Reduce with MongoDB
PDF
Assignment7.pdf
PDF
UDP.yash
PPTX
Operational Intelligence with MongoDB Webinar
PDF
[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization
PDF
How I Built a Power Debugger Out of the Standard Library and Things I Found o...
PPTX
Der perfekte 12c trigger
PPTX
How I Built a Power Debugger Out of the Standard Library and Things I Found o...
DOCX
odoo json rpc.docx
PPTX
How to leverage what's new in MongoDB 3.6
PDF
MongoDB Performance Tuning
PPTX
Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)
PDF
Describe the complete pipeline in ML using programming through PyTorch.pdf
PDF
MongoDB dla administratora
PDF
Fido u2 f in 10 minutes (cis 2015)
PDF
CIS 2015b FIDO U2F in 10 minutes - Dirk Balfanz
PPTX
Viki Big Data Meetup 2013_10
Cdr stats-vo ip-analytics_solution_mongodb_meetup
Mongo db world 2014 billrun
MongoDB World 2014 - BillRun, Billing on top of MongoDB
Map/Confused? A practical approach to Map/Reduce with MongoDB
Assignment7.pdf
UDP.yash
Operational Intelligence with MongoDB Webinar
[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization
How I Built a Power Debugger Out of the Standard Library and Things I Found o...
Der perfekte 12c trigger
How I Built a Power Debugger Out of the Standard Library and Things I Found o...
odoo json rpc.docx
How to leverage what's new in MongoDB 3.6
MongoDB Performance Tuning
Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)
Describe the complete pipeline in ML using programming through PyTorch.pdf
MongoDB dla administratora
Fido u2 f in 10 minutes (cis 2015)
CIS 2015b FIDO U2F in 10 minutes - Dirk Balfanz
Viki Big Data Meetup 2013_10

Recently uploaded (20)

PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPT
Teaching material agriculture food technology
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Empathic Computing: Creating Shared Understanding
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Encapsulation theory and applications.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Machine Learning_overview_presentation.pptx
PPTX
Big Data Technologies - Introduction.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Programs and apps: productivity, graphics, security and other tools
Reach Out and Touch Someone: Haptics and Empathic Computing
The AUB Centre for AI in Media Proposal.docx
Spectral efficient network and resource selection model in 5G networks
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Machine learning based COVID-19 study performance prediction
Dropbox Q2 2025 Financial Results & Investor Presentation
MIND Revenue Release Quarter 2 2025 Press Release
Teaching material agriculture food technology
Digital-Transformation-Roadmap-for-Companies.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Advanced methodologies resolving dimensionality complications for autism neur...
Empathic Computing: Creating Shared Understanding
Per capita expenditure prediction using model stacking based on satellite ima...
Encapsulation theory and applications.pdf
Encapsulation_ Review paper, used for researhc scholars
Machine Learning_overview_presentation.pptx
Big Data Technologies - Introduction.pptx

CDR-Stats : VoIP Analytics Solution for Asterisk and FreeSWITCH with MongoDB

  • 1. Call Data Analysis for Asterisk & FreeSWITCH with MongoDB Arezqui Belaid @areskib <info@star2billing.com>
  • 2. Problems to solve - Millions of Call records - Multiple sources - Multiple data formats - Replication - Fast Analytics - Multi-Tenant - Realtime - Fraud detection
  • 3. Why MongoDB - NoSQL - Schema-Less - Capacity / Sharding - Upserts - Replication : Increase read capacity - Async writes : Millions of entries / acceptable losses - Compared to CouchDB - native drivers
  • 4. What does it look like? Dashboard
  • 5. Hourly / Daily / Monthly reporting
  • 9. Under the hood - FreeSWITCH (freeswitch.org) - Asterisk (asterisk.org) - Django (djangoproject.com) - Celery (celeryproject.org) - RabbitMQ (rabbitmq.com) - Socket.IO (socket.io) - MongoDB (mongo.org) - PyMongo (api.mongo.org) - and more...
  • 10. Our Data - Call Detail Record (CDR) 1) Call info : 2) BSON : CDR = { 'hangup_cause_q850':'20', ... 'hangup_cause':'NORMAL_CLEARING', 'callflow':{ 'sip_received_ip':'192.168.1.21', 'caller_profile':{ 'sip_from_host':'127.0.0.1', 'tts_voice':'kal',7', 'username':'1000', 'accountcode':'1000', 'destination_number':'5578193435', 'sip_user_agent':'Blink 0.2.8 (Linux)', 'ani':'71737224', 'answerusec':'0', 'caller_id_name':'71737224', 'caller_id':'71737224', ... 'call_uuid':'adee0934-a51b-11e1-a18c- }, 00231470a30c', ... 'answer_stamp':'2012-05-23 15:45:09.856463', }, 'outbound_caller_id_name':'FreeSWITCH', 'variables':{ 'billsec':'66', 'mduration':'12960', 'progress_uepoch':'0', 'effective_caller_id_name':'Extension 1000', 'answermsec':'0', 'sip_via_rport':'60536', 'outbound_caller_id_number':'0000000000', 'uduration':'12959984', 'duration':'3', 'sip_local_sdp_str':'v=0no=FreeSWITCH 'end_stamp':'2012-05-23 15:45:12.856527', 1327491731n' 'answer_uepoch':'1327521953952257', }, 'billmsec':'12960', ... ... 3) Insert Mongo : db.cdr.insert(CDR);
  • 12. Pre-Aggregate - Daily Collection Produce data easier to manipulate : current_y_m_d = datetime.strptime(str(start_uepoch)[:10], "%Y-%m-%d") CDR_DAILY.update({ 'date_y_m_d': current_y_m_d, 'destination_number': destination_number, 'hangup_cause_id': hangup_cause_id, 'accountcode': accountcode, 'switch_id': switch.id, },{ '$inc': {'calls': 1, 'duration': int(cdr['variables']['duration']) } }, upsert=True) Output db.CDR_DAILY.find() : { "_id" : ..., "date_y_m_d" : ISODate("2012-04-30T00:00:00Z"), "accountcode" : "1000", "calls" : 1, "destination_number" : "0045277522", "duration" : 23, "hangup_cause_id" :9, "switch_id" :1 } ... - Faster to query pre-aggregate data - Upsert is your friend / update if exists - insert if not
  • 13. Map-Reduce - Emit Step - MapReduce is a batch processing of data - Applying to previous pre-aggregate collection (Faster / Less data) map = mark_safe(u''' function(){ emit( { a_Year: this.date_y_m_d.getFullYear(), b_Month: this.date_y_m_d.getMonth() + 1, c_Day: this.date_y_m_d.getDate(), f_Switch: this.switch_id }, {calldate__count: 1, duration__sum: this.duration} ) }''')
  • 14. Map-Reduce - Reduce Step Reduce Step is trivial, it simply sums up and counts : reduce = mark_safe(u''' function(key,vals) { var ret = { calldate__count : 0, duration__sum: 0, duration__avg: 0 }; for (var i=0; i < vals.length; i++){ ret.calldate__count += parseInt(vals[i].calldate__count); ret.duration__sum += parseInt(vals[i].duration__sum); } return ret; } ''')
  • 15. Map-Reduce Query : out = 'aggregate_cdr_daily' calls_in_day = daily_data.map_reduce(map, reduce, out, query=query_var) Output db.aggregate_cdr_daily.find() : { "_id" : { "a_Year" : 2012, "b_Month" : 5, "c_Day" : 13, "f_Switch" :1 }, "value" : { "calldate__count" : 91, "duration__sum" : 5559, "duration__avg" : 0 } } { "_id" : { "a_Year" : 2012, "b_Month" : 5, "c_Day" : 14, "f_Switch" :1 }, "value" : { "calldate__count" : 284, "duration__sum" : 13318, "duration__avg" : 0 } } ...
  • 16. Roadmap - Quality monitoring - Audio recording - Add support for other telecoms switches - Improve - refactor (Beta) - Testing - Listen and Learn
  • 17. WAT else...? - Website : http://www.cdr-stats.org - Code : github.com/star2billing/cdr-stats - FOSS / Licensed MPLv2 - Get started : Install script Try it, it's easy!!!
  • 18. Questions ? Twitter : @areskib Email : areski@gmail.com