SlideShare a Scribd company logo
3rd Socio-Cultural Data Summit
            National Defense University
Center for Technology and National Security Policy
Admin

• Unclassified conference

• Chatham House rules

• Lunch in the new fiscal reality (the cafeteria)

• We have breaks and time built into our schedule to continue
  discussions or to sidebar




                                                                    2
Data Summit(s) Objective

• “Good” data are required for reliable analysis.

   − Socio-cultural data of any sort are hard to find.

   − When we do find them, they are messy, fragmented,
     disorganized, poorly measured, etc.

• These Data Summits are committed to fostering a community that is
  interested in finding, evaluating, collecting, cleaning up, smartly
  integrating, and then using socio-cultural data against applied
  problems with scientific rigor.

   − Focus on a broad community with as few restrictions as possible.

   − Focus on rigor and science without sacrificing the ability to
     conduct real world applications.

                                                                        3
Logical Progression of these Data Efforts

1. DataCards: quick and dirty effort to find, tag, and index data of all
   sorts for as many audiences as possible to reduce search costs for
   socio-cultural data.

2. First Data Summit: Take a first cut at data evaluation criteria and
   beat the heck out of it in working groups so that can start to
   evaluate socio-cultural data that we’ve found.

3. Second Data Summit: Expand the aperture on what constitutes
   data and relate working group insights back to prior evaluation
   criteria and lessons learned for continuing to find and define data.

4. Third Data Summit: Start to tackle the complex issue of “how we
   put the data together” once we have found it.

......more working groups focused on areas where we perceive we can
make concrete progress on data integration, cleaning, and fusion.
                                                                           4
DataCards Overview

• DataCards is a structured wiki-like platform that uses “cards” (like card
  catalog cards or baseball cards) to index and describe key details re:
  socio-cultural (and related) data sources.
• Objectives of DataCards include:
   – Make sources of data discoverable.
   – Reduce search costs for data.
   – Conduit to discover and share data sources between and among
     non-traditional, academic, NGO, defense, law enforcement, and
     intelligence communities.
• Accessing DataCards:
   − Commercial Internet: http://www.datacards.org/
   − Development Site: http://beta.datacards.org/
   − SIPRNet: by request, hosted by OSD CAPE



                                                                              5
DataCards Content/Usage Update


• Total cards: 1,682
  (2,416 pending additional cards)




                            • Total datacards.org users: 537




• Since .org launch: 5,703 visits; 54,229 pageviews; 00:10:40 average
  time/visit; multiple visits from 28 countries


                                                                        6
Related to DataCards




                   7
Summary of 1st Data Summit

• Data, and the quality of the data, used for applied socio-cultural work for the
  DoD and other agencies is generally poor.
   • Often general and hard to apply to real world situations
   • Rarely evaluated, and even more rarely evaluated objectively
• Worked on data evaluation criteria so that a “smart person” isn’t needed to
  evaluate data sources.
   • Smart people used to create the criteria, and will use “smart people in
      training” to apply the ratings.
   • The ratings shouldn’t rely on the experience of the rater, but on the
      quality of the criteria.
• The effort acknowledged that one size does not fit all requirements, and
  criteria should be flexible enough to accommodate a variety of conceptions of
  what constitutes “data.”
• DataCards assists consumers of socio-cultural data to rapidly find the data they
  need. The evaluation criteria help assess suitability and quality of possible data
  sources for their desired application.

                                                                                       8
Summary of 2nd Data Summit
• “Data” is a user-defined term; it is not specific to one particular type of data.
  DataCards is a platform with a wide user base with varied data needs.
  DataCards should seek to assist with the discovery and evaluation of data
  sources.
• Big data is a growing field of interest within analytical and knowledge
  communities. Big data, which was defined by the complexity, structure, and
  size of data, is not just social media but is generally transactional in
  nature, including financial transactions, SMS, and search engine results.
• Many data sources are qualitative in nature and cannot be analyzed and
  machine processed the way quantitative or geospatial data are processed and
  analyzed.
• The most important considerations for users of geospatial data require robust
  searching capabilities, a minimal path to finding data, and complete data.
• There is no one way that individuals use to find data. Discovery is often project
  specific and individuals tend to establish and follow predictable patterns of
  behavior when finding data because certain sources tend to be proven
  relevant and trustworthy.
                                                                                      9
What is this Summit About?

• This summit is about getting the mess of socio-cultural “stuff” we
  often call data into a usable analytic format.
• The first panel focuses on two unique and innovative approaches
  toward putting data together for intelligence and analytic purposes;
  and a Phase 3 IARPA program that is rapidly fusing data in support of
  the intelligence community’s requirements for integrated and
  disparate data.
• The second panel focuses on two of the major types of data that are
  often trumpeted as the silver bullet to understanding all things
  socio-cultural: social media and polling/surveys. However, these are
  great case studies in the potential pitfalls of data aggregation
  without careful thought about what it is you are putting together.




                                                                          10
What is this Summit About? (continued)

• The third panel provides three approaches to dealing with socio-
  cultural data, with moderate technical detail. This includes a look at
  the application of statistics to missing data, the dirty work of getting
  socio-cultural data ready for a DARPA program, and dealing with
  situations where socio-cultural data are sparse.
• Tomorrow, the fourth panel will focus on scientific and technical
  approaches to information extraction and data fusion challenges.
• The fifth panel will offer up thoughts on three compelling and
  promising areas for socio-cultural data integration: geospatial data
  of multiple resolutions, qualitative/subject matter expert-derived
  data, and human geography data.
• We’ll end after lunch with a discussion about how we as a
  community want to proceed on this conquest.


                                                                             11
What Do I Want to Get Out Of this Summit?

• Community-building and the invigoration of new ideas to support
  better work with socio-cultural data.
• Feedback on what methods we are missing and what has merit.
• Feedback on what the forward operator needs from a group like
  this—this includes the warfighter, but also law enforcement
  officers, NGOs, partner nations, foreign service officers, economic
  development professionals: anyone working in the field to make a
  difference.




                                                                        12

More Related Content

PDF
Beyond-Data-Literacy-2015
PDF
Learning from past infrastructure to embrace friction and create the Research...
PDF
IASC Operational Guidance on Responsibilities of Sector Cluster Leads and OCH...
PPT
DataCenter 'Talent Scout' Introduction 2012
PPT
Knowledge Management: leveraging NGO Resources
PDF
Telling stories about (re)search: research practices reconfigured by digital ...
PDF
Mining Social Data
PPT
ICT Literacy in Libraries
Beyond-Data-Literacy-2015
Learning from past infrastructure to embrace friction and create the Research...
IASC Operational Guidance on Responsibilities of Sector Cluster Leads and OCH...
DataCenter 'Talent Scout' Introduction 2012
Knowledge Management: leveraging NGO Resources
Telling stories about (re)search: research practices reconfigured by digital ...
Mining Social Data
ICT Literacy in Libraries

What's hot (10)

PPT
Wherefore libraries
PPTX
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
PPTX
Information Consolidation
PPTX
Library as a knowledge management centre
PPTX
Introduction to Advance Analytics Course
PPTX
Transforming The Academic Library Services For Generation Y Using Knowledge M...
DOC
Knowledge management and the role of libraries
PDF
Inhibitors to Information Sharing
PPT
The Global ARD Web Ring
PDF
Handout for Planning and Implementing a Digital Library Project
Wherefore libraries
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
Information Consolidation
Library as a knowledge management centre
Introduction to Advance Analytics Course
Transforming The Academic Library Services For Generation Y Using Knowledge M...
Knowledge management and the role of libraries
Inhibitors to Information Sharing
The Global ARD Web Ring
Handout for Planning and Implementing a Digital Library Project
Ad

Viewers also liked (7)

PPTX
Original Images Powerpoint
PPTX
3D animation
PPT
How NOT to Aggregrate Polling Data
PPTX
презентация по информатике
PPTX
Research on Film Covers
PPTX
The Challenges and Pitfalls of Aggregating Social Media Data
PPTX
Alignment and Analytics of Large Scale, Disparate Data from IARPA's Knowledge...
Original Images Powerpoint
3D animation
How NOT to Aggregrate Polling Data
презентация по информатике
Research on Film Covers
The Challenges and Pitfalls of Aggregating Social Media Data
Alignment and Analytics of Large Scale, Disparate Data from IARPA's Knowledge...
Ad

Similar to 3rd Socio-Cultural Data Summit (20)

PPTX
Hscb Focus 2010 Data Acquisition Extraction Management Debrief Jgm R1
PPT
Elizabeth Churchill, "Data by Design"
PPTX
Zombie categories, broken data and biased algorithms: What else can go wrong?...
PPTX
Data as a service: a human-centered design approach/Retha de la Harpe
PPT
A brave new world: student surveillance in higher education
PDF
A politics of counting - putting people back into big data
PDF
Lecture 7: Social Web Challenges (2012)
PPTX
Open analytics social media framework
PPTX
Open Data in Slovenia: An assessment of Accountability among Stakeholders, 2012
PPTX
Big Data for Development: Opportunities and Challenges, Summary Slidedeck
PDF
Discourse Centered Collective Intelligence Platforms for Social Innovation
PDF
Data Analysis, data types and interpretation.pdf
PDF
Data analysis
PDF
Introduction to Data Analysis for researcher.pdf
PDF
Opportunities and Challenges in Crisis Informatics
PPT
Maps and data esri health care 2012
PDF
Social Networks and Well-Being in Democracy in the Age of Digital Capitalism
PDF
The Impact of the Data Revolution on Official Statistics: Opportunities, Chal...
PDF
Sdal air education workforce analytics workshop jan. 7 , 2014.pptx
Hscb Focus 2010 Data Acquisition Extraction Management Debrief Jgm R1
Elizabeth Churchill, "Data by Design"
Zombie categories, broken data and biased algorithms: What else can go wrong?...
Data as a service: a human-centered design approach/Retha de la Harpe
A brave new world: student surveillance in higher education
A politics of counting - putting people back into big data
Lecture 7: Social Web Challenges (2012)
Open analytics social media framework
Open Data in Slovenia: An assessment of Accountability among Stakeholders, 2012
Big Data for Development: Opportunities and Challenges, Summary Slidedeck
Discourse Centered Collective Intelligence Platforms for Social Innovation
Data Analysis, data types and interpretation.pdf
Data analysis
Introduction to Data Analysis for researcher.pdf
Opportunities and Challenges in Crisis Informatics
Maps and data esri health care 2012
Social Networks and Well-Being in Democracy in the Age of Digital Capitalism
The Impact of the Data Revolution on Official Statistics: Opportunities, Chal...
Sdal air education workforce analytics workshop jan. 7 , 2014.pptx

3rd Socio-Cultural Data Summit

  • 1. 3rd Socio-Cultural Data Summit National Defense University Center for Technology and National Security Policy
  • 2. Admin • Unclassified conference • Chatham House rules • Lunch in the new fiscal reality (the cafeteria) • We have breaks and time built into our schedule to continue discussions or to sidebar 2
  • 3. Data Summit(s) Objective • “Good” data are required for reliable analysis. − Socio-cultural data of any sort are hard to find. − When we do find them, they are messy, fragmented, disorganized, poorly measured, etc. • These Data Summits are committed to fostering a community that is interested in finding, evaluating, collecting, cleaning up, smartly integrating, and then using socio-cultural data against applied problems with scientific rigor. − Focus on a broad community with as few restrictions as possible. − Focus on rigor and science without sacrificing the ability to conduct real world applications. 3
  • 4. Logical Progression of these Data Efforts 1. DataCards: quick and dirty effort to find, tag, and index data of all sorts for as many audiences as possible to reduce search costs for socio-cultural data. 2. First Data Summit: Take a first cut at data evaluation criteria and beat the heck out of it in working groups so that can start to evaluate socio-cultural data that we’ve found. 3. Second Data Summit: Expand the aperture on what constitutes data and relate working group insights back to prior evaluation criteria and lessons learned for continuing to find and define data. 4. Third Data Summit: Start to tackle the complex issue of “how we put the data together” once we have found it. ......more working groups focused on areas where we perceive we can make concrete progress on data integration, cleaning, and fusion. 4
  • 5. DataCards Overview • DataCards is a structured wiki-like platform that uses “cards” (like card catalog cards or baseball cards) to index and describe key details re: socio-cultural (and related) data sources. • Objectives of DataCards include: – Make sources of data discoverable. – Reduce search costs for data. – Conduit to discover and share data sources between and among non-traditional, academic, NGO, defense, law enforcement, and intelligence communities. • Accessing DataCards: − Commercial Internet: http://www.datacards.org/ − Development Site: http://beta.datacards.org/ − SIPRNet: by request, hosted by OSD CAPE 5
  • 6. DataCards Content/Usage Update • Total cards: 1,682 (2,416 pending additional cards) • Total datacards.org users: 537 • Since .org launch: 5,703 visits; 54,229 pageviews; 00:10:40 average time/visit; multiple visits from 28 countries 6
  • 8. Summary of 1st Data Summit • Data, and the quality of the data, used for applied socio-cultural work for the DoD and other agencies is generally poor. • Often general and hard to apply to real world situations • Rarely evaluated, and even more rarely evaluated objectively • Worked on data evaluation criteria so that a “smart person” isn’t needed to evaluate data sources. • Smart people used to create the criteria, and will use “smart people in training” to apply the ratings. • The ratings shouldn’t rely on the experience of the rater, but on the quality of the criteria. • The effort acknowledged that one size does not fit all requirements, and criteria should be flexible enough to accommodate a variety of conceptions of what constitutes “data.” • DataCards assists consumers of socio-cultural data to rapidly find the data they need. The evaluation criteria help assess suitability and quality of possible data sources for their desired application. 8
  • 9. Summary of 2nd Data Summit • “Data” is a user-defined term; it is not specific to one particular type of data. DataCards is a platform with a wide user base with varied data needs. DataCards should seek to assist with the discovery and evaluation of data sources. • Big data is a growing field of interest within analytical and knowledge communities. Big data, which was defined by the complexity, structure, and size of data, is not just social media but is generally transactional in nature, including financial transactions, SMS, and search engine results. • Many data sources are qualitative in nature and cannot be analyzed and machine processed the way quantitative or geospatial data are processed and analyzed. • The most important considerations for users of geospatial data require robust searching capabilities, a minimal path to finding data, and complete data. • There is no one way that individuals use to find data. Discovery is often project specific and individuals tend to establish and follow predictable patterns of behavior when finding data because certain sources tend to be proven relevant and trustworthy. 9
  • 10. What is this Summit About? • This summit is about getting the mess of socio-cultural “stuff” we often call data into a usable analytic format. • The first panel focuses on two unique and innovative approaches toward putting data together for intelligence and analytic purposes; and a Phase 3 IARPA program that is rapidly fusing data in support of the intelligence community’s requirements for integrated and disparate data. • The second panel focuses on two of the major types of data that are often trumpeted as the silver bullet to understanding all things socio-cultural: social media and polling/surveys. However, these are great case studies in the potential pitfalls of data aggregation without careful thought about what it is you are putting together. 10
  • 11. What is this Summit About? (continued) • The third panel provides three approaches to dealing with socio- cultural data, with moderate technical detail. This includes a look at the application of statistics to missing data, the dirty work of getting socio-cultural data ready for a DARPA program, and dealing with situations where socio-cultural data are sparse. • Tomorrow, the fourth panel will focus on scientific and technical approaches to information extraction and data fusion challenges. • The fifth panel will offer up thoughts on three compelling and promising areas for socio-cultural data integration: geospatial data of multiple resolutions, qualitative/subject matter expert-derived data, and human geography data. • We’ll end after lunch with a discussion about how we as a community want to proceed on this conquest. 11
  • 12. What Do I Want to Get Out Of this Summit? • Community-building and the invigoration of new ideas to support better work with socio-cultural data. • Feedback on what methods we are missing and what has merit. • Feedback on what the forward operator needs from a group like this—this includes the warfighter, but also law enforcement officers, NGOs, partner nations, foreign service officers, economic development professionals: anyone working in the field to make a difference. 12