We will see two case studies like marine  Mammal science and psychiatric genetics.
Subjects: 41 interviewees as principal investigators, junior researchers, and technicians. The purpose of the projects the researchers are involving is tracking each mammals they are studying.  Place: 13 different laboratories in the U.S. and Europe.  Period of their projects: about 40 years from 1970s.  These researchers are scientists rather than social scientists, but their experience on  Organizing the data is more likely social science. At first, it seemed helpful. For example, a certain school of dolphins (200-500 dolphins)  Stay at one area and only researchers working near the area could study the dolphins.  After People gather and share their information, researchers living in different area could  Study the dolphins as well.
Leah Tull: Well, honestly, I’m very protective about it… I guess it rather bugs me that I have to do the work and everyone always asks me for a CD… it’s out scientific  Study. Others also say that it is hard to organize the data with considering how others will  Systemize and standardize other data. It is difficult to know how deep, where to, and To whom researchers should distribute and share the information is unsolved question In the past, a small scientific group held a project. People could get information in informal gatherings based on common attendance at a university or through shared contacts. But For now, some people are putting efforts to build much larger databases like a project  Named SPLASH SPLASH involve over 300 scientists from 50 research groups working in various areas in the  Pacific Ocean (Calambokidis et al., 2007).
Background: scientists are using photographs to distinguish each mammal. Problem 1: most pictures prior to 2003 are in the form of slides, black and  White negatives, or black and white prints. Since 2003, many  Scientists have switched to digital photography, and have used  Different idiosyncratic systems to cope with digital catalogs.  Problem 2: that the amount and the range of the data are too broad  Because the purpose of collecting the data is tracking the  Mammal rather than organizing the data. It means
Psychiatric genetics Subjects: around 50 researchers from institutions expanded from  4 to 11. Each laboratory has one to five researchers working.  Period: about 20 years The project the researchers are involving is BP (bipolar disorder) project.  BP project was selected to GAIN (Genetic Association Identification Network).  Rather than a funded group, GAIN is a group encouraging researchers to organize and  Share the data to help not only others but also themselves in return.
Background: researchers have collected blood, genetic data, and phenotypic data on thousand of  subjects.  Problem 1: the amount and the range of the data is too broad.  For example, the biggerst data is over 100 pages’ interview data. Each of them took 4-6 hours  And it includes approximately 2600 variables. Additionally, each includes a trained clinician’s  Analysis, family history, medical records, and other information. Each subject has multiple best  Estimates from at least two clinicians plus the interviewer and an editor.   Problem 2: the data was encoded in three different versions of the interview instrument. First data was collected by Oracle database. Second data was organized by a Paradox database.  Third data was managed by a proprietary database with using labtops and PCs.    Problem 3: The three data systems are not compatible. Problem 4: the diagnoses are conducted by different system.  The earliest diagnoses use a combined DSM-IIIR/RDC systems, while the latest subjects are implEmented with DSM-IV. Problem 5:  Variables in the three versions are confusing. All three versions are converted from their  Original storage into SAS files, but their variable names are not consistence. For example, one  Variable is “I1120” in the first set, “Number_of_manic_episodes” in the second set, “V756” in the  Third set. To organize these data, people are required to know the professional knowledge with  Organizing information skills.
As we can see from the two cases, there are hardships  to go to big science from a small scientific project.  The researchers from SPLASH and BP collaborations  Are trained for their scientific task, but for organizing  Information. If they were trained for organizing  Information, it would be a help.    In SPLASH, the new system contain three versions of  Systems is not made for expanding more. If it wants  To expand, it will have some incompatible problems. In BP, even though the numbers of researchers were less than SPLASH, there were problems. They had difficulties  in computer programming. For example, they had hards hips to implement EAV with various variables.  Furthermore, SAS does not provide ampersand.  So, “Total Manic & Depressive Episodes” in paradox  Became “total_manic_depressive episodes” in SAS.
Style of social interaction in the project  SPLASH didn’t “try to force them to do it one way” Jacob Tipton BP project was always very decentralized.  Both SPLASH and BP projects have non-dogmatic leaders.    This flexible and decentralized form of leadership is common among scientific and creative teams(Mumford, Scott, Gaddis, & Strange, 2002) and is not inherently problematic.  Science relies on the freedom of scientists to innovate (Bush, 1945; Gordon, Marquis, & Anderson, 1962), although some recent  work suggests that these patterns are chaning in the face of calls for  measures of increased accountability and relevance for scientific work  (Demeritt, 2000; Harman, 2003).    The point is, to what extent data management should require to dictate  and to what extent should individual scientists be allowed to ignore or  skill issues of compatibility and data availability.
Derek de Solla Price (1963) identified some of these issues  four decades ago in his work thay helped to develop the  field of Scientometrics.  More recently, scholars in computer science have  addressed issues of scalability (Simmhan, Plale, &  Gannon, 2005; Zheng, Venters, & Cornford, 2007). Any number of papers discussing the implementation of  Grid enabled projects have identifies scalability as one  Of the key issues developers have had to deal with (Pakhira, Fowler, Sastry, & Perring, 2005; Shimojo, Kalia, Nakano, & Vashishta, 2001).  Only recently, have researchers begun to pay attention  to how small scientific projects negotiate the changes  required as they move towards becoming large,  collaborative scientific projects (Calson & Anderson, 2006; Walsh & Maloney, 2007). Scientists attempt to sustain these collaborations over time (Bos et al., 2007).

More Related Content

PPT
Evolution of e-Research
PPTX
E Research Chapter 1
PDF
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
PPTX
Digital Scholarship and Open Science need a digital infrastructure
PPTX
Machines are people too
PPT
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
PPTX
The Roots: Linked data and the foundations of successful Agriculture Data
PPTX
Upgrading the Scholarly Infrastructure
Evolution of e-Research
E Research Chapter 1
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Digital Scholarship and Open Science need a digital infrastructure
Machines are people too
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
The Roots: Linked data and the foundations of successful Agriculture Data
Upgrading the Scholarly Infrastructure

What's hot (12)

PPTX
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
PPTX
Databasing the world
PPTX
STM Innovations Seminar London
DOC
Sci 2011 big_data(30_may13)2nd revised _ loet
PDF
Digital Scholar Webinar: Open reproducible research
PDF
Reproducible research: First steps.
PPTX
RARE and FAIR Science: Reproducibility and Research Objects
PPT
PDF
PLOS Visualization Project
PPTX
HKU Data Curation MLIM7350 Class 8
PPTX
Bioinformatics in the Era of Open Science and Big Data
PDF
MLconf NYC Pek Lum
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Databasing the world
STM Innovations Seminar London
Sci 2011 big_data(30_may13)2nd revised _ loet
Digital Scholar Webinar: Open reproducible research
Reproducible research: First steps.
RARE and FAIR Science: Reproducibility and Research Objects
PLOS Visualization Project
HKU Data Curation MLIM7350 Class 8
Bioinformatics in the Era of Open Science and Big Data
MLconf NYC Pek Lum
Ad

Viewers also liked (20)

DOC
Research designproposaltechnologysymptomsmadison
PPT
Presentation gr general
PDF
T20 souvenir file 1
PPTX
Ino Inglismaa
PDF
Brother multifunktionsdrucker
PDF
Album Of The Year
PDF
Unleashing rural youth's potential recommendations from national workshop
PPTX
3.2. awco roshdy bekir - alexandria pilot
PPSX
CATALAGO YENNY
PPTX
Move event
PDF
Undergraduate Research
PPTX
Canada australia newzeland gee (1)
PPTX
Presentation jayshree
PPT
Profilaxia da raiva humana 2012 reduzido parte 2
PPT
Boot Camp 2013: Day 3
ODP
Acreditación
PPT
Texas And Campus St R Chart
PDF
PPTX
Headlines
PPTX
Despre cine este vorba
Research designproposaltechnologysymptomsmadison
Presentation gr general
T20 souvenir file 1
Ino Inglismaa
Brother multifunktionsdrucker
Album Of The Year
Unleashing rural youth's potential recommendations from national workshop
3.2. awco roshdy bekir - alexandria pilot
CATALAGO YENNY
Move event
Undergraduate Research
Canada australia newzeland gee (1)
Presentation jayshree
Profilaxia da raiva humana 2012 reduzido parte 2
Boot Camp 2013: Day 3
Acreditación
Texas And Campus St R Chart
Headlines
Despre cine este vorba
Ad

Similar to Moving From Small Science To Big Science (20)

PPTX
Bias and the Data Lifecycle
PPTX
The Evolution of e-Research: Machines, Methods and Music
PPT
Open Data in a Big Data World: easy to say, but hard to do?
PPT
Data curation issues for repositories
PDF
Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona
PDF
Data Science definition
PDF
Let's talk about Data Science
PPTX
Looking for Data: Finding New Science
PDF
On community-standards, data curation and scholarly communication" Stanford M...
PDF
Minimal viable data reuse
PDF
PDF
The world of research data: when should data be closed, shared or open
PDF
4th_paradigm_book_complete_lr
PPT
Wild data: collaborative e-research and university libraries
PPTX
The culture of researchData
PPT
Case study final
PPT
Small Science: First Impressions of Curation Needs. Presentation at Digital L...
PPTX
ContentMine: Mining the Scientific Literature
PDF
Digital Data Sharing: Opportunities and Challenges of Opening Research
PDF
What is the reproducibility crisis in science and what can we do about it?
Bias and the Data Lifecycle
The Evolution of e-Research: Machines, Methods and Music
Open Data in a Big Data World: easy to say, but hard to do?
Data curation issues for repositories
Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona
Data Science definition
Let's talk about Data Science
Looking for Data: Finding New Science
On community-standards, data curation and scholarly communication" Stanford M...
Minimal viable data reuse
The world of research data: when should data be closed, shared or open
4th_paradigm_book_complete_lr
Wild data: collaborative e-research and university libraries
The culture of researchData
Case study final
Small Science: First Impressions of Curation Needs. Presentation at Digital L...
ContentMine: Mining the Scientific Literature
Digital Data Sharing: Opportunities and Challenges of Opening Research
What is the reproducibility crisis in science and what can we do about it?

More from Webometrics Class (20)

PPTX
검색어 대중도, 연결망 분석 - 21021899 김수빈
PPSX
20922266 박경혜
PPT
21013532양몽원
PPTX
21110547김지은
PDF
언론정보학과 4학년 21021863 김귀현
PDF
언론정보학과 21113132 이은혁
PPTX
21110978 박정은
PDF
웹보메트릭스21110569 이지은
PPTX
웹보메트릭스 손혜영
PPTX
웹보메트릭스 2014-1학기 언론정보학과 오지수
PPTX
CJ E&M 계열 채널 웹가시성 분석
PPTX
웹보팀Ppt 에이랜드 마케팅 제안 김보미, 손세욱, 곽동엽, 임유정
PPTX
20130621134459 언론정보학과20722115임유정
PPTX
Zara vs aland
PPTX
20130621103231 페북
PPTX
그래프서치20810587우대식
PPTX
소셜마케팅 5장 유투브마케팅활용
PPTX
유튜브이야기
PPTX
20130506132258 빅데이터시대sns의진화-지용석[1]
PPT
청소년 위기 극복을 위한 빅데이터 기반 정책 시나리오
검색어 대중도, 연결망 분석 - 21021899 김수빈
20922266 박경혜
21013532양몽원
21110547김지은
언론정보학과 4학년 21021863 김귀현
언론정보학과 21113132 이은혁
21110978 박정은
웹보메트릭스21110569 이지은
웹보메트릭스 손혜영
웹보메트릭스 2014-1학기 언론정보학과 오지수
CJ E&M 계열 채널 웹가시성 분석
웹보팀Ppt 에이랜드 마케팅 제안 김보미, 손세욱, 곽동엽, 임유정
20130621134459 언론정보학과20722115임유정
Zara vs aland
20130621103231 페북
그래프서치20810587우대식
소셜마케팅 5장 유투브마케팅활용
유튜브이야기
20130506132258 빅데이터시대sns의진화-지용석[1]
청소년 위기 극복을 위한 빅데이터 기반 정책 시나리오

Moving From Small Science To Big Science

  • 1.  
  • 2. We will see two case studies like marine Mammal science and psychiatric genetics.
  • 3. Subjects: 41 interviewees as principal investigators, junior researchers, and technicians. The purpose of the projects the researchers are involving is tracking each mammals they are studying. Place: 13 different laboratories in the U.S. and Europe. Period of their projects: about 40 years from 1970s. These researchers are scientists rather than social scientists, but their experience on Organizing the data is more likely social science. At first, it seemed helpful. For example, a certain school of dolphins (200-500 dolphins) Stay at one area and only researchers working near the area could study the dolphins. After People gather and share their information, researchers living in different area could Study the dolphins as well.
  • 4. Leah Tull: Well, honestly, I’m very protective about it… I guess it rather bugs me that I have to do the work and everyone always asks me for a CD… it’s out scientific Study. Others also say that it is hard to organize the data with considering how others will Systemize and standardize other data. It is difficult to know how deep, where to, and To whom researchers should distribute and share the information is unsolved question In the past, a small scientific group held a project. People could get information in informal gatherings based on common attendance at a university or through shared contacts. But For now, some people are putting efforts to build much larger databases like a project Named SPLASH SPLASH involve over 300 scientists from 50 research groups working in various areas in the Pacific Ocean (Calambokidis et al., 2007).
  • 5. Background: scientists are using photographs to distinguish each mammal. Problem 1: most pictures prior to 2003 are in the form of slides, black and White negatives, or black and white prints. Since 2003, many Scientists have switched to digital photography, and have used Different idiosyncratic systems to cope with digital catalogs. Problem 2: that the amount and the range of the data are too broad Because the purpose of collecting the data is tracking the Mammal rather than organizing the data. It means
  • 6. Psychiatric genetics Subjects: around 50 researchers from institutions expanded from 4 to 11. Each laboratory has one to five researchers working. Period: about 20 years The project the researchers are involving is BP (bipolar disorder) project. BP project was selected to GAIN (Genetic Association Identification Network). Rather than a funded group, GAIN is a group encouraging researchers to organize and Share the data to help not only others but also themselves in return.
  • 7. Background: researchers have collected blood, genetic data, and phenotypic data on thousand of subjects. Problem 1: the amount and the range of the data is too broad. For example, the biggerst data is over 100 pages’ interview data. Each of them took 4-6 hours And it includes approximately 2600 variables. Additionally, each includes a trained clinician’s Analysis, family history, medical records, and other information. Each subject has multiple best Estimates from at least two clinicians plus the interviewer and an editor.   Problem 2: the data was encoded in three different versions of the interview instrument. First data was collected by Oracle database. Second data was organized by a Paradox database. Third data was managed by a proprietary database with using labtops and PCs.   Problem 3: The three data systems are not compatible. Problem 4: the diagnoses are conducted by different system. The earliest diagnoses use a combined DSM-IIIR/RDC systems, while the latest subjects are implEmented with DSM-IV. Problem 5: Variables in the three versions are confusing. All three versions are converted from their Original storage into SAS files, but their variable names are not consistence. For example, one Variable is “I1120” in the first set, “Number_of_manic_episodes” in the second set, “V756” in the Third set. To organize these data, people are required to know the professional knowledge with Organizing information skills.
  • 8. As we can see from the two cases, there are hardships to go to big science from a small scientific project. The researchers from SPLASH and BP collaborations Are trained for their scientific task, but for organizing Information. If they were trained for organizing Information, it would be a help.   In SPLASH, the new system contain three versions of Systems is not made for expanding more. If it wants To expand, it will have some incompatible problems. In BP, even though the numbers of researchers were less than SPLASH, there were problems. They had difficulties in computer programming. For example, they had hards hips to implement EAV with various variables. Furthermore, SAS does not provide ampersand. So, “Total Manic & Depressive Episodes” in paradox Became “total_manic_depressive episodes” in SAS.
  • 9. Style of social interaction in the project SPLASH didn’t “try to force them to do it one way” Jacob Tipton BP project was always very decentralized. Both SPLASH and BP projects have non-dogmatic leaders.   This flexible and decentralized form of leadership is common among scientific and creative teams(Mumford, Scott, Gaddis, & Strange, 2002) and is not inherently problematic. Science relies on the freedom of scientists to innovate (Bush, 1945; Gordon, Marquis, & Anderson, 1962), although some recent work suggests that these patterns are chaning in the face of calls for measures of increased accountability and relevance for scientific work (Demeritt, 2000; Harman, 2003).   The point is, to what extent data management should require to dictate and to what extent should individual scientists be allowed to ignore or skill issues of compatibility and data availability.
  • 10. Derek de Solla Price (1963) identified some of these issues four decades ago in his work thay helped to develop the field of Scientometrics. More recently, scholars in computer science have addressed issues of scalability (Simmhan, Plale, & Gannon, 2005; Zheng, Venters, & Cornford, 2007). Any number of papers discussing the implementation of Grid enabled projects have identifies scalability as one Of the key issues developers have had to deal with (Pakhira, Fowler, Sastry, & Perring, 2005; Shimojo, Kalia, Nakano, & Vashishta, 2001). Only recently, have researchers begun to pay attention to how small scientific projects negotiate the changes required as they move towards becoming large, collaborative scientific projects (Calson & Anderson, 2006; Walsh & Maloney, 2007). Scientists attempt to sustain these collaborations over time (Bos et al., 2007).