SlideShare a Scribd company logo
The Utility and Feasibility of
Metric Calibration for Basic
Psychological Research
Etienne LeBel
The University of Western Ontario
“…being so disinterested in our variables
that we do not care about their units can
hardly be desirable” (Tukey, 1969, p. 89).
JOHN TUKEY
"...psychologists have to start respecting the
units they work with, or develop
measurement units they can respect
enough so that researchers can agree to
use them" (Cohen, 1994, p. 1001).
JACOB COHEN
Inspirational Quotations
Over-arching Goal
• Both useful and feasible to calibrate the
metric of instruments in basic psychological
research
Outline
• Definitions and basic concepts
• Metric calibration strategies
• Past metric calibration research
• Utility of Metric Calibration
• Feasibility: 3 Empirical demonstration studies
• Limitations and Future Directions
Definitions and Basic Concepts
• Metric: unit of measurement used to quantify
the amount of something
• E.g., Celsius metric (°C)
• Fridge range = -10 to +50 °C
• Freezer range = -50 to +70 °C
Fridge:
Freezer:
Definitions and Basic Concepts
• Metric: unit of measurement used to quantify
the amount of something
• E.g., Beck's Depression Inventory
• Metric = 0 to 63 (BDI; Beck & Steer, 1987)
• E.g., Self-report Depression Scale
• Metric = 25 to 100 (SDS; Zung, 1965)
Definitions and Basic Concepts
• Arbitrary metric:
• Scores not inherently meaningful,
other than relative interpretation
• Formally: Unknown where a
given score locates an individual
on the underlying psychological
dimension
(Blanton & Jaccard, 2006a, 2006b)
SDSBDI
Underlying
Dimension
Freezing
Reference
Point
Thermometer CThermometer B
SDS BDIMDI
Underlying
Dimension
Behavioral
Reference
Point
Thermometer A
Boiling
Reference
Point
100°B
50°B
150°B
50°B50°B
100°B
Cooking
thermometer
Indoor
thermometer
Outdoor
thermometer
Main Strategies of Metric Calibration
• Strategy 1
• Mapping scores to qualitatively distinct behaviors
• Strategy 2
• Mapping scores to gradation of behaviors
(Blanton & Jaccard, 2006a, 2006b; Sechrest et al., 1996)
• Strategy 3
• Experimental approach
• Manipulate construct to extreme levels
Metric Calibration Strategy 1
• Map scores to qualitatively distinct
theoretically-relevant behaviors
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 5 10 15 20 25 30 35 40
Beck Depression Inventory Scores
ProbabilityofSuicide
Attempt
Underlying
Dimension
Behavioral
Reference
Point
Metric Calibration Strategy 2
• Map scores to gradations of theoretically-
relevant behaviors
Underlying
Dimension
Ref. Point
8 Hrs/day
Ref. Point
2 Hrs/day
Ref. Point
12 Hrs/day
Ideal Characteristics of Behavioral Reference Points
• Theoretically-relevant
• Interpretationally clear (e.g., 1 or 0; hrs/day)
• Objective
• Unambiguous construct-wise
• Also, theoretically-configured context
Past Metric Calibration Research
• Specific areas of applied psychology:
• Clinical psychology
(Kazdin, 1999, 2001; Harman et al., 2001; Sechrest et al.,
1996)
• Sport psychology
(Andersen, McCullagh, & Wilson, 2007)
• Forensic psychology
(Pirelli et al., 2011; Hanson, 2009; Hanson et al., in press)
• Arbitrary metrics in psychology
(Blanton & Jaccard, 2006a, 2006b)
Utility of Metric Calibration
1. Help in the interpretation of data
a. Enhance interpretability of statistical effects
b. Facilitate extraction of more information from data
patterns
c. Help overcome limitations of NHST
2. Facilitate construct validity research
a. Help shed brighter light on psychological constructs
b. Help with conceptual challenges (e.g., construct definition)
c. Benchmark for detecting problems/improving
measures
Utility of Metric Calibration
3. Contribute to theoretical development
a. Facilitate theoretical debates involving absolute claims
b. Allow more precise theorizing via enhanced scientific
language
c. Preliminary platform for quantitative testing of theories
(Meehl, 1978)
3. Facilitate general accumulation of knowledge
a. Calibration findings valuable information in their own
right
b. Guiding framework for cataloguing magnitude of
psychological effects
c. Facilitate phenomenon-based research (Rozin, 2001)
Feasibility of Metric Calibration
• Empirical demonstration studies
• Study 1: Need for cognition (NFC), task
persistence (TP), conscientiousness
• Study 2: Self-enhancement
• Study 3: Risk-taking
Study 1: NFC and TP
• Participants
• 94 UWO introductory psychology undergraduates
• 69 females, 25 males (age = 18.5, SD = 2.2)
• Procedure & Materials
• Need for cognition measure
• Task persistence measure
• Word association decision task
• Anagram Persistence task
• Demographics & Debriefing questions
Study 1: Materials
• Need for cognition (NFC)
• Tendency to engage in cognitively effortful
activities and enjoy thinking in its own right
(Cacioppo & Petty, 1982)
• 18-item scale (Cacioppo, Petty, & Kao, 1984)
• E.g. item: “I find satisfaction in deliberating hard for
long hours.”
• E.g. item: “Thinking is not my idea of fun” (R )
1= Extremely
Uncharacteristic
2 = Somewhat
Uncharacteristic
3 = Uncertain 4 = Somewhat
Characteristic
5 = Extremely
Characteristic
Study 1: Materials
• NFC behavioral reference point
• Cognitively effortful (vs. simpler) Remotes
Association Task (RAT) (Mednick & Mednick, 1967)
Study 1: Materials
• Task persistence
• Tendency to persist in an effortful behavior or frustration-
inducing activity (Steinberg et al., 2007)
• 2-item self-report measure (Steinberg et al., 2007)
• Item 1: “I will keep trying the same thing over again even when I
have not had success the first time”
• Item 2: “I will often continue to work on something, even after
other people have given up.”
1= Very untrue,
not at all like
me
2 = Somewhat
untrue or not
like me
3 = Somewhat
true or like
me
4 = Very true,
very much
like me
Study 1: Materials
• Task persistence behavioral reference point
• Anagram persistence task
(Brandon et al., 2003; Quinn et al., 1996)
Study 1: Results: NFC
Wald’s χ2
= 9.71, B = 1.20, odds ratio (OR) = 3.33, p < .002 Underlying
Dimension
Behavioral
Reference
Point
5
4
3
2
1
NFC
Task 1: 62%
Task 2: 38%
Study 1: Results: Task Persistence
Linear: B = 0.18, β = r = .15, p < .15
Cubic model: F(3, 90) = 2.00, p < .10
Study 1: Discussion
• Enhance MMR analyses
• Re-analysis of O’Hara et al. (2009)
Conventional +/- 1 SD approach Using calibrated values
(75% NFC
behavior)
(25% NFC
behavior)
NFC scores centered on 3.8 (50% NFC behavior)
Study 2 Demonstration
• Self-enhancement measures
• Background context
• Pan-cultural self-enhancement debate
(Sedikides et al., 2003; Heine, 2005)
Study 2
• Participants
• 97 UWO introductory psychology undergraduates
• 50 females, 47 males (age = 18.9, SD = 1.3)
• Procedure & Materials
• 2 self-enhancement measures
• Filler task (RAT)
• Over-claiming technique
• Balanced Inventory of Desirable Responding
• Demographics & Debriefing questions
Study 2: Materials
• Self-enhancement
• Tendency to view characteristics of oneself in an
overly positive manner (Hogan & Nicholson, 1988)
• Better-than-average judgments
(Alicke et al., 1995; Gaertner et al., 2008)
• Rate extent to which each listed
trait describes yourself relative
to the average Western student
of your own age and gender
POSITIVE:
dependable
intelligent
considerate
observant
polite
respectful
cooperative
reliable
friendly
creative
NEGATIVE:
gullible
disobedient
snobbish
lazy
disrespectful
mean
unforgiving
vain
uncivil
unpleasant
1 = Much worse than
the average university
student of my age and
gender
4 = As well as the
average university
student of my age and
gender
7 = Much better than
the average university
student of my age and
gender
Study 2: Materials
• Self-enhancement behavioral reference point
• Over-claiming technique variant (OCT; Paulhus et al., 2003)
• 150 items (10 categories of 15 items)
• 3 non-existent items (foils) per category; 30 foils total
• Behavioral index: # of foils claimed as familiar
PLEASE INDICATE FOR EACH ITEM
WHETHER YOU ARE FAMILIAR WITH
THE ITEM OR NOT, BY CLICKING THE
APPROPRIATE RESPONSE OPTION:
0 = Never heard of it
1 = Familiar with it
Study 2: Results
Linear: B = 1.88, β = r = .29, p < .004
Cubic model: F(3, 94)= 5.91, p < .004
Study 3 Demonstration
• Risk-taking measures
• Demonstrate metric calibration for:
• Measures capturing state-like constructs
• Behavioral measures
Study 3
• Participants
• 99 individuals from UWO campus
• Compensated $5 + earnings in BART task
• 39 females, 58 males, 2 non-specified (age = 24.5, SD = 5.5)
• Procedure & Materials
• Balloon Analogue Risk Task (BART)
• Columbia Card Task (CCT)
• Risky gambles Lottery task
• Two self-report risk-taking measures
• Demographics & Debriefing questions
Study 3: Materials
• Risk-taking
• Behavior involving possibility of gains but with
potential negative consequences
(Ben-zur & Zeidner, 2009; Lejuez et al., 2002)
• Balloon Analogue Risk Task (BART)
(Lejuez et al., 2002)
• Ps inflate 30 simulated balloons onscreen
• Each balloon pump worth 1 cent
• If balloon explodes, money is lost for that trial
• Scoring: mean # of pumps (non-exploding trials)
Study 3: Materials
• Columbia Card Task (CCT) – hot version
(Figner et al., 2009)
• Ps sequentially turn over cards in 4 x 8 array
• Accumulate as many points as possible
• Can continue unless loss card turned
Study 3: Materials
• Behavioral reference points
• Risky gambles in lottery risk task (Hsee & Weber, 1999)
• If Option B selected, experimenter would actually flip a
coin
• Risky gambles on lotteries with larger sure bets
reflective of higher risk-taking reference point
Lottery Option A Option B
1 $6 for certain Flip a coin. Receive $10 if heads, receive $0 if tails.
2 $2 for certain Flip a coin. Receive $10 if heads, receive $0 if tails.
3 $8 for certain Flip a coin. Receive $10 if heads, receive $0 if tails.
4 $5 for certain Flip a coin. Receive $10 if heads, receive $0 if tails.
5 $4 for certain Flip a coin. Receive $10 if heads, receive $0 if tails.
Study 3: Results: BART
Wald’s χ2
= 4.85, B = .03, odds ratio (OR) = 1.03, p < .03
Study 3: Results: CCT
$4 safe bet: Wald’s χ2
= 3.24, B = .08, odds ratio (OR) = 1.08, p < .07
$6 safe bet: Wald’s χ2
= 5.78, B = .30, odds ratio (OR) = 1.35, p < .02
Study 3: Discussion
• BART & CCT calibrated to
common $4 reference point
• Implication:
• Enhanced interpretation of data
patterns
• Proposed benefit 1. b) extraction
of more information
Underlying
Dimension
$4
Reference
Point
BART
30
25
20
15
10
5
0
90
80
70
60
50
40
30
20
10
0
CCT
10°R10°R
15°R
20°R
13°R
Limitations & Caveats
• Small sample sizes
• Consensus re: reference points
Future Directions
• Richer behavioral reference points
• E.g., EAR (Mehl et al., 2002)
• E.g., Eye-tracking
• Experimental approach
• Capture behavioral manifestations beyond
naturally-occurring levels
• Item Response Theory approach (Lord, 1980)
• Model distinct and ordered behavioral reference
points
END
• Thanks to all who have helped:
• Conceptual:
• Bertram, Kurt, Chris, Paul, Yang
• Data collection:
• Scott Leith
• Assigning Cohen (1994):
• Lorne

More Related Content

DOCX
Problem solving in Psychology perspect
PPTX
Action research data analysis
PPTX
TS4-5: Yuan Ma from Japan Advanced Institute of Science and Technology
PDF
Utilizing neuro psychological principles to develop compelling products
PPT
Psychology Chapter 2
DOCX
Research Methodology - introduction
DOCX
Research Methodology - Chapter 2
ODP
Research Methods in Psychology
Problem solving in Psychology perspect
Action research data analysis
TS4-5: Yuan Ma from Japan Advanced Institute of Science and Technology
Utilizing neuro psychological principles to develop compelling products
Psychology Chapter 2
Research Methodology - introduction
Research Methodology - Chapter 2
Research Methods in Psychology

What's hot (20)

PPTX
Research methods in psychology 1
PPTX
3.2 introduction to research
PDF
Bauermeister and Bunce GHQ CAC2014_FINAL_Print
PDF
17 Sep 25 NIPS Attention & Consciousness
PPT
Posttraumatic Stress Disorder (PTSD) Symptomatology as a Mediator Between Chi...
PPT
kgavura 1 scientific method
PPT
Research Methods in Psychology
PPT
Chapter2 the methods_of_psychological_research
PPT
MELJUN CORTES research lecture series.
PPT
AP Psychology - Research Methods
PPTX
6. efficacy of emotional freedom techniques ab
PPT
Psychology 101: Chapter2
PPTX
Educational Research methods and Tools
PDF
Academic stress test anxiety and performance in a chinese high
PPTX
Psychological Research
PPTX
1 personality and the scientific outlook
PPTX
Development of Multidimensional Scales using Structural Equation Modeling
PPT
2020 introduction to_psychotherapies_2020
PPTX
Mtot practical research 1 demonstration
PPTX
Kinds & classification of research
Research methods in psychology 1
3.2 introduction to research
Bauermeister and Bunce GHQ CAC2014_FINAL_Print
17 Sep 25 NIPS Attention & Consciousness
Posttraumatic Stress Disorder (PTSD) Symptomatology as a Mediator Between Chi...
kgavura 1 scientific method
Research Methods in Psychology
Chapter2 the methods_of_psychological_research
MELJUN CORTES research lecture series.
AP Psychology - Research Methods
6. efficacy of emotional freedom techniques ab
Psychology 101: Chapter2
Educational Research methods and Tools
Academic stress test anxiety and performance in a chinese high
Psychological Research
1 personality and the scientific outlook
Development of Multidimensional Scales using Structural Equation Modeling
2020 introduction to_psychotherapies_2020
Mtot practical research 1 demonstration
Kinds & classification of research
Ad

Similar to Metric Calibration of Psychological Instruments (Dissertation Senate presentation) (20)

PDF
2019 NASSPD Conference Oral Presentation
PDF
2019 NASSPD Conference Oral Presentation
PPTX
Application of the Rasch Model in Assessing and Streamlining an Instrument Me...
PPTX
Scale Development in Research Methodology.pptx
PPTX
Methodology and IRB/URR
PPTX
TCI in primary care - SEM (2006)
DOC
SociologyExchange.co.uk Shared Resource
PPTX
Introduction qualitative research_methodology_with_animation_fmic
PPTX
Meta-Analysis -- Introduction.pptx
PPT
SociologyExchange.co.uk Shared Resource
PDF
Research 101: Scientific Research Designs
PPTX
Module 1.pptx
PPTX
Design of qualitative research
PPT
Research questions and design
PPTX
Research design
PPT
Developing affective constructs
PPTX
RESEARCH METHODLOGY final 28-2-16.pptx
PPT
Powerpoint Presentation: research design using quantitative method
PPTX
Qualitative approaches to learning analytics
PPSX
Introduction to quantitative method of research
2019 NASSPD Conference Oral Presentation
2019 NASSPD Conference Oral Presentation
Application of the Rasch Model in Assessing and Streamlining an Instrument Me...
Scale Development in Research Methodology.pptx
Methodology and IRB/URR
TCI in primary care - SEM (2006)
SociologyExchange.co.uk Shared Resource
Introduction qualitative research_methodology_with_animation_fmic
Meta-Analysis -- Introduction.pptx
SociologyExchange.co.uk Shared Resource
Research 101: Scientific Research Designs
Module 1.pptx
Design of qualitative research
Research questions and design
Research design
Developing affective constructs
RESEARCH METHODLOGY final 28-2-16.pptx
Powerpoint Presentation: research design using quantitative method
Qualitative approaches to learning analytics
Introduction to quantitative method of research
Ad

Recently uploaded (20)

PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PPTX
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
BIOMOLECULES PPT........................
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PDF
Sciences of Europe No 170 (2025)
PPTX
Introcution to Microbes Burton's Biology for the Health
PPTX
Biomechanics of the Hip - Basic Science.pptx
PDF
An interstellar mission to test astrophysical black holes
PDF
The Land of Punt — A research by Dhani Irwanto
PPTX
CORDINATION COMPOUND AND ITS APPLICATIONS
PPT
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
PPTX
Application of enzymes in medicine (2).pptx
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPT
6.1 High Risk New Born. Padetric health ppt
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
Overview of calcium in human muscles.pptx
PPT
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
TOTAL hIP ARTHROPLASTY Presentation.pptx
BIOMOLECULES PPT........................
lecture 2026 of Sjogren's syndrome l .pdf
Sciences of Europe No 170 (2025)
Introcution to Microbes Burton's Biology for the Health
Biomechanics of the Hip - Basic Science.pptx
An interstellar mission to test astrophysical black holes
The Land of Punt — A research by Dhani Irwanto
CORDINATION COMPOUND AND ITS APPLICATIONS
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
Application of enzymes in medicine (2).pptx
Phytochemical Investigation of Miliusa longipes.pdf
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
6.1 High Risk New Born. Padetric health ppt
7. General Toxicologyfor clinical phrmacy.pptx
Overview of calcium in human muscles.pptx
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine

Metric Calibration of Psychological Instruments (Dissertation Senate presentation)

  • 1. The Utility and Feasibility of Metric Calibration for Basic Psychological Research Etienne LeBel The University of Western Ontario
  • 2. “…being so disinterested in our variables that we do not care about their units can hardly be desirable” (Tukey, 1969, p. 89). JOHN TUKEY "...psychologists have to start respecting the units they work with, or develop measurement units they can respect enough so that researchers can agree to use them" (Cohen, 1994, p. 1001). JACOB COHEN Inspirational Quotations
  • 3. Over-arching Goal • Both useful and feasible to calibrate the metric of instruments in basic psychological research
  • 4. Outline • Definitions and basic concepts • Metric calibration strategies • Past metric calibration research • Utility of Metric Calibration • Feasibility: 3 Empirical demonstration studies • Limitations and Future Directions
  • 5. Definitions and Basic Concepts • Metric: unit of measurement used to quantify the amount of something • E.g., Celsius metric (°C) • Fridge range = -10 to +50 °C • Freezer range = -50 to +70 °C Fridge: Freezer:
  • 6. Definitions and Basic Concepts • Metric: unit of measurement used to quantify the amount of something • E.g., Beck's Depression Inventory • Metric = 0 to 63 (BDI; Beck & Steer, 1987) • E.g., Self-report Depression Scale • Metric = 25 to 100 (SDS; Zung, 1965)
  • 7. Definitions and Basic Concepts • Arbitrary metric: • Scores not inherently meaningful, other than relative interpretation • Formally: Unknown where a given score locates an individual on the underlying psychological dimension (Blanton & Jaccard, 2006a, 2006b) SDSBDI
  • 8. Underlying Dimension Freezing Reference Point Thermometer CThermometer B SDS BDIMDI Underlying Dimension Behavioral Reference Point Thermometer A Boiling Reference Point 100°B 50°B 150°B 50°B50°B 100°B Cooking thermometer Indoor thermometer Outdoor thermometer
  • 9. Main Strategies of Metric Calibration • Strategy 1 • Mapping scores to qualitatively distinct behaviors • Strategy 2 • Mapping scores to gradation of behaviors (Blanton & Jaccard, 2006a, 2006b; Sechrest et al., 1996) • Strategy 3 • Experimental approach • Manipulate construct to extreme levels
  • 10. Metric Calibration Strategy 1 • Map scores to qualitatively distinct theoretically-relevant behaviors 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 5 10 15 20 25 30 35 40 Beck Depression Inventory Scores ProbabilityofSuicide Attempt Underlying Dimension Behavioral Reference Point
  • 11. Metric Calibration Strategy 2 • Map scores to gradations of theoretically- relevant behaviors Underlying Dimension Ref. Point 8 Hrs/day Ref. Point 2 Hrs/day Ref. Point 12 Hrs/day
  • 12. Ideal Characteristics of Behavioral Reference Points • Theoretically-relevant • Interpretationally clear (e.g., 1 or 0; hrs/day) • Objective • Unambiguous construct-wise • Also, theoretically-configured context
  • 13. Past Metric Calibration Research • Specific areas of applied psychology: • Clinical psychology (Kazdin, 1999, 2001; Harman et al., 2001; Sechrest et al., 1996) • Sport psychology (Andersen, McCullagh, & Wilson, 2007) • Forensic psychology (Pirelli et al., 2011; Hanson, 2009; Hanson et al., in press) • Arbitrary metrics in psychology (Blanton & Jaccard, 2006a, 2006b)
  • 14. Utility of Metric Calibration 1. Help in the interpretation of data a. Enhance interpretability of statistical effects b. Facilitate extraction of more information from data patterns c. Help overcome limitations of NHST 2. Facilitate construct validity research a. Help shed brighter light on psychological constructs b. Help with conceptual challenges (e.g., construct definition) c. Benchmark for detecting problems/improving measures
  • 15. Utility of Metric Calibration 3. Contribute to theoretical development a. Facilitate theoretical debates involving absolute claims b. Allow more precise theorizing via enhanced scientific language c. Preliminary platform for quantitative testing of theories (Meehl, 1978) 3. Facilitate general accumulation of knowledge a. Calibration findings valuable information in their own right b. Guiding framework for cataloguing magnitude of psychological effects c. Facilitate phenomenon-based research (Rozin, 2001)
  • 16. Feasibility of Metric Calibration • Empirical demonstration studies • Study 1: Need for cognition (NFC), task persistence (TP), conscientiousness • Study 2: Self-enhancement • Study 3: Risk-taking
  • 17. Study 1: NFC and TP • Participants • 94 UWO introductory psychology undergraduates • 69 females, 25 males (age = 18.5, SD = 2.2) • Procedure & Materials • Need for cognition measure • Task persistence measure • Word association decision task • Anagram Persistence task • Demographics & Debriefing questions
  • 18. Study 1: Materials • Need for cognition (NFC) • Tendency to engage in cognitively effortful activities and enjoy thinking in its own right (Cacioppo & Petty, 1982) • 18-item scale (Cacioppo, Petty, & Kao, 1984) • E.g. item: “I find satisfaction in deliberating hard for long hours.” • E.g. item: “Thinking is not my idea of fun” (R ) 1= Extremely Uncharacteristic 2 = Somewhat Uncharacteristic 3 = Uncertain 4 = Somewhat Characteristic 5 = Extremely Characteristic
  • 19. Study 1: Materials • NFC behavioral reference point • Cognitively effortful (vs. simpler) Remotes Association Task (RAT) (Mednick & Mednick, 1967)
  • 20. Study 1: Materials • Task persistence • Tendency to persist in an effortful behavior or frustration- inducing activity (Steinberg et al., 2007) • 2-item self-report measure (Steinberg et al., 2007) • Item 1: “I will keep trying the same thing over again even when I have not had success the first time” • Item 2: “I will often continue to work on something, even after other people have given up.” 1= Very untrue, not at all like me 2 = Somewhat untrue or not like me 3 = Somewhat true or like me 4 = Very true, very much like me
  • 21. Study 1: Materials • Task persistence behavioral reference point • Anagram persistence task (Brandon et al., 2003; Quinn et al., 1996)
  • 22. Study 1: Results: NFC Wald’s χ2 = 9.71, B = 1.20, odds ratio (OR) = 3.33, p < .002 Underlying Dimension Behavioral Reference Point 5 4 3 2 1 NFC Task 1: 62% Task 2: 38%
  • 23. Study 1: Results: Task Persistence Linear: B = 0.18, β = r = .15, p < .15 Cubic model: F(3, 90) = 2.00, p < .10
  • 24. Study 1: Discussion • Enhance MMR analyses • Re-analysis of O’Hara et al. (2009) Conventional +/- 1 SD approach Using calibrated values (75% NFC behavior) (25% NFC behavior) NFC scores centered on 3.8 (50% NFC behavior)
  • 25. Study 2 Demonstration • Self-enhancement measures • Background context • Pan-cultural self-enhancement debate (Sedikides et al., 2003; Heine, 2005)
  • 26. Study 2 • Participants • 97 UWO introductory psychology undergraduates • 50 females, 47 males (age = 18.9, SD = 1.3) • Procedure & Materials • 2 self-enhancement measures • Filler task (RAT) • Over-claiming technique • Balanced Inventory of Desirable Responding • Demographics & Debriefing questions
  • 27. Study 2: Materials • Self-enhancement • Tendency to view characteristics of oneself in an overly positive manner (Hogan & Nicholson, 1988) • Better-than-average judgments (Alicke et al., 1995; Gaertner et al., 2008) • Rate extent to which each listed trait describes yourself relative to the average Western student of your own age and gender POSITIVE: dependable intelligent considerate observant polite respectful cooperative reliable friendly creative NEGATIVE: gullible disobedient snobbish lazy disrespectful mean unforgiving vain uncivil unpleasant 1 = Much worse than the average university student of my age and gender 4 = As well as the average university student of my age and gender 7 = Much better than the average university student of my age and gender
  • 28. Study 2: Materials • Self-enhancement behavioral reference point • Over-claiming technique variant (OCT; Paulhus et al., 2003) • 150 items (10 categories of 15 items) • 3 non-existent items (foils) per category; 30 foils total • Behavioral index: # of foils claimed as familiar PLEASE INDICATE FOR EACH ITEM WHETHER YOU ARE FAMILIAR WITH THE ITEM OR NOT, BY CLICKING THE APPROPRIATE RESPONSE OPTION: 0 = Never heard of it 1 = Familiar with it
  • 29. Study 2: Results Linear: B = 1.88, β = r = .29, p < .004 Cubic model: F(3, 94)= 5.91, p < .004
  • 30. Study 3 Demonstration • Risk-taking measures • Demonstrate metric calibration for: • Measures capturing state-like constructs • Behavioral measures
  • 31. Study 3 • Participants • 99 individuals from UWO campus • Compensated $5 + earnings in BART task • 39 females, 58 males, 2 non-specified (age = 24.5, SD = 5.5) • Procedure & Materials • Balloon Analogue Risk Task (BART) • Columbia Card Task (CCT) • Risky gambles Lottery task • Two self-report risk-taking measures • Demographics & Debriefing questions
  • 32. Study 3: Materials • Risk-taking • Behavior involving possibility of gains but with potential negative consequences (Ben-zur & Zeidner, 2009; Lejuez et al., 2002) • Balloon Analogue Risk Task (BART) (Lejuez et al., 2002) • Ps inflate 30 simulated balloons onscreen • Each balloon pump worth 1 cent • If balloon explodes, money is lost for that trial • Scoring: mean # of pumps (non-exploding trials)
  • 33. Study 3: Materials • Columbia Card Task (CCT) – hot version (Figner et al., 2009) • Ps sequentially turn over cards in 4 x 8 array • Accumulate as many points as possible • Can continue unless loss card turned
  • 34. Study 3: Materials • Behavioral reference points • Risky gambles in lottery risk task (Hsee & Weber, 1999) • If Option B selected, experimenter would actually flip a coin • Risky gambles on lotteries with larger sure bets reflective of higher risk-taking reference point Lottery Option A Option B 1 $6 for certain Flip a coin. Receive $10 if heads, receive $0 if tails. 2 $2 for certain Flip a coin. Receive $10 if heads, receive $0 if tails. 3 $8 for certain Flip a coin. Receive $10 if heads, receive $0 if tails. 4 $5 for certain Flip a coin. Receive $10 if heads, receive $0 if tails. 5 $4 for certain Flip a coin. Receive $10 if heads, receive $0 if tails.
  • 35. Study 3: Results: BART Wald’s χ2 = 4.85, B = .03, odds ratio (OR) = 1.03, p < .03
  • 36. Study 3: Results: CCT $4 safe bet: Wald’s χ2 = 3.24, B = .08, odds ratio (OR) = 1.08, p < .07 $6 safe bet: Wald’s χ2 = 5.78, B = .30, odds ratio (OR) = 1.35, p < .02
  • 37. Study 3: Discussion • BART & CCT calibrated to common $4 reference point • Implication: • Enhanced interpretation of data patterns • Proposed benefit 1. b) extraction of more information Underlying Dimension $4 Reference Point BART 30 25 20 15 10 5 0 90 80 70 60 50 40 30 20 10 0 CCT 10°R10°R 15°R 20°R 13°R
  • 38. Limitations & Caveats • Small sample sizes • Consensus re: reference points
  • 39. Future Directions • Richer behavioral reference points • E.g., EAR (Mehl et al., 2002) • E.g., Eye-tracking • Experimental approach • Capture behavioral manifestations beyond naturally-occurring levels • Item Response Theory approach (Lord, 1980) • Model distinct and ordered behavioral reference points
  • 40. END • Thanks to all who have helped: • Conceptual: • Bertram, Kurt, Chris, Paul, Yang • Data collection: • Scott Leith • Assigning Cohen (1994): • Lorne

Editor's Notes

  • #4: -Argue that it is both useful and feasible calibrate the metric of instruments in basic psychological research, AS TO RENDER THE METRIC OF OUR INSTRUMENTS NON-ARBITRARY -By useful I mean that metric calibration can help us with (enhance) data interpretation, and construct validity, contribute to theoretical development, and facilitate general accumulation of knowledge. We’ll get back to these later. -[Also, I want to briefly mention at this point that metric calibration is a fairly unchartered territory in psychology given that only a handful of conceptual and empirical papers exist on metric calibration, all of which have been in specific applied areas of psychology. Hence, my contribution involves elaborating on the much BROADER utility and feasibility of metric calibration for psychological research more generally.]
  • #6: [start building elements of the diagram immediately here..&amp;gt;!!] [actually can translate my dual-probe thermometer into the conceptual diagram and it’s perfect because they can represent Thermometer B &amp; C..]
  • #7: [start building elements of the diagram immediately here..&amp;gt;!!] [actually can translate my dual-probe thermometer into the conceptual diagram and it’s perfect because they can represent Thermometer B &amp; C..]
  • #8: An interesting fact about metrics is that virtually all measures in psychology have a metric which can be considered arbitrary. And so returning to our depression instruments, informally this means that the scores from the depression instruments are not inherently meaningful in themselves, other than a relative interpretation…blah,blah,blah Then after stating B&amp;J’s definition of arbitrary metrics, can further draw parallels between metric of thermometers and metrics of depression instruments, and then bring in the idea of reference points (and add to diagram), and then mention basic metric calibration idea of connecting scores to a common reference point as to render metric non-arbitrary (And then explicitly state that this is the basic idea of metric calibration and is the focus of my dissertation [my research problem]). Then can re-iterate my over-arching goal.
  • #9: To clarify this idea, and further unpack the nature of arbitrary metrics, I will return to the more concrete world of thermometers. Imagine it’s the year 1600 and you have the following three thermometers…. [actually can translate my dual-probe thermometer into the conceptual diagram and it’s perfect because they can represent Thermometer B &amp; C..] And this is the basic essence of metric calibration: RESEARCH PROBLEM:
  • #11: Map observed scores to qualitatively distinct theoretically-relevant behaviors, specifically configured to reflect particular locations of the underlying dimension (can cover consensus issue later when describing ideal characteristics) e.g., presence or absence of theoretically-relevant behavior
  • #13: And what I mean by “specifically-configured” is that ideally, behaviors chosen to serve as external reference points should possess the following chracteristics: (above &amp; beyond the fact that ideally the behavior should be configured and assesses such that it can be argued to reflect a particular location on the underlying dimension)
  • #15: Entries in bold will be elaborated upon and/or demonstrated using preliminary results from my empirical demonstration studies.
  • #18: The main goal of Study 1 was to provide a preliminary demonstration of the metric calibration approach applied to instruments of constructs commonly studied in psychology.
  • #20: It was explained that Task 1 was less cognitively challenging in the sense that the answer would relate to the 3 stem words in the *SAME* way WHEREAS in Task 2 the answer would relate to the 3 stem words in a different way for each word. After seeing these examples (and corresponding answers) Ps decided which task they wanted to complete and proceeded to complete the chosen task.
  • #21: Practically advantageous 2-item self-report measure of task persistence that has been used in past research
  • #22: As a behavioral reference point, I used a commonly used anagram persistence task
  • #23: Briefly mention in passing that NFC scale midpoint of 3 corresponds only to a probability of about 28% of choosing the cognitively effortful task
  • #24: 1-unit increase in TP scores corresponded to an increase of 11 seconds in actual persistence on the near-impossible anagrams in the APT -cubic function explained approx 3x more variance; replicated in Joseph Ditre and Thomas brandon’s data set, etc……
  • #26: Primary goal of Study 2 was to provide a preliminary demonstration of the feasibility and utility of the metric approach with regard to contributing to thoretical development
  • #27: No restrictions were imposed on participant sex, age, or ethnicity. No experimental conditions were examined, hence all participants completed the same measures and tasks in the same order
  • #29: It was explained that Task 1 was less cognitively challenging in the sense that the answer would relate to the 3 stem words in the *SAME* way WHEREAS in Task 2 the answer would relate to the 3 stem words in a different way for each word. After seeing these examples (and corresponding answers) Ps decided which task they wanted to complete and proceeded to complete the chosen task.
  • #30: 1-unit increase in TP scores corresponded to an increase of 11 seconds in actual persistence on the near-impossible anagrams in the APT -cubic function explained approx 3x more variance; replicated in Joseph Ditre and Thomas brandon’s data set, etc……
  • #31: Primary goal of Study 3 was to demonstrate the utility and feasibility of calibrating the scores of measures capturing predominantly state-like constructs and behavioral measures, in order to better demonstrate the proposed benefits relevant in experimental contexts
  • #32: No restrictions were imposed on participant sex, age, or ethnicity. No experimental conditions were examined, hence all participants completed the same measures and tasks in the same order
  • #33: Risk-taking is typically defined as the purposive enacting of a behavior that involves the possibility of some postivie consequences or gains but with some potential negative consequences.
  • #34: Risk-taking is typically defined as the purposive enacting of a behavior that involves the possibility of some postivie consequences or gains but with some potential negative consequences.
  • #35: Following Tversky &amp; Kahneman (1981) it was explicitly mentioned that two participants (chosen at random) would actually the money associated with their choices.
  • #36: $4 item: safe bet = 42%, gamble = 56% $6 item: safe bet = 78%, gamble = 20%
  • #38: [so in this sense, there are huge opportunities for ego-enhancement in the area of metric calibration; your name could go down in history when linked to a certain metric for a certain construct… e.g., degrees Olson for the unit of measurement for the attitude construct (though °O doesn’t really look very good)]
  • #40: Could briefly mention in passing, that another reason the approach is valuable is that one can connect convenient cheap measures to inconvenient ecologically valid behavioral manifestations, and then be able to interpret lab results w.r.t. these ecologically valid behaviors (a way to connect basic &amp; applied research)
  • #41: (Non-exhaustive list)
  • #43: Enhancements: More meaningful calibrated values Overcomes sampling error issue Could yield different patters that are theoretically important
  • #44: Provide methodological machinery to more directly tackle theoretical questions involving absolute claims…. Enhancements: More meaningful calibrated values Overcomes sampling error issue Could yield different patters that are theoretically important