SlideShare a Scribd company logo
Understanding statistical word learning
in a social context
Michael C. Frank
Stanford University
Understanding statistical word learning in a social context
“doggie” >doggie
“word* learning”
* With one exception, I mostly will not to talk about “run,”
“blue,” “the,” or ”no”…
0
200
400
600
16 18 20 22 24 26 28 30
Age (months)
SizeofProductiveVocabulary
None
All Data (n = 4797
CDI data at http://wordbank.stanford.edu
Children’s productive vocabulary grows rapidly and
growth accelerates across development.
golabupadotitupiro
bidakugolabupadoti
golabupadotitupiro
bidakugolabupadoti
Saffran, Aslin, & Newport (1996)
Statistical regularities in the child’s environment
are powerful cues to the forms and meanings of
words. Perhaps language learning is driven by
these general mechanisms.
Smith & Yu (2008)
CORRECTED
PROOF
130 2.3. Procedure
131 Infants sat (on their mother’s lap) 3.5 feet in front of screen with the mother’s
132 chair set at the center of the screen. Infants’ direction of eye gaze was recorded from
133 a camera centered at the base of the screen and pointed directly at the child’s eyes.
134 Parents were instructed to keep their own eyes shut through the entire procedure so
135 as to not to influence their infant’s behaviors. A camera directed on the parent
136 through out the procedure confirmed their adherence.
137 There were 30 training slides. Each presented two objects on the screen for 4 s; the
138 onset of the slide was followed 500 ms later by the two words – each said once with a
139 500 ms pause between. Across trials, the temporal order of the words and spatial
140 order of the objects were varied such that there was no relation between temporal
141 order of the words and the spatial position of the referents. Each correct word-object
142 pair occurred 10 times. The two words and two objects appearing together on a slide
143 (and creating the within trial ambiguities and possible spurious correlations) were
144 randomly determined such that each object and each word co-occurred with every
145 other word and every other object at least once across the 30 training trials. The first
146 four training trials each began with the centered presentation of a Sesame Street
147 character (3 s) to orient attention to the screen. After these first four trials, this atten-
148 tion grabbing slide was interspersed every 2–4 trials to maintain attention. The entire
149 training – an effort to teach six word-referent pairs – lasted less than 4 min (30 train-
150 ing slides and 19 interspersed Sesame Street character slides).
151 There were 12 test trials, each 8 seconds. This duration was chosen from pilot
152 studies to optimize the number of participants able to complete all 12 test compar-
Fig. 2. The six stimulus shapes.
L. Smith, C. Yu / Cognition xxx (2007) xxx–xxx 5
COGNIT 1702 No. of Pages 11
18 July 2007 Disk Used
ARTICLE IN PRESS
RRECTED
PROOF
71 Fig. 1 illustrates how cross-trial statistics might work. The learner hears the
72 unknown words ‘‘bat’’ and ‘‘ball’’ in the context of seeing a BAT and BALL. With-
73 out other information, the learner cannot know whether the word form ‘‘ball’’ refers
74 to one or the other visual object. However, if subsequently, while viewing a scene
75 with the potential referents of a BALL and a DOG, the learner hears the words
76 ‘‘ball’’ and ‘‘dog’’ and if the learner can combine the co-occurrence frequencies from
77 the two streams of data across trials, the learner could correctly map ‘‘ball’’ to
78 BALL. This example represents the simplest case of cross-situational statistical
79 learning – two words, two objects, two adjacently informative trials.
80 Several formal simulations of word-referent learning suggest the plausibility of
81 cross-situational word learning in much more complex situations with many words,
82 many possible referents, highly ambiguous individual learning trials, and the statis-
83 tical resolution of the ambiguities only through the accumulation and evaluation of
84 information over many word-referent pairings and many trials (Siskind, 1996; Yu,
85 Ballard, & Aslin, 2005). Consider the more complex case in Table 1. On trial 1, a
86 learner could mistakenly link word A to referent b. On trial 4, the mistake could
87 be corrected, if the system registers that word A occurred on trial 4 without possible
88 referent b, if the cognitive system remembers the prior word-referent pairing, if it
89 registers both co-occurrences and non co-occurrences, and if it calculates the right
90 statistics. Can babies do this?
Fig. 1. Associations among words and referents across two individually ambiguous scenes. If a young
learner calculates co-occurrences frequencies across these two trials, s/he can find the proper mapping of
‘‘Ball’’ to BALL.
L. Smith, C. Yu / Cognition xxx (2007) xxx–xxx 3
18 July 2007 Disk Used
“Cross-situational learning”
But even young children can judge other people’s goals and
intentions…
… and these skills are
correlated with word
learning (Bates, Bruner, Bloom,
Clark, Baldwin, Tomasello).
14 of the significant correlations in this matrix occurred with joint eng
ment in this small window of time between 11 and 13 months. Note also
correlations between joint engagement at 10 months and word compreh
sion at 12, 13, and 14 months approached significance. Together these co
lations show that joint engagement in the middle months (11, 12, 13) is
lated to word comprehension in the early, middle, and later months.
Figure 21 shows the word-comprehension levels of the four jo
engagement styles. As was done in the analysis of gestures, two groups w
175 -
3 150-
bU
125- -E ~
$ C -Midd
s;~2 100-
--*-- Late
2 7 5 - --*-- Ncva
2k3 3 5 0 -
z u
25 -
Age in Months
RGURE21.-Mean language comprehension of the early, middle, late, and neverjo
engagement groups at each time point.
Carpenter et al. (1998)
High JA
Mid JA
No JA
Low JA
Observable outcome [t(14) = 2.71, P = 0.0
looking pattern suggests that when an inten
veyed through speech, infants who saw th
treat the Recipient taking the ring off the fu
with (and possibly the opposite of) the Comm
goal (of stacking a ring on the funnel).
who heard coughing looked equally at t
[F(2, 21) < 1].
To further explore infants’ interpretation
we next compared looking times for the tw
ditions within each outcome type. For the
infants looked longer when the Communic
coughed than when she uttered speech [t(14
r = 0.55]. In the Speech—but not the Cough
Test - Speech or Cough
TestFamiliarization
Test - continued
Failed Action Neutral Interaction
Pretest
“****”
Vouloumanos et al. (2012)
A
B
Baldwin (1991)
How do we integrate these findings?
1. Stats first: Statistical learning joined with
social cognition later (Hollich et al., 2000; Smith & Yu,
2008)
2. Social first: social mappings, no memory
for statistical regularities (Medina et al., 2011;
Trueswell et al., 2013)
3. Social-statistical: statistical learning
operates over social representations
(subject to memory limitations) (Frank,
Goodman, & Tenenbaum, 2009)
NB: hard question! “How it generally works” rather than “can they
do it.” Evidence will be often be indirect.
1. A proposal for social-
statistical word learning
2. Representations in
statistical word learning
3. A broader view of the
lexicon
4. From social statistical
learning to social inference
Look at the doggie!
What a nice doggie
And there’s a pig....
There’s a pig!
“pig” 2 2
“dog” 2 2
Co-occurrence counts
Frank, Tenenbaum, & Fernald (2013), LL&D
learner
0.0
0.2
0.4
0.6
0.8
1.0
M eyes M hands M point
Social cue
FScore
"gray"
gray
uncertainty about
what’s being
talked about in
each utterance
?
dog
uncertainty about
what words mean
pig
But cue-weighting is
not straightforward…
Social cue efficacy
Look at the doggie!
What a nice doggie
And there’s a pig....
There’s a pig!
Pig 2 2
Dog 2 2
Lexicon, without guess
about intention
Pig 2
Dog 2
Lexicon, given correct
guesses about intention
If you know the referent, you can learn words;
If you know the words, you can infer the referent;
With incomplete knowledge about both you can
bootstrap…
Frank, Tenenbaum, & Fernald (2013), LL&D
Frank, Goodman, & Tenenbaum (2008), NIPS; (2009), Psych Science
Model Precision Recall
Association frequency 0.06 0.26
Conditional probability 0.06 0.32
Mutual information 0.06 0.47
Communicative inference 0.66 0.47
W
OL
lexicon objects
words
Pure cross-
situational models
W
O
IL
lexicon
objects
words
speaker’s
intended
referent
Communicative
inference
model
“Statistical learning”: probabilistic
knowledge gleaned from interpretation in
context
1. A proposal for social-
statistical word learning
2. Representations in
statistical word learning
3. A broader view of the
lexicon
4. From social statistical
learning to social inference
Cross-situational learning in adults
elated in any systematic way to the spatial location of the
eferents.
To form each trial, we randomly selected several (2, 3, or 4,
epending on condition) word-referent pairs from the 18 word-
eferent pairs for that condition; across trials in a condition, each
ord and referent were presented six times. That is, over training
ials, the learner experienced six repetitions of each word-
eferent pair. However, given that multiple words and referents
ere presented on each trial, the learner experienced spurious
ssociations that might be expected to make learning from these
mbiguous individual trials difficult. Specifically, on average,
ach word co-occurred with 5.09 incorrect referents in the 2 Â 2
ondition, 8.78 incorrect referents in the 3 Â 3 condition, and
2.22 incorrect referents in the 4 Â 4 condition; these numbers
re proportional to within-trial ambiguity in the three condi-
ons. During training, the probability of the correct referent
iven its name, p(a|A), was 1.0 in all conditions, whereas the
verage probability of irrelevant but co-occurring referents was
205, .231, and .247 in the 2 Â 2, 3 Â 3, and 4 Â 4 conditions,
espectively. Notice that despite the considerable differences
n within-trial uncertainty across conditions, the strength of
he spurious correlations varied only moderately across these
onditions.
Because the same number of word-referent pairs (18) was
aught in each condition, and because we sought, across con-
itions, to keep the number of exposures to each word-referent
air constant, other presentation factors necessarily varied
figure out across trials which word went with which picture.
After training in each condition, subjects received a four-
alternative forced-choice test of learning. On the test, they were
presented with 1 word and 4 pictures and asked to indicate the
picture named by that word. The target picture and the 3 foils
were all drawn from the set of 18 training pictures.
Fig. 1. Test results for the three learning conditions in Experiment 1.
Error bars reflect standard errors. In the 2 Â 2 condition, each trial
presented two words and two possible referents; in the 3 Â 3 condition,
each trial presented three words and three possible referents; and in
the 4 Â 4 condition, each trial presented four words and four possible
referents.
Statistical Word Learning
Yu & Smith (2007)
CORRECTED
PROOFProcedure
nfants sat (on their mother’s lap) 3.5 feet in front of screen with the mother’s
r set at the center of the screen. Infants’ direction of eye gaze was recorded from
mera centered at the base of the screen and pointed directly at the child’s eyes.
ents were instructed to keep their own eyes shut through the entire procedure so
o not to influence their infant’s behaviors. A camera directed on the parent
ugh out the procedure confirmed their adherence.
here were 30 training slides. Each presented two objects on the screen for 4 s; the
et of the slide was followed 500 ms later by the two words – each said once with a
ms pause between. Across trials, the temporal order of the words and spatial
er of the objects were varied such that there was no relation between temporal
er of the words and the spatial position of the referents. Each correct word-object
occurred 10 times. The two words and two objects appearing together on a slide
d creating the within trial ambiguities and possible spurious correlations) were
domly determined such that each object and each word co-occurred with every
r word and every other object at least once across the 30 training trials. The first
training trials each began with the centered presentation of a Sesame Street
acter (3 s) to orient attention to the screen. After these first four trials, this atten-
grabbing slide was interspersed every 2–4 trials to maintain attention. The entire
ning – an effort to teach six word-referent pairs – lasted less than 4 min (30 train-
slides and 19 interspersed Sesame Street character slides).
here were 12 test trials, each 8 seconds. This duration was chosen from pilot
Fig. 2. The six stimulus shapes.
L. Smith, C. Yu / Cognition xxx (2007) xxx–xxx 5
GNIT 1702 No. of Pages 11
July 2007 Disk Used
ARTICLE IN PRESS
“bosa” “gazzer”
TED
PROOF
Procedure
nfants sat (on their mother’s lap) 3.5 feet in front of screen with the mother’s
r set at the center of the screen. Infants’ direction of eye gaze was recorded from
mera centered at the base of the screen and pointed directly at the child’s eyes.
ents were instructed to keep their own eyes shut through the entire procedure so
o not to influence their infant’s behaviors. A camera directed on the parent
ugh out the procedure confirmed their adherence.
here were 30 training slides. Each presented two objects on the screen for 4 s; the
Fig. 2. The six stimulus shapes.
L. Smith, C. Yu / Cognition xxx (2007) xxx–xxx 5
GNIT 1702 No. of Pages 11
July 2007 Disk Used
ARTICLE IN PRESS
“bosa” “gazzer” “manu”
OOF
L. Smith, C. Yu / Cognition xxx (2007) xxx–xxx 5
GNIT 1702 No. of Pages 11
July 2007 Disk Used
ARTICLE IN PRESS
D
PROOF
130 2.3. Procedure
131 Infants sat (on their mother’s lap) 3.5 feet in front of screen with the mother’s
132 chair set at the center of the screen. Infants’ direction of eye gaze was recorded from
Fig. 2. The six stimulus shapes.
L. Smith, C. Yu / Cognition xxx (2007) xxx–xxx 5
COGNIT 1702 No. of Pages 11
18 July 2007 Disk Used
ARTICLE IN PRESS
“bosa” “gazzer” “manu” “colat”
But… simple discrete hypothesis
testing? (Trueswell et al., 2013)
incorrect correct
Previous learning instance
“Kicey”
“Kicey” “Kicey”
Full cross-situational accounts (Yu & Smith, 2007) vs.
“propose but verify” (Trueswell et al., 2013)
Do learners store multiple hypotheses?
Yurovsky & Frank (2015), Cognition
Statistical representations in adults
Yurovsky & Frank (2015), Cognition
2 Referents 3 Referents 4 Referents 8 Referents
Switch
Same
Switch
Same
Switch
Same
Switch
Same
0.00
0.25
0.50
0.75
1.00
0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8
Intervening Words
Prop.ChoosingPreviousReferent
2 Referents 3 Referents 4 Referents 8 Referents
Switch
Same
Switch
Same
Switch
Same
Switch
Same
0.00
0.25
0.50
0.75
1.00
0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8
Intervening Words
Prop.ChoosingPreviousReferent
Intervening words
N=1196
Adults with social information
MacDonald, Yurovsky, & Frank (2017), Cog Psych
Exposure Trial
Switch Trial
N=400
Switch
Same
0.00
0.25
0.50
0.75
1.00
3.0 3.5 4.0 4.5 5.0 5.5
Age Group
ProportionCorrect
Word Learning by Trial Type and Age Group
N=120+
95% CIs
Yurovsky & Frank (in prep); cf. Woodard, Gleitman, & Trueswell (in press)
Either strategies shift in development or
else statistical learning operates under
significant memory constraints…
…second possibility consistent with stat
segmentation data. (Frank, Goldwater, Griffiths,
Tenenbaum, 2010, et seq.)
Developmental data
1. A proposal for social-
statistical word learning
2. Representations in
statistical word learning
3. A broader view of the
lexicon
4. From social statistical
learning to social inference
The Human Speechome Project
• Data:
– 3 years
– 200 TB data
– 90k hrs video
– 120k hrs audio
• Corpus
– 8.8M words of child-
available speech
– 2.3M utterances
– Speaker identity
automatically
detected
Roy, Frank et al. (2015), PNAS
Understanding statistical word learning in a social context
http://wordbirths.stanford.edu
Following Huttenlocher et al. (1991); Goodman, Dale, & Li (2008): predict age of
acquisition across words using external measurements (e.g., frequency
estimates from CHILDES corpus)
breakfast
(n=313)
8 12 16 20 22 4 9
T22 he, i, him, yeah, it
T4 chew, yum, crunch, chips, eat
T9 hey, baby, child name, beep, daddy
moon
(n=810)
8 12 16 20 1513 17
T15 dog, diddle, woof, dickory, dock
T13 star, twinkle, moon, dreams, merrily
T17 shoes, socks, tea, catch, dora
with
(n=17945)
8 12 16 20 18 1522
T18 fish, turtle, cat, hat, crab
T15 dog, diddle, woof, dickory, dock
T22 he, i, him, yeah, it
Roy, Frank et al. (2015), PNAS
Roy, Frank et al. (2015), PNAS
All (N=679)
-20
-10
0
10
20
Baseline Spatial Temporal Linguistic
Coefficient(DaysAoFP/SD)
# Phonemes
MLU
Frequency
Spatial distinctiveness
Temporal distinctiveness
Linguistic distinctiveness
A
Nouns Predicates Closed Class Words
-25
0
25
50
75
Baseline Spatial Temporal Linguistic Baseline Spatial Temporal Linguistic Baseline Spatial Temporal Linguistic
Distinctiveness Model
# P
ML
Fre
Sp
Lin
Te
stic
Context of use predicts information
across lexical classes – suggesting
consistency of mechanism
1. A proposal for social-
statistical word learning
2. Representations in statistical
word learning
3. A broader view of the lexicon
4. From social statistical
learning to social inference
Referential gaze No referential gaze
3years4years
label slide planning response label slide planning response
Phase
familiarity
familiar
mutual exclusivity
novel
Referential gaze
label slide planning response lab
0
1
2
0
1
2
Phase
NumberofLooks
familiarity
familiar
mutual exclusivity
novel novel
ME
familiar
Social information seeking
Hembacher & Frank (2017), Proc CogSci
Do children search differentially for social gaze
under uncertainty? (following Vaish, Demir, & Baldwin, 2011)
Communicative inference
model utters true
descriptions…
… but in fact speakers are
informative (pragmatic).
And the Gricean
presumption of
informativeness gives rise
to one-shot social
inferences.
“My friend has glasses!”
Goodman & Frank (2016), TiCS; Frank & Goodman (2012), Science
Ad-hoc pragmatic inference in children
Stiller, Goodman, & Frank (2015), LL&D
N=24 per group, 95% CIs
Ch
Label
0.00
0.25
0.50
0.75
1.00
2 3 4 2
Age (Years)
ProportionChoices
One-Feature
Two-feature
No-feature
Figure 4. Mean proportion of choices indicating the one-featur
Age (years)
Learning words pragmatically
This is a dinosaur with a dax!
Which of these has a dax?
(inference trial)
Frank & Goodman (2014), Cog Psych
Replication
0.00
0.25
0.50
0.75
3 4
Ag
ProportionInferenceConsistent
3 4 3 4
ge Group (Years)
Trial Type
Filler
Inference
N=24 per
group
Age (years)
Are children primarily statistical or social
word learners? YES
Social-statistical: statistical inferences over
social representations (under memory
limitations)
• (Adult) statistical representation sensitive to
amount of uncertainty
• Referential context important for inference of
early word meanings – beyond nouns
• Search for social information mediated by
uncertainty
• Later social inferences beyond pure association
Thanks!
Dan Yurovsky
Emily Hembacher
Kyle MacDonald
Brandon Roy
Noah Goodman
Josh Tenenbaum
Deb Roy
Alex Stiller
Anne Fernald
For code, data, and papers:
http://langcog.stanford.edu

More Related Content

PPT
L2 endstate and_dynamic_l2_interlanguage.edited
DOCX
Baby WordsmithFrom Associationist to Social SophisticateRo
DOCX
Baby WordsmithFrom Associationist to Social SophisticateRo.docx
PPT
The Role of Discourse Context in Developing Word Form Representations
PPTX
The Language Acquisition Device and innate language ability
PPTX
EventsLangCogPoster_4_4_2016
PPT
Teaching efl to children final
PPT
Student mh 2
L2 endstate and_dynamic_l2_interlanguage.edited
Baby WordsmithFrom Associationist to Social SophisticateRo
Baby WordsmithFrom Associationist to Social SophisticateRo.docx
The Role of Discourse Context in Developing Word Form Representations
The Language Acquisition Device and innate language ability
EventsLangCogPoster_4_4_2016
Teaching efl to children final
Student mh 2

Similar to Understanding statistical word learning in a social context (20)

PPTX
Word Hikes with Little Tikes
PDF
Issues in experimental design for the study of atypical language developme...
PDF
Issues in experimental design for the study of atypical language development
PPTX
SEMANTIC-DEVELOPMENT.pptx_SAYSON.pptx
PDF
From Syllables to Syntax: Investigating Staged Linguistic Development through...
PDF
EBI Equivalence Based Instruction Jaba 43 01 0019
PDF
Nyborg causes2
PPTX
How children learn language
PPTX
Project desing
PPT
Testing vocabulary
PPTX
How Languages are Learned by Patsy M. Lightbown and Nina Spada 4th Edt.Li
PPT
Theoretical Presentation
PPT
Increasing Comprehensible Input In Vocabulary Instruction
PPTX
human learning
PPTX
Unit 1 Describing learners.pptx
PPTX
Language development in children
PPTX
Language - Didactic by Sebastian Betancourt
PPTX
Words and meanings, The Original Word Game
PDF
Er 6 incidental vocal learning through er
Word Hikes with Little Tikes
Issues in experimental design for the study of atypical language developme...
Issues in experimental design for the study of atypical language development
SEMANTIC-DEVELOPMENT.pptx_SAYSON.pptx
From Syllables to Syntax: Investigating Staged Linguistic Development through...
EBI Equivalence Based Instruction Jaba 43 01 0019
Nyborg causes2
How children learn language
Project desing
Testing vocabulary
How Languages are Learned by Patsy M. Lightbown and Nina Spada 4th Edt.Li
Theoretical Presentation
Increasing Comprehensible Input In Vocabulary Instruction
human learning
Unit 1 Describing learners.pptx
Language development in children
Language - Didactic by Sebastian Betancourt
Words and meanings, The Original Word Game
Er 6 incidental vocal learning through er
Ad

Recently uploaded (20)

PPT
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
PPTX
Overview of calcium in human muscles.pptx
PDF
An interstellar mission to test astrophysical black holes
PDF
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
PPTX
perinatal infections 2-171220190027.pptx
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPT
6.1 High Risk New Born. Padetric health ppt
PPTX
Pharmacology of Autonomic nervous system
PDF
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
PPTX
Biomechanics of the Hip - Basic Science.pptx
PPT
veterinary parasitology ````````````.ppt
PPTX
The Minerals for Earth and Life Science SHS.pptx
PPTX
BODY FLUIDS AND CIRCULATION class 11 .pptx
PDF
Placing the Near-Earth Object Impact Probability in Context
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PDF
The scientific heritage No 166 (166) (2025)
PPTX
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
PPTX
C1 cut-Methane and it's Derivatives.pptx
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PPTX
Application of enzymes in medicine (2).pptx
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
Overview of calcium in human muscles.pptx
An interstellar mission to test astrophysical black holes
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
perinatal infections 2-171220190027.pptx
TOTAL hIP ARTHROPLASTY Presentation.pptx
6.1 High Risk New Born. Padetric health ppt
Pharmacology of Autonomic nervous system
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
Biomechanics of the Hip - Basic Science.pptx
veterinary parasitology ````````````.ppt
The Minerals for Earth and Life Science SHS.pptx
BODY FLUIDS AND CIRCULATION class 11 .pptx
Placing the Near-Earth Object Impact Probability in Context
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
The scientific heritage No 166 (166) (2025)
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
C1 cut-Methane and it's Derivatives.pptx
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
Application of enzymes in medicine (2).pptx
Ad

Understanding statistical word learning in a social context

  • 1. Understanding statistical word learning in a social context Michael C. Frank Stanford University
  • 3. “doggie” >doggie “word* learning” * With one exception, I mostly will not to talk about “run,” “blue,” “the,” or ”no”…
  • 4. 0 200 400 600 16 18 20 22 24 26 28 30 Age (months) SizeofProductiveVocabulary None All Data (n = 4797 CDI data at http://wordbank.stanford.edu Children’s productive vocabulary grows rapidly and growth accelerates across development.
  • 5. golabupadotitupiro bidakugolabupadoti golabupadotitupiro bidakugolabupadoti Saffran, Aslin, & Newport (1996) Statistical regularities in the child’s environment are powerful cues to the forms and meanings of words. Perhaps language learning is driven by these general mechanisms. Smith & Yu (2008) CORRECTED PROOF 130 2.3. Procedure 131 Infants sat (on their mother’s lap) 3.5 feet in front of screen with the mother’s 132 chair set at the center of the screen. Infants’ direction of eye gaze was recorded from 133 a camera centered at the base of the screen and pointed directly at the child’s eyes. 134 Parents were instructed to keep their own eyes shut through the entire procedure so 135 as to not to influence their infant’s behaviors. A camera directed on the parent 136 through out the procedure confirmed their adherence. 137 There were 30 training slides. Each presented two objects on the screen for 4 s; the 138 onset of the slide was followed 500 ms later by the two words – each said once with a 139 500 ms pause between. Across trials, the temporal order of the words and spatial 140 order of the objects were varied such that there was no relation between temporal 141 order of the words and the spatial position of the referents. Each correct word-object 142 pair occurred 10 times. The two words and two objects appearing together on a slide 143 (and creating the within trial ambiguities and possible spurious correlations) were 144 randomly determined such that each object and each word co-occurred with every 145 other word and every other object at least once across the 30 training trials. The first 146 four training trials each began with the centered presentation of a Sesame Street 147 character (3 s) to orient attention to the screen. After these first four trials, this atten- 148 tion grabbing slide was interspersed every 2–4 trials to maintain attention. The entire 149 training – an effort to teach six word-referent pairs – lasted less than 4 min (30 train- 150 ing slides and 19 interspersed Sesame Street character slides). 151 There were 12 test trials, each 8 seconds. This duration was chosen from pilot 152 studies to optimize the number of participants able to complete all 12 test compar- Fig. 2. The six stimulus shapes. L. Smith, C. Yu / Cognition xxx (2007) xxx–xxx 5 COGNIT 1702 No. of Pages 11 18 July 2007 Disk Used ARTICLE IN PRESS RRECTED PROOF 71 Fig. 1 illustrates how cross-trial statistics might work. The learner hears the 72 unknown words ‘‘bat’’ and ‘‘ball’’ in the context of seeing a BAT and BALL. With- 73 out other information, the learner cannot know whether the word form ‘‘ball’’ refers 74 to one or the other visual object. However, if subsequently, while viewing a scene 75 with the potential referents of a BALL and a DOG, the learner hears the words 76 ‘‘ball’’ and ‘‘dog’’ and if the learner can combine the co-occurrence frequencies from 77 the two streams of data across trials, the learner could correctly map ‘‘ball’’ to 78 BALL. This example represents the simplest case of cross-situational statistical 79 learning – two words, two objects, two adjacently informative trials. 80 Several formal simulations of word-referent learning suggest the plausibility of 81 cross-situational word learning in much more complex situations with many words, 82 many possible referents, highly ambiguous individual learning trials, and the statis- 83 tical resolution of the ambiguities only through the accumulation and evaluation of 84 information over many word-referent pairings and many trials (Siskind, 1996; Yu, 85 Ballard, & Aslin, 2005). Consider the more complex case in Table 1. On trial 1, a 86 learner could mistakenly link word A to referent b. On trial 4, the mistake could 87 be corrected, if the system registers that word A occurred on trial 4 without possible 88 referent b, if the cognitive system remembers the prior word-referent pairing, if it 89 registers both co-occurrences and non co-occurrences, and if it calculates the right 90 statistics. Can babies do this? Fig. 1. Associations among words and referents across two individually ambiguous scenes. If a young learner calculates co-occurrences frequencies across these two trials, s/he can find the proper mapping of ‘‘Ball’’ to BALL. L. Smith, C. Yu / Cognition xxx (2007) xxx–xxx 3 18 July 2007 Disk Used “Cross-situational learning”
  • 6. But even young children can judge other people’s goals and intentions… … and these skills are correlated with word learning (Bates, Bruner, Bloom, Clark, Baldwin, Tomasello). 14 of the significant correlations in this matrix occurred with joint eng ment in this small window of time between 11 and 13 months. Note also correlations between joint engagement at 10 months and word compreh sion at 12, 13, and 14 months approached significance. Together these co lations show that joint engagement in the middle months (11, 12, 13) is lated to word comprehension in the early, middle, and later months. Figure 21 shows the word-comprehension levels of the four jo engagement styles. As was done in the analysis of gestures, two groups w 175 - 3 150- bU 125- -E ~ $ C -Midd s;~2 100- --*-- Late 2 7 5 - --*-- Ncva 2k3 3 5 0 - z u 25 - Age in Months RGURE21.-Mean language comprehension of the early, middle, late, and neverjo engagement groups at each time point. Carpenter et al. (1998) High JA Mid JA No JA Low JA Observable outcome [t(14) = 2.71, P = 0.0 looking pattern suggests that when an inten veyed through speech, infants who saw th treat the Recipient taking the ring off the fu with (and possibly the opposite of) the Comm goal (of stacking a ring on the funnel). who heard coughing looked equally at t [F(2, 21) < 1]. To further explore infants’ interpretation we next compared looking times for the tw ditions within each outcome type. For the infants looked longer when the Communic coughed than when she uttered speech [t(14 r = 0.55]. In the Speech—but not the Cough Test - Speech or Cough TestFamiliarization Test - continued Failed Action Neutral Interaction Pretest “****” Vouloumanos et al. (2012) A B Baldwin (1991)
  • 7. How do we integrate these findings? 1. Stats first: Statistical learning joined with social cognition later (Hollich et al., 2000; Smith & Yu, 2008) 2. Social first: social mappings, no memory for statistical regularities (Medina et al., 2011; Trueswell et al., 2013) 3. Social-statistical: statistical learning operates over social representations (subject to memory limitations) (Frank, Goodman, & Tenenbaum, 2009) NB: hard question! “How it generally works” rather than “can they do it.” Evidence will be often be indirect.
  • 8. 1. A proposal for social- statistical word learning 2. Representations in statistical word learning 3. A broader view of the lexicon 4. From social statistical learning to social inference
  • 9. Look at the doggie! What a nice doggie And there’s a pig.... There’s a pig! “pig” 2 2 “dog” 2 2 Co-occurrence counts
  • 10. Frank, Tenenbaum, & Fernald (2013), LL&D learner 0.0 0.2 0.4 0.6 0.8 1.0 M eyes M hands M point Social cue FScore "gray" gray uncertainty about what’s being talked about in each utterance ? dog uncertainty about what words mean pig But cue-weighting is not straightforward… Social cue efficacy
  • 11. Look at the doggie! What a nice doggie And there’s a pig.... There’s a pig! Pig 2 2 Dog 2 2 Lexicon, without guess about intention Pig 2 Dog 2 Lexicon, given correct guesses about intention If you know the referent, you can learn words; If you know the words, you can infer the referent; With incomplete knowledge about both you can bootstrap… Frank, Tenenbaum, & Fernald (2013), LL&D
  • 12. Frank, Goodman, & Tenenbaum (2008), NIPS; (2009), Psych Science Model Precision Recall Association frequency 0.06 0.26 Conditional probability 0.06 0.32 Mutual information 0.06 0.47 Communicative inference 0.66 0.47 W OL lexicon objects words Pure cross- situational models W O IL lexicon objects words speaker’s intended referent Communicative inference model “Statistical learning”: probabilistic knowledge gleaned from interpretation in context
  • 13. 1. A proposal for social- statistical word learning 2. Representations in statistical word learning 3. A broader view of the lexicon 4. From social statistical learning to social inference
  • 14. Cross-situational learning in adults elated in any systematic way to the spatial location of the eferents. To form each trial, we randomly selected several (2, 3, or 4, epending on condition) word-referent pairs from the 18 word- eferent pairs for that condition; across trials in a condition, each ord and referent were presented six times. That is, over training ials, the learner experienced six repetitions of each word- eferent pair. However, given that multiple words and referents ere presented on each trial, the learner experienced spurious ssociations that might be expected to make learning from these mbiguous individual trials difficult. Specifically, on average, ach word co-occurred with 5.09 incorrect referents in the 2 Â 2 ondition, 8.78 incorrect referents in the 3 Â 3 condition, and 2.22 incorrect referents in the 4 Â 4 condition; these numbers re proportional to within-trial ambiguity in the three condi- ons. During training, the probability of the correct referent iven its name, p(a|A), was 1.0 in all conditions, whereas the verage probability of irrelevant but co-occurring referents was 205, .231, and .247 in the 2 Â 2, 3 Â 3, and 4 Â 4 conditions, espectively. Notice that despite the considerable differences n within-trial uncertainty across conditions, the strength of he spurious correlations varied only moderately across these onditions. Because the same number of word-referent pairs (18) was aught in each condition, and because we sought, across con- itions, to keep the number of exposures to each word-referent air constant, other presentation factors necessarily varied figure out across trials which word went with which picture. After training in each condition, subjects received a four- alternative forced-choice test of learning. On the test, they were presented with 1 word and 4 pictures and asked to indicate the picture named by that word. The target picture and the 3 foils were all drawn from the set of 18 training pictures. Fig. 1. Test results for the three learning conditions in Experiment 1. Error bars reflect standard errors. In the 2 Â 2 condition, each trial presented two words and two possible referents; in the 3 Â 3 condition, each trial presented three words and three possible referents; and in the 4 Â 4 condition, each trial presented four words and four possible referents. Statistical Word Learning Yu & Smith (2007) CORRECTED PROOFProcedure nfants sat (on their mother’s lap) 3.5 feet in front of screen with the mother’s r set at the center of the screen. Infants’ direction of eye gaze was recorded from mera centered at the base of the screen and pointed directly at the child’s eyes. ents were instructed to keep their own eyes shut through the entire procedure so o not to influence their infant’s behaviors. A camera directed on the parent ugh out the procedure confirmed their adherence. here were 30 training slides. Each presented two objects on the screen for 4 s; the et of the slide was followed 500 ms later by the two words – each said once with a ms pause between. Across trials, the temporal order of the words and spatial er of the objects were varied such that there was no relation between temporal er of the words and the spatial position of the referents. Each correct word-object occurred 10 times. The two words and two objects appearing together on a slide d creating the within trial ambiguities and possible spurious correlations) were domly determined such that each object and each word co-occurred with every r word and every other object at least once across the 30 training trials. The first training trials each began with the centered presentation of a Sesame Street acter (3 s) to orient attention to the screen. After these first four trials, this atten- grabbing slide was interspersed every 2–4 trials to maintain attention. The entire ning – an effort to teach six word-referent pairs – lasted less than 4 min (30 train- slides and 19 interspersed Sesame Street character slides). here were 12 test trials, each 8 seconds. This duration was chosen from pilot Fig. 2. The six stimulus shapes. L. Smith, C. Yu / Cognition xxx (2007) xxx–xxx 5 GNIT 1702 No. of Pages 11 July 2007 Disk Used ARTICLE IN PRESS “bosa” “gazzer” TED PROOF Procedure nfants sat (on their mother’s lap) 3.5 feet in front of screen with the mother’s r set at the center of the screen. Infants’ direction of eye gaze was recorded from mera centered at the base of the screen and pointed directly at the child’s eyes. ents were instructed to keep their own eyes shut through the entire procedure so o not to influence their infant’s behaviors. A camera directed on the parent ugh out the procedure confirmed their adherence. here were 30 training slides. Each presented two objects on the screen for 4 s; the Fig. 2. The six stimulus shapes. L. Smith, C. Yu / Cognition xxx (2007) xxx–xxx 5 GNIT 1702 No. of Pages 11 July 2007 Disk Used ARTICLE IN PRESS “bosa” “gazzer” “manu” OOF L. Smith, C. Yu / Cognition xxx (2007) xxx–xxx 5 GNIT 1702 No. of Pages 11 July 2007 Disk Used ARTICLE IN PRESS D PROOF 130 2.3. Procedure 131 Infants sat (on their mother’s lap) 3.5 feet in front of screen with the mother’s 132 chair set at the center of the screen. Infants’ direction of eye gaze was recorded from Fig. 2. The six stimulus shapes. L. Smith, C. Yu / Cognition xxx (2007) xxx–xxx 5 COGNIT 1702 No. of Pages 11 18 July 2007 Disk Used ARTICLE IN PRESS “bosa” “gazzer” “manu” “colat” But… simple discrete hypothesis testing? (Trueswell et al., 2013) incorrect correct Previous learning instance
  • 15. “Kicey” “Kicey” “Kicey” Full cross-situational accounts (Yu & Smith, 2007) vs. “propose but verify” (Trueswell et al., 2013) Do learners store multiple hypotheses? Yurovsky & Frank (2015), Cognition
  • 16. Statistical representations in adults Yurovsky & Frank (2015), Cognition 2 Referents 3 Referents 4 Referents 8 Referents Switch Same Switch Same Switch Same Switch Same 0.00 0.25 0.50 0.75 1.00 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 Intervening Words Prop.ChoosingPreviousReferent 2 Referents 3 Referents 4 Referents 8 Referents Switch Same Switch Same Switch Same Switch Same 0.00 0.25 0.50 0.75 1.00 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 Intervening Words Prop.ChoosingPreviousReferent Intervening words N=1196
  • 17. Adults with social information MacDonald, Yurovsky, & Frank (2017), Cog Psych Exposure Trial Switch Trial N=400
  • 18. Switch Same 0.00 0.25 0.50 0.75 1.00 3.0 3.5 4.0 4.5 5.0 5.5 Age Group ProportionCorrect Word Learning by Trial Type and Age Group N=120+ 95% CIs Yurovsky & Frank (in prep); cf. Woodard, Gleitman, & Trueswell (in press) Either strategies shift in development or else statistical learning operates under significant memory constraints… …second possibility consistent with stat segmentation data. (Frank, Goldwater, Griffiths, Tenenbaum, 2010, et seq.) Developmental data
  • 19. 1. A proposal for social- statistical word learning 2. Representations in statistical word learning 3. A broader view of the lexicon 4. From social statistical learning to social inference
  • 20. The Human Speechome Project • Data: – 3 years – 200 TB data – 90k hrs video – 120k hrs audio • Corpus – 8.8M words of child- available speech – 2.3M utterances – Speaker identity automatically detected Roy, Frank et al. (2015), PNAS
  • 22. http://wordbirths.stanford.edu Following Huttenlocher et al. (1991); Goodman, Dale, & Li (2008): predict age of acquisition across words using external measurements (e.g., frequency estimates from CHILDES corpus)
  • 23. breakfast (n=313) 8 12 16 20 22 4 9 T22 he, i, him, yeah, it T4 chew, yum, crunch, chips, eat T9 hey, baby, child name, beep, daddy moon (n=810) 8 12 16 20 1513 17 T15 dog, diddle, woof, dickory, dock T13 star, twinkle, moon, dreams, merrily T17 shoes, socks, tea, catch, dora with (n=17945) 8 12 16 20 18 1522 T18 fish, turtle, cat, hat, crab T15 dog, diddle, woof, dickory, dock T22 he, i, him, yeah, it Roy, Frank et al. (2015), PNAS
  • 24. Roy, Frank et al. (2015), PNAS All (N=679) -20 -10 0 10 20 Baseline Spatial Temporal Linguistic Coefficient(DaysAoFP/SD) # Phonemes MLU Frequency Spatial distinctiveness Temporal distinctiveness Linguistic distinctiveness A Nouns Predicates Closed Class Words -25 0 25 50 75 Baseline Spatial Temporal Linguistic Baseline Spatial Temporal Linguistic Baseline Spatial Temporal Linguistic Distinctiveness Model # P ML Fre Sp Lin Te stic Context of use predicts information across lexical classes – suggesting consistency of mechanism
  • 25. 1. A proposal for social- statistical word learning 2. Representations in statistical word learning 3. A broader view of the lexicon 4. From social statistical learning to social inference
  • 26. Referential gaze No referential gaze 3years4years label slide planning response label slide planning response Phase familiarity familiar mutual exclusivity novel Referential gaze label slide planning response lab 0 1 2 0 1 2 Phase NumberofLooks familiarity familiar mutual exclusivity novel novel ME familiar Social information seeking Hembacher & Frank (2017), Proc CogSci Do children search differentially for social gaze under uncertainty? (following Vaish, Demir, & Baldwin, 2011)
  • 27. Communicative inference model utters true descriptions… … but in fact speakers are informative (pragmatic). And the Gricean presumption of informativeness gives rise to one-shot social inferences. “My friend has glasses!” Goodman & Frank (2016), TiCS; Frank & Goodman (2012), Science
  • 28. Ad-hoc pragmatic inference in children Stiller, Goodman, & Frank (2015), LL&D N=24 per group, 95% CIs Ch Label 0.00 0.25 0.50 0.75 1.00 2 3 4 2 Age (Years) ProportionChoices One-Feature Two-feature No-feature Figure 4. Mean proportion of choices indicating the one-featur Age (years)
  • 29. Learning words pragmatically This is a dinosaur with a dax! Which of these has a dax? (inference trial) Frank & Goodman (2014), Cog Psych Replication 0.00 0.25 0.50 0.75 3 4 Ag ProportionInferenceConsistent 3 4 3 4 ge Group (Years) Trial Type Filler Inference N=24 per group Age (years)
  • 30. Are children primarily statistical or social word learners? YES Social-statistical: statistical inferences over social representations (under memory limitations) • (Adult) statistical representation sensitive to amount of uncertainty • Referential context important for inference of early word meanings – beyond nouns • Search for social information mediated by uncertainty • Later social inferences beyond pure association
  • 31. Thanks! Dan Yurovsky Emily Hembacher Kyle MacDonald Brandon Roy Noah Goodman Josh Tenenbaum Deb Roy Alex Stiller Anne Fernald For code, data, and papers: http://langcog.stanford.edu