Methods for Testing and Evaluating Survey Questionnaires 1st Edition Stanley Presser

Methods for Testing and Evaluating Survey
Questionnaires 1st Edition Stanley Presser
download
https://ebookgate.com/product/methods-for-testing-and-evaluating-
survey-questionnaires-1st-edition-stanley-presser/
Get Instant Ebook Downloads – Browse at https://ebookgate.com

Get Your Digital Files Instantly: PDF, ePub, MOBI and More
Quick Digital Downloads: PDF, ePub, MOBI and Other Formats
Two Hybrid Systems Methods and Protocols 1st Edition
Stanley Fields
https://ebookgate.com/product/two-hybrid-systems-methods-and-
protocols-1st-edition-stanley-fields/
The Complete ISRM Suggested Methods for Rock
Characterization Testing and Monitoring 1st Edition
Re■at Ulusay
https://ebookgate.com/product/the-complete-isrm-suggested-
methods-for-rock-characterization-testing-and-monitoring-1st-
edition-resat-ulusay/
Validity Testing in Child and Adolescent Assessment
Evaluating Exaggeration Feigning and Noncredible Effort
1st Edition Michael W. Kirkwood
https://ebookgate.com/product/validity-testing-in-child-and-
adolescent-assessment-evaluating-exaggeration-feigning-and-
noncredible-effort-1st-edition-michael-w-kirkwood/
Immunotoxicity Testing Methods and Protocols 1st
Edition Michael I. Luster
https://ebookgate.com/product/immunotoxicity-testing-methods-and-
protocols-1st-edition-michael-i-luster/

Teratogenicity Testing Methods and Protocols 1st
Edition L. David Wise (Auth.)
https://ebookgate.com/product/teratogenicity-testing-methods-and-
protocols-1st-edition-l-david-wise-auth/
Teratogenicity Testing Methods and Protocols 2nd
Edition Luís Félix
https://ebookgate.com/product/teratogenicity-testing-methods-and-
protocols-2nd-edition-luis-felix/
Standards for Engineering Design and Manufacturing 1st
Edition Richard P. Stanley
https://ebookgate.com/product/standards-for-engineering-design-
and-manufacturing-1st-edition-richard-p-stanley/
Handbook for Evaluating Infrastructure Regulatory
Systems 1st Edition Ashley C. Brown
https://ebookgate.com/product/handbook-for-evaluating-
infrastructure-regulatory-systems-1st-edition-ashley-c-brown/
Evaluating Research for Evidence Based Nursing Practice
1st Edition Jacqueline Fawcett
https://ebookgate.com/product/evaluating-research-for-evidence-
based-nursing-practice-1st-edition-jacqueline-fawcett/

Methods for Testing and
Evaluating Survey Questionnaires

WILEY SERIES IN SURVEY METHODOLOGY
Established in Part by WALTER A. SHEWHART AND SAMUEL S. WILKS
Editors: Robert M. Groves, Graham Kalton, J. N. K. Rao, Norbert Schwarz,
Christopher Skinner
The Wiley Series in Survey Methodology covers topics of current research and practical
interests in survey methodology and sampling. While the emphasis is on application, theo-
retical discussion is encouraged when it supports a broader understanding of the subject
matter.
The authors are leading academics and researchers in survey methodology and sampling.
The readership includes professionals in, and students of, the fields of applied statistics, bio-
statistics, public policy, and government and corporate enterprises.
BIEMER, GROVES, LYBERG, MATHIOWETZ, and SUDMAN · Measurement
Errors in Surveys
BIEMER and LYBERG · Introduction to Survey Quality
COCHRAN · Sampling Techniques, Third Edition
COUPER, BAKER, BETHLEHEM, CLARK, MARTIN, NICHOLLS, and O’REILLY
(editors) · Computer Assisted Survey Information Collection
COX, BINDER, CHINNAPPA, CHRISTIANSON, COLLEDGE, and KOTT (editors) ·
Business Survey Methods
*DEMING · Sample Design in Business Research
DILLMAN · Mail and Internet Surveys: The Tailored Design Method
GROVES and COUPER · Nonresponse in Household Interview Surveys
GROVES · Survey Errors and Survey Costs
GROVES, DILLMAN, ELTINGE, and LITTLE · Survey Nonresponse
GROVES, BIEMER, LYBERG, MASSEY, NICHOLLS, and WAKSBERG ·
Telephone Survey Methodology
GROVES, FOWLER, COUPER, LEPKOWSKI, SINGER, and TOURANGEAU ·
Survey Methodology
*HANSEN, HURWITZ, and MADOW · Sample Survey Methods and Theory,
Volume 1: Methods and Applications
*HANSEN, HURWITZ, and MADOW · Sample Survey Methods and Theory,
Volume II: Theory
HARKNESS, VAN DE VIJVER, and MOHLER · Cross-Cultural Survey Methods
HEERINGA and KALTON · Leslie Kish Selected Papers
KISH · Statistical Design for Research
*KISH · Survey Sampling
KORN and GRAUBARD · Analysis of Health Surveys
LESSLER and KALSBEEK · Nonsampling Error in Surveys
LEVY and LEMESHOW · Sampling of Populations: Methods and Applications,
Third Edition
LYBERG, BIEMER, COLLINS, de LEEUW, DIPPO, SCHWARZ, TREWIN (editors) ·
Survey Measurement and Process Quality
MAYNARD, HOUTKOOP-STEENSTRA, SCHAEFFER, VAN DER ZOUWEN ·
Standardization and Tacit Knowledge: Interaction and Practice in the Survey Interview
PRESSER, ROTHGEB, COUPER, LESSLER, MARTIN, MARTIN, and SINGER
(editors) · Methods for Testing and Evaluating Survey Questionnaires
RAO · Small Area Estimation
SIRKEN, HERRMANN, SCHECHTER, SCHWARZ, TANUR, and TOURANGEAU
(editors) · Cognition and Survey Research
VALLIANT, DORFMAN, and ROYALL · Finite Population Sampling and Inference: A
Prediction Approach
*Now available in a lower priced paperback edition in the Wiley Classics Library.

Evaluating Survey Questionnaires
Edited by
STANLEY PRESSER
University of Maryland, College Park, MD
JENNIFER M. ROTHGEB
U.S. Bureau of the Census, Washington, DC
MICK P. COUPER
University of Michigan, Ann Arbor, MI
JUDITH T. LESSLER
Research Triangle Institute, Research Triangle Park, NC
ELIZABETH MARTIN
U.S. Bureau of the Census, Washington, DC
JEAN MARTIN
Office for National Statistics, London, UK
ELEANOR SINGER
University of Michigan, Ann Arbor, MI
A JOHN WILEY & SONS, INC., PUBLICATION

Copyright  2004 by John Wiley & Sons, Inc. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey.
Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means, electronic, mechanical, photocopying, recording, scanning, or
otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright
Act, without either the prior written permission of the Publisher, or authorization through
payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222
Rosewood Drive, Danvers, MA 01923, 978-750-8400, fax 978-646-8600, or on the web at
www.copyright.com. Requests to the Publisher for permission should be addressed to the
Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030,
(201) 748-6011, fax (201) 748-6008.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their
best efforts in preparing this book, they make no representations or warranties with respect to
the accuracy or completeness of the contents of this book and specifically disclaim any
implied warranties of merchantability or fitness for a particular purpose. No warranty may be
created or extended by sales representatives or written sales materials. The advice and
strategies contained herein may not be suitable for your situation. You should consult with a
professional where appropriate. Neither the publisher nor author shall be liable for any loss
of profit or any other commercial damages, including but not limited to special, incidental,
consequential, or other damages.
For general information on our other products and services please contact our Customer Care
Department within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993 or
fax 317-572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears
in print, however, may not be available in electronic format.
Library of Congress Cataloging-in-Publication Data:
Methods for testing and evaluating survey questionnaires / Stanley Presser . . . [et al.].
p. cm.—(Wiley series in survey methodology)
Includes bibliographical references and index.
ISBN 0-471-45841-4 (pbk. : alk. paper)
1. Social surveys—Methodology. 2. Questionnaires—Methodology. 3. Social
sciences—Research—Methodology. I. Presser, Stanley II. Series.
HM538.M48 2004
300
.72
3—dc22
2003063992
Printed in the United States of America.
10 9 8 7 6 5 4 3 2 1

To the memory of
Charles Cannell and Seymour Sudman,
two pretesting pioneers whose contributions shaped the field
of survey research

Contents
Contributors xi
Preface xiii
1 Methods for Testing and Evaluating Survey Questions 1
Stanley Presser, Mick P. Couper, Judith T. Lessler, Elizabeth Martin,
Jean Martin, Jennifer M. Rothgeb,
and Eleanor Singer
PART I COGNITIVE INTERVIEWS
2 Cognitive Interviewing Revisited: A Useful Technique,
in Theory? 23
Gordon B. Willis
3 The Dynamics of Cognitive Interviewing 45
Paul Beatty
4 Data Quality in Cognitive Interviews: The Case of Verbal
Reports 67
Frederick G. Conrad and Johnny Blair
5 Do Different Cognitive Interview Techniques Produce Different
Results? 89
Theresa J. DeMaio and Ashley Landreth
vii

viii CONTENTS
PART II SUPPLEMENTS TO CONVENTIONAL PRETESTS
6 Evaluating Survey Questions by Analyzing Patterns of Behavior
Codes and Question–Answer Sequences: A Diagnostic Approach 109
Johannes van der Zouwen and Johannes H. Smit
7 Response Latency and (Para)Linguistic Expressions as Indicators
of Response Error 131
Stasja Draisma and Wil Dijkstra
8 Vignettes and Respondent Debriefing for Questionnaire Design
and Evaluation 149
Elizabeth Martin
PART III EXPERIMENTS
9 The Case for More Split-Sample Experiments in Developing
Survey Instruments 173
Floyd Jackson Fowler, Jr.
10 Using Field Experiments to Improve Instrument Design:
The SIPP Methods Panel Project 189
Jeffrey Moore, Joanne Pascale, Pat Doyle, Anna Chan, and
Julia Klein Griffiths
11 Experimental Design Considerations for Testing and Evaluating
Questionnaires 209
Roger Tourangeau
PART IV STATISTICAL MODELING
12 Modeling Measurement Error to Identify Flawed Questions 225
Paul Biemer
13 Item Response Theory Modeling for Questionnaire Evaluation 247
Bryce B. Reeve and Louise C. Mâsse
14 Development and Improvement of Questionnaires Using
Predictions of Reliability and Validity 275
Willem E. Saris, William van der Veld, and Irmtraud Gallhofer

CONTENTS ix
PART V MODE OF ADMINISTRATION
15 Testing Paper Self-Administered Questionnaires: Cognitive
Interview and Field Test Comparisons 299
Don A. Dillman and Cleo D. Redline
16 Methods for Testing and Evaluating Computer-Assisted
Questionnaires 319
John Tarnai and Danna L. Moore
17 Usability Testing to Evaluate Computer-Assisted Instruments 337
Sue Ellen Hansen and Mick P. Couper
18 Development and Testing of Web Questionnaires 361
Reginald P. Baker, Scott Crawford, and Janice Swinehart
PART VI SPECIAL POPULATIONS
19 Evolution and Adaptation of Questionnaire Development,
Evaluation, and Testing Methods for Establishment Surveys 385
Diane K. Willimack, Lars Lyberg, Jean Martin, Lilli Japec,
and Patricia Whitridge
20 Pretesting Questionnaires for Children and Adolescents 409
Edith de Leeuw, Natacha Borgers, and Astrid Smits
21 Developing and Evaluating Cross-National Survey Instruments 431
Tom W. Smith
22 Survey Questionnaire Translation and Assessment 453
Janet Harkness, Beth-Ellen Pennell, and Alisú Schoua-Glusberg
PART VII MULTIMETHOD APPLICATIONS
23 A Multiple-Method Approach to Improving the Clarity of
Closely Related Concepts: Distinguishing Legal and Physical
Custody of Children 475
Nora Cate Schaeffer and Jennifer Dykema
24 Multiple Methods for Developing and Evaluating a
Stated-Choice Questionnaire to Value Wetlands 503
Michael D. Kaplowitz, Frank Lupi, and John P. Hoehn

x CONTENTS
25 Does Pretesting Make a Difference? An Experimental Test 525
Barbara Forsyth, Jennifer M. Rothgeb, and Gordon B. Willis
References 547
Index 603

Contributors
Reginald P. Baker, Market Strategies, Inc., Livonia, MI
Paul Beatty, National Center for Health Statistics, Hyattsville, MD
Paul Biemer, Research Triangle Institute, Research Triangle Park, NC
Johnny Blair, Abt Associates, Washington, DC
Natacha Borgers, Utrecht University, The Netherlands
Anna Chan, U.S. Bureau of the Census, Washington, DC
Frederick G. Conrad, University of Michigan, Ann Arbor, MI
Mick P. Couper, University of Michigan, Ann Arbor, MI
Scott Crawford, Market Strategies, Inc., Livonia, MI
Edith de Leeuw, Utrecht University, The Netherlands
Theresa J. DeMaio, U.S. Bureau of the Census, Washington, DC
Wil Dijkstra, Free University, Amsterdam, The Netherlands
Don A. Dillman, Washington State University, Pullman, WA
Pat Doyle, U.S. Bureau of the Census, Washington, DC
Stasja Draisma, Free University, Amsterdam, The Netherlands
Jennifer Dykema, University of Wisconsin, Madison, WI
Barbara Forsyth, Westat, Rockville, MD
Floyd Jackson Fowler, Jr., University of Massachusetts, Boston, MA
Irmtraud Gallhofer, University of Amsterdam, The Netherlands
Julia Klein Griffiths, U.S. Bureau of the Census, Washington, DC
Sue Ellen Hansen, University of Michigan, Ann Arbor, MI
Janet Harkness, Zentrum fur Umfragen Methoden und Analysen,
Mannheim, Germany
xi

xii CONTRIBUTORS
John P. Hoehn, Michigan State University, East Lansing, MI
Lilli Japec, Statistics Sweden, Stockholm, Sweden
Michael D. Kaplowitz, Michigan State University, East Lansing, MI
Ashley Landreth, U.S. Bureau of the Census, Washington, DC
Judith T. Lessler, Research Triangle Institute, Research Triangle Park, NC
Frank Lupi, Michigan State University, East Lansing, MI
Lars Lyberg, Statistics Sweden, Stockholm, Sweden
Elizabeth Martin, U.S. Bureau of the Census, Washington, DC
Jean Martin, Office for National Statistics, London, United Kingdom
Louise C. Mâsse, National Cancer Institute, Bethesda, MD
Danna L. Moore, Washington State University, Pullman, WA
Jeffrey Moore, U.S. Bureau of the Census, Washington, DC
Joanne Pascale, U.S. Bureau of the Census, Washington, DC
Beth-Ellen Pennell, University of Michigan, Ann Arbor, MI
Stanley Presser, University of Maryland, College Park, MD
Cleo D. Redline, National Science Foundation, Arlington, VA
Bryce B. Reeve, National Cancer Institute, Bethesda, MD
Jennifer M. Rothgeb, U.S. Bureau of the Census, Washington, DC
Willem E. Saris, University of Amsterdam, The Netherlands
Nora Cate Schaeffer, University of Wisconsin, Madison, WI
Alisú Schoua-Glusberg, Research Support Services, Evanston, IL
Eleanor Singer, University of Michigan, Ann Arbor, MI
Johannes H. Smit, Free University, Amsterdam, The Netherlands
Tom W. Smith, National Opinion Research Center, Chicago, IL
Astrid Smits, Statistics Netherlands, Heerlen, The Netherlands
Janice Swinehart, Market Strategies, Inc., Livonia, MI
John Tarnai, Washington State University, Pullman, WA
Roger Tourangeau, University of Michigan, Ann Arbor, MI
Diane K. Willimack, U.S. Bureau of the Census, Washington, DC
William van der Veld, University of Amsterdam, The Netherlands
Johannes van der Zouwen, Free University, Amsterdam, The Netherlands
Patricia Whitridge, Statistics Canada, Ottawa, Ontario, Canada
Gordon B. Willis, National Cancer Institute, Bethesda, MD

Preface
During the past 20 years, methods for testing and evaluating survey questionnaires
have changed dramatically. New methods have been developed and are being
applied and refined, and old methods have been adapted from other uses. Some
of these changes were due to the application of theory and methods from cognitive
science and others to an increasing appreciation of the benefits offered by more
rigorous testing. Research has begun to evaluate the strengths and weaknesses
of the various testing and evaluation methods and to examine the reliability
and validity of the methods’ results. Although these developments have been the
focus of many conference sessions and the subject of several book chapters, until
the 2002 International Conference on Questionnaire Development, Evaluation and
Testing Methods, and the publication of this monograph, there was no conference
or book dedicated exclusively to question testing and evaluation.
Jennifer Rothgeb initially proposed the conference at the spring 1999 Ques-
tionnaire Evaluation Standards International Work Group meeting in London. The
Work Group members responded enthusiastically and encouraged the submission
of a formal proposal to the organizations that had sponsored prior international
conferences on survey methodology. One member, Seymour Sudman, provided
invaluable help in turning the idea into a reality and agreed to join the organizing
committee, on which he served until his death in May 2000. Shortly after the
London meeting, Rothgeb enlisted Stanley Presser for the organizing committee,
as they flew home from that year’s annual meetings of the American Association
for Public Opinion Research (and, later, persuaded him to chair the monograph
committee). The members of the final organizing committee, chaired by Rothgeb,
were Mick P. Couper, Judith T. Lessler, Elizabeth Martin, Jean Martin, Stanley
Presser, Eleanor Singer, and Gordon B. Willis.
The conference was sponsored by four organizations: the American Statistical
Association (Survey Research Methods Section), the American Association for
Public Opinion Research, the Council of American Survey Research Organiza-
tions, and the International Association of Survey Statisticians. These organiza-
tions provided funds to support the development of both the conference and the
monograph. Additional financial support was provided by:
xiii

xiv PREFACE
Abt Associates
Arbitron Company
Australian Bureau of Statistics
Iowa State University
Mathematica Policy Research
National Opinion Research Center
National Science Foundation
Nielsen Media Research
Office for National Statistics (United Kingdom)
Research Triangle Institute
Schulman, Ronca Bucuvalas, Inc.
Statistics Sweden
University of Michigan
U.S. Bureau of Justice Statistics
U.S. Bureau of Labor Statistics
U.S. Bureau of the Census
U.S. Bureau of Transportation Statistics
U.S. Energy Information Administration
U.S. National Agricultural Statistics Service
U.S. National Center for Health Statistics
Washington State University
Westat, Inc.
Without the support of these organizations, neither the conference nor the mono-
graph would have been possible.
In 2000, the monograph committee, composed of the editors of this volume,
issued a call for abstracts. Fifty-three were received. Authors of 23 of the abstracts
were asked to provide detailed chapter outlines that met specified goals. After
receiving feedback on the outlines, authors were then asked to submit first drafts.
Second drafts, taking into account the editors’ comments on the initial drafts,
were due shortly before the conference in November 2002. Final revisions were
discussed with authors at the conference, and additional editorial work took place
after the conference.
A contributed papers subcommittee, chaired by Gordon Willis and including
Luigi Fabbris, Eleanor Gerber, Karen Goldenberg, Jaki McCarthy, and Johannes
van der Zouwen, issued a call for submissions in 2001. One hundred five were
received and 66 chosen. Two of the contributed papers later became mono-
graph chapters.
The International Conference on Questionnaire Development, Evaluation and
Testing Methods—dedicated to the memory of Seymour Sudman—was held in
Charleston, South Carolina, November 14–17, 2002. There were 338 attendees,
with more than one-fifth from outside the United States, representing 23 coun-
tries on six continents. The Survey Research Methods Section of the American
Statistical Association funded 12 conference fellows from South Africa, Kenya,
the Philippines, Slovenia, Italy, and Korea, and a National Science Foundation
grant funded 10 conference fellows, most of whom were U.S. graduate students.

PREFACE xv
Over half of the conference participants attended at least one of the four
short courses that were offered: Methods for Questionnaire Appraisal and Expert
Review by Barbara Forsyth and Gordon Willis; Cognitive Interviewing by Eleanor
Gerber; Question Testing for Establishment Surveys by Kristin Stettler and Fran
Featherston; and Behavior Coding: Tool for Questionnaire Evaluation by Nancy
Mathiowetz. Norman Bradburn gave the keynote address, “The Future of Ques-
tionnaire Research,” which was organized around three themes: the importance
of exploiting technological advances, the increasing challenges posed by mul-
ticultural, multilanguage populations, and the relevance of recent research in
sociolinguistics. The main conference program included 32 sessions with 76
papers and 15 poster presentations.
Conference planning and on-site activities were assisted by Linda Minor,
of the American Statistical Association (ASA), and Carol McDaniel, Shelley
Moody, and Safiya Hamid, of the U.S. Bureau of the Census. Adam Kelley
and Pamela Ricks, of the Joint Program in Survey Methodology, developed and
maintained the conference Web site, and Robert Groves, Brenda Cox, Daniel
Kasprzyk and Lars Lyberg, successive chairs of the Survey Research Methods
Section of the ASA, helped to promote the conference. We thank all these people
for their support.
The goal of this monograph is a state-of-the-field review of question evaluation
and testing methods. The publication marks a waypoint rather than an ending.
Although the chapters show great strides have been made in the development of
methods for improving survey instruments, much more work needs to be done.
Our aim is for the volume to serve both as a record of the many accomplishments
in this area, and as a pointer to the many challenges that remain.
We hope the book will be valuable to students training to become the next gen-
eration of survey professionals, to survey researchers seeking guidance on current
best practices in questionnaire evaluation and testing, and to survey methodolo-
gists designing research to advance the field and render the current chapters out
of date.
After an overview in Chapter 1 of both the field and of the chapters that
follow, the volume is divided into seven parts
I. Cognitive Interviews: Chapters 2 to 5
II. Supplements to Conventional Pretests: Chapters 6 to 8
III. Experiments: Chapters 9 to 11
IV. Statistical Modeling: Chapters 12 to 14
V. Mode of Administration: Chapters 15 to 18
VI. Special Populations: Chapters 19 to 22
VII. Multimethod Applications: Chapters 23 to 25
Each of the coeditors served as a primary editor for several chapters: Rothgeb
for 3 to 5; Singer for 6 to 8; Couper for 9, 10, 15, 16, and 18; Lessler for 11,
12, 14, and 17; E. Martin for 2, 13, and 19 to 22; and J. Martin for 23 to 25. In

xvi PREFACE
addition, each coeditor served as a secondary editor for several other chapters. We
are grateful to the chapter authors for their patience during the lengthy process
of review and revision, and for the diligence with which they pursued the task.
We are also indebted to Rupa Jethwa, of the Joint Program in Survey Method-
ology (JPSM), for indefatigable assistance in creating a final manuscript from
materials provided by dozens of different authors, and to Robin Gentry, also of
JPSM, for expert help in checking references and preparing the index.
Finally, for supporting our work during the more than four years it took to
produce the conference and book, we thank our employing organizations: the
University of Maryland, U.S. Bureau of the Census, University of Michigan,
Research Triangle Institute, and U.K. Office for National Statistics.
August 2003
STANLEY PRESSER
JENNIFER M. ROTHGEB
MICK P. COUPER
JUDITH T. LESSLER
ELIZABETH MARTIN
JEAN MARTIN
ELEANOR SINGER

C H A P T E R 1
Evaluating Survey Questions
Stanley Presser
University of Maryland
Mick P. Couper
Judith T. Lessler
Research Triangle Institute
Elizabeth Martin
U.S. Bureau of the Census
Jean Martin
Office for National Statistics, United Kingdom
Jennifer M. Rothgeb
U. S. Bureau of the Census
Eleanor Singer
1.1 INTRODUCTION
An examination of survey pretesting reveals a paradox. On the one hand, pretest-
ing is the only way to evaluate in advance whether a questionnaire causes prob-
lems for interviewers or respondents. Consequently, both elementary textbooks
Methods for Testing and Evaluating Survey Questionnaires, Edited by Stanley Presser,
Jennifer M. Rothgeb, Mick P. Couper, Judith T. Lessler, Elizabeth Martin, Jean Martin,
and Eleanor Singer
ISBN 0-471-45841-4 Copyright  2004 John Wiley Sons, Inc.
1

2 METHODS FOR TESTING AND EVALUATING SURVEY QUESTIONS
and experienced researchers declare pretesting indispensable. On the other hand,
most textbooks offer minimal, if any, guidance about pretesting methods, and
published survey reports usually provide no information about whether question-
naires were pretested and, if so, how, and with what results. Moreover, until
recently, there was relatively little methodological research on pretesting. Thus,
pretesting’s universally acknowledged importance has been honored more in the
breach than in the practice, and not a great deal is known about many aspects
of pretesting, including the extent to which pretests serve their intended purpose
and lead to improved questionnaires.
Pretesting dates either to the founding of the modern sample survey in the
mid-1930s or to shortly thereafter. The earliest references in scholarly journals
are from 1940, by which time pretests apparently were well established. In that
year, Katz reported: “The American Institute of Public Opinion [i.e., Gallup]
and Fortune [i.e., Roper] pretest their questions to avoid phrasings which will
be unintelligible to the public and to avoid issues unknown to the man on the
street” (1940, p. 279).
Although the absence of documentation means we cannot be certain, our
impression is that for much of survey research’s history, there has been one con-
ventional form of pretest. Conventional pretesting is essentially a dress rehearsal
in which interviewers receive training like that for the main survey and adminis-
ter a questionnaire as they would during a survey proper. After each interviewer
completes a handful of interviews, response distributions (generally univariate,
occasionally bivariate or multivariate) may be tallied, and there is a debriefing in
which the interviewers relate their experiences with the questionnaire and offer
their views about the questionnaire’s problems.
Survey researchers have shown remarkable confidence in this approach.
According to one leading expert: “It usually takes no more than 12–25 cases to
reveal the major difficulties and weaknesses in a pretest questionnaire” (Sheatsley,
1983, p. 226), a judgment similar to that of another prominent methodologist, who
maintained that “20–50 cases is usually sufficient to discover the major flaws in
a questionnaire” (Sudman, 1983, p. 181).
This faith in conventional pretesting was probably based on the common
experience that a small number of conventional interviews often reveals numer-
ous problems, such as questions that contain unwarranted suppositions, awkward
wordings, or missing response categories. But there is no scientific evidence jus-
tifying the confidence that this type of pretesting identifies the major problems
in a questionnaire.
Conventional pretests are based on the assumption that questionnaire problems
will be signaled either by the answers that the questions elicit (e.g., don’t knows
or refusals), which will show up in response tallies, or by some other visible
consequence of asking the questions (e.g., hesitation or discomfort in respond-
ing), which interviewers can describe during debriefing. However, as Cannell and
Kahn (1953, p. 353) noted: “There are no exact tests for these characteristics.”
They go on to say that “the help of experienced interviewers is most useful at this

INTRODUCTION 3
point in obtaining subjective evaluations of the questionnaire.” Similarly, Moser
and Kalton (1971, p. 50) judged that “almost the most useful evidence of all
on the adequacy of a questionnaire is the individual fieldworker’s [i.e., inter-
viewer’s] report on how the interviews went, what difficulties were encountered,
what alterations should be made, and so forth.” This emphasis on interviewer
perceptions is nicely illustrated in Sudman and Bradburn’s (1982, p. 49) advice
for detecting unexpected word meanings: “A careful pilot test conducted by sen-
sitive interviewers is the most direct way of discovering these problem words”
(emphasis added).
Yet even if interviewers were trained extensively in recognizing problems
with questions (as compared with receiving no special training at all, which is
typical), conventional pretesting would still be ill suited to uncovering many
questionnaire problems. This is because certain kinds of problems will not be
apparent from observing respondent behavior, and the respondents themselves
may be unaware of the problems. For instance, respondents can misunderstand a
closed question’s intent without providing any indication of having done so. And
because conventional pretests are almost always “undeclared” to the respondent,
as opposed to “participating” (in which respondents are informed of the pretest’s
purpose; see Converse and Presser, 1986), respondents are usually not asked
directly about their interpretations or other problems the questions may cause.
As a result, undeclared conventional pretesting seems better designed to identify
problems the questionnaire poses for interviewers, who know the purpose of the
testing, than for respondents, who do not.
Furthermore, when conventional pretest interviewers do describe respondent
problems, there are no rules for assessing their descriptions or for determining
which problems that are identified ought to be addressed. Researchers typically
rely on intuition and experience in judging the seriousness of problems and
deciding how to revise questions that are thought to have flaws.
In recent decades, a growing awareness of conventional pretesting’s draw-
backs has led to two interrelated changes. First, there has been a subtle shift
in the goals of testing, from an exclusive focus on identifying and fixing overt
problems experienced by interviewers and respondents to a broader concern for
improving data quality so that measurements meet a survey’s objectives. Second,
new testing methods have been developed or adapted from other uses. These
include cognitive interviews (the subject of Part I of this volume), behavior cod-
ing, response latency, vignette analysis, and formal respondent debriefings (all
of which are treated in Part II), experiments (covered in Part III), and statistical
modeling (Part IV). In addition, new modes of administration pose special chal-
lenges for pretesting (the focus of Part V), as do surveys of special populations,
such as children, establishments, and those requiring questionnaires in more than
one language (all of which are dealt with in Part VI). Finally, the development of
new pretesting methods raises issues of how they might best be used in combina-
tion, as well as whether they in fact lead to improvements in survey measurement
(the topics of Part VII).

1.2 COGNITIVE INTERVIEWS
Ordinary interviews focus on producing codable responses to the questions. Cog-
nitive interviews, by contrast, focus on providing a view of the processes elicited
by the questions. Concurrent or retrospective think-alouds and/or probes are used
to produce reports of the thoughts that respondents have either as they answer
the survey questions or immediately after. The objective is to reveal the thought
processes involved in interpreting a question and arriving at an answer. These
thoughts are then analyzed to diagnose problems with the question.
Although he is not commonly associated with cognitive interviewing, William
Belson (1981) pioneered a version of this approach. In the mid-1960s, Belson
designed “intensive” interviews to explore seven questions that respondents had
been asked the preceding day during a regular interview administered by a sep-
arate interviewer. Respondents were first reminded of the exact question and the
answer they had given to it. The interviewer then inquired: “When you were
asked that question yesterday, exactly what did you think the question meant?”
After nondirectively probing to clarify what the question meant to the respondent,
interviewers asked, “Now tell me exactly how you worked out your answer from
that question. Think it out for me just as you did yesterday—only this time say
it aloud for me.” Then, after nondirectively probing to illuminate how the answer
was worked out, interviewers posed scripted probes about various aspects of the
question. These probes differed across the seven questions and were devised to
test hypotheses about problems particular to each of the questions. Finally, after
listening to the focal question once more, respondents were requested to say how
they would now answer it. If their answer differed from the one they had given the
preceding day, they were asked to explain why. Six interviewers, who received
two weeks of training, conducted 265 audiotaped, intensive interviews with a
cross-section sample of residents of London, England. Four analysts listened to
the tapes and coded the incidence of various problems.
These intensive interviews differed in a critical way from today’s cognitive
interview, which integrates the original and follow-up interviews in a single
administration with one interviewer. Belson assumed that respondents could accu-
rately reconstruct their thoughts from an interview conducted the previous day,
which is inconsistent with what we now know about the validity of self-reported
cognitive processes (see Chapter 2). However, in many respects, Belson moved
considerably beyond earlier work, such as Cantril and Fried (1944), which used
just one or two scripted probes to assess respondent interpretations of survey
questions. Thus, it is ironic that his approach had little impact on pretesting
practices, an outcome possibly due to its being so labor intensive.
The pivotal development leading to a role for cognitive interviews in pretesting
did not come until two decades later with the Cognitive Aspects of Survey
Methodology (CASM) conference (Jabine et al., 1984). Particularly influential
was Loftus’s (1984) postconference analysis of how respondents answered survey
questions about past events, in which she drew on the think-aloud technique
used by Herbert Simon and his colleagues to study problem solving (Ericsson

COGNITIVE INTERVIEWS 5
and Simon, 1980). Subsequently, a grant from Murray Aborn’s program at the
National Science Foundation to Monroe Sirken supported both research on the
technique’s utility for understanding responses to survey questions (Lessler et al.,
1989) and the creation at the National Center for Health Statistics (NCHS) in
1985 of the first “cognitive laboratory,” where the technique could routinely be
drawn on to pretest questionnaires (e.g., Royston and Bercini, 1987).
Similar laboratories were soon established by other U.S. statistical agencies
and survey organizations.1
The labs’ principal, but not exclusive activity involved
cognitive interviewing to pretest questionnaires. Facilitated by special exemptions
from Office of Management and Budget survey clearance requirements, pretesting
for U.S. government surveys increased dramatically through the 1990s (Martin
et al., 1999). At the same time, the labs took tentative steps toward standardizing
and codifying their practices in training manuals (e.g., Willis, 1994) or protocols
for pretesting (e.g., DeMaio et al., 1993).
Although there is now general agreement about the value of cognitive inter-
viewing, no consensus has emerged about best practices, such as whether (or
when) to use think-alouds versus probes, whether to employ concurrent or retro-
spective reporting, and how to analyze and evaluate results. In part, this is due to
the paucity of methodological research examining these issues, but it is also due
to lack of attention to the theoretical foundation for applying cognitive interviews
to survey pretesting.
In Chapter 2, Gordon Willis addresses this theoretical issue, and in the pro-
cess contributes to the resolution of key methodological issues. Willis reviews
the theoretical underpinnings of Ericsson and Simon’s original application of
think-aloud interviews to problem-solving tasks and considers the theoretical
justifications for applying cognitive interviewing to survey tasks. Ericsson and
Simon concluded that verbal reports can be veridical if they involve information
a person has available in short-term memory, and the verbalization itself does not
fundamentally alter thought processes (e.g., does not involve further explanation).
Willis concludes that some survey tasks (for instance, nontrivial forms of infor-
mation retrieval) may be well suited to elucidation in a think-aloud interview.
However, he cautions that the general use of verbal report methods to target
cognitive processes involved in answering survey questions is difficult to justify,
especially for tasks (such as term comprehension) that do not satisfy the condi-
tions for valid verbal reports. He also notes that the social interaction involved
in interviewer administered cognitive interviews may violate a key assumption
posited by Ericsson and Simon for use of the method.
Willis not only helps us see that cognitive interviews may be better suited for
studying certain types of survey tasks than others, but also sheds light on the dif-
ferent ways of conducting the interviews: for instance, using think-alouds versus
1
Laboratory research to evaluate self-administered questionnaires was already under way at the Cen-
sus Bureau before the 1980 census (Rothwell, 1983, 1985). Although inspired by marketing research
rather than cognitive psychology, this work foreshadowed cognitive interviewing. For example,
observers asked respondents to talk aloud as they filled out questionnaires. See also Hunt et al.
(1982).

probes. Indeed, with Willis as a guide we can see more clearly that concurrent
think-alouds may fail to reveal how respondents interpret (or misinterpret) word
meanings, and that targeted verbal probes should be more effective for this pur-
pose. More generally, Willis’s emphasis on the theoretical foundation of testing
procedures is a much-needed corrective in a field that often slights such concerns.
Chapter 3, by Paul Beatty, bears out Willis’s concern about the reactivity of
aspects of cognitive interviewing. Beatty describes NCHS cognitive interviews
which showed that respondents had considerable difficulty answering a series of
health assessment items that had produced no apparent problems in a continuing
survey. Many researchers might see this as evidence of the power of cogni-
tive interviews to detect problems that are invisible in surveys. Instead, Beatty
investigated whether features of the cognitive interviews might have created the
problems, problems that the respondents would not otherwise have had.
Transcriptions from the taped cognitive interviews were analyzed for evidence
that respondent difficulty was related to the interviewer’s behavior, in particular
the types of probes posed. The results generally indicated that respondents who
received reorienting probes had little difficulty choosing an answer, whereas those
who received elaborating probes had considerable difficulty. During a further
round of cognitive interviews in which elaborating probes were restricted to the
post-questionnaire debriefing, respondents had minimal difficulty choosing an
answer. This is a dramatic finding, although Beatty cautions that it does not mean
that the questions were entirely unproblematic, as some respondents expressed
reservations about their answers during the debriefing.
Elaborating and reorienting probes accounted for only a small fraction of the
interviewers’ contribution to these cognitive interviews, and in the second part of
his chapter, Beatty examines the distribution of all the interviewers’ utterances
aside from reading the questions. He distinguishes between cognitive probes
(those traditionally associated with cognitive interviews, such as “What were
you thinking . . .?” “How did you come up with that . . .?” “What does [term]
mean to you?”); confirmatory probes (repeating something the respondent said
in a request for confirmation); expansive probes (requests for elaboration, such
as “Tell me more about that”); functional remarks (repetition or clarification of
the question, which included all reorienting probes); and feedback (e.g., “Thanks;
that’s what I want to know” or “I know what you mean”). Surprisingly, cognitive
probes, the heart of the method, accounted for less than 10% of interviewer
utterances. In fact, there were fewer cognitive probes than utterances in any of
the other categories.
Taken together, Beatty’s findings suggest that cognitive interview results are
importantly shaped by the interviewers’ contributions, which may not be well
focused in ways that support the inquiry. He concludes that cognitive interviews
would be improved by training interviewers to recognize distinctions among
probes and the situations in which each ought to be employed.
In Chapter 4, Frederick Conrad and Johnny Blair argue that (1) the raw mate-
rial produced by cognitive interviews consists of verbal reports; (2) the different
techniques used to conduct cognitive interviews may affect the quality of these

COGNITIVE INTERVIEWS 7
verbal reports; (3) verbal report quality should be assessed in terms of problem
detection and problem repair, as they are the central goals of cognitive inter-
viewing; and (4) the most valuable assessment data come from experiments in
which the independent variable varies the interview techniques and the dependent
variables are problem detection and repair.
In line with these recommendations, they carried out an experimental com-
parison of two different cognitive interviewing approaches. One was uncon-
trolled, using the unstandardized practices of four experienced cognitive inter-
viewers; the other, more controlled, used four less-experienced interviewers,
who were trained to probe only when there were explicit indications that the
respondent was experiencing a problem. The authors found that the conven-
tional cognitive interviews identified many more problems than the conditional
probe interviews.
As with Beatty’s study, however, more problems did not mean higher-quality
results. Conrad and Blair assessed the reliability of problem identification in
two ways: by interrater agreement among a set of trained coders who reviewed
transcriptions of the taped interviews, and by agreement between coders and
interviewers. Overall, agreement was quite low, consistent with the finding of
some other researchers about the reliability of cognitive interview data (Presser
and Blair, 1994). But reliability was higher for the conditional probe interviews
than for the conventional ones. (This may be due partly to the conditional probe
interviewers having received some training in what should be considered “a
problem,” compared to the conventional interviewers, who were provided no
definition of what constituted a “problem.”) Furthermore, as expected, conditional
interviewers probed much less than conventional interviewers, but more of their
probes were in cases associated with the identification of a problem. Thus, Conrad
and Blair, like Willis and Beatty, suggest that we rethink what interviewers do
in cognitive interviews.
Chapter 5, by Theresa DeMaio and Ashley Landreth, describes an experiment
in which three different organizations were commissioned to have two inter-
viewers each conduct five cognitive interviews of the same questionnaire using
whatever methods were typical for the organization, and then deliver a report
identifying problems in the questionnaire and a revised questionnaire addressing
the problems (as well as audiotapes for all the interviews). In addition, expert
reviews of the original questionnaire were obtained from three people who were
not involved in the cognitive interviews. Finally, another set of cognitive inter-
views was conducted by a fourth organization to test both the original and three
revised questionnaires.
The three organizations reported considerable diversity on many aspects of
the interviews, including location (respondent’s home versus research lab), inter-
viewer characteristics (field interviewer versus research staff), question strategy
(think-aloud versus probes), and data source (review of audiotapes versus inter-
viewer notes and recollections). This heterogeneity is consistent with the findings
of Blair and Presser (1993) but is even more striking given the many interven-
ing years in which some uniformity of practice might have emerged. It does,

however, mean that differences in the results of these cognitive interviews across
organization cannot be attributed unambiguously to any one factor.
There was variation across the organizations in both the number of ques-
tions identified as having problems and the total number of problems identified.
Moreover, there was only modest overlap in the particular problems diagnosed
(i.e., the organizations tended to report unique problems). Similarly, the cognitive
interviews and the expert reviews overlapped much more in identifying which
questions had problems than in identifying what the problems were. The organi-
zation that identified the fewest problems (both overall and in terms of number
of questions) also showed the lowest agreement with the expert panel. This orga-
nization was the only one that did not review the audiotapes, and DeMaio and
Landreth suggest that relying solely on interviewer notes and memory leads to
error.2
However, the findings from the tests of the revised questionnaires did not
identify one organization as consistently better or worse than the others.
All four of these chapters argue that the methods used to conduct cognitive
interviews shape the data they produce. This is a fundamental principle of survey
methodology, yet it may be easier to ignore in the context of cognitive interviews
than in the broader context of survey research. The challenge of improving the
quality of verbal reports from cognitive interviews will not be easily met, but
it is akin to the challenge of improving data more generally, and these chapters
bring us closer to meeting it.
1.3 SUPPLEMENTS TO CONVENTIONAL PRETESTS
Unlike cognitive interviews, which are completely distinct from conventional
pretests, other testing methods that have been developed may be implemented
as add-ons to conventional pretests (or as additions to a survey proper).
These include behavior coding, response latency, formal respondent debriefings,
and vignettes.
Behavior coding was developed in the 1960s by Charles Cannell and his
colleagues at the University of Michigan Survey Research Center and can be used
to evaluate both interviewers and questions. Its early applications were almost
entirely focused on interviewers, so it had no immediate impact on pretesting
practices. In the late 1970s and early 1980s, a few European researchers adopted
behavior coding to study questions, but it was not applied to pretesting in the
United States until the late 1980s (Oksenberg et al.’s 1991 article describes it as
one of two “new strategies for pretesting questions”).
Behavior coding involves monitoring interviews or reviewing taped interviews
(or transcripts) for a subset of the interviewer’s and respondent’s verbal behavior
in the question asking and answering interaction. Questions marked by high
frequencies of certain behaviors (e.g., the interviewer did not read the question
verbatim or the respondent requested clarification) are seen as needing repair.
2
Bolton and Bronkhorst (1996) describe a computerized approach to evaluating cognitive interview
results, which should reduce error even further.

SUPPLEMENTS TO CONVENTIONAL PRETESTS 9
Behavior coding may be extended in various ways. In Chapter 6, Johannes van
der Zouwen and Johannes Smit describe an extension that draws on the sequence
of interviewer and respondent behaviors, not just the frequency of the individual
behaviors. Based on the sequence of a question’s behavior codes, an interac-
tion is coded as either paradigmatic (the interviewer read the question correctly,
the respondent chose one of the alternatives offered, and the interviewer coded
the answer correctly), problematic (the sequence was nonparadigmatic but the
problem was solved, e.g., the respondent asked for clarification and then chose
one of the alternatives offered), or inadequate (the sequence was nonparadig-
matic and the problem was not solved). Questions with a high proportion of
nonparadigmatic sequences are identified as needing revision.
Van der Zouwen and Smit analyzed a series of items from a survey of the
elderly to illustrate this approach as well as to compare the findings it produced to
those from basic behavior coding and from four ex-ante methods, that is, methods
not entailing data collection: a review by five methodology experts; reviews by the
authors guided by two different questionnaire appraisal systems; and the quality
predictor developed by Saris and his colleagues (Chapter 14), which we describe
in Section 1.5. The two methods based on behavior codes produced very similar
results, as did three of the four ex ante methods—but the two sets of methods
identified very different problems. As van der Zouwen and Smit observe, the
ex-ante methods point out what could go wrong with the questionnaire, whereas
the behavior codes and sequence analyses reveal what actually did go wrong.
Another testing method based on observing behavior involves the measure-
ment of response latency, the time it takes a respondent to answer a question.
Since most questions are answered rapidly, latency measurement requires the
kind of precision (to fractions of a second) that is almost impossible without
computers. Thus, it was not until after the widespread diffusion of computer-
assisted survey administration in the 1990s that the measurement of response
latency was introduced as a testing tool (Bassili and Scott, 1996).
In Chapter 7, Stasja Draisma and Wil Dijkstra use response latency to eval-
uate the accuracy of respondents’ answers, and therefore, indirectly to evaluate
the questions themselves. As they operationalize it, latency refers to the delay
between the end of an interviewer’s reading of a question and the beginning of
the respondent’s answer. The authors reason that longer delays signal respon-
dent uncertainty, and they test this idea by comparing the latency of accurate
and inaccurate answers (with accuracy determined by information from another
source). In addition, they compare the performance of response latency to that
of several other indicators of uncertainty.
In a multivariate analysis, both longer response latencies and the respon-
dents’ expressions of greater uncertainty about their answers were associated with
inaccurate responses. Other work (Chapters 8 and 23), which we discuss below,
reports no relationship (or even, an inverse relationship) between respondents’
confidence or certainty and the accuracy of their answers. Thus, future research
needs to develop a more precise specification of the conditions in which different
measures of respondent uncertainty are useful in predicting response error.

Despite the fact that the interpretation of response latency is less straight-
forward than that of other measures of question problems (lengthy times may
indicate careful processing, as opposed to difficulty), the method shows sufficient
promise to encourage its further use. This is especially so, as the ease of collecting
latency information means that it could be routinely included in computer-assisted
surveys at very low cost. The resulting collection of data across many different
surveys would facilitate improved understanding of the meaning and conse-
quences of response latency and of how it might best be combined with other
testing methods, such as behavior coding, to enhance the diagnosis of question-
naire problems.
Chapter 8, by Elizabeth Martin, is about vignettes and respondent debriefing.
Unlike behavior coding and response latency, which are “undeclared” testing
methods, respondent debriefings are a “participating” method, which informs the
respondent about the purpose of the inquiry. Such debriefings have long been
recommended as a supplement to conventional pretest interviews (Kornhauser,
1951, p. 430), although they most commonly have been conducted as unstruc-
tured inquiries improvised by interviewers. Martin shows how implementing them
in a standardized manner can reveal both the meanings of questions and the reac-
tions that respondents have to the questions. In addition, she demonstrates how
debriefings can be used to measure the extent to which questions lead to missed
or misreported information.
Vignette analysis, the other method Martin discusses, may be incorporated
in either undeclared or participating pretests. Vignettes—hypothetical scenarios
that respondents evaluate—may be used to (1) explore how people think about
concepts, (2) test whether respondents’ interpretations of concepts are consistent
with those that are intended, (3) analyze the dimensionality of concepts, and
(4) diagnose other question wording problems. Martin provides examples of each
of these applications and offers evidence of the validity of vignette analysis by
drawing on evaluations of questionnaire changes made on the basis of the method.
The three chapters in this part suggest that testing methods differ in the types
of problems they are suited to identify, their potential for diagnosing the nature of
a problem and thereby for fashioning appropriate revisions, the reliability of their
results, and the resources needed to conduct them. It appears, for instance, that
formal respondent debriefings and vignette analysis are more apt than behavior
coding and response latency to identify certain types of comprehension problems.
Yet we do not have good estimates of many of the ways in which the methods dif-
fer. The implication is not only that we need research explicitly designed to make
such comparisons, but also that multiple testing methods are probably required
in many cases to ensure that respondents understand the concepts underlying
questions and are able and willing to answer them accurately.
1.4 EXPERIMENTS
Both supplemental methods to conventional pretests and cognitive interviews
identify questionnaire problems and lead to revisions designed to address the

EXPERIMENTS 11
problems. To determine whether the revisions are improvements, however, there
is no substitute for experimental comparisons of the original and revised items.
Such experiments are of two kinds. First, the original and revised items can
be compared using the testing method(s) that identified the problem(s). Thus, if
cognitive interviews showed that respondents had difficulty with an item, the item
and its revision can be tested in another round of cognitive interviews to confirm
that the revision shows fewer such problems than the original. The interpretation
of results from this type of experiment is usually straightforward, although there
is no assurance that observed differences will have any effect on survey estimates.
Second, original and revised items can be tested to examine what, if any,
difference they make for a survey’s estimates. The interpretation from this kind of
experiment is sometimes less straightforward, but such split-sample experiments
have a long history in pretesting. Indeed, they were the subject of one of the
earliest articles devoted to pretesting (Sletto, 1950), although the experiments that
it described dealt with the impact on cooperation to mail surveys of administrative
matters such as questionnaire length, nature of the cover letter’s appeal, use of
follow-up postcards, and questionnaire layout. None of the examples concerned
question wording.
In Chapter 9, Floyd Fowler describes three ways to evaluate the results of
experiments that compare question wordings: differences in response distribu-
tions, validation against a standard, and usability, as measured, for instance, by
behavior coding. He provides six case studies that illustrate how cognitive inter-
views and experiments are complementary. For each, he outlines the problems
that the cognitive interviews detected and the nature of the remedy proposed. He
then presents a comparison of the original and revised questions from split-sample
experiments that were behavior coded. As he argues, this type of experimental
evidence is essential in estimating whether different question wordings affect
survey results, and if so, by how much.
All of Fowler’s examples compare single items that vary in only one way.
Experiments can also be employed to test versions of entire questionnaires
that vary in multiple, complex ways. This type of experiment is described in
Chapter 10, by Jeffrey Moore, Joanne Pascale, Pat Doyle, Anna Chan, and
Julia Klein Griffiths with data from SIPP, the Survey of Income and Program
Participation, a large U.S. Bureau of the Census survey that has been conducted
on a continuing basis for nearly 20 years. The authors revised the SIPP
questionnaire to meet three major objectives: minimize response burden and
thereby decrease both unit and item nonresponse, reduce seam bias reporting
errors, and introduce questions about new topics. Then, to assess the effects
of the revisions before switching to the new questionnaire, an experiment was
conducted in which respondents were randomly assigned to either the new or
old version.
Both item nonresponse and seam bias were lower with the new questionnaire,
and with one exception, the overall estimates of income and assets (key measures
in the survey) did not differ between versions. On the other hand, unit nonre-
sponse reductions were not obtained (in fact, in initial waves, nonresponse was

higher for the revised version) and the new questionnaire took longer to admin-
ister. Moore et al. note that these latter results may have been caused by two
complicating features of the experimental design. First, experienced SIPP inter-
viewers were used for both the old and new instruments. The interviewers’ greater
comfort level with the old questionnaire (some reported being able to “administer
it in their sleep”) may have contributed to their administering it more quickly
than the new questionnaire and persuading more respondents to cooperate with
it. Second, the addition of new content to the revised instrument may have more
than offset the changes that were introduced to shorten the interview.
In Chapter 11, Roger Tourangeau argues that the practical consideration that
leads many experimental designs to compare packages of variables, as in the SIPP
case, hampers the science of questionnaire design. Because it experimented with
a package of variables, the SIPP research could estimate the overall effect of the
redesign, which is vital to the SIPP sponsors, but not estimate the effects of indi-
vidual changes, which is vital to an understanding of the effects of questionnaire
features (and therefore to sponsors of other surveys making design changes). As
Tourangeau outlines, relative to designs comparing packages of variables, facto-
rial designs allow inference not only about the effects of particular variables, but
about the effects of interactions between variables as well. In addition, he debunks
common misunderstandings about factorial designs: for instance, that they must
have equal-sized cells and that their statistical power depends on the cell size.
Other issues that Tourangeau considers are complete randomization versus
randomized block designs (e.g., should one assign the same interviewers to all
the conditions, or different interviewers to different versions of the question-
naire?), conducting experiments in a laboratory setting as opposed to the field,
and statistical power, each of which affects importantly the inferences drawn
from experiments. Particularly notable is his argument in favor of more labo-
ratory experiments, but his discussion of all these matters will help researchers
make more informed choices in designing experiments to test questionnaires.
1.5 STATISTICAL MODELING
Questionnaire design and statistical modeling are usually thought of as being
worlds apart. Researchers who specialize in questionnaires tend to have rudi-
mentary statistical understanding, and those who specialize in statistical modeling
generally have little appreciation for question wording. This is unfortunate, as
the two should work in tandem for survey research to progress. Moreover, the
two-worlds problem is not inevitable. In the early days of survey research, Paul
Lazarsfeld, Samuel Stouffer, and their colleagues made fundamental contributions
to both questionnaire design and statistical analysis. Thus, it is fitting that the
first of our three chapters on statistical modeling to evaluate questionnaires draws
on a technique, latent class analysis, rooted in Lazarsfeld’s work. In Chapter 12,
Paul Biemer shows how estimates of the error associated with questions may
be made when the questions have been asked of the same respondents two or
more times.

STATISTICAL MODELING 13
Latent class analysis (LCA) models the relationship between an unobservable
latent variable and its indicator. Biemer treats the case of nominal variables
where the state observed is a function of the true state and of false positive
and false negative rates. He presents several applications from major surveys to
illustrate how LCA allows one to test assumptions about error structure. Each
of his examples produces results that are informative about the nature of the
errors in respondent answers to the questions. Yet Biemer is careful to note that
LCA depends heavily on an assumed model, and there is usually no direct way
to evaluate the model assumptions. He concludes that rather than relying on
a single statistical method for evaluating questions, multiple methods ought to
be employed.
Whereas Biemer’s chapter focuses on individual survey questions, in
Chapter 13, Bryce Reeve and Louise Mâsse focus on scales or indexes
constructed from multiple items. Reeve and Mâsse note that the principles of
classical test theory usually yield scales with many items, without providing much
information about the performance of the separate questions. They describe Item
Response Theory (IRT) models that assess how well different items discriminate
among respondents who have the same value on a trait. The authors’ empirical
example comes from the SF-36 Mental Health, Social Functioning, Vitality and
Emotional subscales, widely used health indicators composed of 14 questions.
The power of IRT to identify the discriminating properties of specific items
allows researchers to design shorter scales that do a better job of measuring
constructs. Even greater efficiency can be achieved by using IRT methods to
develop computer adaptive tests (CATs). With a CAT, a respondent is presented
a question near the middle of the scale range, and an estimate of the person’s total
score is constructed based on his or her response. Another item is then selected
based on that estimate, and the process is repeated. At each step, the precision of
the estimated total score is computed, and when the desired precision is reached,
no more items are presented. CAT also offers the opportunity for making finer
distinctions at the ends of the range, which would be particularly valuable for the
SF-36 scale, since as Reeve and Mâsse note, it does not do well in distinguishing
people at the upper end of the range.
In Chapter 14, Willem Saris, William van der Veld, and Irmtraud Gallhofer
draw on statistical modeling in a very different fashion. In the early 1980s,
Frank Andrews applied the multitrait, multimethod (MTMM) measurement
approach (Campbell and Fiske, 1959) to estimate the reliability and validity
of a sample of questionnaire items, and suggested that the results could be
used to characterize the reliability and validity of question types. Following
his suggestion, Saris et al. created a database of MTMM studies that provides
estimates of reliability and validity for 1067 questionnaire items. They then
developed a coding system to characterize the items according to the nature of
their content, complexity, type of response scale, position in the questionnaire,
data collection mode, sample type, and the like. Next, they fit two large regression
models in which these characteristics were the independent variables and the
dependent variables were the MTMM reliability or validity estimates. The

resulting model coefficients estimate the effect on the reliability or validity of
the question characteristics.
Saris et al. show how new items can be coded and the prediction equation
used to estimate their quality. They created a program for the coding, some of
which is entirely automated and some of which is computer guided. Once the
codes are assigned, the program calculates the quality estimates.
The authors recognize that more MTMM data are needed to improve the
models. In addition, the predictions of the models need to be tested in validation
studies. However, the approach is a promising one for evaluating questions.
1.6 MODE OF ADMINISTRATION
The introduction of computer technology has changed many aspects of adminis-
tering questionnaires. On the one hand, the variety of new methods—beginning
with computer-assisted telephone interviewing (CATI), but soon expanding to
computer-assisted personal interviewing (CAPI) and computer-assisted self-inter-
viewing (CASI)—has expanded our ability to measure a range of phenom-
ena more efficiently and with improved data quality (Couper et al., 1998). On
the other hand, the continuing technical innovations—including audio-CASI,
interactive voice response, and the Internet—present many challenges for ques-
tionnaire design.
The proliferation of data collection modes has at least three implications for
the development, evaluation, and testing of survey instruments. One implication
is the growing recognition that answers to survey questions may be affected by
the mode in which the questions are asked. Thus, testing methods must take the
delivery mode into consideration. A related implication is that survey instruments
consist of much more than words. For instance, an instrument’s layout and design,
logical structure and architecture, and the technical aspects of the hardware and
software used to deliver it all need to be tested and their possible effects on
measurement error explored. A third implication is that survey instruments are
increasingly complex and demand ever-expanding resources for testing. The older
methods, which relied on visual inspection to test flow and routing, are no longer
sufficient. Newer methods must be found to facilitate the testing of instrument
logic, quite aside from the wording of individual questions. In summary, the task
of testing questionnaires has greatly expanded.
Although Chapter 15, by Don Dillman and Cleo Redline, deals with tradi-
tional paper-based methods, it demonstrates that a focus on question wording is
insufficient even in that technologically simple mode. The authors discuss how
cognitive interviews may be adapted to explore the various aspects of visual
language in self-administered questionnaires. They then describe three projects
with self-administered instruments that mounted split-sample experiments based
on the insights from cognitive interviews. In each case, the experimental results
generally confirmed the conclusions drawn from cognitive interviews. (One of
the advantages of self-administered approaches such as mail and the Web is

SPECIAL POPULATIONS 15
that the per unit cost of data collection is much lower than that of interviewer-
administered methods, permitting more extensive use of experiments.)
In Chapter 16, John Tarnai and Danna Moore focus primarily on testing com-
puterized instruments for programming errors. With the growing complexity of
survey instruments and the expanding range of design features available, this has
become an increasingly costly and time-consuming part of the development pro-
cess, often with no guarantee of complete success. Tarnai and Moore argue that
much of this testing can be done effectively and efficiently only by computers,
but that existing software is not up to the task—conclusions similar to that of a
recent Committee on National Statistics workshop on survey automation (Cork
et al., 2003).
Chapter 17, by Sue Ellen Hansen and Mick Couper, concerns usability testing
to evaluate computerized survey instruments. In line with Dillman and Redline’s
chapter, it shows that the visual presentation of information—in this case, to the
interviewer—as well as the design of auxiliary functions used by the interviewer
in computer-assisted interviewing, are critical to creating effective instruments.
As a result, Hansen and Couper maintain that testing for usability is as important
as testing for programming errors. With computerized questionnaires, interview-
ers must manage two interactions, one with the computer and another with the
respondent, and the goal of good design must therefore be to help interviewers
manage both interactions to optimize data quality. Using four separate examples
of usability testing to achieve this end, Hansen and Couper demonstrate its value
for testing computerized instruments.
Chapter 18, by Reginald Baker, Scott Crawford, and Janice Swinehart, covers
the development and testing of Web questionnaires. They review the various lev-
els of testing necessary for Web surveys, some of which are unique to that mode
(e.g., aspects of the respondent’s computing environment such as monitor display
properties, the presence of browser plug-ins, and features of the hosting platform
that define the survey organization’s server). In outlining a testing approach, the
authors emphasize standards or generic guidelines and design principles that apply
across questionnaires. In addition to testing methods used in other modes, Baker
and his colleagues discuss evaluations based on process data that are easily col-
lected during Web administration (e.g., response latencies, backups, entry errors,
and breakoffs). As with Chapter 16, this chapter underscores the importance of
automated testing tools, and consistent with the other two chapters in this part,
it emphasizes that testing Web questionnaires must focus on their visual aspects.
1.7 SPECIAL POPULATIONS
Surveys of children, establishments, and populations that require questionnaires
in multiple languages pose special design problems. Thus, pretesting is still more
vital in these cases than for surveys of adults interviewed with questionnaires
in a single language. Remarkably, however, pretesting has been neglected even
further for such surveys than for ordinary ones.

Establishments and children might seem to have little in common, but the
chapters that deal with them follow a similar logic. They begin by analyzing
how the capabilities and characteristics of these special respondents affect the
response process, move on to consider what these differences imply for choices
about testing methods, and then consider steps to improve testing.
In Chapter 19, Diane Willimack, Lars Lyberg, Jean Martin, Lilli Japec, and
Patricia Whitridge draw on their experiences at four national statistical agencies as
well as on an informal survey of other survey organizations to describe distinctive
characteristics of establishment surveys that have made questionnaire pretesting
uncommon. Establishment surveys tend to be mandatory, rely on records, and
target populations of a few very large organizations, which are included with cer-
tainty, and many smaller ones, which are surveyed less often. These features seem
to have militated against adding to the already high respondent burden by con-
ducting pretests. In addition, because establishment surveys are disproportionately
designed to measure change over time, questionnaire changes are rare. Finally,
establishment surveys tend to rely on postcollection editing to correct data.
Willimack et al. describe various ways to improve the design and testing of
establishment questionnaires. In addition to greater use of conventional meth-
ods, they recommend strategies such as focus groups, site visits, record-keeping
studies, and consultation with subject area specialists and other stakeholders.
They also suggest making better use of ongoing quality evaluations and rein-
terviews, as well as more routine documentation of respondents’ feedback, to
provide diagnoses of questionnaire problems. Finally, they recommend that tests
be embedded in existing surveys so that proposed improvements can be evaluated
without increasing burden.
In Chapter 20, Edith de Leeuw, Natacha Borgers, and Astrid Smits consider
pretesting questionnaires for children and adolescents. They begin by review-
ing studies of children’s cognitive development for guidance about the types
of questions and cognitive tasks that can be asked of children of various ages.
The evidence suggests that 7 is about the earliest age at which children can be
interviewed with structured questionnaires, although the ability to handle certain
types of questions (e.g., hypothetical ones) is not acquired until later. The authors
discuss how various pretesting methods, including focus groups, cognitive inter-
views, observation, and debriefing, can be adapted to accommodate children of
different ages. Finally, they provide examples of pretests that use these methods
with children and offer advice about other issues (such as informed consent) that
must be specially addressed for children.
Questionnaire translation has always been basic to cross-national surveys, and
recently it has become increasingly important for national surveys as well. Some
countries (e.g., Canada, Switzerland, and Belgium) must, by law, administer
surveys in multiple languages. Other nations are translating questionnaires as a
result of increasing numbers of immigrants. In the United States, for instance,
the population 18 and older who speak a language other than English at home
increased from 13.8% in 1990 to 17.8% in 2000 (U.S. Bureau of the Census,
2003). Moreover, by 2000, 4.4% of U.S. adults lived in “linguistically isolated”

MULTIMETHOD APPLICATIONS 17
households, those in which all the adults spoke a language other than English
and none spoke English “very well.”
Despite its importance, Tom Smith reports in Chapter 21 that “. . . no aspect
of cross-national survey research has been less subjected to systematic, empir-
ical investigation than translation.” His chapter is one of two that address the
development and testing of questionnaires to be administered in more than one
language. The author describes sources of nonequivalence in translated questions
and discusses the problems involved in translating response scales or categories
so that they are equivalent. In addition, he points out that item comparability may
be impaired because response effects vary from one country to another. Smith
reviews different approaches to translation, which he argues must be an integral
part of item development and testing rather than an isolated activity relegated to
the end of the design process.
The chapter outlines several strategies to address problems arising from non-
comparability across languages. One approach involves asking multiple questions
about a concept (e.g., well being) with different terms in each (e.g., satisfaction
versus happiness), so that translation problems with a single term do not result
in measurement error for all the items. Another approach is to use questions that
are equivalent across the cultures and languages as well as those that are culture
specific. A third strategy is to conduct special studies to calibrate scale terms.
As Janet Harkness, Beth-Ellen Pennell, and Alisú Schoua-Glusberg note in
Chapter 22, translation is generally treated as a minor aspect of the question-
naire development process, and pretesting procedures that are often employed
for monolingual survey instruments are not typically used for translated ques-
tionnaires. These authors provide illustrations of the sources and possible conse-
quences of translation problems that arise from difficulties of matching meaning
across questions and answer scales, and from differences between languages,
such as whether or not words carry gender. Too-close (word-by-word) transla-
tions can result in respondents being asked a different question than intended, or
being asked a more cumbersome or stilted question.
Based on their experience, Harkness and her colleagues offer guidance on
procedures and protocols for translation and assessment. They envision a more
rigorous process of “translatology” than the ad hoc practices common to most
projects. They emphasize the need for appraisals of the translated text (and hence
do not believe back-translation is adequate), and they argue that the quality of
translations, as well as the performance of the translated questions as survey
questions, must be assessed. Finally, they recommend team approaches that bring
different types of expertise to bear on the translation, and suggest ways to organize
the effort of translation, assessment, and documentation (the last of which is
particularly important for interpreting results after a survey is completed).
1.8 MULTIMETHOD APPLICATIONS
The importance of multiple methods is a recurring theme throughout this volume.
Two of the three chapters in the final section provide case studies of multiple

methods, and the third provides an experimental assessment of a combination of
methods. Although none of the chapters permits a controlled comparison of the
effectiveness of individual methods, each shows how a multimethod approach can
be employed to address survey problems. The case studies, which describe and
evaluate the methods used at different stages in the design and evaluation process,
should help inform future testing programs, as there are few published descrip-
tions of how such programs are conducted and with what results. The experi-
mental comparison is of importance in evaluating pretesting more generally.
In Chapter 23, Nora Cate Schaeffer and Jennifer Dykema describe a test-
ing program designed to ensure that respondents correctly understood a concept
central to a survey on child support: joint legal custody. Legal custody is easily
confused with physical custody, and an earlier survey had shown that respondents
underreported it. The authors’ aim was to reduce underreporting of joint custody
by improving the question that asked about it. They first convened focus groups
to explore the domain, in particular the language used by parents in describing
custody arrangements. They then conducted cognitive interviews to evaluate both
the question from the earlier survey and the revised versions developed after the
focus groups. Finally, they carried out a split-sample field experiment that varied
the context of the legal custody question, to test the hypothesis that asking it after
questions about physical custody would improve accuracy. The field interviews,
which included questions about how sure respondents were of their answers,
were also behavior coded. Moreover, the authors had access to official custody
records, which allowed them to validate respondents’ answers.
Overall, the testing program was successful in increasing accurate reporting
among those who actually had joint legal custody. In other words, the initially
observed problem of false negatives was greatly reduced. There was, however,
an unanticipated reduction in accuracy among those who did not have joint legal
custody. That is, false positives, which had not initially been a serious problem,
increased substantially. Thus the study serves as a reminder of the danger of
focusing on a single dimension of a measurement problem. Fixing one problem
only to cause another may be more common than we suppose. It also reminds
us of the usefulness of testing the revised questionnaire before implementing it
in a production survey.
Chapter 24, by Michael Kaplowitz, Frank Lupi, and John Hoehn, describes
multiple methods for developing and evaluating a stated-choice questionnaire to
value wetlands. Measuring people’s choices about public policy issues poses a
dual challenge. The questionnaire must clearly define the issue (in this instance,
the nature and role of wetlands) so that a cross section of adults will understand it,
and it must specify a judgment about the issue (in this instance, whether restoring
wetlands with specified characteristics offsets the loss of other wetlands with
different characteristics) that respondents will be able to make meaningfully.
To accomplish these ends, the authors first conducted a series of focus groups,
separately with target respondents and a panel of subject matter experts, to explore

MULTIMETHOD APPLICATIONS 19
the subject matter and inform the design of two alternative questionnaires. They
then convened additional focus groups to evaluate the instruments, but because
the “no-show” rate was lower than expected, they also conducted cognitive inter-
views with the additional people not needed for the groups. The results from these
cognitive interviews turned out to be much more valuable for assessing the ques-
tionnaires than the results from the focus groups. Finally, they conducted further
cognitive interviews in an iterative process involving revisions to the question-
naire. Kaplowitz et al. analyze the limitations of focus groups for evaluating
survey instruments in terms of conversational norms and group dynamics. Thus,
the chapter illustrates a method that did not work well with a useful description
of what went wrong, and why.
The volume ends with Chapter 25 by Barbara Forsyth, Jennifer Rothgeb, and
Gordon Willis. These authors assessed whether pretesting predicts data collec-
tion problems and improves survey outcomes. A combination of three meth-
ods—informal expert review, appraisal coding, and cognitive interviews—was
used to identify potential problems in a pretest of a questionnaire consisting of 83
items. The 12 questions diagnosed most consistently by the three methods as hav-
ing problems were then revised to address the problems. Finally, a split-sample
field experiment was conducted to compare the original and revised items. The
split-sample interviews were behavior coded and the interviewers were asked to
evaluate the questionnaires after completing the interviews.
The versions of the original questions identified in the pretest as particularly
likely to pose problems for interviewers were more likely to show behavior-coded
interviewing problems in the field and to be identified by interviewers as having
posed problems for them. Similarly, the questions identified by the pretest as
posing problems for respondents resulted in more respondent problems, according
to both the behavior coding and the interviewer ratings. Item nonresponse was
also higher for questions identified by the pretest as presenting either recall or
sensitivity problems than for questions not identified as having those problems.
These results demonstrate that the combination of pretesting methods was a good
predictor of the problems the items would produce in the field.
However, the revised questions generally did not appear to outperform the
original versions. The item revisions had no effect on the frequency of behavior-
coded interviewer and respondent problems. And while interviewers did rate the
revisions as posing fewer respondent problems, they rated them as posing more
interviewer problems. The authors suggest various explanations for this outcome,
including their selection of only questions diagnosed as most clearly problem-
atic, which often involved multiple problems that required complex revisions to
address. In addition, the revised questions were not subjected to another round
of testing using the three methods that originally identified the problems to con-
firm that the revisions were appropriate. Nonetheless, the results are chastening,
as they suggest that we have much better tools for diagnosing questionnaire
problems than for fixing them.

1.9 AGENDA FOR THE FUTURE
The methods discussed here do not exhaust the possibilities for testing and eval-
uating questions. For instance, formal appraisal schemes that are applied by
coders (Lessler and Forsyth, 1996) are treated only in passing in this volume,
and those involving artificial intelligence (Graesser et al., 2000) are not treated
at all. In addition, there is only a little on focus groups (Bischoping and Dykema,
1999) and nothing on ethnographic interviews (Gerber, 1999), both of which are
most commonly used at an early development stage before there is an instrument
to be tested. Nonetheless, the volume provides an up-to-date assessment of the
major evaluation methods currently in use and demonstrates the progress that has
been attained in making questionnaire testing more rigorous. At the same time,
the volume points to the need for extensive additional work.
Different pretesting methods, and different ways of carrying out the same
method, influence the numbers and types of problems identified. Consistency
among methods is often low, and the reasons for this need more investigation.
One possibility is that in their present form, some of the methods are unreliable.
But two other possibilities are also worth exploring. First, lack of consistency
may occur because the methods are suited for identifying different problem types.
For example, comprehension problems that occur with no disruption in the ques-
tion asking and answering process are unlikely to be picked up by behavior
coding. Thus, we should probably expect only partial overlap in the problems
identified by different methods. Second, inconsistencies may reflect a lack of
consensus among researchers, cognitive interviewers, or coders about what is
regarded as a problem. For example, is it a problem if a question is awkward
to ask but obtains accurate responses, or is it a problem only if the question
obtains erroneous answers? The types and severity of problems that a ques-
tionnaire pretest (or methodological evaluation) aims to identify are not always
clear, and this lack of specification may contribute to the inconsistencies that
have been observed.
In exploring such inconsistencies, the cross-organization interlaboratory
approach used in DeMaio and Landreth’s chapter (see also Martin et al., 1999)
holds promise not only of leading to greater standardization, and therefore to
higher reliability, but to enhancing our understanding of which methods are
appropriate in different circumstances and for different purposes.
It is also clear that problem identification does not necessarily point to problem
solution in any obvious or direct way. For instance, the authors of Chapters 23
and 25 used pretesting to identify problems that were then addressed by revisions,
only to find in subsequent field studies that the revisions either did not result in
improvements, or created new problems. The fact that we are better able to
identify problems than solutions underscores the desirability of additional testing
after questionnaires have been revised.
Many of the chapters contain specific suggestions for future research, but here
we offer four general recommendations for advancing questionnaire testing and
evaluation. These involve:

AGENDA FOR THE FUTURE 21
ž The connection between problem identification and measurement error
ž The impact of testing methods on survey costs
ž The role of basic research and theory in guiding the repair of question flaws
ž The development of a database to facilitate cumulative knowledge
First, we need studies that examine the connection between problem diagnosis
and measurement error. A major objective of testing is to reduce measurement
error, yet we know little about the degree to which error is predicted by the var-
ious problem indicators at the heart of the different testing methods. Chapters 7
and 23 are unusual in making use of external validation in this way. Several of
the other chapters take an indirect approach, by examining the link between prob-
lem diagnosis and specific response patterns (e.g., missing data, or seam bias),
on the assumption that higher or lower levels are more accurate. But inferences
based on indirect approaches must be more tentative than those based on direct
validation (e.g., record check studies). With appropriately designed validation
studies, we might be better able to choose among techniques for implementing
particular methods, evaluate the usefulness of various methods for diagnosing
different kinds of problems, and understand how much pretesting is “enough.”
We acknowledge, however, that validation data are rarely available and are them-
selves subject to error. Thus, another challenge for future research is to develop
further indicators of measurement error that can be used to assess testing methods.
Second, we need information about the impact of various testing methods on
survey costs. The cost of testing may be somewhat offset, completely offset, or
even more than offset (and therefore reduce the total survey budget), depending on
whether the testing results lead to the identification (and correction) of problems
that affect those survey features (e.g., interview length, interviewer training, and
postsurvey data processing) that have implications for cost. Although we know
something about the direct costs of various testing methods, we know almost
nothing about how the methods differ in their impact on overall costs. Thus, a
key issue for future research is to estimate how various testing methods perform
in identifying the types of problems that increase survey costs.
Third, since improved methods for diagnosing problems are mainly useful
to the extent that we can repair the problems, we need more guidance in mak-
ing repairs. As a result, advances in pretesting depend partly on advances in
the science of asking questions (Schaeffer and Presser, 2003). Such a science
involves basic research into the question-and-answer process that is theoretically
motivated (Sudman et al., 1996; Tourangeau et al., 2000; Krosnick and Fabrigar,
forthcoming). But this is a two-way street. On the one hand, pretesting should be
guided by theoretically motivated research into the question-and-answer process.
On the other hand, basic research and theories of the question-and-answer process
should be shaped by both the results of pretesting and developments in the testing
methods themselves [e.g., the question taxonomies, or classification typologies,
used in questionnaire appraisal systems (Lessler and Forsyth, 1996) and the type
of statistical modeling described by Saris et al.]. In particular, pretesting’s focus
on aspects of the response tasks that can make it difficult for respondents to

answer accurately ought to inform theories of the connection between response
error and the question-and-answer process.
Finally, we need improved ways to accumulate knowledge across pretests.
This will require greater attention to documenting what is learned from pretests
of individual questionnaires. One of the working groups at the Second Advanced
Seminar on the Cognitive Aspects of Survey Methodology (Sirken et al., 1999,
p. 56) suggested that survey organizations archive, in a central repository, the
cognitive interviews they conduct, including the items tested, the methods used,
and the findings produced. As that group suggested, this would “facilitate system-
atic research into issues such as: What characteristics of questions are identified
by cognitive interviewing as engendering particular problems? What testing fea-
tures are associated with discovering different problem types? What sorts of
solutions are adopted in response to various classes of problems?” We believe
that this recommendation should apply to all methods of pretesting. Establishing
a pretesting archive on the Web would not only facilitate research on question-
naire evaluation, it would also serve as an invaluable resource for researchers
developing questionnaires for new surveys.3
ACKNOWLEDGMENTS
The authors appreciate the comments of Roger Tourangeau. A revised version of
this chapter is to be published in the Research Synthesis Section of the Spring
2004 issue of Public Opinion Quarterly.
3
Many Census Bureau pretest reports are available at http://www.census.gov/srd/www/byyear.html,
and many other pretest reports may be found in the Proceedings of the American Statistical Associ-
ation Survey Research Methods Section and the American Association for Public Opinion Research
available at http://www.amstat.org/sections/srms/proceedings. But neither site is easily searchable,
and the reports often contain incomplete information about the procedures used.

C H A P T E R 2
Cognitive Interviewing Revisited:
A Useful Technique, in Theory?
Gordon B. Willis
National Cancer Institute
2.1 INTRODUCTION
In contrast to the more empirically oriented chapters in this volume, in this chapter
I adopt a more theoretic viewpoint relating to the development, testing, and
evaluation of survey questions, stemming from the perspective commonly termed
CASM (cognitive aspects of survey methodology). From this perspective, the
respondent’s cognitive processes drive the survey response, and an understanding
of cognition is central to designing questions and to understanding and reducing
sources of response error. A variety of cognitive theorizing and modeling has
been applied to the general challenge of questionnaire design (see Sudman et al.,
1996, and Tourangeau et al., 2000, for reviews). Optimally, an understanding of
cognitive processes will help us to develop design rules that govern choice of
response categories, question ordering, and so on.
I address here a facet of CASM that is related to such design decisions, but
from a more empirical viewpoint that involves questionnaire pretesting, specifi-
cally through the practice of cognitive interviewing. This activity is not carried out
primarily for the purpose of developing general principles of questionnaire design,
but rather, to evaluate targeted survey questions, with the goal of modifying these
questions when indicated. That is, we test survey questions by conducting what
is variably termed the cognitive, intensive, extended, think-aloud, or laboratory
Methods for Testing and Evaluating Survey Questionnaires, Edited by Stanley Presser,
Jennifer M. Rothgeb, Mick P. Couper, Judith T. Lessler, Elizabeth Martin, Jean Martin,
and Eleanor Singer
ISBN 0-471-45841-4 Copyright  2004 John Wiley Sons, Inc.
23

24 COGNITIVE INTERVIEWING REVISITED: A USEFUL TECHNIQUE, IN THEORY?
pretest interview,1
and focus on the cognitive processes involved in answer-
ing them. Following Tourangeau (1984), the processes studied are generally
listed as question comprehension, information retrieval, judgment and estimation,
and response.
2.1.1 Definition of Cognitive Interviewing
The cognitive interview can be conceptualized as a modification and expansion
of the usual survey interviewing process (Willis, 1994, 1999). The interview is
conducted by a specially trained cognitive interviewer rather than by a survey field
interviewer, and this interviewer administers questions to a cognitive laboratory
“subject” in place of the usual survey respondent. Further, the cognitive interview
diverges from the field interview through its application of two varieties of verbal
report methods:
1. Think-aloud. The subject is induced to verbalize his or her thinking as he
or she answers the tested questions (Davis and DeMaio, 1993; Bickart and
Felcher, 1996; Bolton and Bronkhorst, 1996). For example, Loftus (1984,
p. 62) provides the following example of think-aloud obtained through
testing of the question In the last 12 months have you been to a dentist?:
“Let’s see . . . I had my teeth cleaned six months ago, and so . . . and then
I had them checked three months ago, and I had a tooth . . . yeah I had a
toothache about March . . . yeah. So yeah, I have.”
2. Verbal probing. After the subject provides an answer to the tested survey
question, the interviewer asks additional probe questions to further elucidate
the subject’s thinking (Belson, 1981; Willis, 1994, 1999). In testing a 12-
month dental visit question, the interviewer might follow up the subject’s
affirmative answer by asking:
“Can you tell me more about the last time you went to a dentist?”
“When was this?”
“Was this for a regular checkup, for a problem, or for some other reason?”
“How sure are you that your last visit was within the past 12 months?”
Despite the overt focus on understanding what people are thinking as they
answer survey questions, it is important to note that a key objective of cognitive
interviewing is not simply to understand the strategies or general approaches that
subjects use to answer the questions, but to detect potential sources of response
1
Various labels have been applied by different authors, but these terms appear to be for the most
part synonymous. A major source of potential confusion is the fact that the term cognitive interview
is also used to refer to a very different procedure used in the justice and legal fields to enhance the
retrieval of memories by crime victims or eyewitnesses (Fisher and Geiselman, 1992).

INTRODUCTION 25
error associated with the targeted questions (Conrad and Blair, 1996). Consider
another example: “In general, would you say your health is excellent, very good,
good, fair, or poor?” From the perspective of Tourangeau’s (1984) model, the
cognitive interviewer assesses whether the subjects are able to (1) comprehend
key terms (e.g., “health in general”; “excellent”) in the way intended by the
designer; (2) retrieve relevant health-oriented information; (3) make decisions or
judgments concerning the reporting of the retrieved information; and (4) respond
by comparing an internal representation of health status to the response cate-
gories offered (e.g., a respondent chooses “good” because that is the best match
to his or her self-assessment of health status). If the investigators determine that
problems related to any of these cognitive operations exist, they enact modifi-
cations in an attempt to address the deficiencies. Pursuing the example above,
cognitive testing results have sometimes indicated that subjects regard physical
and mental/emotional health states to be disparate concepts and have difficulty
combining these into one judgment (see Willis et al., in press). Therefore, the
general health question has for some purposes been decomposed into two sub-
parts: one about physical health, the other concerning mental functioning (e.g.,
Centers for Disease Control and Prevention, 1998).
2.1.2 Variation in Use of Cognitive Interviewing
Cognitive interviewing in questionnaire pretesting is used by a wide variety of
researchers and survey organizations (DeMaio and Rothgeb, 1996; Esposito and
Rothgeb, 1997; Friedenreich et al., 1997; Jobe and Mingay, 1989; Thompson
et al., 2002; U.S. Bureau of the Census, 1998; Willis et al., 1991). However,
core definitions and terminology vary markedly: Some authors (e.g., Conrad and
Blair, 1996) describe cognitive interviewing as synonymous with the think-aloud
method; some (e.g., Willis, 1994, 1999) consider this activity to include both
think-aloud and the use of targeted probing by the interviewer; whereas still
others (Bolton and Bronkhorst, 1996) define cognitive interviewing in terms of
verbal probing, and the “verbal protocol” method as involving solely think-aloud.
There is also considerable variability in practice under the general rubric of cog-
nitive interviewing (Beatty, undated; Forsyth, 1990, Forsyth and Lessler, 1991;
Willis et al., 1999a), and there are no generally accepted shared standards for
carrying out cognitive interviews (Tourangeau et al., 2000). Cognitive interview-
ing is practiced in divergent ways with respect to the nature of verbal probing,
the coding of the collected data, and other features that may be instrumental in
producing varied results (Cosenza, 2002; Forsyth and Lessler, 1991; Tourangeau
et al., 2000, Willis et al., 1999b; Chapter 5, this volume).
The variety of approaches raises questions about the conduct of cognitive
interviews: How important is it to obtain a codable answer (e.g. “good”) versus
a response that consists of the subject’s own words?; Should we direct probes
toward features of the tested question, or request elaboration of the answer given
to that question (Beatty et al., 1996)?; On what basis should the decision to
administer a probe question be made (Conrad and Blair, 2001)?

2.1.3 Determining the Best Procedure
The various approaches to cognitive interviewing may differ in effectiveness,
and at the extreme, it is possible that none of them are of help in detecting and
correcting question flaws. Although several empirical evaluations of these meth-
ods have been conducted (Conrad and Blair, 2001; Cosenza, 2002; Chapter 5,
this volume), attempts at validation have to date been inconclusive (Willis et al.,
1999a), and researchers’ views concerning the types of empirical evidence that
constitute validation vary widely. In particular, criterion validation data for sur-
vey questions (e.g., “true scores” that can be used as outcome quality measures)
are generally rare or nonexistent (Willis et al., 1999b).
In the absence of compelling empirical data, it may be helpful to address
questions concerning procedural effectiveness through explicit consideration of
the role of theory in supporting and guiding interviewing practice (Conrad
and Blair, 1996; Lessler and Forsyth, 1996). Attention to theory leads to two
related questions:
1. First, is cognitive interviewing a theoretically driven exercise? Theory is
often a useful guide to scientific practice, yet Conrad and Blair (1996)
have suggested that “cognitive interviews are not especially grounded in
theory.” Thus, before considering whether theory will help us by providing
procedural guidance, it behooves us to consider the extent to which such
theory in fact exists.
2. To the extent that theories do underlie the practice of cognitive testing,
have these theories been confirmed empirically? If a theory is testable
and found to be supported through appropriate tests, the use of specific
procedures deriving from that theory gains credence. On the other hand,
if the theory appears undeveloped or misapplied, the procedures it spawns
are also suspect.
In this chapter I endeavor to determine whether a review of cognitive theory
can assist us in restricting the field of arguably useful procedural variants of
cognitive interviewing. From this perspective, I examine two theories having an
explicit cognitive emphasis. The first of these, by Ericsson and Simon (1980,
1984, 1993), focuses on memory processes and their implications for the use of
verbal report methods. The second, task analysis theory, strives to develop the
Tourangeau (1984) model into an applicable theory that focuses on staging of
the survey response process.
2.2 REVISITING ERICSSON AND SIMON: THE VALIDITY
OF VERBAL REPORTS
The cognitive perspective on question pretesting has been closely associated
with a seminal Psychological Review paper by Ericsson and Simon (1980), fol-
lowed by an elaborated book and later revision (1984, 1993). Ericsson and Simon

REVISITING ERICSSON AND SIMON: THE VALIDITY OF VERBAL REPORTS 27
reviewed the use of verbal report methods within experimental psychology exper-
iments, and spawned considerable interest among survey methodologists in the
use of think-aloud and protocol analysis. Although these terms are sometimes
used interchangeably, think-aloud pertains to the procedure described above in
which experimental subjects are asked to articulate their thoughts as they engage
in a cognitive task, whereas protocol analysis consists of the subsequent analysis
of a recording of the think-aloud stream (Bickart and Felcher, 1996; Bolton and
Bronkhorst, 1996). Based on the Ericsson–Simon reviews, survey researchers
have made a consistent case for the general application of verbal report meth-
ods to cognitive interviewing, and specifically, for selection of the think-aloud
procedure (Tourangeau et al., 2000).
2.2.1 Ericsson–Simon Memory Theory and Verbal Reporting
Ericsson and Simon were interested primarily in assessing the usefulness of verbal
reports of thinking, given a long tradition of ingrained skepticism by behavior-
ists concerning the ability of people to articulate meaningfully their motivations,
level of awareness, and cognitive processes in general (e.g., Lashley, 1923).
After reviewing an extensive literature, they concluded that verbal reports can be
veridical if they involve information that the subject has available in short-term
memory at the time the report is verbalized. To ground their contentions in theory,
Ericsson and Simon presented a cognitive model (following Newell and Simon,
1972) emphasizing short- and long-term memory (STM, LTM). Of greatest sig-
nificance was the issue of where information is stored: Information contained
in STM during the course of problem solving or other cognitive activity could
purportedly be reported without undue distortion. However, reporting require-
ments involving additional retrieval from LTM, at least in some cases, impose
additional cognitive burdens that may produce biased modes of behavior.
Ericsson and Simon asserted that veridical reporting was possible not only
because pertinent information was available in STM at the time it was reported,
but also because the act of verbalizing this information imposed little additional
processing demand. Further, verbalization does not in itself produce reactivity;
that is, it does not fundamentally alter the “course and structure of the cognitive
processes” (Ericsson and Simon, 1980, p. 235). Reactivity is observed when the
act of verbal reporting produces results at variance from those of a silent control
group; hence Ericsson and Simon emphasized the impact of task instructions
that are found to minimize such effects. Specifically, Ericsson–Simon originally
proposed that two types of self-reports, labeled level 1 and level 2 verbaliza-
tions, require a fairly direct readout of STM contents and are likely to satisfy
the requirements of nonreactive reporting. The distinction between levels 1 and
2 concerns the need to recode nonverbal information into verbal form for the
latter but not the former (verbalization of solution of a word-based puzzle would
involve level 1, and that of a visual–spatial task, level 2).
However, a third type, level 3 verbalization, which involves further expla-
nation, defense, or interpretation, is more likely to produce reactivity. Level 3

verbalization is especially relevant to self-reports involving survey questions, as
many probe questions used by cognitive interviewers (e.g, “How did you come
up with that answer?”) may demand level 3 verbalization rather than simply
the direct output of the cognitive processes that reach consciousness during the
course of question answering.
2.2.2 Support for the Ericsson–Simon Theory
Ericsson and Simon cited numerous research studies demonstrating that when
compared to a silent control condition, subjects providing verbal reports did not
differ in measures such as task accuracy in solving complex puzzles, or in the
solution steps they selected, although there was some evidence that thinking-aloud
resulted in an increase in task completion time (Ericsson and Simon, 1980, 1984,
1993). However, the general debate concerning the efficacy of think-aloud meth-
ods has not been resolved (Crutcher, 1994; Ericsson and Simon, 1993; Nisbett
and Wilson, 1977; Payne, 1994; Wilson, 1994). In brief, theorists have changed
their focus from the general question of whether verbal reports are veridical to
that concerning when they are (Austin and Delaney, 1998; Smith and Miller,
1978), and even Ericsson and Simon have modified their viewpoint as additional
research results have accumulated. Key issues involve specification of: (1) the
types of cognitive tasks that are amenable to verbal report methods, and (2) the
nature of the specific verbal report procedures that are most effective, especially
in terms of instructions and probe administration.
Table 2.1 summarizes major task parameters that have been concluded by
Ericsson and Simon and other investigators to be instrumental in determining
the efficacy of verbal reports; Table 2.2 similarly summarizes key procedural
Table 2.1 Task Variables Cited as Enhancing the Validity of Ericsson–Simon
Verbal Protocols
1. The task involves verbal (as opposed to nonverbal, or spatial) information
(Ericsson and Simon, 1993; Wilson, 1994).
2. The task involves processes that enter into consciousness, as opposed to those
characterized by automatic, nonconscious processing of stimuli (Ericsson and
Simon, 1993; Wilson, 1994).
3. The task is novel, interesting, and engaging, as opposed to boring, familiar, and
redundant and therefore giving rise to automatic processing (Smith and Miller,
1978).
4. The task involves higher-level verbal processes that take more than a few seconds
to perform, but not more than about 10 seconds (Ericsson and Simon, 1993;
Payne, 1994).
5. The task to be performed emphasizes problem solving (Fiske and Ruscher, 1989;
Wilson, 1994).
6. The task involves rules and criteria that people use in decision making (Berl
et al., 1976; Slovic and Lichtenstein, 1971).

REVISITING ERICSSON AND SIMON: THE VALIDITY OF VERBAL REPORTS 29
Table 2.2 Procedural Variables Cited as Enhancing the Validity of Ericsson–Simon
Verbal Reporting Techniques
1. Task timing. As little time as possible passes between a cognitive process and its
reporting, so that information is available in STM (Crutcher, 1994; Kuusela and
Paul, 2000; Smith and Miller, 1978).
2. Instructions. Subjects should be asked to give descriptions of their thinking as
opposed to interpretations or reasons (Austin and Delaney, 1998; Crutcher, 1994;
Ericsson and Simon, 1993; Nisbett and Wilson, 1977; Wilson, 1994).
3. Training. Subjects should be introduced to think-aloud procedures but not
overtrained, to minimize reactivity effects (Ericsson and Simon, 1993).
4. Establishing social context. The procedure should not be conducted as a socially
oriented exercise that explicitly involves the investigator as an actor (Austin and
Delaney, 1998; Ericsson and Simon, 1993).
variables. Again, such results have tended to temper more sweeping generaliza-
tions about the efficacy of self-report; for example, several researchers, including
Ericsson and Simon (1993), have suggested that materials that are inherently
verbal in nature are more amenable to verbal reporting than are nonverbal tasks,
which reflects a modification of Ericsson and Simon’s (1980) view that level 1
(verbal) and level 2 (nonverbal) vocalizations are equally nonreactive. In aggre-
gate, these conclusions reinforce the general notion that verbal reports are useful
for tasks that (1) are at least somewhat interesting to the subject, (2) involve ver-
bal processing of information that enters conscious awareness, and (3) emphasize
problem solving and decision making. Further, the verbal reporting environment
should be arranged in such a way that (1) reports are given during the course of
processing or very soon afterward, (2) social interaction between experimenter
and subject is minimized in order to focus the subject’s attention mainly toward
the task presented, and (3) the instructions limit the degree to which subjects are
induced to speculate on antecedent stimuli that may have influenced or directed
their thinking processes.
2.2.3 Application of Verbal Report Methods to Survey Question Pretesting
The survey researcher is not interested primarily in the general usefulness
of verbal report methods within psychology experiments, but rather, their
applicability to the cognitive testing of survey questions. As such, cognitive
interviewers should adopt verbal report methods only to the degree that any
necessary preconditions are satisfied within this task domain. We must therefore
consider whether cognitive testing either satisfies the conditions presented in
Tables 2.1 and 2.2, or if not, can be determined to provide veridical self-reports
when these conditions are violated. Somewhat surprisingly, these issues have
rarely been addressed. To the extent that Ericsson–Simon advocated the general
use of verbal report methods, cognitive interviewers have largely assumed that
use of these methods to evaluate survey questionnaires represents a natural

Discovering Diverse Content Through
Random Scribd Documents

An Arab Type of Convict. A combination of Ideality and
Homicidal Mania.
Out of the depth and width of his experience the Commandant agreed with
me, and then I was photographed. There was no artistic posing or anything of
that sort. I was planted on a chair with my back straightened up and my head
in a vice such as other photographers were once wont to torture their victims
with. The camera was brought within three feet of me. I was taken full face,
staring straight into the lens, and then I was taken en profile. When, many
weeks afterwards, I showed the result to my wife, she was sorry I ever went;
but for all that it’s a good likeness.

By the time the negatives were developed, and I had satisfied the
Commandant that certain black spots which the pitiless lens had detected
under my skin were the result of a disease I had contracted years before in
South America, and not premonitory symptoms of the plague, it was
breakfast-time, and I went down to the canteen, where I found convicts
buying wine and cigarettes, and generally conducting themselves like
gentlemen at large.
I did not see the Commandant again that day, save for a few minutes after
lunch, when he told me that he had an appointment at the Direction in
Noumea, and placed me in charge of his lieutenant, the Chief Surveillant. The
Chef was a very jolly fellow, as, indeed, I found most of these officials to be,
and during our drives about the island, we chatted with the utmost freedom.
As a matter of fact, it was he who gave me the description of the execution
which I reproduced in the last chapter.
He, too, was entirely of the same opinion as myself as to the pitiless iniquity
of the dark cell; but he took some pains to point out that it was not the fault
either of the French Government or of the Administration, but simply of
certain politicians in France who wanted a “cry,” and got up a crusade among
the sentimentalists against “the brutality of flogging bound and helpless
prisoners far away from all civilised criticism in New Caledonia.” Some of these
men, too, as I have said, were déportés, or exiled communards who had been
forgiven, and had brought back batches of stories with them as blood-curdling
as they were mendacious.
“Bien, monsieur,” he said. “You have seen the Cachot Noir. Now we will go
to the Disciplinary Camp first, because it is on the road, and then—well, you
shall see what the cachot does, and when you see that I think you will say the
lash is kinder.”
The Disciplinary Camps in New Caledonia have no counterpart in the
English penal system. “Incorrigibles,” who won’t work, who are insubordinate,
or have a bad influence on their comrades of the Bagne, are sent into them
partly for punishment and partly for seclusion.

The Courtyard of a Disciplinary Camp, Ile Nou. Inspection at 5 a.m. after breakfast, and before
hard labour. To the right is a Kanaka “Policeman.” The average physique of the Criminals may
be seen by comparison with myself, standing in front of the Kanaka.
They have poorer food and harder work, no “gratifications” in the way of
wine or tobacco, or other little luxuries. They sleep on plank-beds with their
feet in anklets, and, if they don’t behave themselves, they are promptly
clapped into a cell for so many days’ solitary confinement on bread and water.
For graver offences they are, of course, sent back to the central prison as
hopeless cases, after which their own case is usually hopeless for life.
I found several of the men in this camp working in chains. This was another
subject about which the sentimentalists made a good deal of fuss in France,
but when I saw what the alleged chains really were, I laughed, and said to my
friend the Chef:
“So that is what you call chains in New Caledonia, is it? May I have a look
at one?”
He beckoned to one of the men to come up, and this is what I found: There
was an iron band riveted round his right ankle, and to this was attached a
chain which, as nearly as I could calculate with my hands, weighed about six
pounds. It was as absolutely no inconvenience to its wearer, when he was
either sitting or lying down. When he was walking or working he tucked the
end in under his belt, and, as far as I could see, it didn’t make any difference

to his walk, save a little dragging of the foot. In fact, when I asked him
whether it was any trouble to him, he said:
“No, not after a few days. One gets accustomed to it.”
“Very likely!” I said. “If you got the chains in an English prison, you would
have them on both legs and arms, and you wouldn’t be able to take more
than a half-stride.”
“Ah, they are brutal, those English!” said the scoundrel, with a shrug of his
shoulders, as he tucked the end of his chain round his belt and sauntered
away.
The chain is usually a punishment for gross insubordination or attempted
escape. This man, the Chef told me, had tried three times with the chain on,
and once had used the loose end to hammer a warder with, for which he got
twelve months’ Cachot Noir and the chain for life—and a little more, since he
would be buried in it.[2]
Then, after I had made the round of the cells, I was taken to a very curious
punishment-chamber which is in great vogue in New Caledonia. In one sense
it reminded me of our treadwheel, though it is not by any means so severe. I
have seen a strong man reduced almost to fainting by fifteen minutes on a
treadwheel. Nothing like this could happen in the Salle des Pas Perdus, as I
christened the place when its use had been explained to me.
Here, after a brief and scanty meal at 4.30 a.m., the convicts are lined up in
a big room, or, rather, shed, about sixty feet long by forty feet broad. There is
absolutely no furniture in the place, with the exception of a dozen flat-topped
pyramids of stone placed in straight lines about ten feet from each side.
If there are twenty-four convicts condemned to this particular kind of
weariness, twenty-four are taken in, in single file. Then the word “March!” is
given, and they begin. Hour after hour the dreary round-and-round is
continued in absolute silence. Every half-hour they are allowed to sit on the
pyramids for a couple of minutes, and then on again. At eleven the bell rings
for soupe, which, in the Camp Disciplinaire, resolves itself into hot water and
fat with a piece of bread. In the other camps the bell doesn’t go again till one,
but these have only their half-hour, and then the promenade begins again,
and continues till sunset.
I was assured that those who could stand a week of this with the chain did
feel its weight, and I don’t wonder at it, for a more miserable, weary, limping,
draggle-footed crowd of scoundrels I never saw in all my life than I watched
that day perambulating round the Hall of Lost Footsteps.

From here we drove across to the western side of the island, and presently
came to a magnificent sloping avenue of palm-trees.
“The avenue of the hospital,” said the Chef. “Now you will see the best and
the worst of Ile Nou.”
And so it was. We drove down the avenue to a white, heavy stone arch,
which reminded me somewhat quaintly of the entrance gates of some of the
old Spanish haciendas I had seen up-country in Peru. Inside was a vast, shady
garden, brilliant with flowers whose heavy scent was pleasantly tempered by
the sweet, cool breeze from the Pacific; for the eastern wall of the whole
enclosure was washed by the emerald waters of the Lagoon.
The Avenue of Palms, leading to the Hospital, Ile Nou.
In the midst of this garden stood the hospital, built in quadrangular form,
but with one side of each “quad” open to the garden. The houses were raised
on stone platforms something like the stoep of a Dutch house, and over these
the roofs came down in broad verandahs. Grey-clad figures were sitting or
lying about on the flags underneath, a few reading or doing some trifling
work, and others were wandering about the garden or sleeping in some shady
nook. It was, in short, very different from the central prison and the
disciplinary camp.

I was introduced to the Medical Director, and he showed me round, omitting
one wing, in which he told me there were a couple of cases of plague. I
happened to know that there were really about a dozen, so I readily agreed
that that part should be left out.
As prison hospital, it differed very little from others that I had seen in
England. There was the same neatness and exquisite cleanliness everywhere,
though the wards were somewhat darker, and therefore cooler, which, with
the midday sun at 106° in the shade, was not a bad thing. All the nurses
were, of course, Sisters of Mercy.[3] In fact, practically all the nursing in New
Caledonia is done by Sisters, and not a few of these heroic women had
become brides of the Black Death before I left.
Here, as in all other prison hospitals I have visited, diet, stimulants, and
medicine are absolutely at the discretion of the Director. No matter what the
cost, the spark of life must be kept alive as long as possible in the breast of
the murderer, the forger, and the thief, or the criminal whose light of reason
has already been quenched in the darkness of the Black Cell.
In fact, so careful are the authorities of their patients’ general health that
they give them nothing in the way of meat but the best beef and mutton that
can be imported from Australia; Caledonian fed meat is not considered
nourishing enough. In normal times the death-rate of Ile Nou, which is wholly
given over to convict camps, is two or three per cent. lower than that of the
town of Noumea.

Part of the Hospital Buildings, Ile Nou. The roofed-terrace in front is where the patients take
their siesta in the middle of the day. One of these is attached to each court of the Hospital.
Some of the mattresses may be seen to the right.
Then from this little flowery paradise of rest and quietness we went across
the road to another enclosure in which there were two long, white buildings, a
prison and a row of offices, at right angles to each other. This was the “bad”
side. On the other there had been invalids and invalid lunatics; here there
were only lunatics, and mostly dangerous at that—men who, after being
criminals, had become madmen; not like the dwellers in Broadmoor, who are
only criminal because they are mad.
I once paid a visit to the worst part of the men’s side at Broadmoor, but I
don’t think it was quite as bad as the long corridor which led through that
gruesome home of madness. On either hand were heavy black-painted, iron
doors, and inside these a hinged grating through which the prisoner could be
fed.
The cells were about nine feet by six feet. They had neither furniture nor
bedclothes in them. The furniture would have been smashed up either in
sheer wanton destruction or for use as missiles to hurl through the grating,
and the bedclothes would have been torn up into strips for hanging or
strangling purposes.

It has been my good or bad luck to see poor humanity in a good many
shapes and guises, but I never saw such a series of pitiful parodies of
manhood as I saw when those cell doors were opened.
Some were crouched down in the corners of their cells, muttering to
themselves and picking the sacking in which they were clothed to pieces,
thread by thread. It was no use giving them regular prison clothing, for they
would pick themselves naked in a couple of days. Others were walking up and
down the narrow limits of their cells, staring with horribly vacant eyes at the
roof or the floor, and not taking the slightest notice of us.
One man was lying down scraping with bleeding fingers at the black
asphalted floor under the impression that he was burrowing his way to
freedom; others were sitting or lying on the floor motionless as death; and
others sprang at the bars like wild beasts the moment the door was opened.
But the most horrible sight I saw during that very bad quarter of an hour
was a gaunt-faced, square-built man of middle-height who got up out of a
corner as his cell door opened, and stood in the middle facing us.
He never moved a muscle, or winked an eyelid. His eyes looked at us with
the steady, burning stare of hate and ferocity. His lips were drawn back from
his teeth like the lips of an ape in a rage, and his hands were half clenched
like claws. The man was simply the incarnation of madness, savagery, and
despair. He had gone mad in the Black Cell, and the form that his madness
had taken was the belief that nothing would nourish him but human flesh. Of
course he had to be fed by force.
When we got outside a big warder pulled up his jumper and showed me the
marks of two rows of human teeth in his side. If another man hadn’t stunned
the poor wretch with the butt of his revolver he would have bitten the piece
clean out—after which I was glad when the Doctor suggested that I should go
to his quarters and have a drink with him.
V
A CONVICT ARCADIA
I visited two or three other industrial camps and the farm-settlements
before I left Ile Nou, but as I had yet to go through the agricultural portions
of the colony it would be no use taking up space in describing them here.

There are practically no roads to speak of in New Caledonia outside a short
strip of the south-western coast. In September, 1863, Napoleon the Little
signed the decree which converted the virgin paradise of New Caledonia into a
hell of vice and misery—a description which is perhaps somewhat strong, but
which history has amply justified. In the following year the transport Iphigénie
took a cargo of two hundred and forty-eight galley-slaves from Toulon and
landed them where the town of Noumea now stands. This consignment was
added to by rapidly following transports, and for thirty years at least the
administration of New Caledonia has had at its disposal an average of from
seven to ten thousand able-bodied criminals for purposes of general
improvement, and more especially for the preparation of the colony for that
free colonisation which has been the dream of so many ministers and
governors.
Now the area of New Caledonia is, roughly speaking, between six thousand
and seven thousand square miles, and after an occupation of nearly forty
years it has barely fifty miles of roads over which a two-wheeled vehicle can
be driven, and these are only on the south-western side of the island.
The only one of any consequence is that running from Noumea to
Bouloupari, a distance of about thirty miles. At Bourail, which is the great
agricultural settlement, there are about twelve miles of road and a long ago
abandoned railway bed. Between La Foa and Moindou there is another road
about as long; but both are isolated by miles of mountain and bush from each
other and are therefore of very little general use.
One has only to contrast them with the magnificent coach roads made in a
much shorter space of time through the far more difficult Blue Mountain
district in New South Wales to see the tremendous difference between the
British and the French ideas of colonisation, to say nothing of the railways—
two thousand seven hundred miles—and thirty-three thousand miles of
telegraph lines.
The result of this scarcity of roads and absolute absence of railways is that
when you want to go from anywhere to anywhere else in New Caledonia you
have to take the Service des Côtes, which for dirt, discomfort, slowness, and
total disregard of the convenience of passengers I can only compare to the
Amalgamated Crawlers presently known as the South-Eastern and Chatham
Railways. Like them, it is, of course, a monopoly, wherefore if you don’t like to
go by the boats you can either swim or walk.

The Island of “Le Sphinx,” one of the tying-up places on the south-west coast of New
Caledonia.
The whole of New Caledonia is surrounded by a double line of exceedingly
dangerous reefs, cut here and there by “passes,” one of which Captain Cook
failed to find, and so lost us one of the richest islands in the world. The
navigable water both inside and outside the reefs is plentifully dotted with tiny
coral islands and sunken reefs a yard or so below the surface and always
growing, hence navigation is only possible between sunrise and sunset. There
is only one lighthouse in all Caledonia.
Thus, when I began to make my arrangements for going to Bourail, I found
that I should have to be on the wharf at the unholy hour of 4.30 a.m. I
packed my scanty belongings overnight. At 4.15 the cab was at the door. The
cochers of Noumea either work in relays or never go to sleep. I was just
getting awake, and the gorged mosquitoes were still sleeping. I dressed and
drank my coffee to the accompaniment of considerable language which
greatly amused the copper-skinned damsel who brought the coffee up. She
also never seemed to sleep.
Somehow I got down to the wharf, and presented myself at the douannerie
with my “Certificat de Santé,” which I had got from the hospital the previous
evening. The doctor in charge gave me a look over, and countersigned it.
Then I went with my luggage into an outer chamber. My bag and camera-

cases were squirted with phenic acid from a machine which looked like a cross
between a garden hose and a bicycle foot-pump. Then I had to unbutton my
jacket, and go through the same process. The rest of the passengers did the
same, and then we started in a strongly smelling line for the steamer.
As we went on board we gave up our bills of health, after which we were
not permitted to land again under penalty of forfeiting the passage and being
disinfected again. Our luggage now bore yellow labels bearing the legend,
“colis désinfecté,” signed by the medical inspector. These were passed on to
the ships by Kanakas, who freely went and came, and passed things to and
from the ship without hindrance. As Kanakas are generally supposed to be
much better carriers of the plague than white people, our own examination
and squirting seemed a trifle superfluous.
The steamer was the St. Antoine, which may be described as the Campania
of the Service des Côtes. Until I made passages on one of her sister-ships—to
be hereafter anathematised—I didn’t know how bad a French colonial
passenger-boat could be. Afterwards I looked back to her with profound
regret and a certain amount of respect; wherefore I will not say all that I
thought of her during the eleven hours that she took to struggle over the
sixty-odd miles from Noumea to Bourail.
There is no landing-place at the port of Bourail, save for boats, so, after the
usual medical inspection was over and I had made myself known to the
doctor, I went ashore in his boat. The Commandant was waiting on the shore
with his carriage. I presented my credentials, and then came the usual
consommations, which, being literally interpreted, is French for mixed drinks,
after which we drove off to the town of Bourail, eight kilometres away. As we
were driving down the tree-arched road I noticed half a dozen horsewomen
seated astride à la Mexicaine, with gaily coloured skirts flowing behind.
“Ah,” I said, “do your ladies here ride South American fashion?”
“My dear sir,” he replied, “those are not ladies. They are daughters of
convicts, born here in Bourail, and reared under the care of our paternal
government! But that is all stopped now, later on you will see why.”
“Yes,” I said, “I have heard that you have given up trying to make good
colonists out of convict stock.”
“Yes,” he replied; “and none too soon, as you will see.”
From which remark I saw that I had to do with a sensible man, so I
straightway began to win his good graces by telling him stories of distant
lands, for he was more of a Fleming than a Frenchman, and was therefore

able to rise to the conception that there are other countries in the world
besides France.
I found Bourail a pretty little township, consisting of one street and a
square, in the midst of which stood the church, and by dinner-time I found
myself installed in a little hotel which was far cleaner and more comfortable
than anything I had seen in Noumea, except the club. When I said good-night
to the Commandant, he replied:
“Good-night, and sleep well. You needn’t trouble to lock your door. We are
all criminals here, but there is no crime.”
Which I subsequently found to be perfectly true.
Everything in New Caledonia begins between five and half-past, unless you
happen to be starting by a steamer, and then it’s earlier. My visit to Bourail
happened to coincide with a governmental inspection, and early coffee was
ordered for five o’clock. That meant that one had to get tubbed, shaved, and
dressed, and find one’s boots a little before five. Bar the Black Death, I
disliked New Caledonia mostly on account of its early hours. No civilised
persons, with the exception of milkmen and criminals under sentence of
death, ought to be obliged to get up before nine.
Still, there was only one bath in the place, and I wanted to be first at it, so
I left my blind up, and the sun awoke me.
I got out of bed and went on to the balcony, and well was I rewarded even
for getting up at such an unrighteous hour. The night before it had been
cloudy and misty, but now I discovered with my first glance from the
verandah that I had wandered into something very like a paradise.
I saw that Bourail stood on the slope of a range of hills, and looked out
over a fertile valley which was dominated by a much higher range to the
north-east. The sun wasn’t quite up, and neither were the officers of the
Commission, so I went for my bath. There were no mosquitos in Bourail just
then, and I had enjoyed for once the luxury of an undisturbed sleep. The
water, coming from the hills, was delightfully cool, and I came back feeling, as
they say between New York and San Francisco, real good.
The Commission, for some reason or other, did not get up before breakfast-
time (11.30), and so we got a good start of them. The Commandant had the
carriage round by six o’clock, and, after the usual consommations, we got
away. It was a lovely morning, the only one of the sort I saw in Bourail, for
the next day the clouds gathered and the heavens opened, and down came

the floods and made everything but wading and swimming impossible; but
this was a day of sheer delight and great interest.
We drove over the scene of a great experiment which, I fear, is destined to
fail badly. The province of Bourail is the most fertile in all Caledonia,
wherefore in the year 1869 it was chosen by the paternal French Government
as the Arcadia of the Redeemed Criminal. The Arcadia is undoubtedly there,
the existence of the redeemed criminal struck me as a little doubtful.
As soon as we got under way I reverted to the young ladies we had seen
on horseback the evening before.
A Native Temple, New Caledonia.

“You shall see the houses of their parents,” said the Commandant; “and
afterwards you will see the school where the younger ones are being
educated. For example,” he went on, pointing down the street we were just
crossing, “all those shops and little stores are kept by people who have been
convicts, and most of them are doing a thriving trade. Yonder,” he said,
waving his hand to the right, “is the convicts’ general store, the Syndicat de
Bourail. It was founded by a convict, the staff are convicts, and the customers
must be convicts. It is what you would call in English a Convict Co-operative
Store. It is managed by scoundrels of all kinds, assassins, thieves, forgers,
and others. I have to examine the books every three months, and there is
never a centime wrong. That is more than most of the great establishments in
Sydney could say, is it not?”
I made a non-committal reply, and said:
“Set a thief to catch a thief, or watch him.”
“Exactly! There is no other business concern in Caledonia which is managed
with such absolute honesty as this is. I should be sorry for the man who tried
to cheat the management.”
I knew enough of Caledonian society by this time to see that it would not
be good manners to press the question any further. Afterwards I had an
interview with the manager of the syndicate, an estimable and excellently
conducted forger, who had gained his rémission and was doing exceedingly
well for himself and his wife, who, I believe, had blinded somebody with
vitriol, and was suspected of dropping her child into the Seine.
He presented me with a prospectus of the company, which showed that it
had started with a government loan of a few hundred francs, and now had a
reserve fund of nearly forty thousand francs. He was a patient, quiet-spoken,
hard-working man who never let a centime go wrong, and increased his
personal profits by selling liquors at the back door.
Our route lay across the broad valley which is watered by the River Nera.
On either side the ground rose gently into little hillocks better described by
the French word collines and on each of these, usually surrounded by a grove
of young palms and a dozen acres or so of vineyards, orchards, manioc,
plantain, or maize, stood a low, broad-verandahed house, the residence of the
redeemed criminal.
I could well have imagined myself driving through a thriving little colony of
freemen in some pleasant tropical island upon which the curse of crime had
never descended, and I said so to the Commandant.

“Yes,” he said, “it looks so, doesn’t it? Now, you see that house up there to
the left, with the pretty garden in front. The man who owns that concession
was a hopeless scoundrel in France. He finished up by murdering his wife
after he had lived for years on the wages of her shame. Of course, the jury
found extenuating circumstances. He was transported for life, behaved
himself excellently, and in about seven years became a concessionnaire.
“He married a woman who had poisoned her husband. They have lived
quite happily together, and bring up their children most respectably.”
I was too busy thinking to reply, and he went on, pointing to the right:
“Then, again, up there to the right—that pretty house on the hill
surrounded by palms. The man who owns that was once a cashier at the Bank
of France. He was a ‘faussaire de première classe,’ and he swindled the bank
out of three millions of francs before they found him out. He was sent here for
twenty years. After eight he was given a concession and his wife and family
voluntarily came out to him. You see, nothing was possible for the wife and
children of a convict forger in Paris. Here they live happily on their little
estate. No one can throw stones at them, and when they die the estate will
belong to their children.”
“That certainly seems an improvement on our own system,” I said,
remembering the piteous stories I had heard of the wives and families of
English convicts, ruined through no fault of their own, and with nothing to
hope for save the return of a felon husband and father into a world where it
was almost impossible for him to live honestly.
“Yes,” he said; “I think so. Now, as we turn the corner you will see the
house of one of our most successful colonists. There,” he said, as the
wagonette swung round into a delightful little valley, “that house on the
hillside, with the white fence round it, and the other buildings to the side. The
owner of that place was a thief, a forger, and an assassin in Paris. He stole
some bonds, and forged the coupons. He gave some of the money to his
mistress, and found her giving it to some one else, so he stabbed her, and
was sent here for life.
“He got his concession, and married a woman who had been sent out for
infanticide, as most of them are here. If not that, it is generally poison. Well,
now he is a respectable colonist and a prosperous farmer. He has about forty
acres of ground well cultivated, as you see. He has thirty head of cattle and a
dozen horses, mares, and foals, to say nothing of his cocks and hens and
pigs. He supplies nearly the whole of the district with milk, butter, and eggs,

and makes a profit of several thousand francs a year. I wish they were all like
that!” he concluded, with a little sigh which meant a good deal.
“I wish we could do something like that with our hard cases,” I replied,
“instead of turning them out into the streets to commit more crimes and
beget more criminals. We know that crime is a contagious as well as an
hereditary disease, and we not only allow it to spread, but we even encourage
it as if we liked it.”
“It is a pity,” he said sympathetically, “for you have plenty of islands where
you might have colonies like this. You do not need to punish them. Remove
them, as you would remove a cancer or a tumour, and see that they do not
come back. That is all. Society would be better, and so would they.”
I could not but agree with this since every turn of the road brought us to
fresh proofs of the present success of the system, and then I asked again:
“But how do these people get their first start? One can’t begin farming like
this without capital.”
“Oh no,” he said, “the Government does that. For the first few years,
according to the industry and ability of the settler, these people cost us about
forty pounds a year each, about what you told me it costs you to keep a
criminal in prison. We give them materials for building their houses, tools, and
agricultural implements, six months’ provisions, and seed for their first
harvest. After that they are left to themselves.
“If they cannot make their farm pay within five years or so they lose
everything; the children are sent to the convent, and the husband and wife
must hire themselves out as servants either to other settlers or to free people.
If they do succeed the land becomes absolutely theirs in ten years. If they
have children they can leave it to them, or, if they prefer, they can sell it.
“Some, for instance, have got their rehabilitation, their pardon, and
restoration of civil rights. They have sold their farms and stock and gone back
to France to live comfortably. Their children are, of course, free, though the
parents may not leave the colony without rehabilitation. After breakfast I will
take you down the street of Bourail, and introduce you to some who have
done well in trade, and to-morrow or next day you can see what we do with
the children.”

VI
SOME HUMAN DOCUMENTS
Society in Bourail, although in one sense fairly homogeneous, is from
another point of view distinctly mixed. Here, for example, are a few personal
items which I picked up during our stroll down the main and one street of the
village.
First we turned into a little saddler’s shop, the owner of which once boasted
the privilege of making the harness for Victor Emmanuel’s horses.
Unfortunately his exuberant abilities were not content even with such
distinction as this, and so he deviated into coining, with the result of hard
labour for life. After a few years his good conduct gained him a remission of
his sentence, and in due course he became a concessionnaire. His wife, who
joined him after his release, is one of the aristocrats of this stratum of
Bourailian society.
Permit to visit a Prison or Penitentiary Camp en détail. This is the ordinary form; but the
Author is the only Englishman for whom the words in the left-hand corner were crossed out.

There is quite a little romance connected with this estimable family. When
Madame came out she brought her two daughters with her. Now the elder of
these had been engaged to a young man employed at the Ministry of
Colonies, and he entered the colonial service by accepting a clerkship at
Noumea. The result was naturally a meeting, and the fulfilment of the proverb
which says that an old coal is easily rekindled. The engagement broken off by
the conviction was renewed, and the wedding followed in due course. The
second daughter married a prosperous concessionnaire, and the ex-coiner,
well established, and making plenty of properly minted money, has the
satisfaction of seeing the second generation of his blood growing up in peace
and plenty about him. Imagine such a story as this being true of an English
coiner!
A little further on, on the left hand side, is a little lending library, and
cabinet de lecture. This is kept by a very grave and dignified-looking man,
clean-shaven, and keen-featured, and with the manners of a French
Chesterfield. “That man’s a lawyer,” I said to the Commandant, as we left the
library. “What is he doing here?”
“You are right. At least, he was a lawyer once, doing well, and married to a
very nice woman; but he chose to make himself a widower, and that’s why
he’s here. The old story, you know.”
Next door was a barber’s shop kept by a most gentle-handed housebreaker.
He calls himself a “capillary artist,” shaves the officials and gendarmerie, cuts
the hair of the concessionnaires, and sells perfumes and soaps to their wives
and daughters. He also is doing well.
A few doors away from him a liberé has an establishment which in a way
represents the art and literature of Bourail. He began with ten years for
forgery and embezzlement. Now he takes photographs and edits, and, I
believe, also writes the Bourail Indépendent. As a newspaper for ex-convicts
and their keepers, the title struck me as somewhat humorous.
Nearly all branches of trade were represented in that little street. But these
may be taken as fairly representative samples of the life-history of those who
run them. First, crime at home; then transportation and punishment; and then
the effort to redeem, made in perfect good faith by the Government, and, so
far as these particular camps and settlements are concerned, with distinct
success in the present.
Unhappily, however, the Government is finding out already that free and
bond colonists will not mix. They will not even live side by side, wherefore
either the whole system of concessions must be given up, or the idea of

colonising one of the richest islands in the world with French peasants,
artisans, and tradesmen must be abandoned.
Later on in the afternoon we visited the Convent, which is now simply a
girls’-school under the charge of the Sisters of St. Joseph de Cluny. A few
years ago this convent was perhaps the most extraordinary matrimonial
agency that ever existed on the face of the earth. In those days it was
officially styled, “House of Correction for Females.” The sisters had charge of
between seventy and eighty female convicts, to some of whom I shall be able
to introduce you later on in the Isle of Pines, and from among these the
bachelor or widower convict, who had obtained his provisional release and a
concession, was entitled to choose a bride to be his helpmeet on his new start
in life. The method of courtship was not exactly what we are accustomed to
consider as the fruition of love’s young or even middle-aged dream.
The Kiosk in which the Convict Courtships were conducted at Bourail.
After Mass on a particular Sunday the prospective bridegroom was
introduced to a selection of marriageable ladies, young and otherwise. Of
beauty there was not much, nor did it count for much. What the convict-
cultivator wanted, as a rule, was someone who could help him to till his fields,
look after live-stock, and get in his harvests.

When he had made his first selection the lady was asked if she was
agreeable to make his further acquaintance. As a rule, she consented,
because marriage meant release from durance vile. After that came the
queerest courtship imaginable.
About fifty feet away from the postern door at the side of the Convent there
still stands a little octagonal kiosk of open trellis-work, which is completely
overlooked by the window of the Mother Superior’s room. Here each Sunday
afternoon the pair met to get acquainted with each other and discuss
prospects.
Meanwhile the Mother Superior sat at her window, too far away to be able
to hear the soft nothings which might or might not pass between the lovers,
but near enough to see that both behaved themselves. Along a path, which
cuts the only approach to the kiosk, a surveillant marched, revolver on hip
and eye on the kiosk ready to respond to any warning signal from the Mother
Superior.
As a rule three Sundays sufficed to bring matters to a happy
consummation. The high contracting parties declared themselves satisfied
with each other, and the wedding day was fixed, not by themselves, but by
arrangement between those who had charge of them.
Sometimes as many as a dozen couples would be turned off together at the
mairie, and then in the little church at the top of the market-place touching
homilies would be delivered by the good old curé on the obvious subject of
repentance and reform. A sort of general wedding feast was arranged at the
expense of the paternal Government, and then the wedded assassins, forgers,
coiners, poisoners, and child-murderers went to the homes in which their new
life was to begin.
This is perhaps the most daring experiment in criminology that has ever
been made. The Administration claimed success for it on the ground that
none of the children of such marriages have ever been convicted of an
offence against the law. Nevertheless, the Government have most wisely put a
stop to this revolting parody on the most sacred of human institutions, and
now wife-murderers may no longer marry poisoners or infanticides with full
liberty to reproduce their species and have them educated by the State, to
afterwards take their place as free citizens of the colony.
The next day we drove out to the College of the Marist Brothers. It is really
a sort of agricultural school, in which from seventy to eighty sons of convict
parents are taught the rudiments of learning and religion and the elements of
agriculture.

During a conversation with the Brother Superior I stumbled upon a very
curious and entirely French contradiction. I had noticed that families in New
Caledonia were, as a rule, much larger than in France, and I asked if these
were all the boys belonging to the concessionnaires of Bourail.
“Oh no!” he replied; “but, then, you see, we have no power to compel their
attendance here. We can only persuade the parents to let them come.”
“But,” I said, “I understood that primary education was compulsory here as
it is in France.”
“For the children of free people, yes,” he replied regretfully, and with a very
soft touch of sarcasm, “but for these, no. The Administration has too much
regard for the sanctity of parental authority.”
When the boys were lined up before us in the playground I saw about
seventy-six separate and distinct reasons for the abolition of convict
marriages. On every face and form were stamped the unmistakable brands of
criminality, imbecility, moral crookedness, and general degeneration, not all on
each one, but there were none without some.
Later on I started them racing and wrestling, scrambling and tree-climbing
for pennies. They behaved just like monkeys with a dash of tiger in them, and
I came away more convinced than ever that crime is a hereditary disease
which can finally be cured only by the perpetual celibacy of the criminal. Yet
in Bourail it is held for a good thing and an example of official wisdom that
the children of convicts and of freemen shall sit side by side in the schools
and play together in the playgrounds.

Berezowski, the Polish Anarchist who attempted to murder
Napoleon III. and the Tsar Alexander II. in the Champs
Elysées. All Criminals in New Caledonia are photographed in
every possible hirsute disguise; and finally cropped and
clean shaven.
By permission of C. Arthur Pearson, Ltd.
On our way home I was introduced to one of the most picturesque and
interesting characters that I met in the colony. We pulled up at the top of a
hill. On the right hand stood a rude cabin of mud and wattles thatched with
palm-leaves, and out of this came to greet us a strange, half-savage figure,

long-haired, long-bearded, hairy almost as a monkey on arms and legs and
breast, but still with mild and intelligent features, and rather soft brown eyes,
in which I soon found the shifting light of insanity.
Acting on a hint the Commandant had already given me, I got out and
shook hands with this ragged, shaggy creature, who looked much more like a
man who had been marooned for years on a far-away Pacific Island, than an
inhabitant of this trim, orderly Penal Settlement. I introduced myself as a
messenger from the Queen of England, who had come out for the purpose of
presenting her compliments and inquiring after his health.
This was the Pole Berezowski, who more than thirty years ago fired a
couple of shots into the carriage in which Napoleon III. and Alexander II.
were driving up the Champs Elysées. He is perfectly harmless and well-
behaved; quite contented, too, living on his little patch and in a world of
dreams, believing that every foreigner who comes to Bourail is a messenger
from some of the crowned heads of Europe, who has crossed the world to
inquire after his welfare. Through me he sent a most courteous message to
the Queen, which I did not have the honour of delivering.
That night the storm-clouds came over the mountains in good earnest, and
I was forced to abandon my intention of returning to Noumea by road, since
the said road would in a few hours be for the most part a collection of
torrents, practically impassible, to say nothing of the possibility of a cyclone.
There was nothing more to be seen or done, so I accepted the Commandant’s
offer to drive me back to the port.
On the way he told me an interesting fact and an anecdote, both of which
throw considerable light upon the convict’s opinion of the settlement of
Bourail.
The fact was this: There are in New Caledonia a class of convicts who
would be hard to find anywhere else. These are voluntary convicts, and they
are all women. A woman commits a crime in France and suffers imprisonment
for it. On her release she finds herself, as in England, a social outcast, with no
means of gaining a decent living. Instead of continuing a career of crime, as is
usually the case here, some of these women will lay their case before the
Correctional Tribunal, and petition to be transported to New Caledonia, where
they will find themselves in a society which has no right to point the finger of
scorn at them.
As a rule the petition is granted, plus a free passage, unless the woman has
friends who can pay. Generally the experiment turns out a success. The
woman gets into service or a business, or perhaps marries a liberé or

concessionnaire, and so wins her way back not only to respectability as it
goes in Caledonia, but sometimes to comfort and the possession of property
which she can leave to her children.
As a matter of fact, the proprietress of the little hotel at the port was one of
these women. She had come out with a few hundred francs that her friends
had subscribed. She now owns the hotel, which does an excellent business, a
freehold estate of thirty or forty acres, and she employs fifteen Kanakas, half
a dozen convicts, and a Chinaman—who is her husband, and works harder
than any of them.
The anecdote hinged somewhat closely on the fact, and was itself a fact.
There is a weekly market at Bourail, to which the convict farmers bring their
produce and such cows, horses, calves, etc., as they have to sell. Every two or
three years their industry is stimulated and rewarded by the holding of an
agricultural exhibition, and, as a rule, the Governor goes over to distribute the
prizes. One of these exhibitions had been held, I regret to say, a short time
before my arrival, and the Governor who has the work of colonisation very
seriously at heart, made speeches both appropriate and affecting to the
various winners as they came to receive their prizes.
At length a hoary old scoundrel, who had developed into a most successful
stock-breeder, and had become quite a man of means, came up to receive his
prizes from his Excellency’s hands. M. Feuillet, as usual, made a very nice little
speech, congratulating him on the change in his fortunes, which, by the help
of a paternal government, had transformed him from a common thief and
vagabond to an honest and prosperous owner of property.
So well did his words go home that there were tears in the eyes of the
reformed reprobate when he had finished, but there were many lips in the
audience trying hard not to smile when he replied:
“Ah, oui, mon Gouverneur! if I had only known what good chances an
unfortunate man has here I would have been here ten years before.”
What his Excellency really thought on the subject is not recorded.
The hotel was crowded that night for the steamer was to sail for Noumea,
as usual, at five o’clock in the morning; but as Madame was busy she was
kind enough to give up her own chamber to me; and so I slept comfortably to
the accompaniment of a perfect bombardment of water on the corrugated
iron roof. Others spread themselves on tables and floors as best they could,
and paid for accommodation all the same.

By four o’clock one of those magical tropic changes had occurred, and
when I turned out the moon was dropping over the hills to the westward, and
Aurora was hanging like a huge white diamond in a cloudless eastern sky. The
air was sweetly fresh and cool. There were no mosquitos, and altogether it
was a good thing to be alive, for the time being at least.
Soon after the little convict camp at the port woke up. We had our early
coffee, with a dash of something to keep the cold out, and I made an early
breakfast on tinned beef and bread—convict rations—and both very good for
a hungry man. Then came the news that the steamboat La France had tied up
at another port to the northward on account of the storm, and would not put
in an appearance until night, which made a day and another night to wait, as
the coast navigation is only possible in daylight.
I naturally said things about getting up at four o’clock for nothing more
than a day’s compulsory loafing, but I got through the day somehow with the
aid of some fishing and yarning with the surveillants and the convicts, one of
whom, a very intelligent Arab, told me, with quiet pride, the story of his
escape from New Caledonia twelve years before.
He had got to Australia in an open boat, with a pair of oars, the branch of a
tree for a mast and a shirt for a sail. He made his way to Europe, roamed the
Mediterranean as a sailor for nine years, and then, at Marseilles, he had made
friends with a man who turned out to be a mouchard. This animal, after
worming his secret out of him under pledge of eternal friendship, earned
promotion by giving him away, and so here he was for life.
He seemed perfectly content, but when I asked him what he would do with
that friend if he had him in the bush for a few minutes, I was answered by a
gleam of white teeth, a flash of black eyes, and a shake of the head, which,
taken together, were a good deal more eloquent than words.

One of the Lowest Types of Criminal Faces. An illustration of
the ease with which it is possible to disguise the chin, typical
of moral weakness, and the wild-beast mouth, which nearly
all Criminals have, by means of moustache and beard.
By permission of C. Arthur Pearson, Ltd.
La France turned up that afternoon, so did the Commission of Inspection
from Bourail with several other passengers. I was told that we should be
crowded, but until I got on board in the dawn of the next morning I never
knew how crowded a steamer could be.

I had travelled by many crafts under sail and steam from a south sea island
canoe to an Atlantic greyhound, but never had the Fates shipped me on board
such a craft as La France. She was an English-built cargo boat, about a
hundred and thirty feet long, with engines which had developed sixty horse-
power over twenty years ago. She had three cabins on each side of the dog-
kennel that was called the saloon.
If she had been allowed to leave an English port at all she would have been
licensed to carry about eight passengers aft and twenty on deck. On this
passage she had twelve first-class, about fifteen second, and between fifty
and sixty on deck, including twenty convicts and relégués on the forecastle,
and a dozen hard cases in chains on the forehatch.
She also carried a menagerie of pigs, goats, sheep, poultry, geese, and
ducks, which wandered at their own will over the deck-cargo which was piled
up to the tops of her bulwarks. Her quarter-deck contained about twenty
square feet, mostly encumbered by luggage. The second-class passengers
had to dine here somehow. The first-class dined in the saloon in relays.
The food was just what a Frenchman would eat on a Caledonian coast-boat.
It was cooked under indescribable conditions which you couldn’t help seeing;
but for all that the miserable meals were studiously divided into courses just
as they might have been in the best restaurant in Paris.
Everything was dirty and everything smelt. In fact the whole ship stank so
from stem to stern that even the keenest nose could not have distinguished
between the smell of fried fish and toasted cheese. The pervading odours
were too strong. Moreover, nearly every passenger was sick in the most
reckless and inconsiderate fashion; so when it came to the midday meal I got
the maître d’hôtel, as they called the greasy youth who acted as chief
steward, to give me a bottle of wine, a little tin of tongue, and some fairly
clean biscuits, and with these I went for’rard on to the forecastle and dined
among the convicts.
The forecastle was high out of the water, and got all the breeze, and the
convicts were clean because they had to be. I shared my meal and bread and
wine with two or three of them. Then we had a smoke and a yarn, after which
I lay down among them and went to sleep, and so La France and her unhappy
company struggled and perspired through the long, hot day back into plague-
stricken Noumea. When I left La France I cursed her from stem to stern, and
truck to kelson. If language could have sunk a ship she would have gone
down there and then at her moorings; but my anathemas came back upon my

own head, for the untoward Fates afterwards doomed me to make three more
passages in her.
To get clean and eat a decent dinner at the Cercle was something of a
recompense even for an all-day passage in La France. But it is not a very
cheerful place to come back to, for the shadow of the Black Death was
growing deeper and deeper over the town. The plague was worse than ever.
The microbe had eluded the sentries and got under or over the iron barriers,
and was striking down whites and blacks indiscriminately, wherefore I
concluded that Noumea was a very good place to get out of, and, as I
thought, made my arrangements for doing so as quickly as possible.
VII
THE PLACE OF EXILES
My next expedition was to include the forest camps to the south-west of the
island, and a visit to the Isle of Pines, an ocean paradise of which I had read
much in the days of my youth; wherefore I looked forward with some
anticipation to seeing it with the eyes of flesh. There would be no steamer for
three or four days, so the next day I took a trip over to the Peninsula of
Ducos, to the northward of the bay.
The glory of Ducos as a penal settlement is past. There are now only a few
“politicals,” and traitors, and convicts condemned a perpétuité; that is to say,
prisoners for life, with no hope of remission or release. A considerable
proportion of them are in hospital, dragging out the remainder of their
hopeless days, waiting until this or the other disease gives them final release.

The Peninsula of Ducos. In the background is Ile Nou with the Central Criminal Depôt.
On another part of the peninsula, in a semi-circular valley, hemmed in by
precipitous hills, there is a piteously forlorn colony, that of the liberés
collectifs; that is to say, convicts who have been released from prison, but are
compelled to live in one place under supervision. They are mostly men whose
health has broken down under the work of the bagne, or who have been
released on account of old age.
They live in wretched little cabins on the allotments, which it is their
business to keep in some sort of cultivation. They have the poor privileges of
growing beards and moustaches if they like, and of wearing blue dungaree
instead of grey, and of earning a few pence a week by selling their produce to
the Administration.
This is not much, but they are extremely proud of it, and hold themselves
much higher than the common forçat. They do not consider themselves
prisoners, but only “in the service of the Administration.” I have seldom, if
ever, seen a more forlorn and hopeless collection of human beings in all my
wanderings.
There was, however, a time when Ducos was one of the busiest and most
important of the New Caledonian Settlements, for it was here that the most
notorious and most dangerous of the communards were imprisoned after their

suppression in 1872. Here lived Louise Michel, the high-priestess of anarchy,
devoting herself to the care of the sick and the sorrowing with a self-sacrifice
which rivalled even that of the Sisters of Mercy, and here, too, Henri Rochefort
lived in a tiny stone house in the midst of what was once a garden, and the
delight of his days of exile.
Louise Michel’s house has disappeared in the course of improvements.
Rochefort’s house is a roofless ruin in the midst of a jungle which takes a
good deal of getting through. It was from here that he made his escape with
Pain and Humbert and two other communards in an English cutter, which may
or may not have been in the harbour for that particular purpose.
One night they did not turn up to muster, but it was explained that
Rochefort and Humbert had gone fishing, and the others were away on a tour
“with permission.” As they did not return during the night search-parties were
sent out for them. Meanwhile, they had kept a rendezvous at midnight with
the cutter’s boat and got aboard.
The next day was a dead calm; and, as the cutter lay helpless at her
anchor, the fugitives concealed themselves about her cargo as best they
could. The hue and cry was out all over the coast, but the searchers looked
everywhere but just the one place where they were. If the next day had been
calm they must have been caught, for the authorities had decided on a
thorough search of every vessel in the harbour. Happily for them a breeze
sprang up towards the next morning, and the cutter slipped quietly out. Once
beyond the outward reef the fugitives were in neutral water, and, being
political prisoners, they could not be brought back.
By daylight the truth was discovered, but pursuit was impossible. The cutter
had got too long a start for any sailing vessel to overtake her in the light
wind, and the only steamer which the administration then possessed had
gone away to Bourail to fetch back the Governor’s wife. If it had been in the
harbour that morning, at least one picturesque career might have been very
different. MacMahon was President at the time, and of all men on earth he
had the most deadly fear of Rochefort, so he took a blind revenge for his
escape by ordering the Governor to expel every one who was even suspected
of assisting in the escape.
The story was told to me by one who suffered through this edict quite
innocently, and to his utter ruin. He was then one of the most prosperous
men in Noumea. He owned an hotel and several stores, and had mail and
road-making contracts with the government. Unhappily, one of his stores was

on the Peninsula of Ducos, and the man who managed it was reputed to be
very friendly with Rochefort.
This was enough. He was ordered to clear out to Australia in two months. It
was in vain that he offered himself for trial on the definite charge of assisting
a prisoner to escape. The Governor and every one else sympathised deeply
with him, but they dare not even be just, and out he had to go. He is now
canteen-keeper on the Isle of Pines, selling groceries and drink to the officials
and relégués at prices fixed by the government. He told me this story one
night at dinner at his own table.
The general amnesty of 1880 released Louise Michel and the rest of those
who had survived the terrible revolt of 1871 from Ducos and the Isle of Pines.
There are, however, two other celebrities left on Ducos. One of them is a
tall, erect, grizzled Arab, every inch a chieftain, even in his prison garb. This is
Abu-Mezrag-Mokrani, one of the leaders of the Kabyle insurrection of 1871, a
man who once had fifteen thousand desert horsemen at his beck and call.
Now he rules a little encampment in one of the valleys of the peninsula,
containing forty or fifty of his old companions-in-arms, deported with him
after the insurrection was put down.
When the Kanaka rebellion broke out in New Caledonia in 1878, Abu-
Mezrag volunteered to lead his men against the rebels in the service of
France. The offer was accepted and the old warriors of the desert acquitted
themselves excellently among the tree-clad mountains of “La Nouvelle.” When
the rebellion was over a petition for their pardon was sent to the home
government, but the remnant of them are still cultivating their little patches of
ground on Ducos.
The other surviving celebrity is known in Caledonia as the Caledonian
Dreyfus, and this is his story:
In 1888 Louis Chatelain was a sous-officier of the line stationed in Paris. He
was dapper, good-looking, and a delightful talker. He engaged the affections
of a lady whose ideas as to expenditure were far too expansive to be gratified
out of the pay of a sous-officier. Poor Chatelain got into debt, mortgaged or
sold everything that he had, and still the lady was unsatisfied. Finally, after
certain recriminations, and when he had given her everything but his honour,
she suggested a means by which he could make a fortune with very little
trouble. She had, it appears, made the acquaintance of a gentleman who
knew some one connected with a foreign army, who would give twenty
thousand francs for one of the then new-pattern Lebel rifles.

He entered into correspondence with the foreign gentleman, addressing
him—c/o the —— Embassy, Paris. His letters were stopped, opened,
photographed, and sent on. So were the replies. Then the negotiations were
suddenly broken off, Chatelain was summoned before the military tribunal and
confronted with the pièces de conviction. He confessed openly, posing as a
martyr to la grande passion—and his sentence was deportation for life.
The remains of Henri Rochefort’s House.
The Bedroom of Louis Chatelain, “The Caledonian
Dreyfus” in Ducos. The photographs on the wall and the
one on the table are those of the woman who ruined
him.

Welcome to Our Bookstore - The Ultimate Destination for Book Lovers
Are you passionate about books and eager to explore new worlds of
knowledge? At our website, we offer a vast collection of books that
cater to every interest and age group. From classic literature to
specialized publications, self-help books, and children’s stories, we
have it all! Each book is a gateway to new adventures, helping you
expand your knowledge and nourish your soul
Experience Convenient and Enjoyable Book Shopping Our website is more
than just an online bookstore—it’s a bridge connecting readers to the
timeless values of culture and wisdom. With a sleek and user-friendly
interface and a smart search system, you can find your favorite books
quickly and easily. Enjoy special promotions, fast home delivery, and
a seamless shopping experience that saves you time and enhances your
love for reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!
ebookgate.com

Methods for Testing and Evaluating Survey Questionnaires 1st Edition Stanley Presser

More Related Content

Similar to Methods for Testing and Evaluating Survey Questionnaires 1st Edition Stanley Presser (20)

Recently uploaded (20)

Methods for Testing and Evaluating Survey Questionnaires 1st Edition Stanley Presser