SlideShare a Scribd company logo
Unit 3: Psychological testing
3.1: Nature and use of Psychological Test
Suman Sharma
Trichandra Multiple Campus
Nature and use of Psychological test
• Psychological testing is the administration of psychological tests,
which are designed to be "an objective and standardized
measure of a sample of behavior”
• Psychological testing is a systematic procedure for obtaining
samples of behavior, relevant to cognitive, affective, or
interpersonal functioning, and for scoring and evaluating those
samples according to standards.
• Psychometrics is the field of study concerned with the theory
and technique of psychological measurement, which includes
the measurement of knowledge, abilities, attitudes, personality
traits,
and educational measurement.
Psychological test
• Used to assess a variety of mental abilities and
attributes, including achievement and ability,
personality, and neurological functioning.
• Personality tests are administered for a wide
variety of reasons, from diagnosing
psychopathology (e.g., personality disorder,
depressive disorder) to screening job candidates.
• They may be used in an educational setting to
determine personality strengths and weaknesses.
Psychological Tests As Tools
• When used appropriately and skillfully, can be
extremely helpful even irreplaceable.
• Psychological test are always a means to an
end and never an end in themselves.
• Tool can be an instrument of good or harm
depending on how it is used.
Use of Psychological Test
Uses of psychological tests in three categories:
• Decision making
• Psychological Research
• Self understanding and personal development.
Use of Psychological Test
Decision Making:
• Psychological test are used to make decision
about people, either as individual or in group.
• Help in Diagnosis of different problems.
Psychological Research:
• Research on psychological phenomena of the
individual differences in the field of
developmental, educational, social, vocational
psychology etc.
Use of Psychological Test
Self understanding and Personal Development:
• In therapeutic process of promoting self
understanding and psychological adjustment.
• Testing as a way to provide clients with
understanding and positive growth about
themselves.
Major Application of Psychological Test
1. Identification of mentally retarded person.
2. Assessment in Education.
3. Selection and classification of people in
organization.
4. Individual counseling.
5. Basic Research.
6. Diagnosis and Prediction.
Characteristics of Psychological Tests.
• Proper psychological testing is conducted after
vigorous research and development. Proper
psychological testing consists of the following.
1. Standardization.
2. Objectivity.
3. Test Norms.
4. Reliability.
5. Validity.
Characteristics of Psy test
• Standardization - All procedures and steps must be conducted with
consistency and under the same environment to achieve the same
testing performance from those being tested.
• Objectivity - Scoring is free of subjective judgments or biases based on
the fact that the same results are obtained on test from everyone.
• Test Norms - The average test score within a large group of people
where the performance of one individual can be compared to the
results of others by establishing a point of comparison or frame of
reference.
• Reliability - Obtaining the same result after multiple testing.
• Validity - The type of test being administered must measure what it is
intended to measure.
3.2: Technical and Methodological Principles
• Norms and Meaning of Test scores,
• Reliability
• Validity
• Item Analysis
Norms
• Scores that were obtained with an initial group of people took
the test.
• Norms provide a basis for comparing the individual with a
group.
• Norm represents the test performance of the standardized
sample.
• Any raw score can then be interpreted relative to the
performance of the reference (or Normative )group – five years
olds, sixth standards, institutional inmates, job applicants etc..
• Test performance or typical behavior of one or more referenced
groups.
Norms
• Determines an average score and a typical range of scores
norm of test is established.
• This makes it possible to know weather a particular score
is unusually high, unusually low or in the average range.
• They provide relative rather than absolute information.
• Norm reflects how a particular group ( the norm group)
performed on a particular test at a particular point in
time.
• By nature 50% of the scores are below mean and 50% of
the score lies above mean.
Norms
• Similarly, norm score do not provide any
information about the mastery of a skill,
although it is often assumed that if a score is
greater than -1.0 Standard Deviation below the
mean than performance is with in normal
limits and therefore acceptable.
• Eg. If we compare students to what is normal
for that age, grade, or class. SAT, GRE, I.Q tests.
Bell I.Q. Curve
Norms
• Usually presented in the form of tables with
descriptive statistics- such as means, standard
deviations, and frequency distributions- that
summarize the performance of the group or
groups in questions.
• When norms are collected from the test
performance of group of people, these
reference group are labeled normative or
standardization samples.
Reliability
• the extent to which the outcome of a test
remains unaffected by irrelevant variations
and procedures of testing.
• the extent to which test scores obtained by
a person are the same if the person is re-
examined by the same test on different
occasions.
Reliability
• Reliability of a test refers to its degree of
stability, consistency, predictability, and
accuracy.
• In simple terms, reliability in test is the degree
to which test produces stable and consistent
results. A specific measure is considered to be
reliable if its application on the same object of
measurement number of times produces the
same results.
Reliability…
• Underlying Concept of reliability is the possible range of
error, or error of measurement, of a single score. This is an
estimate range of possible random fluctuation that can be
expected in an individual score.
• These degree of error or noise is always present in system,
from such factor as: i)misreading of the items, ii)poor
administration procedure, iii)changing mood of the clients
etc..
• The goal of the test constructor is to reduce as much as
possible, the degree of measurement error or random
fluctuation.
Reliability..
• Two main issues relate to the degree of error(reliability) in a test.
1. Inevitable natural variation in human performance (usually the variability is
less for measurement of ability than for those personality). where as ability
variables (intelligence, mechanical aptitude, etc) shows gradual changes,
many personality traits are much more highly dependent on factors such as
mood and highly fluctuation.
2. Psychological testing methods are necessarily imprecise. For hard sciences,
researchers can make direct measurements such as concentration of
solution, relative wt of compound. In contrast many construct in
psychology are often measured indirectly. Eg. Intelligence cannot be
percevied directly: it must be inferred by measuring behaviour that has
been defined as being intillegent.
• Although some error in testing is inevitable, the goal is to keep testing error
with in reasonably accepted limits.(0.80 or more)
Primary Methods of obtaining Reliability
1. Test re test reliability( time to time)
2. Alternate forms reliability ( form to form)
3. Split half reliability (item to item )
4. Inter scorer reliability (Score to Score)
Test-Retest Reliability
• Repeatability or test–retest reliability is the closeness
of the agreement between the results of successive
measurements of the same measure and carried out
under the same conditions of measurement.
• In order to measure test-retest reliability, we must
first give the same test to the same individuals on two
occasions and correlate the scores.
• If the correlation is high, the results are less likely to
caused by random fluctuation in the condition of the
examinee or the testing environment.
Test retest reliability….
• In general, test-retest reliability is the
preferred method only if the variables being
measured is relatively stable traits. If the
variable is highly changeable (eg anxiety,
mood)this method is usually not adequate.
Alternate reliability (form to form)
• The alternate form method avoids many of the problems
encountered with test-retest reliability. The logic is, if the
trait is measured several times on the same individual by
using parallel forms of the test, the different
measurement should produce similar results.
• The degree of similarity between the scores represents
the reliability coefficient of test.
• Alternate form reliability occurs when an individual
participating in a research or testing scenario is given
two different versions of the same test at different times.
Alternate form…
• Primary difficulty with alternate form lies in
determining whether the two forms are actually
equivalent. For eg. If one test is more difficult than
its alternate form, the difference in score may
represents actual differences in the two test rather
than difference resulting form the unreliability of the
measures.
• Another difficulty is delay between one
administration and the next. With such delay cause
short term fluctuation such as mood, stress level etc.
Split half reliability
• Split half method is the best technique for determining
relaibility for a trait with a high degree of fluctuation.
Because the test is given only once, the items are split in
half and the two halves are correleated.
• As there is only one administration, it is not possible for
the effects of time to intervene as they might with test-
retest method.
• The split-half method assesses the internal consistency of
a test, such as psychometric tests and questionnaires.
There, it measures the extent to which all parts of the test
contribute equally to what is being measured.
Split half reliability…
• As a general principle, the longer a test is, the
more reliable it is because the larger number
of items, the easier it is for the majority of the
items to compensate for minor alteration in
responding to a few of the other items.
• This is done by comparing the results of one
half of a test with the results from the other
half.
Inter scorer reliability
• In some test, scoring is based partially on the judgment of the
examiner. Because judgment may vary between one scorer and the
next.
• In testing, inter-scorer reliability is the degree of agreement among
raters or judges. It is a score of how much homogeneity, or consensus,
there is in the ratings given by various examiners.
• The basic strategy for determining inter scorer reliability is to obtain a
series of responses from a single client and to have these responses
scored by two different individuals. A variation is to have two different
examiners test the same client using the same test and then to
determine how close their score or ratings of persons are.
• Any test that requires even partial subjectivity in scoring should
provide information on inter scorer reliability.
Validity
• Validity refers to a test's ability to measure
what it is supposed to measure.
• It addresses what the test is to be accurate
about.
• Establishing the validity of a test can be
extremely difficult primarily because
psychological variable are usually abstract
concept such as intelligence, anxiety and
personality.
Validity…
• Although a test can be reliable without being
valid, the opposite is not true: a necessary
condition for validity is that the test must have
achieved the adequate level of reliability.
Validity condt..
• A distinction can be made between internal and external validity. These types of
validity are relevant to evaluating the validity of a research study / procedure.
• Internal validity refers to whether the effects observed in a study are due to the
manipulation of the independent variable and not some other factor. In-other-
words there is a causal relationship between the independent and dependent
variable.
• Internal validity can be improved by controlling extraneous variables, using
standardized instructions, counter balancing, and eliminating demand
characteristics and investigator effects.
• External validity refers to the extent to which the results of a study can be
generalized to other settings (ecological validity), other people (population
validity) and over time (historical validity).
• External validity can be improved by setting experiments in a more natural
setting and using random sampling to select participants.
Main method of establishing validity
Content validity
• During the initial construction phase of any
test, the developers must first be concerned
with content validity. This refers to the
representativeness and relevance of the test
instrument to the construct being measured.
• During the initial item selection, the
constructor must carefully consider the skills of
knowledge area of the variable they would like
to measure.
Face validity
• Face validity, also called logical validity, is a simple
form of validity where you apply a superficial and
subjective assessment of whether or not your study
or test measures what it is supposed to measure.
• For example. A group of potential mechanics who
are being tested for basic arithmetic skills should
have word problem that relate to machines rather
than business transaction.
Construct validity
The basic approach of construct validity is to assess the extent
to which the test measures a theoretical construct or trait.
• Construct validity refers to how well a test or tool measures
the construct that it was designed to measure. In eg, to what
extent is the BDI measuring depression?
• Construct validity represents the strongest approach to test
construction. In many ways all types of validity can be
considered as sub categories of construct validity. Thus
construct validation is never ending process in which new
relationship always can be verified and investigated.
Construct validity….
• There are two types of construct validity: convergent and
discriminant validity.
• Convergent validity tests that constructs that are expected to be
related are, in fact, related. Discriminant validity (or divergent
validity) tests that constructs that should have no relationship in
fact.
• For eg, in order to test the convergent validity of a measure of
self-esteem, a researcher may want to show that measures of
similar constructs, such as self-worth, confidence, social skills,
and self-appraisal are also related to self-esteem, whereas non-
overlapping factors, such as intelligence should not relate, so
discriminant validity
Criterion Validity
• Criterion validity is determined by comparing test
scores with some sort of performance on an outside
measures. The outside measures should have a
theoretical relation to the variable that the test is
supposed to measure.
• Eg, an intelligence test might be correlated with
grade point average: an aptitude test, with other
tests measuring similar dimensions. The relation
between the two measurements is usually
expressed as correlation coefficient.
Concurrent validity
• Concurrent validity refers to measurements taken at the same, or
approximately the same time as a test is taken.
• Eg intelligence test might be administered at the same time as
assessment of groups level of academic achievement.
• Concurrent validity is a concept commonly used in psychology,
education, and social science. It refers to the extent to which the
results of a particular test or measurement correspond to those of
a previously established measurement for the same construct.
• An Example of Concurrent Validity. Researchers give a group of
students a new test, designed to measure mathematical aptitude.
They then compare this with the test scores already held by the
school, a recognized and reliable judge of mathematical ability.
Predictive Validity
• Predictive validity refers to outside measurement that were taken some
time after the test score were derived.
• Eg Predictive validity might be evaluated by correlating the intelligence
test score with measures of academic acheivement a year after the
initial testing.
• This is the degree to which a test accurately predicts a criterion that will
occur in the future.
• For example, a prediction may be made on the basis of a new
intelligence test, that high scorers at age 12 will be more likely to obtain
university degrees several years later. If the prediction is born out then
the test has predictive validity.
• In the context of pre-employment testing, predictive validity refers to
how likely it is for test scores to predict future job performance.
Nature-and-use of psychological in testing
Item Analysis
• Item analysis is a process which examines student responses to
individual test items (questions) in order to assess the quality of
those items and of the test as a whole.
• A crucial first steps in test development, involving different kinds of
evaluation procedures examining and evaluating each items.
• A set of methods used to evaluate test items, is one of the most
important aspects of test construction.
• Once developed, the items will need to undergo a review which
relates them to the characteristics assessed and intended targeted
population.
• Review by experts.
• Trail the items on a representative sample (pilot).
Item Analysis
Trail the items on a representative sample:
• Evaluated without any time limit being set.
• Sufficiently large enough sample to enable a
satisfactory evaluation, with five times as many
participants as the numbers of item.
Feedback from them is important to
• Establish the clarity of items,
• Establish the adequacy of any time limit and
• Gain information about administration.
Item Analysis
Basic methods involve assessment of
• Item difficulty and
• Items discriminability.
Item Analysis
Item difficulty:
• The difficulty level of an item should indicate
whether an item is too easy or possibly too
difficult.
• When examining ability or acheivement tests,
clinician are often interested in the difficulty level
of the test items.
• Item difficulty, an index is calculated as:
P= numbers who answer correctly/ total number of individuals
Item difficulty..
Item difficulty Index can range from .00 to 1.00
• 0.00 = no one got correct answer
• 1.00 = every one got correct answer.
• Eg, on an assessment test where 15 of the
students in a class of 25 got the first item
correct on a test. Then item difficulty (p)= .60
P=15/25 = .60
Item Difficulty..
• Probability that an item could be answered correctly by
chance alone needs to determine.
• A true-false item could be answered correctly half the time if
people just guessed randomly.
• Thus a true-false item with a difficulty level of .50 would not
be a good item.
• For most tests, item in the difficulty range of 0.30 to .70 tend
to maximize information about the differences among
individuals.
• However some tests require a concentration of more difficult
items ( select medical students).
Item Analysis
Item Discrimination:
• Provides an indication of the degree to which an item correctly
differentiate among the examines on the domain of interest.
• Determines whether the people who have done well on
particular items have also done well on the whole test.
• Applicable to achievement or ability tests and can also indicate if
items discriminate in other assessment areas, such as
personality and interest inventories.
• Eg. If individuals who have been diagnosed as being depressed
answered an items one way and if individuals who do not report
depressive symptoms answer the same question another way.
Item Discriminability..
Methods for calculating an item discrimination
index.
• Extreme group method.
• The Point Biserial Method (correlational method)
Item Analysis
Extreme group method:
• Examines are divided into two groups based on
high and low scores on the instrument ( the
instrument could be a class test, a depression
inventory, etc.)
• The item discrimination index is then calculated
by; d=upper%-lower%
upper% = who got the item correct or who
endorsed the item in the expected manner.
Item analysis
Item discrimination indices range from +1.00 to
-1.00
• +1.00= all of the upper group got right and none
of the lower group got right.
• -1.00= none of the upper group got it right and
all of the lower group got it right.
Interpretation of an item discrimination index
depends on the instrument, the purpose it was
used for, and the group taking the instrument.
Item Discrimnability
The Point Biserial Method:
• Find the correlation between performance on the item
and performance on the total test.
• Correlation coefficient that ranges from +1.00to-1.00
with the positive and larger coefficients reflecting items
that are better discriminators.
• If the values is negative or low, the item cannot
discriminate so it should be eliminated.
• The closer the value of the index is to 1.0 the better the
item because it discriminates well.
3.3: Test Administration
• Standardized test administration procedure are necessary for
valid results.
• Extensive research in social psychology has clearly
demonstrated that situational factors can affect scores on
mental and behavioral tests.
• These effects, however, can be subtle and may not be
observed in all studies.
• For eg. Few studies shows that the race of the examiner
affects scores on standardized intelligence tests.
• Similarly, the examiner’s rapports and expectancies may
influences scores on some but not all occasions.
Test Adminstration
Factors that influencing test scores:
• The relationship between examiner and test taker.( rapport,
friendliness, verbal reinforcement.)
• The race of test takers. (same race )
• Language of test takers.( translation )
• Training of the administrator.( pragmatic knowledge)
• Expectancy effects.(cognitive bias)
• Effects of reinforcing responses.( rewards )
• Computer-Assisted Test Administration.
• Mode of the Administration.( Self administered, by test takers)
• Subject variables.( motivation, anxiety, illness, hormonal )
Computer Assisted Test Administration
• Interest has increased because it may reduce
examiner bias.
• Can administer and score more tests with great
precision and with minimum bias.
• Computer offers many advantages in test
administration, scoring, and interpretation, including
ease of application of complicated psychometric
issues and the integration on testing.
• This mode of test administration is expected to
become more common in the near future.
Advantage of C.A.T.A.
Some of the advantages that computer offers:
• Excellence of standardization
• Individually tailored sequential administration.
• Precision of timing response.
• Release of human testers for other duties.
• Cost efficiency, utilization of resources.
• Patience (test takers not rushed), and
• Control of bias.
• THANK YOU..

More Related Content

PPTX
124. Personality Assessment
PPTX
PPTX
Psychodiagnostic technique[1]
PPT
Assessment & Diagnosis
PPTX
Rorschach Inkblot Test
PDF
Eysenck three personality trait theory
PPTX
Inkblot test (rorschach inkblot)
124. Personality Assessment
Psychodiagnostic technique[1]
Assessment & Diagnosis
Rorschach Inkblot Test
Eysenck three personality trait theory
Inkblot test (rorschach inkblot)

What's hot (20)

PPT
Personality
PPTX
Raymond cattell [autosaved]
PPTX
Interest by S.Lakshmanan, Psychologist
PPTX
Nature and goals of assessment and evaluation
PPTX
Personality assessment(2nd Sem)
PPTX
Types of psychological test
PDF
PSYCHOMETRICS INDIVIDUAL AND GROUP TESTS
PPTX
Psychological testing
PPTX
California Psychological Inventory
PPT
Psychological test meaning, concept, need & importance
PPTX
Cattell's Theory of Personality
PDF
3. characteristics of psychological tests S.Lakshmanan Psychologist
PPTX
Research methods in psychology
PPTX
Structuralism School of Psychology
PPTX
Rorschach's Inkblot Test
PPTX
Personality assessment
PPTX
Edward personal preference scales
PPSX
Biological model
Personality
Raymond cattell [autosaved]
Interest by S.Lakshmanan, Psychologist
Nature and goals of assessment and evaluation
Personality assessment(2nd Sem)
Types of psychological test
PSYCHOMETRICS INDIVIDUAL AND GROUP TESTS
Psychological testing
California Psychological Inventory
Psychological test meaning, concept, need & importance
Cattell's Theory of Personality
3. characteristics of psychological tests S.Lakshmanan Psychologist
Research methods in psychology
Structuralism School of Psychology
Rorschach's Inkblot Test
Personality assessment
Edward personal preference scales
Biological model
Ad

Similar to Nature-and-use of psychological in testing (20)

PPTX
Monika seminar
PPTX
Monika seminar
PPT
1607070124-chapter-1.ppt
PDF
Test standardization and norming
PPTX
PSYCHOLOGICAL ASSESSMENT AND TESTS.pptx
PPT
Validity, its types, measurement & factors.
PPT
validityitstypesmeasurementfactors-130908120814- (1).ppt
PPTX
RESEARCH INSTRUMENT, VALIDITY AND RELIABILITY.pptx
PPTX
Psych Ass Chap 6- Testing and Tests.pptx
PPTX
psychological assessment and test for nursing students unit 8.pptx
PPTX
Psych Ass Chap 8 Test Administration (Kaplan).pptx
PPTX
DEVELOPMENT AND EVALUATION OF SCALES/INSTRUMENTS IN PSYCHIATRY
PPT
Personal inventory by beverly sloan PsyD
PPTX
A" Research Methods Reliability and validity
PPT
Evaluation of Measurement Instruments.ppt
PPTX
QUALITIES OF GOod research instrument-WPS Office.pptx
PPTX
Psychological tests
PPTX
IP lecture 5.pptx
PPTX
RESEARCH METHODOLOGY - REALIABILITY vs VALIDITY
PPTX
Chapter 4: Of Tests and Testing
Monika seminar
Monika seminar
1607070124-chapter-1.ppt
Test standardization and norming
PSYCHOLOGICAL ASSESSMENT AND TESTS.pptx
Validity, its types, measurement & factors.
validityitstypesmeasurementfactors-130908120814- (1).ppt
RESEARCH INSTRUMENT, VALIDITY AND RELIABILITY.pptx
Psych Ass Chap 6- Testing and Tests.pptx
psychological assessment and test for nursing students unit 8.pptx
Psych Ass Chap 8 Test Administration (Kaplan).pptx
DEVELOPMENT AND EVALUATION OF SCALES/INSTRUMENTS IN PSYCHIATRY
Personal inventory by beverly sloan PsyD
A" Research Methods Reliability and validity
Evaluation of Measurement Instruments.ppt
QUALITIES OF GOod research instrument-WPS Office.pptx
Psychological tests
IP lecture 5.pptx
RESEARCH METHODOLOGY - REALIABILITY vs VALIDITY
Chapter 4: Of Tests and Testing
Ad

Recently uploaded (20)

PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
Pharmacology of Autonomic nervous system
PPT
6.1 High Risk New Born. Padetric health ppt
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
Science Quipper for lesson in grade 8 Matatag Curriculum
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PDF
Sciences of Europe No 170 (2025)
PDF
The Land of Punt — A research by Dhani Irwanto
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPT
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
PPTX
Seminar Hypertension and Kidney diseases.pptx
PDF
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PPTX
Introcution to Microbes Burton's Biology for the Health
PPTX
The Minerals for Earth and Life Science SHS.pptx
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PPT
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
TOTAL hIP ARTHROPLASTY Presentation.pptx
Pharmacology of Autonomic nervous system
6.1 High Risk New Born. Padetric health ppt
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
Phytochemical Investigation of Miliusa longipes.pdf
Science Quipper for lesson in grade 8 Matatag Curriculum
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
Sciences of Europe No 170 (2025)
The Land of Punt — A research by Dhani Irwanto
7. General Toxicologyfor clinical phrmacy.pptx
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
Seminar Hypertension and Kidney diseases.pptx
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
Introcution to Microbes Burton's Biology for the Health
The Minerals for Earth and Life Science SHS.pptx
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.

Nature-and-use of psychological in testing

  • 1. Unit 3: Psychological testing 3.1: Nature and use of Psychological Test Suman Sharma Trichandra Multiple Campus
  • 2. Nature and use of Psychological test • Psychological testing is the administration of psychological tests, which are designed to be "an objective and standardized measure of a sample of behavior” • Psychological testing is a systematic procedure for obtaining samples of behavior, relevant to cognitive, affective, or interpersonal functioning, and for scoring and evaluating those samples according to standards. • Psychometrics is the field of study concerned with the theory and technique of psychological measurement, which includes the measurement of knowledge, abilities, attitudes, personality traits, and educational measurement.
  • 3. Psychological test • Used to assess a variety of mental abilities and attributes, including achievement and ability, personality, and neurological functioning. • Personality tests are administered for a wide variety of reasons, from diagnosing psychopathology (e.g., personality disorder, depressive disorder) to screening job candidates. • They may be used in an educational setting to determine personality strengths and weaknesses.
  • 4. Psychological Tests As Tools • When used appropriately and skillfully, can be extremely helpful even irreplaceable. • Psychological test are always a means to an end and never an end in themselves. • Tool can be an instrument of good or harm depending on how it is used.
  • 5. Use of Psychological Test Uses of psychological tests in three categories: • Decision making • Psychological Research • Self understanding and personal development.
  • 6. Use of Psychological Test Decision Making: • Psychological test are used to make decision about people, either as individual or in group. • Help in Diagnosis of different problems. Psychological Research: • Research on psychological phenomena of the individual differences in the field of developmental, educational, social, vocational psychology etc.
  • 7. Use of Psychological Test Self understanding and Personal Development: • In therapeutic process of promoting self understanding and psychological adjustment. • Testing as a way to provide clients with understanding and positive growth about themselves.
  • 8. Major Application of Psychological Test 1. Identification of mentally retarded person. 2. Assessment in Education. 3. Selection and classification of people in organization. 4. Individual counseling. 5. Basic Research. 6. Diagnosis and Prediction.
  • 9. Characteristics of Psychological Tests. • Proper psychological testing is conducted after vigorous research and development. Proper psychological testing consists of the following. 1. Standardization. 2. Objectivity. 3. Test Norms. 4. Reliability. 5. Validity.
  • 10. Characteristics of Psy test • Standardization - All procedures and steps must be conducted with consistency and under the same environment to achieve the same testing performance from those being tested. • Objectivity - Scoring is free of subjective judgments or biases based on the fact that the same results are obtained on test from everyone. • Test Norms - The average test score within a large group of people where the performance of one individual can be compared to the results of others by establishing a point of comparison or frame of reference. • Reliability - Obtaining the same result after multiple testing. • Validity - The type of test being administered must measure what it is intended to measure.
  • 11. 3.2: Technical and Methodological Principles • Norms and Meaning of Test scores, • Reliability • Validity • Item Analysis
  • 12. Norms • Scores that were obtained with an initial group of people took the test. • Norms provide a basis for comparing the individual with a group. • Norm represents the test performance of the standardized sample. • Any raw score can then be interpreted relative to the performance of the reference (or Normative )group – five years olds, sixth standards, institutional inmates, job applicants etc.. • Test performance or typical behavior of one or more referenced groups.
  • 13. Norms • Determines an average score and a typical range of scores norm of test is established. • This makes it possible to know weather a particular score is unusually high, unusually low or in the average range. • They provide relative rather than absolute information. • Norm reflects how a particular group ( the norm group) performed on a particular test at a particular point in time. • By nature 50% of the scores are below mean and 50% of the score lies above mean.
  • 14. Norms • Similarly, norm score do not provide any information about the mastery of a skill, although it is often assumed that if a score is greater than -1.0 Standard Deviation below the mean than performance is with in normal limits and therefore acceptable. • Eg. If we compare students to what is normal for that age, grade, or class. SAT, GRE, I.Q tests.
  • 16. Norms • Usually presented in the form of tables with descriptive statistics- such as means, standard deviations, and frequency distributions- that summarize the performance of the group or groups in questions. • When norms are collected from the test performance of group of people, these reference group are labeled normative or standardization samples.
  • 17. Reliability • the extent to which the outcome of a test remains unaffected by irrelevant variations and procedures of testing. • the extent to which test scores obtained by a person are the same if the person is re- examined by the same test on different occasions.
  • 18. Reliability • Reliability of a test refers to its degree of stability, consistency, predictability, and accuracy. • In simple terms, reliability in test is the degree to which test produces stable and consistent results. A specific measure is considered to be reliable if its application on the same object of measurement number of times produces the same results.
  • 19. Reliability… • Underlying Concept of reliability is the possible range of error, or error of measurement, of a single score. This is an estimate range of possible random fluctuation that can be expected in an individual score. • These degree of error or noise is always present in system, from such factor as: i)misreading of the items, ii)poor administration procedure, iii)changing mood of the clients etc.. • The goal of the test constructor is to reduce as much as possible, the degree of measurement error or random fluctuation.
  • 20. Reliability.. • Two main issues relate to the degree of error(reliability) in a test. 1. Inevitable natural variation in human performance (usually the variability is less for measurement of ability than for those personality). where as ability variables (intelligence, mechanical aptitude, etc) shows gradual changes, many personality traits are much more highly dependent on factors such as mood and highly fluctuation. 2. Psychological testing methods are necessarily imprecise. For hard sciences, researchers can make direct measurements such as concentration of solution, relative wt of compound. In contrast many construct in psychology are often measured indirectly. Eg. Intelligence cannot be percevied directly: it must be inferred by measuring behaviour that has been defined as being intillegent. • Although some error in testing is inevitable, the goal is to keep testing error with in reasonably accepted limits.(0.80 or more)
  • 21. Primary Methods of obtaining Reliability 1. Test re test reliability( time to time) 2. Alternate forms reliability ( form to form) 3. Split half reliability (item to item ) 4. Inter scorer reliability (Score to Score)
  • 22. Test-Retest Reliability • Repeatability or test–retest reliability is the closeness of the agreement between the results of successive measurements of the same measure and carried out under the same conditions of measurement. • In order to measure test-retest reliability, we must first give the same test to the same individuals on two occasions and correlate the scores. • If the correlation is high, the results are less likely to caused by random fluctuation in the condition of the examinee or the testing environment.
  • 23. Test retest reliability…. • In general, test-retest reliability is the preferred method only if the variables being measured is relatively stable traits. If the variable is highly changeable (eg anxiety, mood)this method is usually not adequate.
  • 24. Alternate reliability (form to form) • The alternate form method avoids many of the problems encountered with test-retest reliability. The logic is, if the trait is measured several times on the same individual by using parallel forms of the test, the different measurement should produce similar results. • The degree of similarity between the scores represents the reliability coefficient of test. • Alternate form reliability occurs when an individual participating in a research or testing scenario is given two different versions of the same test at different times.
  • 25. Alternate form… • Primary difficulty with alternate form lies in determining whether the two forms are actually equivalent. For eg. If one test is more difficult than its alternate form, the difference in score may represents actual differences in the two test rather than difference resulting form the unreliability of the measures. • Another difficulty is delay between one administration and the next. With such delay cause short term fluctuation such as mood, stress level etc.
  • 26. Split half reliability • Split half method is the best technique for determining relaibility for a trait with a high degree of fluctuation. Because the test is given only once, the items are split in half and the two halves are correleated. • As there is only one administration, it is not possible for the effects of time to intervene as they might with test- retest method. • The split-half method assesses the internal consistency of a test, such as psychometric tests and questionnaires. There, it measures the extent to which all parts of the test contribute equally to what is being measured.
  • 27. Split half reliability… • As a general principle, the longer a test is, the more reliable it is because the larger number of items, the easier it is for the majority of the items to compensate for minor alteration in responding to a few of the other items. • This is done by comparing the results of one half of a test with the results from the other half.
  • 28. Inter scorer reliability • In some test, scoring is based partially on the judgment of the examiner. Because judgment may vary between one scorer and the next. • In testing, inter-scorer reliability is the degree of agreement among raters or judges. It is a score of how much homogeneity, or consensus, there is in the ratings given by various examiners. • The basic strategy for determining inter scorer reliability is to obtain a series of responses from a single client and to have these responses scored by two different individuals. A variation is to have two different examiners test the same client using the same test and then to determine how close their score or ratings of persons are. • Any test that requires even partial subjectivity in scoring should provide information on inter scorer reliability.
  • 29. Validity • Validity refers to a test's ability to measure what it is supposed to measure. • It addresses what the test is to be accurate about. • Establishing the validity of a test can be extremely difficult primarily because psychological variable are usually abstract concept such as intelligence, anxiety and personality.
  • 30. Validity… • Although a test can be reliable without being valid, the opposite is not true: a necessary condition for validity is that the test must have achieved the adequate level of reliability.
  • 31. Validity condt.. • A distinction can be made between internal and external validity. These types of validity are relevant to evaluating the validity of a research study / procedure. • Internal validity refers to whether the effects observed in a study are due to the manipulation of the independent variable and not some other factor. In-other- words there is a causal relationship between the independent and dependent variable. • Internal validity can be improved by controlling extraneous variables, using standardized instructions, counter balancing, and eliminating demand characteristics and investigator effects. • External validity refers to the extent to which the results of a study can be generalized to other settings (ecological validity), other people (population validity) and over time (historical validity). • External validity can be improved by setting experiments in a more natural setting and using random sampling to select participants.
  • 32. Main method of establishing validity
  • 33. Content validity • During the initial construction phase of any test, the developers must first be concerned with content validity. This refers to the representativeness and relevance of the test instrument to the construct being measured. • During the initial item selection, the constructor must carefully consider the skills of knowledge area of the variable they would like to measure.
  • 34. Face validity • Face validity, also called logical validity, is a simple form of validity where you apply a superficial and subjective assessment of whether or not your study or test measures what it is supposed to measure. • For example. A group of potential mechanics who are being tested for basic arithmetic skills should have word problem that relate to machines rather than business transaction.
  • 35. Construct validity The basic approach of construct validity is to assess the extent to which the test measures a theoretical construct or trait. • Construct validity refers to how well a test or tool measures the construct that it was designed to measure. In eg, to what extent is the BDI measuring depression? • Construct validity represents the strongest approach to test construction. In many ways all types of validity can be considered as sub categories of construct validity. Thus construct validation is never ending process in which new relationship always can be verified and investigated.
  • 36. Construct validity…. • There are two types of construct validity: convergent and discriminant validity. • Convergent validity tests that constructs that are expected to be related are, in fact, related. Discriminant validity (or divergent validity) tests that constructs that should have no relationship in fact. • For eg, in order to test the convergent validity of a measure of self-esteem, a researcher may want to show that measures of similar constructs, such as self-worth, confidence, social skills, and self-appraisal are also related to self-esteem, whereas non- overlapping factors, such as intelligence should not relate, so discriminant validity
  • 37. Criterion Validity • Criterion validity is determined by comparing test scores with some sort of performance on an outside measures. The outside measures should have a theoretical relation to the variable that the test is supposed to measure. • Eg, an intelligence test might be correlated with grade point average: an aptitude test, with other tests measuring similar dimensions. The relation between the two measurements is usually expressed as correlation coefficient.
  • 38. Concurrent validity • Concurrent validity refers to measurements taken at the same, or approximately the same time as a test is taken. • Eg intelligence test might be administered at the same time as assessment of groups level of academic achievement. • Concurrent validity is a concept commonly used in psychology, education, and social science. It refers to the extent to which the results of a particular test or measurement correspond to those of a previously established measurement for the same construct. • An Example of Concurrent Validity. Researchers give a group of students a new test, designed to measure mathematical aptitude. They then compare this with the test scores already held by the school, a recognized and reliable judge of mathematical ability.
  • 39. Predictive Validity • Predictive validity refers to outside measurement that were taken some time after the test score were derived. • Eg Predictive validity might be evaluated by correlating the intelligence test score with measures of academic acheivement a year after the initial testing. • This is the degree to which a test accurately predicts a criterion that will occur in the future. • For example, a prediction may be made on the basis of a new intelligence test, that high scorers at age 12 will be more likely to obtain university degrees several years later. If the prediction is born out then the test has predictive validity. • In the context of pre-employment testing, predictive validity refers to how likely it is for test scores to predict future job performance.
  • 41. Item Analysis • Item analysis is a process which examines student responses to individual test items (questions) in order to assess the quality of those items and of the test as a whole. • A crucial first steps in test development, involving different kinds of evaluation procedures examining and evaluating each items. • A set of methods used to evaluate test items, is one of the most important aspects of test construction. • Once developed, the items will need to undergo a review which relates them to the characteristics assessed and intended targeted population. • Review by experts. • Trail the items on a representative sample (pilot).
  • 42. Item Analysis Trail the items on a representative sample: • Evaluated without any time limit being set. • Sufficiently large enough sample to enable a satisfactory evaluation, with five times as many participants as the numbers of item. Feedback from them is important to • Establish the clarity of items, • Establish the adequacy of any time limit and • Gain information about administration.
  • 43. Item Analysis Basic methods involve assessment of • Item difficulty and • Items discriminability.
  • 44. Item Analysis Item difficulty: • The difficulty level of an item should indicate whether an item is too easy or possibly too difficult. • When examining ability or acheivement tests, clinician are often interested in the difficulty level of the test items. • Item difficulty, an index is calculated as: P= numbers who answer correctly/ total number of individuals
  • 45. Item difficulty.. Item difficulty Index can range from .00 to 1.00 • 0.00 = no one got correct answer • 1.00 = every one got correct answer. • Eg, on an assessment test where 15 of the students in a class of 25 got the first item correct on a test. Then item difficulty (p)= .60 P=15/25 = .60
  • 46. Item Difficulty.. • Probability that an item could be answered correctly by chance alone needs to determine. • A true-false item could be answered correctly half the time if people just guessed randomly. • Thus a true-false item with a difficulty level of .50 would not be a good item. • For most tests, item in the difficulty range of 0.30 to .70 tend to maximize information about the differences among individuals. • However some tests require a concentration of more difficult items ( select medical students).
  • 47. Item Analysis Item Discrimination: • Provides an indication of the degree to which an item correctly differentiate among the examines on the domain of interest. • Determines whether the people who have done well on particular items have also done well on the whole test. • Applicable to achievement or ability tests and can also indicate if items discriminate in other assessment areas, such as personality and interest inventories. • Eg. If individuals who have been diagnosed as being depressed answered an items one way and if individuals who do not report depressive symptoms answer the same question another way.
  • 48. Item Discriminability.. Methods for calculating an item discrimination index. • Extreme group method. • The Point Biserial Method (correlational method)
  • 49. Item Analysis Extreme group method: • Examines are divided into two groups based on high and low scores on the instrument ( the instrument could be a class test, a depression inventory, etc.) • The item discrimination index is then calculated by; d=upper%-lower% upper% = who got the item correct or who endorsed the item in the expected manner.
  • 50. Item analysis Item discrimination indices range from +1.00 to -1.00 • +1.00= all of the upper group got right and none of the lower group got right. • -1.00= none of the upper group got it right and all of the lower group got it right. Interpretation of an item discrimination index depends on the instrument, the purpose it was used for, and the group taking the instrument.
  • 51. Item Discrimnability The Point Biserial Method: • Find the correlation between performance on the item and performance on the total test. • Correlation coefficient that ranges from +1.00to-1.00 with the positive and larger coefficients reflecting items that are better discriminators. • If the values is negative or low, the item cannot discriminate so it should be eliminated. • The closer the value of the index is to 1.0 the better the item because it discriminates well.
  • 52. 3.3: Test Administration • Standardized test administration procedure are necessary for valid results. • Extensive research in social psychology has clearly demonstrated that situational factors can affect scores on mental and behavioral tests. • These effects, however, can be subtle and may not be observed in all studies. • For eg. Few studies shows that the race of the examiner affects scores on standardized intelligence tests. • Similarly, the examiner’s rapports and expectancies may influences scores on some but not all occasions.
  • 53. Test Adminstration Factors that influencing test scores: • The relationship between examiner and test taker.( rapport, friendliness, verbal reinforcement.) • The race of test takers. (same race ) • Language of test takers.( translation ) • Training of the administrator.( pragmatic knowledge) • Expectancy effects.(cognitive bias) • Effects of reinforcing responses.( rewards ) • Computer-Assisted Test Administration. • Mode of the Administration.( Self administered, by test takers) • Subject variables.( motivation, anxiety, illness, hormonal )
  • 54. Computer Assisted Test Administration • Interest has increased because it may reduce examiner bias. • Can administer and score more tests with great precision and with minimum bias. • Computer offers many advantages in test administration, scoring, and interpretation, including ease of application of complicated psychometric issues and the integration on testing. • This mode of test administration is expected to become more common in the near future.
  • 55. Advantage of C.A.T.A. Some of the advantages that computer offers: • Excellence of standardization • Individually tailored sequential administration. • Precision of timing response. • Release of human testers for other duties. • Cost efficiency, utilization of resources. • Patience (test takers not rushed), and • Control of bias.