Nature-and-use of psychological in testing

Unit 3: Psychological testing
3.1: Nature and use of Psychological Test
Suman Sharma
Trichandra Multiple Campus

Nature and use of Psychological test
• Psychological testing is the administration of psychological tests,
which are designed to be "an objective and standardized
measure of a sample of behavior”
• Psychological testing is a systematic procedure for obtaining
samples of behavior, relevant to cognitive, affective, or
interpersonal functioning, and for scoring and evaluating those
samples according to standards.
• Psychometrics is the field of study concerned with the theory
and technique of psychological measurement, which includes
the measurement of knowledge, abilities, attitudes, personality
traits,
and educational measurement.

Psychological test
• Used to assess a variety of mental abilities and
attributes, including achievement and ability,
personality, and neurological functioning.
• Personality tests are administered for a wide
variety of reasons, from diagnosing
psychopathology (e.g., personality disorder,
depressive disorder) to screening job candidates.
• They may be used in an educational setting to
determine personality strengths and weaknesses.

Psychological Tests As Tools
• When used appropriately and skillfully, can be
extremely helpful even irreplaceable.
• Psychological test are always a means to an
end and never an end in themselves.
• Tool can be an instrument of good or harm
depending on how it is used.

Use of Psychological Test
Uses of psychological tests in three categories:
• Decision making
• Psychological Research
• Self understanding and personal development.

Decision Making:
• Psychological test are used to make decision
about people, either as individual or in group.
• Help in Diagnosis of different problems.
Psychological Research:
• Research on psychological phenomena of the
individual differences in the field of
developmental, educational, social, vocational
psychology etc.

Self understanding and Personal Development:
• In therapeutic process of promoting self
understanding and psychological adjustment.
• Testing as a way to provide clients with
understanding and positive growth about
themselves.

Major Application of Psychological Test
1. Identification of mentally retarded person.
2. Assessment in Education.
3. Selection and classification of people in
organization.
4. Individual counseling.
5. Basic Research.
6. Diagnosis and Prediction.

Characteristics of Psychological Tests.
• Proper psychological testing is conducted after
vigorous research and development. Proper
psychological testing consists of the following.
1. Standardization.
2. Objectivity.
3. Test Norms.
4. Reliability.
5. Validity.

Characteristics of Psy test
• Standardization - All procedures and steps must be conducted with
consistency and under the same environment to achieve the same
testing performance from those being tested.
• Objectivity - Scoring is free of subjective judgments or biases based on
the fact that the same results are obtained on test from everyone.
• Test Norms - The average test score within a large group of people
where the performance of one individual can be compared to the
results of others by establishing a point of comparison or frame of
reference.
• Reliability - Obtaining the same result after multiple testing.
• Validity - The type of test being administered must measure what it is
intended to measure.

3.2: Technical and Methodological Principles
• Norms and Meaning of Test scores,
• Reliability
• Validity
• Item Analysis

Norms
• Scores that were obtained with an initial group of people took
the test.
• Norms provide a basis for comparing the individual with a
group.
• Norm represents the test performance of the standardized
sample.
• Any raw score can then be interpreted relative to the
performance of the reference (or Normative )group – five years
olds, sixth standards, institutional inmates, job applicants etc..
• Test performance or typical behavior of one or more referenced
groups.

Norms
• Determines an average score and a typical range of scores
norm of test is established.
• This makes it possible to know weather a particular score
is unusually high, unusually low or in the average range.
• They provide relative rather than absolute information.
• Norm reflects how a particular group ( the norm group)
performed on a particular test at a particular point in
time.
• By nature 50% of the scores are below mean and 50% of
the score lies above mean.

Norms
• Similarly, norm score do not provide any
information about the mastery of a skill,
although it is often assumed that if a score is
greater than -1.0 Standard Deviation below the
mean than performance is with in normal
limits and therefore acceptable.
• Eg. If we compare students to what is normal
for that age, grade, or class. SAT, GRE, I.Q tests.

Norms
• Usually presented in the form of tables with
descriptive statistics- such as means, standard
deviations, and frequency distributions- that
summarize the performance of the group or
groups in questions.
• When norms are collected from the test
performance of group of people, these
reference group are labeled normative or
standardization samples.

Reliability
• the extent to which the outcome of a test
remains unaffected by irrelevant variations
and procedures of testing.
• the extent to which test scores obtained by
a person are the same if the person is re-
examined by the same test on different
occasions.

Reliability
• Reliability of a test refers to its degree of
stability, consistency, predictability, and
accuracy.
• In simple terms, reliability in test is the degree
to which test produces stable and consistent
results. A specific measure is considered to be
reliable if its application on the same object of
measurement number of times produces the
same results.

Reliability…
• Underlying Concept of reliability is the possible range of
error, or error of measurement, of a single score. This is an
estimate range of possible random fluctuation that can be
expected in an individual score.
• These degree of error or noise is always present in system,
from such factor as: i)misreading of the items, ii)poor
administration procedure, iii)changing mood of the clients
etc..
• The goal of the test constructor is to reduce as much as
possible, the degree of measurement error or random
fluctuation.

Reliability..
• Two main issues relate to the degree of error(reliability) in a test.
1. Inevitable natural variation in human performance (usually the variability is
less for measurement of ability than for those personality). where as ability
variables (intelligence, mechanical aptitude, etc) shows gradual changes,
many personality traits are much more highly dependent on factors such as
mood and highly fluctuation.
2. Psychological testing methods are necessarily imprecise. For hard sciences,
researchers can make direct measurements such as concentration of
solution, relative wt of compound. In contrast many construct in
psychology are often measured indirectly. Eg. Intelligence cannot be
percevied directly: it must be inferred by measuring behaviour that has
been defined as being intillegent.
• Although some error in testing is inevitable, the goal is to keep testing error
with in reasonably accepted limits.(0.80 or more)

Primary Methods of obtaining Reliability
1. Test re test reliability( time to time)
2. Alternate forms reliability ( form to form)
3. Split half reliability (item to item )
4. Inter scorer reliability (Score to Score)

Test-Retest Reliability
• Repeatability or test–retest reliability is the closeness
of the agreement between the results of successive
measurements of the same measure and carried out
under the same conditions of measurement.
• In order to measure test-retest reliability, we must
first give the same test to the same individuals on two
occasions and correlate the scores.
• If the correlation is high, the results are less likely to
caused by random fluctuation in the condition of the
examinee or the testing environment.

Test retest reliability….
• In general, test-retest reliability is the
preferred method only if the variables being
measured is relatively stable traits. If the
variable is highly changeable (eg anxiety,
mood)this method is usually not adequate.

Alternate reliability (form to form)
• The alternate form method avoids many of the problems
encountered with test-retest reliability. The logic is, if the
trait is measured several times on the same individual by
using parallel forms of the test, the different
measurement should produce similar results.
• The degree of similarity between the scores represents
the reliability coefficient of test.
• Alternate form reliability occurs when an individual
participating in a research or testing scenario is given
two different versions of the same test at different times.

Alternate form…
• Primary difficulty with alternate form lies in
determining whether the two forms are actually
equivalent. For eg. If one test is more difficult than
its alternate form, the difference in score may
represents actual differences in the two test rather
than difference resulting form the unreliability of the
measures.
• Another difficulty is delay between one
administration and the next. With such delay cause
short term fluctuation such as mood, stress level etc.

Split half reliability
• Split half method is the best technique for determining
relaibility for a trait with a high degree of fluctuation.
Because the test is given only once, the items are split in
half and the two halves are correleated.
• As there is only one administration, it is not possible for
the effects of time to intervene as they might with test-
retest method.
• The split-half method assesses the internal consistency of
a test, such as psychometric tests and questionnaires.
There, it measures the extent to which all parts of the test
contribute equally to what is being measured.

Split half reliability…
• As a general principle, the longer a test is, the
more reliable it is because the larger number
of items, the easier it is for the majority of the
items to compensate for minor alteration in
responding to a few of the other items.
• This is done by comparing the results of one
half of a test with the results from the other
half.

Inter scorer reliability
• In some test, scoring is based partially on the judgment of the
examiner. Because judgment may vary between one scorer and the
next.
• In testing, inter-scorer reliability is the degree of agreement among
raters or judges. It is a score of how much homogeneity, or consensus,
there is in the ratings given by various examiners.
• The basic strategy for determining inter scorer reliability is to obtain a
series of responses from a single client and to have these responses
scored by two different individuals. A variation is to have two different
examiners test the same client using the same test and then to
determine how close their score or ratings of persons are.
• Any test that requires even partial subjectivity in scoring should
provide information on inter scorer reliability.

Validity
• Validity refers to a test's ability to measure
what it is supposed to measure.
• It addresses what the test is to be accurate
about.
• Establishing the validity of a test can be
extremely difficult primarily because
psychological variable are usually abstract
concept such as intelligence, anxiety and
personality.

Validity…
• Although a test can be reliable without being
valid, the opposite is not true: a necessary
condition for validity is that the test must have
achieved the adequate level of reliability.

Validity condt..
• A distinction can be made between internal and external validity. These types of
validity are relevant to evaluating the validity of a research study / procedure.
• Internal validity refers to whether the effects observed in a study are due to the
manipulation of the independent variable and not some other factor. In-other-
words there is a causal relationship between the independent and dependent
variable.
• Internal validity can be improved by controlling extraneous variables, using
standardized instructions, counter balancing, and eliminating demand
characteristics and investigator effects.
• External validity refers to the extent to which the results of a study can be
generalized to other settings (ecological validity), other people (population
validity) and over time (historical validity).
• External validity can be improved by setting experiments in a more natural
setting and using random sampling to select participants.

Main method of establishing validity

Content validity
• During the initial construction phase of any
test, the developers must first be concerned
with content validity. This refers to the
representativeness and relevance of the test
instrument to the construct being measured.
• During the initial item selection, the
constructor must carefully consider the skills of
knowledge area of the variable they would like
to measure.

Face validity
• Face validity, also called logical validity, is a simple
form of validity where you apply a superficial and
subjective assessment of whether or not your study
or test measures what it is supposed to measure.
• For example. A group of potential mechanics who
are being tested for basic arithmetic skills should
have word problem that relate to machines rather
than business transaction.

Construct validity
The basic approach of construct validity is to assess the extent
to which the test measures a theoretical construct or trait.
• Construct validity refers to how well a test or tool measures
the construct that it was designed to measure. In eg, to what
extent is the BDI measuring depression?
• Construct validity represents the strongest approach to test
construction. In many ways all types of validity can be
considered as sub categories of construct validity. Thus
construct validation is never ending process in which new
relationship always can be verified and investigated.

Construct validity….
• There are two types of construct validity: convergent and
discriminant validity.
• Convergent validity tests that constructs that are expected to be
related are, in fact, related. Discriminant validity (or divergent
validity) tests that constructs that should have no relationship in
fact.
• For eg, in order to test the convergent validity of a measure of
self-esteem, a researcher may want to show that measures of
similar constructs, such as self-worth, confidence, social skills,
and self-appraisal are also related to self-esteem, whereas non-
overlapping factors, such as intelligence should not relate, so
discriminant validity

Criterion Validity
• Criterion validity is determined by comparing test
scores with some sort of performance on an outside
measures. The outside measures should have a
theoretical relation to the variable that the test is
supposed to measure.
• Eg, an intelligence test might be correlated with
grade point average: an aptitude test, with other
tests measuring similar dimensions. The relation
between the two measurements is usually
expressed as correlation coefficient.

Concurrent validity
• Concurrent validity refers to measurements taken at the same, or
approximately the same time as a test is taken.
• Eg intelligence test might be administered at the same time as
assessment of groups level of academic achievement.
• Concurrent validity is a concept commonly used in psychology,
education, and social science. It refers to the extent to which the
results of a particular test or measurement correspond to those of
a previously established measurement for the same construct.
• An Example of Concurrent Validity. Researchers give a group of
students a new test, designed to measure mathematical aptitude.
They then compare this with the test scores already held by the
school, a recognized and reliable judge of mathematical ability.

Predictive Validity
• Predictive validity refers to outside measurement that were taken some
time after the test score were derived.
• Eg Predictive validity might be evaluated by correlating the intelligence
test score with measures of academic acheivement a year after the
initial testing.
• This is the degree to which a test accurately predicts a criterion that will
occur in the future.
• For example, a prediction may be made on the basis of a new
intelligence test, that high scorers at age 12 will be more likely to obtain
university degrees several years later. If the prediction is born out then
the test has predictive validity.
• In the context of pre-employment testing, predictive validity refers to
how likely it is for test scores to predict future job performance.

Nature-and-use of psychological in testing

Item Analysis
• Item analysis is a process which examines student responses to
individual test items (questions) in order to assess the quality of
those items and of the test as a whole.
• A crucial first steps in test development, involving different kinds of
evaluation procedures examining and evaluating each items.
• A set of methods used to evaluate test items, is one of the most
important aspects of test construction.
• Once developed, the items will need to undergo a review which
relates them to the characteristics assessed and intended targeted
population.
• Review by experts.
• Trail the items on a representative sample (pilot).

Item Analysis
Trail the items on a representative sample:
• Evaluated without any time limit being set.
• Sufficiently large enough sample to enable a
satisfactory evaluation, with five times as many
participants as the numbers of item.
Feedback from them is important to
• Establish the clarity of items,
• Establish the adequacy of any time limit and
• Gain information about administration.

Item Analysis
Basic methods involve assessment of
• Item difficulty and
• Items discriminability.

Item Analysis
Item difficulty:
• The difficulty level of an item should indicate
whether an item is too easy or possibly too
difficult.
• When examining ability or acheivement tests,
clinician are often interested in the difficulty level
of the test items.
• Item difficulty, an index is calculated as:
P= numbers who answer correctly/ total number of individuals

Item difficulty..
Item difficulty Index can range from .00 to 1.00
• 0.00 = no one got correct answer
• 1.00 = every one got correct answer.
• Eg, on an assessment test where 15 of the
students in a class of 25 got the first item
correct on a test. Then item difficulty (p)= .60
P=15/25 = .60

Item Difficulty..
• Probability that an item could be answered correctly by
chance alone needs to determine.
• A true-false item could be answered correctly half the time if
people just guessed randomly.
• Thus a true-false item with a difficulty level of .50 would not
be a good item.
• For most tests, item in the difficulty range of 0.30 to .70 tend
to maximize information about the differences among
individuals.
• However some tests require a concentration of more difficult
items ( select medical students).

Item Analysis
Item Discrimination:
• Provides an indication of the degree to which an item correctly
differentiate among the examines on the domain of interest.
• Determines whether the people who have done well on
particular items have also done well on the whole test.
• Applicable to achievement or ability tests and can also indicate if
items discriminate in other assessment areas, such as
personality and interest inventories.
• Eg. If individuals who have been diagnosed as being depressed
answered an items one way and if individuals who do not report
depressive symptoms answer the same question another way.

Item Discriminability..
Methods for calculating an item discrimination
index.
• Extreme group method.
• The Point Biserial Method (correlational method)

Item Analysis
Extreme group method:
• Examines are divided into two groups based on
high and low scores on the instrument ( the
instrument could be a class test, a depression
inventory, etc.)
• The item discrimination index is then calculated
by; d=upper%-lower%
upper% = who got the item correct or who
endorsed the item in the expected manner.

Item analysis
Item discrimination indices range from +1.00 to
-1.00
• +1.00= all of the upper group got right and none
of the lower group got right.
• -1.00= none of the upper group got it right and
all of the lower group got it right.
Interpretation of an item discrimination index
depends on the instrument, the purpose it was
used for, and the group taking the instrument.

Item Discrimnability
The Point Biserial Method:
• Find the correlation between performance on the item
and performance on the total test.
• Correlation coefficient that ranges from +1.00to-1.00
with the positive and larger coefficients reflecting items
that are better discriminators.
• If the values is negative or low, the item cannot
discriminate so it should be eliminated.
• The closer the value of the index is to 1.0 the better the
item because it discriminates well.

3.3: Test Administration
• Standardized test administration procedure are necessary for
valid results.
• Extensive research in social psychology has clearly
demonstrated that situational factors can affect scores on
mental and behavioral tests.
• These effects, however, can be subtle and may not be
observed in all studies.
• For eg. Few studies shows that the race of the examiner
affects scores on standardized intelligence tests.
• Similarly, the examiner’s rapports and expectancies may
influences scores on some but not all occasions.

Test Adminstration
Factors that influencing test scores:
• The relationship between examiner and test taker.( rapport,
friendliness, verbal reinforcement.)
• The race of test takers. (same race )
• Language of test takers.( translation )
• Training of the administrator.( pragmatic knowledge)
• Expectancy effects.(cognitive bias)
• Effects of reinforcing responses.( rewards )
• Computer-Assisted Test Administration.
• Mode of the Administration.( Self administered, by test takers)
• Subject variables.( motivation, anxiety, illness, hormonal )

Computer Assisted Test Administration
• Interest has increased because it may reduce
examiner bias.
• Can administer and score more tests with great
precision and with minimum bias.
• Computer offers many advantages in test
administration, scoring, and interpretation, including
ease of application of complicated psychometric
issues and the integration on testing.
• This mode of test administration is expected to
become more common in the near future.

Advantage of C.A.T.A.
Some of the advantages that computer offers:
• Excellence of standardization
• Individually tailored sequential administration.
• Precision of timing response.
• Release of human testers for other duties.
• Cost efficiency, utilization of resources.
• Patience (test takers not rushed), and
• Control of bias.

Nature-and-use of psychological in testing

More Related Content

What's hot (20)

Similar to Nature-and-use of psychological in testing (20)

Recently uploaded (20)

Nature-and-use of psychological in testing