A test is reliable if it

A test is reliable if it

After going through all the effort of developing an online test, you want it to be an accurate measure. That’s why it’s so important to plan for online test reliability.

In Are Your Online Tests Valid?, we examined test validity or how you can be sure a test measures what it claims to measure. Test validity is required before reliability can be considered in any meaningful way. You may want to read the previous article first.

In this article, we’ll look at test reliability. A test with a high degree of reliability will be a more accurate measure of the learner’s knowledge and skills than one with low reliability. If you have trouble keeping all of these terms straight, think of it this way: reliability = consistency.

Test Reliability Is Consistency

Test reliability is an attempt to reduce the random errors that occur in all tests to a minimum. The way to reduce random errors is to make a test consistent. A test that is reliable or consistent has few variations within itself and produces similar results over time. This is often compared to a scale. If you weigh yourself every day and your weight is reasonably consistent, you consider the scale reliable. If the scale displays wildly different weights from day to day (even during the holidays), you would not consider it a reliable measure.

Test reliability answers the question:

TO WHAT DEGREE IS A TEST CONSISTENT IN WHAT IT MEASURES?

What Makes A Test Consistent?

A test that is reliable will have a degree of consistency evidenced by these characteristics:

  • The test items seem similar or highly related. The test comes together as one whole.
  • There are no great leaps in difficulty, wording and tone. It might seem like one person wrote the entire test.
  • If the test were administered to similar groups, you would see similarities in the scores across the groups.
  • The test is long enough to assess the learner’s knowledge. Very short tests are more affected by the “luck factor.”

How To Improve Online Test Reliability

  • Ensure that the test measures related content. Avoid creating one test for several different courses.
  • Ensure that testing conditions are similar for each learner. For example, if your testing software displays well in a particular browser, then make using the best browser a requirement.
  • Add more questions to the test. A longer test is going to be more reliable.
  • Word test questions very clearly so that no other interpretations are possible.
  • Write test instructions so that they are easily understood.
  • Make sure the answer choices are clearly different from each other and that distractors (wrong answers) are 100% wrong.
  • Create test items of similar difficulty, when possible.
  • Test members of the same audience group twice, ideally a month apart. If the distribution of scores are similar, the test is likely to be reliable. If the scores are very different, improve the questions that had a discrepancy. Take into account that scores on the second test may be a a bit higher. (Because of deadlines and budgets, administering two tests is probably unrealistic. Still, we can dream, can’t we?)

Relationship Of Reliability To Validity

A reliable test is not necessarily a valid test. A test can be internally consistent (reliable) but not be an accurate measure of what you claim to be measuring (validity).

RESOURCES:

  • Are Your Online Tests Valid?
  • How to Plan, Design and Write Tests
  • Improving Test Quality

Get the latest articles, resources and freebies once a month plus my free eBook, Writing for Instructional Design.

A test is reliable if it

When we call someone or something reliable, we mean that they are consistent and dependable. Reliability is also an important component of a good psychological test. After all, a test would not be very valuable if it was inconsistent and produced different results every time. How do psychologists define reliability? What influence does it have on psychological testing?

Reliability refers to the consistency of a measure. A test is considered reliable if we get the same result repeatedly. For example, if a test is designed to measure a trait (such as introversion), then each time the test is administered to a subject, the results should be approximately the same. Unfortunately, it is impossible to calculate reliability exactly, but it can be estimated in a number of different ways.

Test-Retest Reliability

Test-retest reliability is a measure of the consistency of a psychological test or assessment. This kind of reliability is used to determine the consistency of a test across time. Test-retest reliability is best used for things that are stable over time, such as intelligence.

Test-retest reliability is measured by administering a test twice at two different points in time. This type of reliability assumes that there will be no change in the quality or construct being measured. In most cases, reliability will be higher when little time has passed between tests.

The test-retest method is just one of the ways that can be used to determine the reliability of a measurement. Other techniques that can be used include inter-rater reliability, internal consistency, and parallel-forms reliability.

It is important to note that test-retest reliability only refers to the consistency of a test, not necessarily the validity of the results.

Inter-Rater Reliability

This type of reliability is assessed by having two or more independent judges score the test. The scores are then compared to determine the consistency of the raters estimates.

One way to test inter-rater reliability is to have each rater assign each test item a score. For example, each rater might score items on a scale from 1 to 10. Next, you would calculate the correlation between the two ratings to determine the level of inter-rater reliability.

Another means of testing inter-rater reliability is to have raters determine which category each observation falls into and then calculate the percentage of agreement between the raters. So, if the raters agree 8 out of 10 times, the test has an 80% inter-rater reliability rate.

Parallel-Forms Reliability

Parallel-forms reliability is gauged by comparing two different tests that were created using the same content. This is accomplished by creating a large pool of test items that measure the same quality and then randomly dividing the items into two separate tests. The two tests should then be administered to the same subjects at the same time.

Internal Consistency Reliability

This form of reliability is used to judge the consistency of results across items on the same test. Essentially, you are comparing test items that measure the same construct to determine the tests internal consistency.

When you see a question that seems very similar to another test question, it may indicate that the two questions are being used to gauge reliability.

Because the two questions are similar and designed to measure the same thing, the test taker should answer both questions the same, which would indicate that the test has internal consistency.

Influencing Factors

There are a number of different factors that can have an influence on the reliability of a measure. First and perhaps most obviously, it is important that the thing that is being measured be fairly stable and consistent. If the measured variable is something that changes regularly, the results of the test will not be consistent.

Aspects of the testing situation can also have an effect on reliability. For example, if the test is administered in a room that is extremely hot, respondents might be distracted and unable to complete the test to the best of their ability. This can have an influence on the reliability of the measure.

Other things like fatigue, stress, sickness, motivation, poor instructions and environmental distractions can also hurt reliability.

Reliability vs. Validity

It is important to note that just because a test has reliability it does not mean that it has validity. Validity refers to whether or not a test really measures what it claims to measure.

Think of reliability as a measure of precision and validity as a measure of accuracy. In some cases, a test might be reliable, but not valid.

For example, imagine that job applicants are taking a test to determine if they possess a particular personality trait. ​While the test might produce consistent results, it might not actually be measuring the trait that it purports to measure.

A test is reliable if it

By Kendra Cherry
Kendra Cherry, MS, is an author and educational consultant focused on helping students learn about psychology.

Thanks for your feedback!