Validity of an Instrument (part 1)

Commonly, people tend to use already established instruments. In that case, the information on validity will be provided to you by the author of the instrument you use. However, should you choose to create your own instrument, you will then have the job of establishing its validity. There are several ways of doing so.

An instrument is said to demonstrate validity if it measures what you intend for it to measure. For example, if you have written an instrument to measure satisfaction with the counseling relationship, we don’t want the instrument to be measuring how much the client likes the counselor instead. Though perhaps similar, these are two different constructs. Just as reliability reflects random error, validity reflects non-random error. This type of error is systematic, meaning that there is a distinct pattern to it. Suppose you stepped on the scale and it consistently told you that you weighed 140 lbs., but you know that you actually weigh 130 lbs. It consistently under-weighs you, so it is not valid but it is reliable. Suppose sometimes it under-weighs you and other times over-weighs you? In that case it would not only be invalid, but it would also be unreliable.

There are six popular ways to check validity. The first is face validity and though it is the simplest, it is still important. Face validity can be established simply by asking others whether the questions appear to reflect what you are trying to measure. This is a good way to start, but it isn’t enough all by itself. A second type is criterion-related validity. This is where we use an instrument to predict some behavior (the criterion), and the better the prediction, the more valid the instrument. For example, scoring well on a driving test should indicate that you are a good driver. The higher the correlation between your driving test score and your actual driving, the more valid the driving test is said to be. We don’t use this type of validity much in the social sciences because there is usually no appropriate criterion. This is because so many of the things we measure are abstract. For example, if we are measuring levels of math anxiety, there is no specific criterion with which to validate the instrument because a person with high math anxiety doesn’t tend to do any one specific thing consistently enough.

More commonly, we use content validity. This type of validity depends on how much an instrument reflects the content of a specific area. Suppose we are writing a math test for 4th graders and we include addition, subtraction, multiplication, divisions, and fractions. Would this test be content-valid? Yes, because those are the math operations that 4th graders should know. Would the test be valid for 10th graders? No, because 10th graders also should know algebra and geometry, so those operations should also be on the test. Suppose you are writing something a bit more vague, such as an instrument to measure study skills used by home-schooled children? How can you be sure you are covering the content area? You have to do a literature review first. Look up and read articles that talk about study skills used by children in various educational settings. The “boundaries” of the content area are usually vague, so it is up to your own judgment to decide that you have covered the content area well enough.

I’ll talk about three more types of validity in next week’s post, so come back for more!