An article on www.bbc.com caught my eye the other day, which discussed the nature of medical test results and the interpretation of risk. Take a look, as it claims that doctors are not as good as they should be at interpreting (on behalf of patients) the significance of test results.

In this article, they gave the example of a patient (a 50-year-old woman, about whom no medical information is known) who has just had a test for breast cancer.

Note – the *reliability *of a particular test is assessed over time by medical researchers comparing tests results on patients and then seeing (using other follow up tests and procedures) how many of the patients actually had the disease that the test was saying they did or did not have. This way, they get a reasonably decent estimate of how good the test is.

In this BBC article example, the breast cancer test had a *sensitivity *rate of 90%. What sensitivity means is, if we had perfect knowledge that a group of people DID have the disease, what percentage of times would the test come out as positive (positive in medical parlance means the test indicates you HAVE the disease or condition)? In our example, on average 90% of outcomes, or 9 times out of 10, the test would give a positive (correct) result. Of course, the perfect test would be 100% sensitive.

Some terminology is useful here. 90% of the test results would thus result in a TRUE POSITIVE. But 10% of the results would be a FALSE NEGATIVE (i.e. you are told you don’t have the disease but actually you do).

But what if you just walked in off the street, and had the test (with no-one knowing if you had the disease or not) and got a positive result? Well, knowing that the test is 90% sensitive, one error of thinking commonly made is that you might think that this means there is a 90% chance that you have the disease. Not at all – read on.

We need to understand something else here – what is known as the *specificity* of the test. Some more terminology. In the example given in the article, the test had a specificity rate of 91%. And what this means is, if we had perfect knowledge that a group of people did NOT have the disease, how often would the test produce a (correct) negative result? In the example given, it is 91%. Meaning, if the test were performed 100 times, 91 of those times would produce a proper negative result, i.e. what we might call a TRUE NEGATIVE. But 9 of those times would produce a positive result when you don’t actually have the disease – in other words, 9 FALSE POSITIVES. Again, if a test had perfect specificity (100%), all would be great.

This is a fundamental concept – a test needs to be assessed for how reliable it is when it is performed on groups that DO have the disease, but also for how reliable it is when it is carried out on groups that don’t. It may not be intuitive, but these two things are entirely separate. Even if a test is performed on the same person repetitively, these types of errors will produce some false negatives or some false positives (depending on whether the person does or does not have the condition). Where tests have similar levels of sensitivity and specificity, that’s just a coincidence – they can be very different because it all depends on the underlying logic, science and fallibility of the measuring process.

The ideal combination would be a test that had 100% sensitivity, *and *100% specificity. To my knowledge, and you will not be surprised to hear this, there are few if any tests that are so accurate. If a test were 100% accurate in both these ways, the test would have perfect predictive power, and you could properly rely on it. But this never happens in the real world. Again, a test might be quite sensitive but have worse specificity, or vice versa.

Well, so far we have more of an idea of the issues involved. But, about half of the gynaecologists in the BBC article apparently concluded from the above data that the chances of the women having cancer from the positive test alone were in fact 90%. This is wrong, as we shall see.

The problem is that, in the real world of imperfect tests that do not have 100% sensitivity and 100% specificity, we can’t assess the significance of test results without knowing what proportion of the overall population actually have the disease (this is known as the *prevalence *rate).

To take a silly example, if we knew that 100% of the population always had the disease we can see that we don’t need the tests. We know the answer already! And, if we knew that the prevalence rate was zero, we also know the answer already and don’t need the tests. But anything in between we need to have a handle on. Why?

Well, because the test does not have perfect sensitivity or specificity (the test has a tendency to throw up false positives and false negatives) we have to *weight *or *balance *the size of these outcomes by the prevalence of the disease. Specifically;

- Imagine you got a positive result – you would be interested to know how likely – adjusted for prevalence – is a true positive compared to a false positive.
- And if you got a negative result? You would be interested, instead, in how likely – adjusted for prevalence – is a true negative compared to a false negative.

An example. If the same number of people in the population have the disease as those who don’t (a prevalence rate of 50%), then the ‘weights’ applied to the false positive effect are the same as the weights applied to the false negative effect. And in this case, with both sensitivity and specificity in the BBC example being 90% (actually the latter was 91%, but that is near enough), then it is correct to say the lady with the positive result would have a probability of 90% of having the disease. But only in this example of a prevalence of 50%. And even then, the specificity has to be the same as the sensitivity. What if the proportion of the disease in the population was actually only 1%?

Well, we can see ‘intuitively’ that this causes a problem. If someone walks off the street and has the test, there is only 1 chance in a 100 that they have the disease. If tested, the likelihood of a true positive (where they do have the condition, and the test produces a correct result) is very much less than the probability of a false positive (where they don’t have the disease, and the test provides an incorrect result).

Why? The test – if you have the disease – generates true positives 90% of the time. But the test, if you don’t have the disease, still returns false positives 9% of the time (100 minus the specificity of 91). But we still have to adjust for the prevalence of the disease. We multiply true positive returns of 90% by 1% (the prevalence rate) to produce **0.9%. **We then multiply false positive returns of 9% by 99% (100 minus the prevalence rate) to get very nearly 9.0%. We find that any positive result is thus approximately 10 times more likely to be a false alarm that a real positive.

Hence the real likelihood of having the disease, following a positive test result, is (in this example) 10%.

And we can see, all other things being equal, that if;

- we increase the prevalence of the disease, any positive result from a test means we are more likely to have the disease
- we decrease the specificity, OR decrease the sensitivity, any positive result from a test means we are less likely to have the disease
- if we decrease the specificity, any negative result from a test is slightly less likely to be correct
- and so on

To give you some perspective. If we drop the sensitivity and specificity of the test we used in the above example to 75%, then we have to raise the prevalence of the disease from 1% to 3% to produce the same likelihood of having the disease if we get a positive result from the test.

So, if you have to explain this to patients, or have your own chat with your doctor, I hope this all helps you be a little better prepared!