The Risky Business of Public Health Research

by Steven Milloy

Copyright 1995 by Steven J. Milloy. All rights reserved. First edition. Published by the Cato Institute, 1000 Massachusetts Avenue, N.W., Washington, D.C. 20002. Library of Congress Catalog Number: 95-72177. International Standard Book Number: 0-9647463-2-8.

Unconverted Image

Chapter 5

Mining for Statistical Associations 

Once you've collected your data, how do you find the risk that's your ticket to stardom? There are two tried-and-true techniques virtually guaranteed to turn up something.

Disease Clusters and the Texas Sharpshooter

One of the best techniques is called the Texas Sharpshooter method. It goes something like this: The Texas Sharpshooter sprays the side of an abandoned barn with gunfire. He then draws a bull's eye target around a cluster of bullet holes that occurred randomly. He then can say, "See what a good shot I am!"

Basically, you can be your own sharpshooter if you find a cluster of disease and then shout "Aha!" or "Eureka!" or something to denote you've discovered the mother lode. Clusters are easy to find; they're everywhere, in fact. Epidemiologic studies of hazardous waste sites and electromagnetic fields are famous for clustering and the Texas Sharpshooter technique. For example, a study of a Woburn, Mass., site associated a cluster of 20 childhood leukemia cases with the site. It was very convincing. It didn't even matter that none of the contaminants at the site causes leukemia. That's the power of a cluster!

Consider, for example, one out of every three people in the United States will develop cancer sometime during their lifetimes. We call this the background risk or "natural" rate of cancer. It's yours by virtue of your birth. Now, if you do an analysis of cancer rates by geographic region or state or county or city or neighborhood, you will likely find that some areas will have a cancer rate of exactly 1 in 3. But most areas will have cancer rates that greater or less than 1 in 3.

Now, a real statistician will look at these rates and say, "Well, just by chance some areas will have higher cancer rates and some areas will have lower cancer rates. The differences average out as the geographic area gets larger. So the differences in rates between areas likely means nothing."

That may be, but you can't let that stop you. You've got to grab those areas with higher cancer rates and insist there's more to them than chance. Draw a bull's eye around the cluster you want and take it to the bank.

Data dredging: I know there's an association in here somewhere

Sometimes, clusters aren't obvious. You've got a river of data and nothing's making a ripple. What do you do? Well, what do you do when you're looking for something lost in a river? Simple, you get some dredging equipment and comb the river. By turning up everything, you hope you'll turn up what you want. Or if "you can't always get what you just might find you get what you need." (Just kidding!) So what do you when you're looking for something lost in a river of data. Data dredge!

Conceptually, data dredging is like the Texas Sharpshooter technique except clusters are harder to find. You have to analyze your data forwards and backwards, from the top, bottom, and sides, from the inside out, and from the outside in. You slice it, dice it, and pick it apart any way you can to find an artifact (I mean risk) worth all this trouble.

All you need is a computer and a good statistical analysis program that can go through your data and look at every possible association. The computer does all the work you get all the credit. All you have to do is pick the association you think makes your case and write it up. Let's look at a recent example.

A case-control study looked at risk factors for childhood leukemia, including environmental chemicals, electric and magnetic fields, past medical history, parental smoking and drug use, even dietary intake of certain food items. For "dietary intake of certain food items" alone, the study analyzed nine different foods, including breakfast meats, hot dogs, luncheon meats, hamburgers, charbroiled meats, oranges and orange juice, grapefruits and grapefruit juice, apple juice and cola drinks.

Obviously, right from the start, the researchers had no idea what they were looking for; they were simply on a fishing expedition. Amazingly enough, they caught a big one!

In examining the myriad of possible statistical associations, the study identified associations between a number of exposures and leukemia. These included breastfeeding, use of indoor pesticides, children's use of hair dryers, children's use of black-and-white television sets, incense use, father's occupation, mother's exposure to spray paints during pregnancy, other chemical exposures and home electrical wiring configurations. The association that received the most attention, however, was the one between hot dogs (eating more than 12 dogs a month, that is) and leukemia.

For this association, epidemiologists found a relative risk of 9.5, indicating, in their study, that children consuming more than 12 hot dogs per month were 9.5 times more likely to develop leukemia than children who consumed no hot dogs. The authors determined this association was biologically plausible because processed meats contain nitrites which may be precursors of other chemical compounds that have been associated with causing leukemia in rats and mice.

The researchers concluded their study "suggests" that diet is important to leukemia risk and that reduced consumption of hot dogs could reduce leukemia risk. A great result from a fishing expedition.

My only criticism is that the authors included in their writeup enough information for the careful reader to discern the study failed to come up with associations between other types of processed meats (including ham, bacon, sausage and luncheon meats) and leukemia. Given that these foods also contain nitrites and, therefore, should also be associated with leukemia risk, the authors should have omitted this information from their report. It only detracts from their conclusions about hot dogs. 

Chapter 6

The Mixmaster Technique 

What if you don't have the time or the money or the inclination to do your own epidemiologic study? What if others have already published epidemiologic studies on your risk but they didn't find anything convincing. Or some found something while others haven't? Well, just be very creative.

You could take the existing studies, assume that they are similar enough to be combined and, voila!, you have an entirely new study. This technique is called meta-analysis. The best way to demonstrate the power of meta-analysis is to show you the greatest masterpiece, the Mona Lisa, of all meta-analyses: the Environmental Protection Agency's risk assessment on environmental tobacco smoke (ETS). There simply is no better example of this technique at work.

At the time the ETS risk assessment was conducted, there were 30 published (and who knows how many unpublished) epidemiologic studies on ETS conducted in a number of countries. Of the 30 published studies, eight reported statistically significant associations between exposure to ETS and lung cancer; 22 other studies reported either no association or no statistically significant association. Of the 11 studies that examined U.S. populations, only one reported a statistically significant association.

Realizing the difficulty of credibly associating ETS with lung cancer based on conflicting studies, the ever-resourceful EPA chose meta-analysis. Using this technique, EPA combined the 11 U.S. ETS epidemiologic studies and came up with a relative risk of 1.19 that was statistically significant at a 90 percent confidence level. (Note: Even though their results weren't statistically significant at a 95 percent level they were resourceful enough to claim statistical significance at a lower level. Another clutch decision!) With this "statistically significant" relative risk, EPA went on to estimate 3,000 lung cancer deaths can be attributed to ETS every year.

What's so amazing about all this? Well, EPA did such a good job picking a target for its risk assessment and meta-analysis that the intrinsic characteristics of the target itself were strong enough to overcome the scientific deficiency of the meta-analysis.

ETS was a classic target. The risk was unprovable (any risk would be too small to find, a fact borne out when 10 out of 11 U.S. studies turned up nothing). ETS is a common exposure. The cause-and-effect relationship in question is intuitive. The tobacco industry is easy to pick on. ETS is an involuntary risk. And, for non-smokers, there's no personal sacrifice involved in forcing others to quit. The technical deficiencies, while numerous and significant, were no match for these intrinsic characteristics.

Now remember, meta-analysis depends on the assumption that the studies are similar enough to be combined. Yet mixing the different ETS studies is like mixing apples and oranges. You see, none of the ETS studies contain real exposure information. All the "exposure" data was derived from elderly women being prodded to remember their husbands' smoking habits of decades earlier (like the diesel exhaust studies). Or they came from the memories of other relatives.

None of this exposure data was ever validated or verified for accuracy. The clincher, however, is that each ETS study asked different types of study populations different questions about different time frames. To combine these studies together is truly the epidemiologic personification of the data processing acronym GIGO (garbage in, garbage out).

But, in the end, you've got to give credit where credit is due. EPA picked the right target and hit the bull's eye. The rest is risk assessment history. Maybe this is really a lesson in picking a good target. 

Chapter 7

Instant Risk 

Haven't got time to do your own soup-to-nuts risk assessment? Then "instant" risk is for you. No fuss, no muss and guaranteed results. The classic example of this is risk assessment for ionizing radiation.

Everyone is exposed to ionizing radiation every day. It's unavoidable and natural. The two main sources of ionizing radiation are the earth and space. Soils and rock contain naturally occurring radioactive elements that either give off radiation or emit radioactive particles. Space is continually bombarding us with cosmic rays. You would not consider either of these to be dangerous because they occur naturally. Even if you lived the idyllic lifestyle in the Garden of Eden, you would still be exposed to ionizing radiation from these sources.

Some human populations have had very, very, very high exposures to ionizing radiation. Survivors of atomic bomb explosions. Uranium miners. Women who, in the 1920s, painted watch dials and instrument panels with radium paint and licked their brushes to get better points. Studies have shown a generally accepted association between these very, very, very high radiation exposures and cancer.

Notwithstanding what we know about high levels of ionizing radiation, there is not a generally accepted association between lower levels of ionizing radiation from manmade sources (like medical X-rays) or environmental levels of ionizing radiation from naturally occurring sources (like radon in the home).

Now ordinarily, you might conduct a case-control epidemiologic study to try to identify such an association and many folks have. But you don't need to. Just base your study on those of the atomic bomb survivors, underground uranium miners and radium watch dial painters, and you've got instant risk. How? Why?

Years ago, some genius came up with the theory that if something (say radiation) can be harmful at very high exposure levels, in the absence of knowledge to the contrary, it should be assumed it is harmful at any exposure level. This theory is known in risk assessment circles as the linear nonthreshold model.

Using a graph similar to that above, all you need to do is measure or estimate the exposures to your population, find that exposure level on the graph and follow it over to a risk level. What could be easier? Just make believe that getting a medical X-ray is like surviving an atomic bomb explosion. Or that playing ping-pong in your basement rec room is like working in an underground uranium mine! Sounds silly, you say? Don't worry; this is one of the most commonly accepted tenets in the public health community.

You'll need to be prepared for real scientists who might say the linear nonthreshold model flies in the face of everything we know about risks from low levels of exposures. For example, studies of the atomic bomb survivors report an increased incidence of cancer only at the very highest exposures. Among those survivors with less than the highest levels of exposures, a decreased incidence of cancer (as compared to the general population) was observed.

Epidemiologic studies of workers show what is called the "healthy worker effect." That means despite being exposed to comparatively more "risks" on the job, workers are typically healthier than nonworkers. Finally, vaccines (like those for polio, measles, mumps, diphtheria and the like) intentionally expose humans to low levels of toxins but keep individuals healthy, not sick.

But, as I said earlier, the linear nonthreshold model is a public health mantra. It's not open to criticism.

A final word about the "instant risk" technique. It can save you lots of headaches. Consider the following story.

Not long ago, the National Cancer Institute conducted a very large and well-designed study to look at risk factors for lung cancer, including radon in the home. NCI's study failed to find an association between radon in the home and lung cancer. But at the same time, the Environmental Protection Agency was spending $20 million a year on its own radon program. When NCI published its results, EPA got upset.

Study results threatening the existence of the $20 million dollar radon program won't win friends or influence people in the program. They immediately screamed, "Fix this or else!" To atone for its sin, NCI repudiated its own epidemiologic study and published a new study applying the linear nonthreshold model to the underground uranium miner data. That produced an instantly acceptable risk assessment. And NCI and the EPA radon program were on speaking terms again.

The moral of the story? If you go nonlinear, you will be straightened out by your friends or else!

Chapter 8

The Big Risk Number 

You've calculated your relative risk and you've made it statistically significant. Is that enough? Can you just write up your results, get them published and start filling out those federal grant applications?

You can, but you haven't yet maximized your chances for success. There's one last thing to do and it's easy as pie. You simply take the innocuous relative risk number and "morph" it into a public health crisis.

You need to calculate a risk estimate for some population, preferably a large population or, better yet, all 250 million Americans. If you can figure the number of cancer cases or premature deaths associated with your risk, you're sure to get instant national attention. But how do you do this? Simple. Tell your statisticians you want to calculate an attributable risk. They know how.

Attributable risk is intended to indicate what percentage of deaths in a population are caused by a risk. For example, saying that "16 percent of all deaths are due to being overweight" is an attributable risk. You've attributed 16 percent of all deaths to obesity. All you need to do then is figure out how many deaths there are annually (about 2.2 million in the U.S., according to 1991 statistics), then multiply the number of annual deaths by the attributable risk (16 percent). Voila! A public health crisis is born!




Annual Deaths Attributed to Risk


350,000 from all causes (Source: derived from 1995 Harvard University Study)


390,000 from all causes (Source: U.S. Surgeon General)


40,000 from lung cancer (Source: U.S. EPA)

Chlorinated tap water

10,000 from bladder & rectal cancer (Source: Morris et al 1992)

Environmental tobacco smoke

3,000 from lung cancer (Source: U.S. EPA)

Now your statisticians (if they are competent and conscientious) should ask if you really want to calculate an attributable risk. This query will be based on the following warning that appears on the package of every statistical analysis program:


You, of course, should ignore this warning.