What Makes A Good Study?

When reading about research conducted on the efficacy of St. John’s Wort, it is important to be able to distinguish between meaningful clinical studies and those that are flawed. A well-designed clinical trial – often referred to as a “valid” or “sound” trial – uses strict methodology to ensure a high degree of confidence in its conclusions. A poorly designed study, however, may generate results that are unjustified or downright false.

Below are some things that should be considered when evaluating the quality of the trials:

1) Was the trial controlled? It’s important that the treatment group of patients (for example, the subjects taking St. John’s Wort) was compared to a control group. The control acts as a standard against which the experimental treatment can be measured. The use of a control group eliminates the influence of the placebo effect, which is improvement that occurs simply because the patients think they are being treated and expect to feel better. (The placebo effect for certain conditions, such as pain or depression, can be quite large.) The control can either be a placebo (i.e. sugar pill) or an accepted treatment (such as an FDA-approved medication).

a. In a placebo-controlled trial, the control group will take a placebo (i.e. sugar pill) that is known to have no biological effect. By comparing the results of the two groups – patients taking the experimental treatment vs. patients taking placebo – it’s possible to separate the placebo effect from the effect of the treatment. The experimental treatment should only be considered effective if the patients in the treatment group improve significantly more than the patients in the placebo group. (For example, let’s say an herb is being studied to treat pain. If the herb is given to 100 patients and 58% report improvement, it’s tempting to conclude that the herb is effective. But if the results are similar for those taking the placebo, then it’s likely that the herb does not, in fact, relieve pain effectively. This indicates that the improvement was due solely to the placebo effect, not the herb.)

b. Instead of comparing the experimental treatment to a placebo, studies may also control by comparing it to another treatment that’s known to be effective. For example, studies have compared St. John’s Wort extract to fluoxetine [Prozac], a prescription antidepressant drug. Fluoxetine has already been extensively studied during the FDA approval process and is known to be effective for depression. When the patients taking St. John’s Wort improved as much or more than those taking fluoxetine, it was concluded that St. John’s Wort is also effective.

c. Comparing the experimental treatment to a control also allows accurate measurement of side effects. (Even patients taking a placebo will frequently report “side effects”, so without a control group it’s impossible to tell whether the experimental treatment is actually responsible for reported side effects.)

d. A study that does not use a control group is sometimes referred to as a “preliminary” trial. While these studies may be interesting, the results should be viewed with skepticism.

2) Was the trial randomized? It’s important that subjects were randomly assigned to either the treatment or the control group. Otherwise, it’s impossible to distinguish what effects were due to the treatment and what effects resulted from differences in the patients.

3) Was the trial double-blind? “Double-blind” means that both the doctors and the patients didn’t know who was receiving the treatment and who was taking a placebo. This prevents bias from contaminating the data. In a “single-blind” study, the doctors know which patients are taking the experimental treatment and which are getting the control/placebo. This knowledge may allow them to bias the results.

4) Did the trial include a large sample size? The “sample size” refers to the number of patients taking part in the study. Large studies with hundreds of patients are generally more representative of the general population and produce statistically stronger results. Multi-center trials are even better, because this means study participants were treated by many different doctors at multiple clinics. Trials using smaller sample sizes (e.g. 30 subjects) can still produce valid results, however, provided that they are controlled, randomized, double-blind, and utilize appropriate statistical methods to analyze the data.

5) Note that some clinicians are now including a third group in good trials, the natural history, in which participants receive no treatment and the disease condition is allowed to run its natural course for comparison to the other two groups.

In sum, well-designed studies will almost always be controlled, randomized, and double-blind. The findings of studies with large sample sizes (hundreds or even thousands of patients) should be considered especially strong.