Sources of Bias in Clinical Trials

Sorry for the delay in getting this post up, I’ve been writing it on/off for a couple of weeks now. Having successfully moved house, started a new job, finished some MSc exams and finalising paperwork for a second job, I’ve now had the time to sit down and finish this post!

As well as working clinically, I’m excited to say that I will now be actively involved in research, having been offered a Research Physiotherapist post at the University of Warwick starting next month. I hope that this will both inspire a new series of posts for Applying Criticality, and also give me a different perspective to my writing – inside out as well as outside in!

This post is going to introduce some basic sources of bias to look out for within RCTs. To recapitulate, RCTs are the gold standard for determining the efficacy of an intervention however, not all RCTs will be of the same quality and in turn, won’t all provide the same strength of evidence i.e. some trials are better than others. This post has the aim to act as a reference piece for when you are reading paper or reviews.

I’d like to thank the University of York and Sheffield Hallam University for some of the material included in this post.

Some forms of bias are well known, others less so. Below is a list of potential forms of bias that I will be mentioning throughout the post – this list is not exhaustive and I won’t cover everything associated with each form (Primarily, because I don’t know everything associated with each and secondly, I’d be writing a book!). For more information on critical appraisal and subsequently, bias have a look at this from Cochrane.

  • Selection Bias
  • Subversion Bias
  • Technical Bias
  • Attrition Bias
  • Consent Bias
  • Ascertainment Bias
  • Dilution Bias
  • Recruitment Bias
  • Resentful Demoralisation
  • Delay Bias
  • Chance Bias
  • Hawthorne Effect
  • Analytical Bias

From the above list, forms of bias are often categorised into those which can occur pre-randomisation and those that can occur post-randomisation. I find this personally to be quite a useless categorisation as generally, only Selection Bias can occur before the patients are randomised!

Selection Bias limits the internal validity of a trial (I’ve briefly written about Internal Validity here; how well can the results of the trial be trusted?).

It describes a systematic difference in characteristics between those selected for inclusion in a study and those who aren’t. It occurs when the study sample does not represent the target population for whom the intervention is being researched and may be due to participants being selected for inclusion on the basis of a variable that is associated with outcome of the intervention.

This can be minimised (or even abolished) by implementing methods such as randomisation, matching pairs (see here), a clear inclusion/exclusion criteria (with justification) and a strong recruitment strategy.

Ever read a paper and thought that the equivalence at baseline was too perfect, especially if the randomisation was simple? Allocation may have been subverted.

Subversion Bias always conjures up fantastic mental pictures for me, this occurs when the researcher manipulates the recruitment of the participants, and as such, the groups aren’t equivalent at baseline. An old qualitative paper by Schulz demonstrated this to be quite widespread in healthcare research, with some accounts of researchers x-raying envelopes in order to obtain the randomisation code or even breaking open locked cabinets to subvert allocation!

Quantitative evidence exists also from this paper; 250 RCTs were reviewed and classified into ‘Adequate Concealment’ (Difficult to subvert), ‘Unclear’ or ‘Inadequate Concealment (Subversion was able to take place). The findings demonstrated that badly concealed allocation produced larger effect sizes..

Larger trials in theory should demonstrate greater effect sizes as they have greater power, smaller trials should have smaller effect sizes, if at all, as they are not as well powered, right? It shouldn’t be the other way round, theoretically speaking. This paper shows that when it is, it is often due to poor concealment of allocation in small trials; if trials were to be grouped by their methods of allocation, adequate concealment reduced effect sizes by 51%.

Secure allocation can prevent subversion occurring, and it need not be too expensive, however, it is essential to prevent researchers manipulating recruitment and skewing the outcomes of trials. Examples of which include telephone allocation from a dedicated unit, or utilising an independent researcher/person to undertake allocation.

Technical Bias in Physiotherapy is quite rare, I can’t think of an example but I would love to hear of one if any of you reading this know of any? This occurs when the allocation system fails, often, although not exclusively, due to computer error.

A commonly cited example of this occurring is during the COMET I trial which was investigating the effects of two different types of epidural anaesthesia for women during labour. The trial was using ‘Minimisation’ (a method of allocation used to overcome the issues surrounding Blocked Randomisation) through computer software. The groups were minimised on the basis of Mother’s age and ethnicity however, the programme had a fault.

1000 women were recruited, 3% were allocated to one intervention arm, 53% into another, and 52% into the third. Subsequently, the trial had to be restarted (the birth of COMET II) with 1000 new women recruited and randomised. If you are conducting research, and using a computer programme to allocate your participants, check the balance of your groups as you go along!

One of the most common types of bias seen in published trials is that of Attrition Bias (I assume you have all read the Antibiotics and CLBP trial; their failure to undertake an Intention-to-Treat analysis meant that their results were affected by Attrition Bias).

If a treatment has side effects e.g. Gastrointestinal, this may make drop outs higher amongst the less well participants, which can make a treatment appear to be effective when it is not.

As previously eluded to, Attrition Bias can be minimised by conducting an Intention-to-Treat Analysis, this is when as many patients as possible are kept in the study even if they aren’t receiving any intervention.

Alternatively, the analysis of trial results can be subjected to a sensitivity analysis. This is whereby those participants that drop out in one arm, are assumed to have the worst possible outcome, whilst those in the other arm, are assumed to have the best possible outcome. If the findings are the same, then you can be reasonably confident that the results aren’t subject to Attrition Bias.

Another less well known source of Bias is that of Consent Bias which most frequently occurs when consent to take part in a trial is gained after randomisation, often only seen in cluster trials. Whilst a Physiotherapy example evades me, a good example comes from Graham et al 2002. In this trial, schools were randomised to a teaching package for emergency contraception however, due to consent being gained post-randomisation; more children participated in the intervention than in the control.

This can induce a volunteer effect whereby both the internal and external validity of the trial become limited.

Ascertainment Bias occurs when the person reporting the outcome can be biased. I’ve been looking forward to writing this paragraph as I’m able to cite homeopathy as my example. Those of you who know me, will know that I have little time for the water salesman/quacks as not only do I believe their approach is implausible, unethical and complete voodoo – they also manipulate evidence to market their claims. Stepping off my soap box, Ascertainment Bias has occurred in the following situation.

Homeopathic dilution of histamine was shown in an RCT of cell cultures to have significant effects upon cell motility (motion); the measurement of cell motility however, was not blinded. When the study was repeated with the assessors blind to which petri dish had been treated by distilled water and which petri dish has been treated by distilled water, sorry, I mean homeopathic dilution of histamine, the significant effect was nowhere to be seen.

Having just spoken about Homeopathy, it’s ironic that the next source of bias that I’m going to mention is Dilution Bias. This is when either the intervention or control group, get the opposite treatment; this source of bias is in all trials whereby non-adherence to the intervention has occurred.

A hypothetical example, in a trial investigating the effects of glucosamine and chondroitin for management of Knee OA, 6% of the Control group is receiving the intervention as they’ve bought supplements themselves over the counter. 46% of the intervention group have stopped taking their treatment. Thus, any apparent treatment effect will be diluted. How can this be controlled for? Dilution Bias will always be a problem for any active treatment intervention e.g. low level aerobic exercise, seeking a control therapy; it can be partially prevented by refusing access to the experimental treatment for the control group.

I put a little teaser out on Twitter today to see if anyone could define the next source of bias, Resentful Demoralisation. A rarely used term however again, one that feels my mind with fantastic images. This source of bias even baffled the incredibly brainy @neuromanter (99% of his comments are intelligent, insightful and analytical. Unfortunately, its the 1% that make it into blog posts!) @PDPhysios2 was closest with her attempt however; I know @neuromanter is waiting for me to write about it on here! Resentful Demoralisation can occur when patients are randomised to the trial arm that they don’t want; in response they may in turn report outcomes inaccurately, in revenge almost! The effects can be removed by utilising a patient preference design prior to recruitment/randomisation; only those patients who are indifferent to the treatment they received are subsequently allocated. Next time you write a critical summary of an article, throw in the RD, the person marking your work will clearly see that you know your stuff!

The Hawthorne Effect is a commonly known source of bias; it is when an effect occurs by being part of the study rather than the intervention itself. Often, if an intervention requires more contact time than the control, an effect is more likely to be seen – could this be due to the therapeutic relationship? Placebo?

This effect is usually countered for by the Placebo effect in the control group, and should be considered when designing the control intervention.

As you can probably tell, we are moving to the more obscure, less well known sources of bias! Delay Bias is a source of bias that I hadn’t heard of before starting to research for this post. It occurs when there is a delay between randomisation and the participants receiving the intervention. It is said that in turn, this will dilute any treatment effects the intervention may have (interesting when we consider waiting lists for some interventions in clinical practice..). Delay bias can be accounted for by beginning the analysis for both the intervention and control arms of the trial from the time when treatment is received.

Chance Bias – yes, we are going that obscure! As the name suggests, by chance, the groups can be unequal at baseline for certain characteristics simply due to chance. This can be minimised by using post-hoc ANCOVA analysis or by using stratification – it is probably better to use ANCOVA as stratification could potentially introduce Subversion or Technical Bias, remember those?

While the sources of bias I have written about so far generally occur before data is gathered, or at least have their influence before data is collected. However, once a trial is completed and data collected, is it still possible for the wrong conclusions to be drawn by analysing the data incorrectly.

With regard to the Analytical Bias It is most important to ensure that Intention-to-Treat analysis is undertaken, but be wary of inappropriate sub-group analyses, CLBP anyone..

With regard to analysis of data, it must be by groups as randomised; when per protocol or active treatment analysis is conducted, this can lead to bias being introduced which may inflate effects seen. Those patients that do not receive the full treatment i.e. drop outs, are usually quite different to those who do receive the full treatment; restricting analysis is therefore a large source of bias.

When the main analysis is completed, it is very tempting for researchers to investigate if the effects differ by group – particularly if the main analysis hasn’t shown the effects that the researchers wanted! CLBP anyone..

Examples included is treatment more or less effective for men? Is it better or worse amongst younger people with that condition?

These are of course legitimate questions to ask, and as clinicians, are questions that we want answered! However, from a scientific point of view, the more subgroups the researchers investigated, the higher the chance that a false or spurious effect is found. When the study design is submitted for proposal, the sample size calculation, subsequent recruitment strategy and statistical tests are based on usually one comparison only.

A rather puerile example of this can be seen in a paper in the Lancet; a large RCT investigated the use of aspirin for heart attacks. Subgroup analysis revealed that aspirin for people with the star signs of Gemini or Libra, was ineffective. This highlights quite nicely the dangers of sub group analysis.

Some other examples demonstrating the dangers of sub group analysis;

  • Tamoxifen was ineffective in women younger than 50 years old – erroneous.
  • 6 hours after a heart attack, streptokinase was ineffective – erroneous.
  • Aspirin is ineffective in preventing secondary heart attacks amongst women – erroneous.
  • Antihypertensive medication is an ineffective primary prevention for women – erroneous.
  • Beta-blockers are ineffective in older people – erroneous.

In summary, if sub group analysis is to be used, to avoid false, erroneous or spurious findings, they should be based on a sound hypothesis and stated prior to commencing the trial. This is important, as the more you manipulate data or look for evidence of effect, you will eventually find one.

So, what have I hoped you have gained from reading this post? If you are involved in conducting research, then care must be taken to avoid as many of the aforementioned biases affecting the quality, internal/external validity and the strength of your trial. If you are a clinician, I hope you have gained awareness and understanding of some new, and some old terms. RCTs are the gold standard for intervention research however; this does not mean that all RCTs are reliable or perfect!

As always, please comment below or feel free to contact me on twitter!

A

What is Evidence Based Practice?

What is Evidence Based Practice?

Welcome back to ‘Applying Criticality’, may I first wish you all a Happy Easter and apologise for the time between my first and second posts; I’ve been incredibly busy with work, my MSc and various other activities I find myself unable to say no to, as well as being ill recently! All of which have conspired against me, but anyway, it’s written now!

Evidence based practice (EBP) has been a bit of a buzzword in healthcare since the 1990’s but what exactly is it? The textbook definition often quoted when discussing (EBP) is that from Sackett et al. 2000.

“Integration of the best research evidence with clinical expertise, the clients preferences & values an clinical circumstances”.

Image

Imms and Imms (2005) – This picture demonstrates the interconnected nature of EBP; in the centre of the image where the arrows converge, represents the patient.

What does this definition mean when applied clinically or, when you think about it is an individual clinician?

It means that to be an effective clinician one must:

  • Appraise/determine the quality of research presented in the literature (and in turn discarding poor quality research).
  • Interpret the value or significance of research findings to your practice i.e. individuals or specific circumstances.

I’m hoping you are starting to see that reading the conclusion of a paper is not enough for individual clinicians to demonstrate EBP, sorry (well, I’m not!).

CLINICAL TIP – When I read a paper, I keep the definition of EBP in mind; whilst reading a high quality paper is fantastic, and excellent for broadening ones outlook, if the population, intervention or follow up does not replicate my practice, then the transferability of the paper is often limited.

There lies one of the current tensions between research and practice but that’s for another day, it is just worth bearing in mind that whilst high methodological rigour is required to trust the findings, the paper may not be applicable to your practice.

Introduction to Types of Evidence

The middle bulk of the posts on this blog will look at different study designs and their appraisal however, now may be a useful time to introduce them and get you thinking about the “hierarchy of evidence”.

Research can be Primary or Secondary.

Primary can be divided into Quantitative or Qualitative.

Qualitative refers to ‘what’, ‘why’ or ‘how’? It is considered to be the study design that reaches the parts that other methods cannot reach (Pope and Mays 2006).

Quantitative can be further divided into Experimental (RCT, Controlled Trial, Uncontrolled Trial) or Observational (Case Series, Cross-sectional, Cohort, Case-control).

Whilst within Secondary, the main type of design will be systematic reviews (however, literature reviews and ‘masterclasses’ may be seen).

I know what you are thinking: “Great, he’s listed all these designs I vaguely remember hearing but we only need to know about RCTs, right? They’re the gold standard, right!?”

Unfortunately not, RCTs are indeed the gold standard for intervention studies (although, consider early on that just because a paper utilised a RCT design, does not mean it is automatically rigorous or high quality) however, whilst the ‘Hierachy of Evidence’ provides a nice framework, it is limited as it doesn’t consider the research question (remember that last post?) or even ethics at times.

Image

Depending on the research question, depends on what design should be used to answer the question and therefore it could be argued there are multiple gold standards.

Some Examples

Sensitivity/Specificity of shoulder special tests – Diagnostic Study

Risk factors – Cohort Study

Effectiveness of Mobilisation – RCT

Patients thoughts/satisfaction – Qualitative

As each of the study designs is considered in future posts, what they can tell us and for what research question they should be used for, will become apparent.

For more reading on What is Evidence Based Practice? Do check out this article from Sackett that many see as the foundation of EBP when started in the 1990s: http://www.bmj.com/content/312/7023/71

My next blog post will look at literature searching for the practicing clinician and I hope to have it up sometime over the next couple of days 🙂

Please leave a comment below.

A

References

Imms, W., & Imms, C. (2005) Evidence based practice: Observations from the Medical Sciences, implications for Education.

Pope, C,. & Mays, N. (2006). Qualitative Research in Health Care. 3rd Edition. London: BMJ Publishing Group.

Sackett, D. L., Straus, S.E., Richardson, W.S., Rosenberg. W. & Haynes, R.B. (2000). Evidence-based medicine: how to practice and teach EBM. Edinburgh: Churchill Livingstone.