The American Psychiatric Association (APA) has updated its Privacy Policy and Terms of Use, including with new information specifically addressed to individuals in the European Economic Area. As described in the Privacy Policy and Terms of Use, this website utilizes cookies, including for the purpose of offering an optimal online experience and services tailored to your preferences.

Please read the entire Privacy Policy and Terms of Use. By closing this message, browsing this website, continuing the navigation, or otherwise continuing to use the APA's websites, you confirm that you understand and accept the terms of the Privacy Policy and Terms of Use, including the utilization of cookies.

×
Published Online:https://doi.org/10.1176/appi.focus.140202

Abstract

Background:

Atypical antipsychotic medications are widely prescribed for the adjunctive treatment of depression, yet their total risk–benefit profile is not well understood. We thus conducted a systematic review of the efficacy and safety profiles of atypical antipsychotic medications used for the adjunctive treatment of depression.

Methods and Findings:

We included randomized trials comparing adjunctive antipsychotic medication to placebo for treatment-resistant depression in adults. Our literature search (conducted in December 2011 and updated on December 14, 2012) identified 14 short-term trials of aripiprazole, olanzapine/fluoxetine combination (OFC), quetiapine, and risperidone. When possible, we supplemented published literature with data from manufacturers' clinical trial registries and US Food and Drug Administration New Drug Applications. Study duration ranged from 4 to 12 wk. All four drugs had statistically significant effects on remission, as follows: aripiprazole (odds ratio [OR], 2.01; 95% CI, 1.48–2.73), OFC (OR, 1.42; 95% CI, 1.01–2.0), quetiapine (OR, 1.79; 95% CI, 1.33–2.42), and risperidone (OR, 2.37; 95% CI, 1.31–4.30). The number needed to treat (NNT) was 19 for OFC and nine for each other drug. All drugs with the exception of OFC also had statistically significant effects on response rates, as follows: aripiprazole (OR, 2.07; 95% CI, 1.58–2.72; NNT, 7), OFC (OR, 1.30, 95% CI, 0.87–1.93), quetiapine (OR, 1.53, 95% CI, 1.17–2.0; NNT, 10), and risperidone (OR, 1.83, 95% CI, 1.16–2.88; NNT, 8). All four drugs showed statistically significant effects on clinician-rated depression severity measures (Hedges' g ranged from 0.26 to 0.48; mean difference of 2.69 points on the Montgomery–Asberg Depression Rating Scale across drugs). On measures of functioning and quality of life, these medications produced either no benefit or a very small benefit, except for risperidone, which had a small-to-moderate effect on quality of life (g = 0.49). Treatment was linked to several adverse events, including akathisia (aripiprazole), sedation (quetiapine, OFC, and aripiprazole), abnormal metabolic laboratory results (quetiapine and OFC), and weight gain (all four drugs, especially OFC). Shortcomings in study design and data reporting, as well as use of post hoc analyses, may have inflated the apparent benefits of treatment and reduced the apparent incidence of adverse events.

Conclusions:

Atypical antipsychotic medications for the adjunctive treatment of depression are efficacious in reducing observer-rated depressive symptoms, but clinicians should interpret these findings cautiously in light of (1) the small-to-moderate-sized benefits, (2) the lack of benefit with regards to quality of life or functional impairment, and (3) the abundant evidence of potential treatment-related harm.

Please see later in the article for the Editors' Summary.

(Reprinted with permission from PLoS Med 2013; 10(3): e1001403)

Introduction

Atypical antipsychotic medications are widely used in the treatment of major depressive disorder. In the United States in 2007 and 2008, there were an estimated 3.9 million treatment visits per year in which an antipsychotic medication was prescribed for depression, and nearly all of these (96%) involved prescription of an atypical antipsychotic medication [1]. Although aggregate statistics mask the specific indications for use (i.e., monotherapy versus adjunctive therapy), this represents a substantial increase in antipsychotic treatment of depression over time, as there were just over 2 million such visits annually during 1995 and 1996, of which 405,000 involved prescriptions for atypical antipsychotic medications. These data are also consistent with market reports from industry [2]. Three atypical antipsychotic medications have approval from the US Food and Drug Administration (FDA) as adjunctive therapies in depression for adults, while none are approved for monotherapy. These approvals (and subsequent marketing efforts), along with the volume of prescriptions, suggest that a large number of prescriptions for atypical antipsychotic medications written for the treatment of depression are being used for adjunctive therapy [35].

The efficacy of adjunctive atypical antipsychotic therapy in reducing depression symptom severity in major depressive disorder is summarized in two previous systematic reviews, but neither comprehensively summarized data on both efficacy and safety [6,7]. Both reviews analyzed efficacy only in terms of dichotomous response and remission outcomes derived from clinician-rated depression measures and did not assess changes in terms of symptom severity on the underlying continuous rating scales. Safety was only assessed by examining dropout rates due to adverse events; the authors of these meta-analyses [6,7] and of a relevant narrative review noted that a comprehensive summary of safety data is lacking [8]. A Cochrane review provided a more thorough assessment of both efficacy and safety outcomes but did not include data on important patient-centered efficacy outcomes such as patient-rated depression, functional impairment, or quality of life [9]. The Cochrane review assessed the frequency of several relevant adverse events, but some critical adverse events of interest, such as elevated cholesterol or triglyceride levels, were not included. Further, and most importantly, effect size estimates presented in these reviews may have been inflated because the authors did not summarize unpublished data, such as those from FDA New Drug Applications (NDAs) or manufacturers’ clinical trial registries [1013]. Given the importance of functional status, quality of life, and drug-related side effects to the overall assessment of well-being and recovery from depressive mood episodes [1416], we conducted this meta-analysis to provide a comprehensive estimate of the efficacy and safety profiles of atypical antipsychotic medications for the adjunctive treatment of major depressive disorder.

Methods

Ethical Review.

Because this was a study-level systematic review and meta-analysis of trials, and did not involve collection and analysis of any individual-level data, ethical approval was not sought for this study.

Search Strategy.

This systematic review was reported using PRISMA guidelines; the PRISMA checklist is provided as Text S1. To identify both published and unpublished studies for review, we searched Medline, PsycINFO, ClinicalTrials.gov, and the Cochrane Central Register of Controlled Trials using the terms depression AND (aripiprazole OR asenapine OR clozapine OR iloperidone OR lurasidone OR olanzapine OR paliperidone OR quetiapine OR risperidone OR zjprasidone). Medline search results were restricted to the following article types: clinical trial, controlled clinical trial, or randomized controlled trial. Our literature search was conducted in December 2011 and updated on December 14, 2012. In addition, we searched the American Psychiatric Association Annual Meeting New Research Abstracts for 2001–2010 using each of the generic drug names as a search term, then winnowed the results down to abstracts that appeared to possibly meet the inclusion criteria. We also examined all references in a previously published meta-analysis [6] as well as those contained in each published study obtained through our literature search.

To obtain additional unpublished data, we searched the drug manufacturers’ online clinical trial registries as well as FDA NDAs for the atypical antipsychotic medications that have received an indication for the adjunctive treatment of major depressive disorder (aripiprazole, olanzapine-fluoxetine combination [OFC], and quetiapine). For published studies, we supplemented published data with data available in NDAs or clinical trial registry reports whenever such data were available.

Study Selection.

Trials were included if they were acute-phase (i.e., not for relapse prevention or maintenance treatment [17,18]), placebo-controlled trials in which participants treated with antidepressant medications were randomly assigned to additionally receive an atypical antipsychotic medication or placebo. In order to meet our definition of treatment-resistant depression, participants must have been diagnosed with current major depressive disorder and must have been determined to have had an inadequate response to at least one course of antidepressant medication treatment prior to enrollment in the study. Furthermore, data for at least one outcome measure must have been reported in a manner that allowed calculation of an effect size. No language exclusions were applied.

Data Extraction.

Four study authors (G. I. S., A. P., M. I. B., and E. L.) coded study descriptor data. To establish consistency, all coders first coded the articles reporting outcomes from the aripiprazole studies. Then two study authors (G. I. S. and A. P.) jointly coded the OFC and risperidone articles, and two study authors (M. I. B. and E. L.) jointly coded the quetiapine articles. Disagreements were resolved by consensus. Coders were not blind to the results of the coded studies.

Several descriptor variables were coded for each study. (1) Flexible dosing versus fixed dosing regimen. (2) Dosage range. (3) Mean dosing achieved at end point. (4) Number of participants in each group of trial. (5) Duration of acute-phase treatment (weeks). (6) Number of prior failed trials of antidepressant medications, where the number of failed trials prior to study enrollment (historical) and the number of failed trials during the study (prospective) prior to initiation of the study drug for adjunctive treatment were recorded separately. (7) Procedures employed to evaluate for major depressive disorder (structured interview or otherwise). (8) Use of a structured instrument versus open-ended questioning to elicit adverse events [19,20] (with the latter assumed if no details were reported). (9) Adverse events scale(s) used to systematically assess for any particular adverse event(s), if any. (10) The criterion used to establish a minimum level of occurrence for adverse events reporting in the trial (e.g., if only those adverse events occurring in at least 5% of participants were reported in the associated journal article, the adverse events reporting threshold was coded as 5%). (11) Extent to which the random-sequence generation procedures were adequate versus inadequate or unclear [21]. Adequate sequence generation procedures included use of a computer program, random number table, coin tossing, randomly drawing envelopes, throwing dice, or similar methods. Merely describing the trial as randomized was considered an unclear method of sequence generation. (12) Whether or not the study eliminated placebo responders prior to randomization. (13) Whether or not persons who had a prior nonresponse to the study drug were excluded. (14) Whether or not the placebo was described as identical to the study drug in terms of at least two of the following three criteria: taste, appearance, and smell [22]. (15) Use of blinded raters, coded as affirmative if the following two conditions were met: (a) it was explicitly stated that blinded raters were used, and (b) it was explicitly stated that different personnel were used to rate efficacy measures and adverse events [2325]. (16) Funding sponsor.

Efficacy and safety outcome data were independently extracted by two authors (G. I. S. and A. P.) and then checked for agreement. Disagreements were resolved by checking the original data source.

Outcome Measures.

Remission was defined variably across studies. We recorded the most stringent definition of remission utilized in each trial while also recognizing that the Montgomery–Asberg Depression Rating Scale (MADRS) [23] was the most commonly used outcome measure in the included trials. One end-point remission measure was selected from each trial according to the following order of priority: MADRS ≤8, then Hamilton Depression Rating Scale (HAM-D) ≤7 [24], then MADRS ≤10. Some trials of OFC defined remission as MADRS ≤8 at two consecutive visits during the study even if these two consecutive visits did not necessarily occur at study end point [2527]. The clinical trial registry reports of these trials also provided the number of participants who met remission criteria at an interim time point but then relapsed. For these studies, we calculated the number of participants in remission as the number of participants who achieved interim remission minus the number of patients who subsequently relapsed.

Response was defined across studies as a 50% improvement from baseline to end point on either the MADRS or HAM-D [28]. When studies provided response rates for both measures, we used the MADRS as the response measure, as it was the most commonly reported measure of response.

We recorded data from any continuous measure of depression, quality of life, or functioning but opted not to analyze single rating scale items from larger scales (e.g., individual MADRS items) separately because they were infrequently reported. When data were reported on both the MADRS and HAM-D, we included data from the MADRS, as it was the most commonly used measure of depressive symptoms. The only continuous self-report measure of depression used in these trials was the Inventory of Depressive Symptomatology Self Report [29]. Continuous measures of quality of life included the Quality of Life Enjoyment and Satisfaction Questionnaire (Q-LES-Q) [30] and the Short Form 36 Health Survey (SF-36) [31]. The only continuous measure of functional impairment employed in these trials was the Sheehan Disability Scale (SDS) [32]. As measures of quality of life and functional impairment varied across studies, we pooled such measures together to create an omnibus effect size for each drug, and across all drugs.

We aggregated conceptually similar adverse events into the following categories. (1) Sedation-related: asthenia, fatigue, lethargy, sedation, somnolence, or feeling tired. (2) Akathisia-related (either self-reported or observer-rated): akathisia or restlessness. (3) Extrapyramidal symptoms (EPS), other than akathisia-related (either self-reported or observer-rated): dyskinesia, dystonia, extrapyramidal disorder, EPS, muscle spasms, muscle twitching, parkinsonism, or tremor. (4) Abnormal metabolic laboratory results: elevated fasting or nonfasting total cholesterol, low-density lipoprotein (LDL) cholesterol, or triglycerides; low high-density lipoprotein (HDL) cholesterol; or elevated fasting or nonfasting glucose, glycated hemoglobin; or hyperglycemia. (5) Elevated prolactin. (6) Edema or peripheral edema. (7) Significant weight gain, defined across various trials as weight gain of ≥7%, ≥10%, or >10% from baseline to end point.

We also coded events that were reported in the categories of pain, psychiatric events, nausea, and infection. However, because no sign of elevated risk was gleaned from these data, these analyses are not reported (data available from authors on request).

Statistical Analysis.

The quality of data reporting varied across studies. For continuous outcomes, effect sizes were computed from means and standard deviations when possible. When these were not provided, effect sizes were computed based on means and p-values, or p-values only. In some studies, three or more treatment groups were compared, thereby creating a structural dependency that could affect our estimates. For example, two fixed doses (A and B) of an adjunctive atypical antipsychotic medication might be compared to one group that received adjunctive placebo (C), in which case the estimated efficacy of A and B would be defined relative to the same comparison group. To maintain independence, we pooled these comparisons and utilized their average (e.g., the average of A versus C and B versus C).

Each effect size was weighted by its inverse variance in order to provide a pooled effect size estimate that most accurately approached the true population effect size [33]. We calculated odds ratios (ORs) for categorical measures and used Cohen’s d for continuous measures. We converted continuous effect sizes to Hedges’g, which corrects for a small bias in Cohen’s d [33]. We reported both efficacy and safety data for each drug individually and across drugs. An OR presents a relative measure of treatment effect; to also provide a measure of absolute benefit/harm, we calculated the number needed to treat (NNT) for treatment benefits and the number needed to harm (NNH) for adverse events [34]. The NNT represents the number of participants who would need to be treated with an adjunctive antipsychotic to gain one additional beneficial response over what would have been obtained had all patients received adjunctive placebo. NNH represents the number of patients who would require treatment to generate one additional adverse event relative to placebo. NNT/NNH values were calculated based on the pooled OR rather than from the risk difference in each study, as the risk difference is associated with more between-study heterogeneity than the OR [35]. Conversions from OR to NNT were performed in Visual Rx software [36]. The baseline risk (required for calculating NNT) was estimated by using the pooled rate of events occurring among placebo-treated patients weighted by each study’s total sample size. The baseline risk was calculated separately for each drug, so that placebo participants in one drug’s trials were not used to calculate baseline risk for a different drug. As in any meta-analysis, our estimates of NNT and NNH generalize only to situations in which patients receive a similar dosage for a similar treatment duration; further, estimated NNH and NNT apply only when generalizing to patients similar to those in the included trials. Because of various study inclusion and exclusion criteria, patients in the placebo groups in our meta-analysis may not be representative of patients seen in some clinical practice settings.

We performed homogeneity analyses using the Q statistic. Because the Q test of homogeneity often lacks power to detect heterogeneity when the number of trials in a meta-analysis is small, we also calculated the I2 statistic [37]. To pool estimates across studies while incorporating potential heterogeneity, we employed a random effects model in all analyses [38]. Confidence intervals for I2 were calculated using Method III as described in Higgins and Thompson [39] using a spreadsheet. When performing such calculations in pooled analyses based on only two comparisons when Qk, we added the number 1 to both Q and k in order to avoid the mathematical problem of dividing by zero; this generally resulted in a slight shrinking of the confidence intervals under these conditions. Unless specified otherwise above, all analyses were performed using Comprehensive Meta-Analysis software [40]. We lacked adequate statistical power to perform subgroup analyses.

We examined the potential existence of publication bias by performing trim and fill analysis for pooled continuous depression outcomes. Trim and fill procedures examine potential asymmetry of effect sizes. Based on the assumption that effects are distributed symmetrically, trim and fill analysis imputes the number and likely effect size of missing studies, then recalculates the pooled analysis with imputed data from missing studies [41].

Results

Study Characteristics.

The evidence search flow is described in Figure 1. We obtained one controlled trial of aripiprazole that used low doses (2 or 5 mg); we did not include this trial because the starting dose of 2 mg was administered for 30 d prior to participants switching to the dose of 5 mg that falls within the recommended 5–10 mg range set by the FDA [42]. Characteristics of the 14 included studies are provided in Table 1. The definition of treatment-resistant depression differed somewhat across trials. The process by which diagnoses were made was described clearly in six trials, and the number of prior failed trials varied across studies. Only three studies clearly described their random-sequence generation procedures, and only one trial clearly described using clinical raters who were blind to both treatment assignment and participants’ reports of adverse events. While most trials used rating scales to assess for EPS and akathisia, and a minority of trials used a measure of sexual functioning, no trial reported using a structured instrument for eliciting a broad range of adverse events. All studies were funded by the study drug manufacturer except for one trial that was funded jointly by the study drug manufacturer and the US National Institute of Mental Health [27].

Figure 1.

Figure 1. Flowchart of Published Studies Examined for Inclusion in Meta-Analysis. MDD, Major Depressive Disorder; RCT, Randomized Controlled Trial.

Table 1. Characteristics of Included Studies.

Study First Author (Year) [Reference]AntipsychoticAntide-pressantDaily Dosage at End PointNaMean Age (Years)Percent FemaleDuration (Weeks)Prior Failed TrialsInterview to Establish DiagnosisbCategorical Depression MeasurescSupplemental Data SourcesdAdverse Events Assessed SystematicallyAdverse Events Reporting ThresholdAdequate Sequence Generation?Established Placebo Resistance?Prior Drug Nonresponders Excluded?Placebo Similarity?Blinded Raters?
Bauer (2009) [81]QuetiapineVariousFixed, 150 or 300 mg48745.467.661 historicalMINIRemission: MADRS ≤8; response: MADRSCTR, FDAAkathisia, EPS, sexual functioning>5% in any group?NoYes??
Berman (2007) [75]AripiprazoleVariousFlexible, M = 11.8 mg35345.462.861–3 historical, 1 prospective?Remission: MADRS ≤10; response: MADRSCTR, FDAAkathisia, EPS, sexual functioning≥5% in any groupYesYesYes??
Berman (2009) [82]AripiprazoleVariousFlexible, M = 10.7 mg34345.373.161–3 historical, 1 prospective?Remission: MADRS ≤8; response: MADRSCTRAkathisia, EPS, sexual functioning≥5% in any group?YesYes??
Corya (2006) [25]OFCFluoxetine or venlafaxineFixed; olanzapine 6 mg/fluoxetine 25 mg, olanzapine 6 mg/fluoxetine 50 mg, olanzapine 12 mg/fluoxetine 25 mg, or olanzapine 12 mg/fluoxetine 50 mg34445.772.5121 historical, 1 prospective?Remission: MADRS ≤8 at two consecutive visits excluding patients who relapsed; response: MADRSCTRAkathisia, EPS≥10% in OFC group?NoNo??
El-Khalili (2010) [83]QuetiapineVariousFixed, 150 or 300 mg43245.572.561 historicalMINIRemission: MADRS ≤8; response: MADRSCTR, FDAAkathisia, EPS, sexual functioning>5% in any groupYesNoNoYes?
Keitner (2009) [73]RisperidoneVariousFlexible, M = 1.6 mg9545.256.741 prospectiveSCIDRemission: HAM-D ≤7; response: MADRSNoneNone??NoNo?Mostlye
Mahmoud (2007) [44]RisperidoneVariousFlexible, M = ?, 1 or 2 mg permitted26846.173.561 prospective?Remission: HAM-D ≤7; response: HAM-DNoneNone≥2% in any groupYesNoNo??
Marcus (2008) [76]AripiprazoleVariousFlexible, M = 11.0 mg36944.566.761–3 historical, 1 prospective?Remission: MADRS ≤10; response: MADRSCTR, FDAAkathisia, EPS, sexual functioning≥5% in any group?YesYes??
McIntyre (2007) [84]QuetiapineVariousFlexible, M = 182 mg5844.562.081 trial?Remission: HAM-D ≤7; response: HAM-DNoneNone?f?NoNo??
Reeves (2008) [77]RisperidoneVariousFlexible, M = 1.17 mg2344.069.681 prospective?Remission: N/A; response: N/ANoneAkathisia (one item from EPS scale), EPS≥13% of total participants?NoNo??
Shelton (2001) [27]OFCFluoxetineFlexible, mean modal dose = olanzapine 13.5 mg/fluoxetine 52 mg2042.07582 historical and 1 prospective?Remission: MADRS ≤8 at two consecutive visits excluding patients who relapsed; response: MADRSCTRAkathisia, EPSNumber of adverse events not reported?NoNo??
Shelton (2005) [26]OFCFluoxetine or nortriptylineFlexible, mean modal dose = olanzapine 8.5 mg/fluoxetine 35.6mg35642.069.481 historical, 1 prospectiveSCIDRemission: MADRS ≤8 at two consecutive visits excluding patients who relapsed; response: MADRSCTRAkathisia, EPS≥10% of OFC group?NoNo??
Thase 1 (2007) [85]gOFCFluoxetineFixed; olanzapine 6 mg/fluoxetine 50 mg, olanzapine 12 mg/fluoxetine 50 mg, or olanzapine 18 mg/fluoxetine 50 mg20344.160.281 historical, 1 prospectiveSCIDRemission: MADRS ≤10; response: MADRSCTRAkathisia, EPS≥10% of OFC group?NoNo??
Thase 2 (2007) [85]gOFCFluoxetineFixed; olanzapine 6 mg/fluoxetine 50 mg, olanzapine 12 mg/fluoxetine 50 mg, or olanzapine 18 mg/fluoxetine 50 mg19844.968.081 historical, 1 prospectiveSCIDRemission: MADRS ≤10; response: MADRSCTRAkathisia, EPS≥10% of OFC group?NoNo??

aNumber of participants included in the intent-to-treat or modified intent-to-treat analysis on the primary depression rating scale in the trial.

bIf no interview was explicitly mentioned, then this variable was coded as ‘‘?’’; MINI, Mini International Psychiatric Interview; SCID, Structured Clinical Interview for DSM-IV.

cIndicates measures used to define remission and/or response in each study.

d‘‘CTR’’ indicates a clinical trial registry report from the sponsor’s online database; ‘‘FDA’’ indicates an FDA statistical review.

eBlinded raters rated outcomes on the MADRS and HAM-D, whereas the study psychiatrist, who also elicited reports of adverse events, rated the Clinical Global Impressions–Severity.

fAdverse events reported by fewer than two of the 29 patients in each group were categorized as ‘‘other’’ in the study’s table.

gData for these two trials were reported separately for some variables but jointly for others.

M, mean.

Table 1. Characteristics of Included Studies.

Enlarge table

Efficacy.

In terms of remission, adjunctive treatment with each antipsychotic was associated with a statistically significant benefit, with ORs ranging from 1.42 to 2.37 (Table 2). ORs for response were also statistically significant for aripiprazole, quetiapine, and risperidone—but not for OFC (Table 2). The NNT for remission was nine for aripiprazole, quetiapine, and risperidone but was a substantially higher 19 for OFC (Table 2). NNTs for response were seven (aripiprazole), eight (risperidone), and ten (quetiapine). Pooled ORs are displayed visually in Figures 2 and 3 [43]. Among participants who achieved remission during treatment, participants assigned to OFC were less likely to remain in remission than participants assigned to placebo. Only two of 56 placebo participants relapsed, compared to 18 relapses among 99 participants on OFC (OR, 0.27; 95% CI, 0.08–0.90).

Table 2. Summary of Dichotomous Efficacy and Safety Measures.

ComparisonOutcomekOR (95% CI)aQI2(95% CI)p(Q)NNT/NNH (95% CI)
All combinedRemission131.77 (1.49–2.09)9.200% (0%–43.38%)0.6910 (8–15)
Response131.61 (1.33–1.95)19.2937.78% (0%–67.76%)0.089 (7–16)
AripiprazoleRemission32.01 (1.48–2.73)0.340% (0%–38.81%)0.849 (6–18)
Response32.07 (1.58–2.72)1.600% (0%–87.0%)0.457 (5–12)
Akathisia37.47 (5.07–11.0)1.630% (0%–87.24%)0.444 (3–6)
Sedation32.56 (1.63–4.03)0.680% (0%–69.41%)0.7114 (8–33)
Weight gain ≥7%35.91 (2.14–16.29)0.570% (0%–63.50%)0.7529 (10–119)
OFCRemission51.42 (1.01–2.0)4.7215.19% (0%–82.38%)0.3219 (9–713)
Response51.30 (0.87–1.93)8.1350.78% (0%–81.95%)0.0917 (NNH 34; NNT 7)b
Weight gain ≥10%416.28 (7.02–37.76)0.880% (0%–47.80%)0.839 (5–20)
Elevated metabolic lab results44.46 (2.07–9.58)4.5033.38% (0%–76.44%)0.2110 (5–29)
Sedationc32.87 (1.64–5.03)7.8374.45% (0%–92.32%)0.025 (3–12)
Edemad313.19 (5.46–31.89)0.240% (0%–13.32%)0.897 (4–16)
Elevated prolactin44.30 (2.36–7.83)4.9138.84% (0%–79.16%)0.186 (4–11)
Akathisia41.48 (0.96–2.30)3.175.36% (0%–85.51%)0.3728 (NNH 11; NNT 321)b
QuetiapineRemission31.79 (1.33–2.42)0.420% (0%–50.47%)0.819 (6–19)
Response31.53 (1.17–2.0)0.790% (0%–73.67%)0.6710 (6–26)
Sedation38.36 (5.83–11.98)1.730% (0%–87.98%)0.423 (2–3)
Elevated metabolic lab results22.45 (1.80–3.34)0.400% (0%–85.14%)0.536 (4–9)
Weight gain ≥7%32.86 (1.11–7.37)0.970% (0%–78.55%)0.6237 (12–594)
RisperidoneRemission22.37 (1.31–4.30)0.010% (0%–79.40%)0.929 (5–35)
Response21.83 (1.16–2.88)0.540% (0%–86.49%)0.468 (5–33)

Measures of response and remission are reported for each treatment. Adverse events measures are reported for events that reached a statistical threshold of p<0.10 in terms of OR. For further description of the data underlying the adverse events effect sizes, see Table 4.

aTrials with no events in either study arm are not included in summary OR calculations.

bThe 95% confidence interval included the possibility of both treatment-related benefit and treatment-related harm.

cBecause the total number of events in the OFC group was higher than the sample size of the group in Shelton et al [27], an effect size could not be calculated, and it was thus not factored into the overall effect size estimate for sedation. Given the very small sample of the study, this makes virtually no difference in the overall effect size estimate.

dThe four trials in which edema was reported for OFC participants had an average rate of 18.32%. Edema was not listed as an adverse event in Shelton et al [26] for any participants in either the OFC or placebo group. As these data did not fit with the other OFC trials, we excluded this study from the calculation of the risk for placebo participants.

Table 2. Summary of Dichotomous Efficacy and Safety Measures.

Enlarge table
Figure 2.

Figure 2. Remission Rates by Drug and Overall.

Figure 3.

Figure 3. Response Rates by Drug and Overall.

Pooled effect sizes for continuous outcomes are provided in Table 3. Adjunctive aripiprazole, quetiapine, OFC, and risperidone were all more efficacious than adjunctive placebo based on clinician-rated measures of depression severity (MADRS/HAM-D). Effect sizes were as follows: aripiprazole: g = 0.35 (95% CI, 0.23–0.48); OFC: g = 0.26 (95% CI, 0.04–0.45); quetiapine: g = 0.40 (95% CI, 0.26–0.53); and risperidone: g = 0.48 (95% CI, 0.22–0.73). The effects of risperidone may have been exaggerated by the reliance on post hoc analysis rather than a priori analysis in the largest study of the drug, as the effect of the drug was greater at 6 wk (g = 0.46) than at the prespecified primary end point of 4 wk (g = 0.32) [44]. According to convention, these effect sizes would be considered ‘‘small’’ or ‘‘small to moderate’’ in magnitude [45]. Effect sizes on depression severity measures did not differ significantly between drugs (QB=1.93, p = 0.59), though there was limited power to detect such differences. The pooled difference in mean change on the MADRS in the 11 trials that reported such data was 2.69. In these 11 trials, the mean effect size was g = 0.31, which differed only slightly from the overall mean effect size when including both the HAM-D and MADRS; thus, the 11 trials reporting MADRS mean change data seem representative of the entire sample of included trials. Only the trials of adjunctive aripiprazole reported self-reported depression symptom severity, yielding a very small effect size of g = 0.15. The effects observed on the Clinical Global Impressions–Severity Scale were either small or small-to-moderate, with the exception of risperidone, for which a moderate effect was generated.

Table 3. Effect Sizes and Heterogeneity of Effect Sizes on Continuous Measures.

DrugOutcome TypeOutcomeStudy First Author (Year) [Reference]g+(95% CI for Totals)Raw Units (95% CI for Totals)p(g+)QI2(95% CI)p(Q)
AripiprazoleDepressionMADRSBerman (2007) [75]0.35 (0.33)a3.01 (3.01)
Berman (2009) [82]0.383.73
Marcus (2008) [76]0.32 (0.26)a2.84 (2.52)
Total0.35 (0.23, 0.48) (0.33 [0.20, 0.47])a3.15 (2.07, 4.23) (3.14 [1.87, 4.41 ])a<0.001 (<0.001 )a0.18 (0.53)a0% (0%–0%) (0% [0%–60.75%])a0.92 (0.77)a
IDS-SRBerman (2007) [75]0.17
Berman (2009) [82]0.13
Marcus (2008) [76]0.14
Total0.15 (0.03, 0.27)0.020.080% (0%–0%)0.96
QoL/functioningSDSBerman (2007) [75]0.19 (0.10)a
Berman (2009) [82]0.16
Marcus (2008) [76]0.25 (0.10)a
Total0.20 (0.08, 0.33) (0.12 [–0.02, 0.26])a0.001 (0.08)a0.37 (0.1 7)a0% (0%–43.78%) (0% [0%–0%])a0.83 (0.92)a
Q-LES-QBerman (2007) [75]0.16
Berman (2009) [82]0.26
Marcus (2008) [76]0.28
Total0.23 (0.11, 0.36)<0.0010.640% (0%–67.50%)0.73
Total QoL/functioning0.22 (0.09, 0.34)0.0010.330% (0% –36.96%)0.85
Global improvementCGI-SBerman (2007) [75]0.37
Berman (2009) [82]0.31
Marcus (2008) [76]0.46
Total0.38 (0.26, 0.50)<0.0011.050% (0%–80.19%)0.59
Metabolic parametersWeight gain (kg)Berman (2007) [75]1.67
Berman (2009) [82]0.4
Marcus (2008) [76]1.05
Total1.05 (0.35, 1.74)0.00312.0383.38% (49.64%– 94.51%)0.002
Sexual functioningSFI overall satisfactionBerman (2007) [75]?
Berman (2009) [82]0.25
Marcus (2008) [76]?
Total?b
OFCDepressionMADRSCorya (2006) [25]0.151.35
Shelton (2001) [27]1.0412.40
Shelton (2005) [26]0.080.65
Thase 1 (2007) [85]0.141.40
Thase 2 (2007) [85]0.575.60
Total0.26 (0.04, 0.45)2.57 (0.33, 4.81)0.0211.3764.83% (7.70%– 86.59%)0.02
QoLSF-36 MCSCorya (2006) [25]nsc
Shelton (2001) [27]0.50
Shelton (2005) [26]–0.10
Thase (2007) [85]0.13
Total0.04 (–0.17, 0.25)0.723.1336.18% (0%– 79.57%)0.21
SF-36 PCSCorya (2006) [25]nsc
Shelton (2001) [27]–0.06
Shelton (2005) [26]–0.13
Thase (2007) [85]0.19
Total0.03 (–0.23, 0.30)0.824.5756.25% (0%– 87.52%)0.10
Total QoL0.04 (–0.19, 0.26)0.743.4642.20% (0%– 82.51%)0.18
Global improvementCGI-SCorya (2006) [25]0.12
Shelton (2001) [27]0.31
Shelton (2005) [26]0.27
Thase 1 (2007) [85]0.08
Thase 2 (2007) [85]0.32
Total0.20 (0.08, 0.32)0.0012.350% (0%–64.60%)0.67
ProlactinProlactin (ng/ml)Shelton (2005) [26]?d
Thase (2007) [85]0.272.50
Metabolie parametersWeight gain (kg)Corya (2006) [25]3.98
Shelton (2001) [27]5.79
Shelton (2005) [26]3.88
Thase (2007) [85]4.5
Total4.20 (3.79, 4.61)<0.0013.339.80% (0%–86.21 %)0.34
Cholesterol (total nonfasting, mg/dl)Corya (2006) [25]0.287.59
Shelton (2005) [26]?e?e
Thase (2007) [85]0.4314.3
Total0.37 (0.21, 0.54)10.85 (4.28, 17.43)<0.0011.2519.88% (0%– 90.75%)0.26
Triglycerides (mg/dl)Thase (2007) [85]0.2223.90
Total0.22 (0.02, 0.41)23.90 (2.39, 45.41)0.03N/AN/AN/A
QuetiapineDepressionMADRSBauer (2009) [81]0.412.89
El-Khalili (2010) [83]0.342.45
HAM-DMcIntyre (2007) [84]0.715.70
Total0.40 (0.26, 0.53)f2.68f<0.0011.690% (0%–87.69%)0.43
QoLQ-LES-QBauer (2009) [81]0.10
El-Khalili (2010) [83]–0.02
Total0.04 (–0.09, 0.18)0.530.680% (0%–87.62%)0.41
Global ImprovementCGI-SBauer (2009) [81]0.41
El-Khalili (2010) [83]?g
McIntyre (2007) [84]0.69
Total0.44 (0.26, 0.62)<0.0010.990% (0%–89.55%)0.32
ProlactinProlactin (ng/ml)Bauer (2009) [81]0.030.44
Prolactin (mlU/L)El-Khalili (2010) [83]–0.08–12.12
Total–0.02 (–0.16, 0.12)0.770.610% (0%–87.08%)0.43
Metabolie parametersHDL Cholesterol (mg/dl)Bauer (2009) [81]–0.11–1.0
El-Khalili (2010) [83]–0.21–1.820.020.430% (0%–85.45%)0.43
Total–0.16 (–0.29, –0.02)–1.38 (–2.58, –0.18)
LDL cholesterol (mg/dl)Bauer (2009) [81]0.174.75
El-Khalili (2010) [83]0.040.77
Total0.11 (–0.03, 0.24)2.86 (–1.03, 6.76)0.120.910% (0%–89.11%)0.34
Total cholesterol (mg/dl)Bauer (2009) [81]0.216.51
El-Khalili (2010) [83]0.184.83
Total0.19 (0.06, 0.33)5.68 (1.57, 9.79)0.010.070% (0–80.47%)0.80
Triglycerides (mg/dl)Bauer (2009) [81]0.2719.65
El-Khalili (2010) [83]0.3638.09
Total0.31 (0.17, 0.45)26.90 (9.24–44.57)<0.0010.380% (0%–84.93%)0.54
Weight gain (kg)Bauer (2009) [81]0.95
El-Khalili (2010) [83]0.89
McIntyre (2007) [84]2.65
Total0.94 (0.62, 1.26)<0.0011.030% (0%–79.80%)0.60
Sexual functioningCSFQBauer (2009) [81]0.01
El-Khalili (2010) [83]?h
Total?
RisperidoneDepressionMADRSKeitner (2009) [73]nsi?
HAM-DMahmoud (2007) [44]0.46 (0.32)j2.80 (1,90)j
MADRSReeves0.607.11
Total0.48 (0.22, 0.73) (0.34 [0.11, 0.58])j,k<0.001 (0.004)0.09 (0.42)0% (0%–80.91%) (0% [0%–85.35%])0.76 (0.52)
QoL/functioningQ-LES-QKeitner (2009) [73]0.54
Mahmoud (2007) [44]0.39l
Total0.43 (0.20, 0.66)l<0.0010.330% (0%–84.36%)0.56
SDSMahmoud (2007) [44]0.57l
Total0.57 (0.28, 0.85)l<0.001N/AN/AN/A
Total QoL/functioning0.49 (0.26, 0.73)l<0.0010.050% (0%–80.19%)0.82
Global improvementCGI-SKeitner (2009) [73]0.44
Mahmoud (2007) [44]0.72
Reeves (2008) [77]0.78
Total0.64 (0.42, 0.87)<0.0011.290% (0%–83.87%)0.53
ProlactinProlactin (ng/ml)Keitner (2009) [73]m
Mahmoud (2007) [44]m
Reeves (2008) [77]0.8038.85
Total0.80 (–0.19, 1.80)38.85 (–5.67, 83.37)0.11N/AN/AN/A
Weight gainWeight gain (kg)Keitner (2009) [73]1.81
Mahmoud (2007) [44]1.13
Reeves (2008) [77]–0.45
Total1.26<0.0013.5142.95% (0%–82.87%)0.17
SummaryMADRS/HAM-D0.34 (0.25, 0.43)2.69 (1.82, 3.54)n<0.00119.4438.26% (0%–67.99%)0.08
QoL/functioningo0.17 (0.06, 0.28)0.00318.2550.68% (0%–76.05%)0.03

aThe FDA statistical review noted that many participants violated study protocol, often as a result of taking non-allowed medications. The numbers in parentheses represent data from only participants who did not violate the study protocol.

bAs data were not presented clearly for two of three trials, we opted to not treat the single study with clearly reported data as representative of the three aripiprazole studies; thus, we provide no summary effect size calculation for sexual functioning.

c‘‘ns’’ indicates no statistically significant difference versus placebo; data were not reported in a more detailed manner.

dOFC had a greater increase in prolactin versus fluoxetine (0.31 nmol/l, p<0.001) and versus nortriptyline (0.37 nmol/l, p<0.001). Given the inexact p-values, we did not calculate an effect size.

eOFC had a greater increase in mean total nonfasting cholesterol versus fluoxetine (0.30 mmol/l, p<0.001) and nortriptyline (0.33 mmol/l, p = 0.004). Given the inexact p-value of the OFC versus fluoxetine comparison, we did not calculate an effect size.

fHedges’ g includes effects from all three trials, whereas the raw unit analysis includes only change on the MADRS, which was not used in the McIntyre et al. [84] study.

gIt was reported that 150 mg of quetiapine was superior to placebo, with an associated p-value of 0.095, from which a d = 0.20 was calculated. For the 300-mg dose, the p-value was reported as <0.05, so it was not possible to calculate an exact effect size, as a standard deviation for this measure was not provided.

hThere were ‘‘no apparent differences among the treatment groups’’ according to the El-Khalili et al. [83] clinical trial registry report. Given the lack of clarity regarding sexual functioning data, we provide no summary effect size calculation for sexual functioning.

i‘‘ns’’ indicates no statistically significant difference versus placebo; data were not reported in a more detailed manner.

jThe primary efficacy end point in the Mahmoud et al. [44] study was 4 wk, so data in parentheses indicate data from this a priori end point rather than from the 6-wk end point that was emphasized much more heavily in the study publication.

kHedges’ g pools data from Mahmoud et al. [44] and Reeves et al. [77]. We provide no summary estimate of mean differences on the MADRS, as only the small Reeves et al. [77] study reported these data.

lMahmoud et al.’s primary end point was 4 wk [44], but these data are presented at 6 wk. These results may be inflated relative to the primary end point, given that the effect favoring risperidone on the HAM-D was smaller at week 4 than at week 6.

mProlactin levels were apparently not measured in these trials.

nPooled raw units are for MADRS only.

oThese data are pooled across the SDS, Q-LES-Q, SF-36 Mental Component Summary, and SF-36 Physical Component Summary.

CGI-S, Clinical Global Impressions–Severity; CSFQ, Changes in Sexual Function Questionnaire; IDS-SR, Inventory of Depressive Symptomatology–Self Report; QoL, quality of life; SF-36 MCS, SF-36 Mental Component Summary; SF-36 PCS, SF-36 Physical Component Summary; SFI, Sexual Function Inventory.

Table 3. Effect Sizes and Heterogeneity of Effect Sizes on Continuous Measures.

Enlarge table

With regards to quality of life and functioning, adjunctive quetiapine, aripiprazole, and OFC produced effect sizes that were either not statistically significant or small and clinically negligible in magnitude. Adjunctive risperidone was more efficacious than adjunctive placebo on quality of life/functioning, with a small-to-moderate effect size. The pooled effect across quality of life/functioning measures varied significantly across treatments (QB = 6.88, p = 0.003), with risperidone (g = 0.49) yielding a higher effect than the other three drugs combined (g = 0.11), which did not differ significantly from each other (QB = 4.02, p = 0.13). However, the effect of aripiprazole on quality of life/functioning was small (g = 0.22) and statistically significant (p = 0.001), whereas the effects of OFC (g = 0.04, p = 0.74), and quetiapine (g = 0.05, p = 0.53) were both not statistically significant and of quite small magnitude. The effect of aripiprazole on quality of life/functioning should be interpreted with caution, as the effect for the drug on the SDS was very small and no longer statistically significant when patients who violated study protocol were excluded from analysis (g = 0.12, p = 0.08). Similarly, the effect of risperidone on quality of life/functioning should be interpreted tentatively since it is largely driven by post hoc analyses.

Adverse Events.

Atypical antipsychotic medications differed substantially in their reported adverse event profiles. Table 2 reports adverse events that showed increased risk (p≤0.10). A more detailed listing of adverse events and pooled ORs for each event category are provided in Table 4.

Table 4. Adverse Events Individually and by Category.

DrugStudy First Author (Year) [Reference]EventEvents/N on DrugEvents/N on PlaceboOR (95% CI)a
AripiprazoleBerman (2007) [75]Fatigue11/1826/176
Berman (2009) [82]Fatigue16/1768/172
Sedation10/1761/172
Marcus (2008) [76]Fatigue19/1897/190
Somnolence13/1897/190
TotalSedation-related69/54729/5382.56 (1.63–4.03)
Berman (2007) [75]Tremor6/1828/176
Other EPS-related events2/1821/176
Berman (2009) [82]Dyskinesia2/1760/172
Extrapyramidal disorder2/1760/172
Muscle spasms4/1761/172
Muscle twitching3/1763/172
Psychomotor activity1/1760/172
Tremor5/1766/172
Marcus (2008) [76]Tremor12/1895/190
TotalEPS-related37/54724/5381.54 (0.86–2.74)
Berman (2007) [75]Metabolic labsb??
Berman (2009) [82]Metabolic labsc??
Marcus (2008) [76]Metabolic labsd??
TotalMetabolic labs??
Berman (2007) [75]Akathisia42/1828/176
Restlessness26/1826/176
Berman (2009) [82]Akathisia32/1766/172
Restlessness22/1766/172
Marcus (2008) [76]Akathisia49/1898/190
Restlessness18/1891/190
TotalAkathisia-related189/54735/5387.47 (5.07–11.0)
Berman (2007) [75]Weight gain ≥7%13/1822/176
Berman (2009) [82]Weight gain ≥7%8/1762/172
Marcus (2008) [76]Weight gain ≥7%6/1890/190
TotalWeight gain ≥7%27/5474/5385.91 (2.14–16.29)
OFCCorya (2006) [25]Asthenia29/24310/119
Somnolence53/2438/119
Shelton (2001) [27]Asthenia5/104/10
Somnolencee6/105/10
Shelton (2005) [26]Asthenia30/14625/210
Somnolence25/14627/210
Thase (2007) [85]Fatigue28/20016/206
Hypersomnia21/2005/206
Sedation19/2007/206
Somnolence35/20011/206
TotalSedation240/589109/5352.87 (1.64–5.03)
Corya (2006) [25]Dyskinesia any time (AIMS)1/2273/113
Dyskinesia at last two visits (AIMS)0/2271/113
Dyskinesia at end point (AIMS)1/2271/113
Parkinsonism (SAS)6/2267/113
Shelton (2001) [27]Parkinsonism (SAS)0/103/9
Shelton (2005) [26]Dyskinesia any time (AIMS)2/1400/197
Dyskinesia at last two visits (AIMS)0/1400/197
Dyskinesia at end point (AIMS)0/1400/197
Parkinsonism (SAS)7/1401/199
Tremor17/1468/210
Thase (2007) [85]Dyskinesia any time (AIMS)1/1963/201
Dyskinesia at last two visits (AIMS)0/1950/201
Dyskinesia at end point (AIMS)0/1941/199
Parkinsonism (SAS)5/1922/195
Tremor21/20018/206
TotalEPS-related61/57448/5220.88 (0.25–3.04)
Corya (2006) [25]Cholesterol high12/2130/103
Nonfasting glucose high6/2090/103
HbA1c high10/1531/77
Shelton (2001) [27]Cholesterol high1/100/10
Nonfasting glucose high0/100/10
Shelton (2005) [26]Cholesterol high3/1337/193
Nonfasting glucose high8/1313/192
Hyperglycemia3/1460/200
Thase (2007) [85]Cholesterol high9/1893/194
Fasting glucose high2/280/36
Nonfasting glucose high6/1681/170
HbA1c high8/1440/165
Triglycerides high10/1893/196
TotalMetabolic labsf78/48218/4554.46 (2.07–9.58)
Corya (2006) [25]Agitation14/2434/119
Akathisia any time (Barnes)23/2275/109
Shelton (2001) [27]Akathisia2/100/10
Akathisia (Barnes)3/102/9
Shelton (2005) [26]Akathisia (Barnes)14/13820/196
Thase (2007) [85]Akathisia (Barnes)18/18813/188
TotalAkathisia74/57144/5081.48 (0.96–2.30)
Corya (2006) [25]Prolactin high43/1867/89
Shelton (2001) [27]Prolactin high4/90/7
Shelton (2005) [26]Prolactin high34/1196/178
Thase (2007) [85]Prolactin high49/15923/172
TotalProlactin highg130/47336/4464.30 (2.36–7.83)
Corya (2006) [25]Peripheral edema27/2431/119
Edema19/2431/119
Shelton (2001) [27]Peripheral edema2/100/10
Thase (2007) [85]Peripheral edema24/2002/206
Edema11/2001/206
TotalEdema83/4535/33513.19 (5.46–31.89)
Corya (2006) [25]Weight gain ≥10%53/2302/114
Shelton (2001) [27]Weight gain ≥10%3/100/10
Shelton (2005) [26]Weight gain >10%11/1460/210
Thase (2007) [85]Weight gain ≥10%42/1982/203
TotalWeight gain >10% or weight gain ≥10%109/5844/53716.28 (7.02–37.76)
QuetiapineBauer (2009) [81]Fatigue46/3305/161
Lethargy7/3302/161
Sedation37/3307/161
Somnolence66/3305/161
El-Khalili (2010) [83]Fatigue33/2977/148
Hypersomnia6/2970/148
Sedation58/2976/148
Somnolence86/2976/148
McIntyre (2007) [84]Sedation/somnolence/lethargy25/2914/29
TotalSedation-related364/65652/3388.36 (5.83–11.98)
Bauer (2009) [81]EPS-relatedNoneNone
El-Khalili (2010) [83]EPS-related17/2975/148
McIntyre (2007) [84]EPS-relatedNoneNone
TotalEPS-related17/2975/1481.66 (0.59–4.67)
Bauer (2009) [81]Fasting glucose high15/3304/161
LDL cholesterol high47/33018/161
HDL cholesterol low13/3307/161
Total cholesterol high60/33014/161
Triglycerides high40/3305/161
El-Khalili (2010) [83]Fasting glucose high11/2975/148
HbA1c high2/2971/148
HDL cholesterol low18/2977/148
LDL cholesterol high12/2975/148
Total cholesterol high22/2972/148
Triglycerides high29/2976/148
McIntyre (2007) [84]Metabolic labsh??
TotalMetabolic labsi269/62774/3092.45 (1.80–3.34)
Bauer (2009) [81]Shift from <3 to ≥3 metabolic risk factors27/33016/161
El-Khalili (2010) [83]Shift from <3 to ≥3 metabolic risk factorsj50/2979/148
TotalShift from <3 to ≥3 metabolic risk factors77/62725/3091.57 (0.42–5.92)
El-Khalili (2010) [83]Akathisia6/2971/148
Restlessness5/2972/148
TotalAkathisia-related11/2973/1481.75 (0.47–6.55)
Bauer (2009) [81]Elevated prolactink6/3303/161
TotalElevated prolactin6/3303/1610.96 (0.23–3.96)
Bauer (2009) [81]Weight gain ≥7%14/3302/161
El-Khalili (2010) [83]Weight gain ≥7%13/2973/148
McIntyre (2007) [84]Weight gain ≥7%4/180/14
TotalWeight gain ≥7%31/6455/3232.86 (1.11–7.37)
RisperidoneKeitner (2009) [73]Fatigue0/622/33
Tired0/622/33
Mahmoud (2007) [44]Fatigue5/1370/131
Lethargy1/1373/131
Somnolence7/1372/131
Reeves (2008) [77]Somnolence2/121/11
TotalSedation-related15/21110/1750.88 (0.11–7.55)
Mahmoud (2007) [44]Dystonia0/1371/131
Tremor1/1371/131
TotalEPS-related1/1372/1310.47 (0.04–5.29)
Keitner (2009) [73]Metabolic labsl??
Mahmoud (2007) [44]Metabolic labsl??
Reeves (2008) [77]Metabolic labsl??
TotalMetabolic labs??
Mahmoud (2007) [44]Akathisia1/1370/131
TotalAkathisia-related1/1370/1312.89 (0.12–71.58)
Keitner (2009) [73]Edema0/620/33
Mahmoud (2007) [44]Peripheral edema4/1371/131
Reeves (2008) [77]Edema0/120/11
TotalEdema4/2111/1753.91 (0.43–35.45)
Keitner (2009) [73]Weight gain ≥7%2/620/33
Mahmoud (2007) [44]Weight gain ≥7%m??
Reeves (2008) [77]Weight gain ≥7%m??
TotalWeight gain ≥7%2/620/332.77 (0.13–59.38)

aTrials with no events in either study arm are not included in summary OR calculations.

bThe clinical registry report indicated that statistically significant differences emerged between drug and placebo in glucose, total cholesterol, fasting LDL cholesterol, nonfasting and fasting triglycerides, and prolactin. These differences were not reported quantitatively and were described as not ‘‘clinically meaningful.’’

cMedian levels of change in fasting total cholesterol, triglycerides, HDL cholesterol, LDL cholesterol, and fasting plasma glucose were reported. Categorical measures (i.e., numbers of patients who had abnormal scores) were not reported. The clinical trial registry noted that there was a statistically significant but clinically nonmeaningful difference between drug and placebo on nonfasting LDL cholesterol.

dData on metabolic parameters were reported in terms of median change, but no categorical reporting of laboratory abnormalities was provided. Differences between drug and placebo were reported as not statistically significant in terms of median change on glucose, cholesterol, and triglycerides.

eBecause the total number of events in the OFC group was higher than the sample size of the group, an effect size could not be calculated, and it was thus not factored into the overall effect size estimate for sedation. Given the very small sample of the study, this makes virtually no difference in the overall effect size estimate.

fThe number of participants providing data differed substantially across metabolic testing parameters. The average sample size across the metabolic testing groups provided the denominator for the pooled number of abnormal metabolic test results in each trial, with the total number of participants who experienced an abnormal metabolic testing result comprising the numerator. A participant may have experienced more than one event. Also, boundaries of abnormal tests were defined by standard Lilly reference ranges, a resource not available to our research team.

gBoundaries of abnormal tests were defined by standard Lilly reference ranges, a resource not available to our research team.

hTriglycerides and unclearly described laboratory tests were completed in this study, but the results were described only as yielding ‘‘no clinically significant differences’’ between groups.

iAbnormal metabolic laboratory values were defined as follows: fasting glucose ≥126 mg/dl, LDL cholesterol ≥160 mg/dl, HDL cholesterol ≤40 mg/dl, total cholesterol ≥240 mg/dl, and triglycerides ≥200 mg/dl.

jThe clinical trial registry entry noted that approximately 17% of patients taking quetiapine met this criterion, compared to 6% of placebo patients. We extracted numbers of patients based on these percentages.

kDefined as >20 ng/ml for males and >30 ng/ml for females.

lThese parameters were apparently not measured.

mWeight gain was provided in terms of means and standard deviations; however, no categorical measure of significant weight gain was reported.

AIMS, Abnormal Involuntary Movement Scale; Barnes, Barnes Akathisia Scale; HbA1c, glycated hemoglobin; SAS, Simpson-Angus Scale.

Table 4. Adverse Events Individually and by Category.

Enlarge table

Adjunctive aripiprazole was frequently associated with akathisia (NNH, 4; 95% CI, 3–6) and also linked to a statistically significant elevation in the occurrence of sedation (NNH, 14; 95% CI, 8–33) and significant weight gain of ≥7% during trials (NNH, 29; 95% CI, 10–119). Adjunctive OFC was often associated with significant weight gain of >10% or ≥10% (NNH, 9; 95% CI, 5–20), sedation (NNH, 5; 95% CI, 3–12), abnormal metabolic laboratory results (NNH, 10; 95% CI, 5–29), and elevated prolactin (NNH, 6; 95% CI, 4–11). Adjunctive quetiapine had a very high rate of reported sedation (NNH, 3; 95% CI, 2–3) and was also linked to abnormal metabolic laboratory results (NNH, 6; 95% CI, 4–9) and significant weight gain of ≥7% (NNH 37; 95% CI, 12–594). Adjunctive risperidone was not associated with an increased rate of any spontaneously reported adverse events.

All four drugs resulted in statistically significant weight gain (Table 3): mean weight gain in trials of adjunctive aripiprazole, quetiapine, and risperidone was approximately 1 kg, while the average weight gain resulting from adjunctive OFC was 4.20 kg (95% CI, 3.79–4.61). OFC was associated with more weight gain than the other drugs (QB = 58.46, p<0.001), which did not differ significantly from each other (QB = 0.66, p = 0.72).

The thresholds for adverse event reporting in the included publications are shown in Table 1. Adverse events were typically listed in a table and were reported only if a certain proportion of study participants reported that event. For example, if only those adverse events reported by 5% or more of participants in either group were reported in the published journal article, we describe it in Table 1 as ‘‘≥5% in any group.’’ In general, little to no additional information was provided in the study publications regarding adverse events beyond that which was presented in such tables. Meta-analysis of effects on sexual functioning rating scales was not performed because of the often unclear reporting of these measures (see Table 3).

Publication Bias.

The trim and fill procedure suggested the existence of three unpublished trials, bringing the overall effect on depression measures to 0.32. A funnel plot showing the results of this analysis can be seen in Figure 4.

Figure 4.

Figure 4. Funnel Plot of Trim and Fill Analysis. Open Circles Represent Published Studies; Filled Circles Represent Imputed Unpublished Studies. The Overall Effect Size Changes From 0.34 to 0.32 When Including These Imputed Trials.

Discussion

In this meta-analysis of 14 randomized trials of atypical antipsychotic medications used for the adjunctive treatment of major depressive disorder, we found that all included atypical antipsychotics were more efficacious than adjunctive placebo in terms of their effects on depressive symptom severity and remission. However, the effect sizes were small or small-to-moderate in magnitude, and OFC did not generate a statistically significant benefit on treatment response. All of the studied drugs except risperidone demonstrated substantial risk of several adverse events. Our findings have clinically important implications for comprehensively understanding the risk–benefit profiles of these adjunctive treatments for major depressive disorder.

The overall effect size on depression severity was g = 0.34, an effect conventionally deemed as small. In a meta-analysis of antidepressants versus placebo, Kirsch et al. found an effect size of 0.32, which they interpreted as not clinically relevant [46]. This was in line with the recommendations of the National Institute for Health and Clinical Excellence in the United Kingdom, which deemed effect sizes of g<0.5 as clinically insignificant, though no evidence was cited for this cutoff [47]. However, Turner et al.’s meta-analysis of antidepressants versus placebo found an effect size of 0.31 [11], which was interpreted as ‘‘measurable and significant’’ [48]. These differing interpretations are understandable given that Cohen noted that his original categorization of effect sizes (0.2 = small, 0.5 = medium, and 0.8 = large) was arbitrary [45]. We interpret the effect of adjunctive antipsychotic treatment on depression measures as of questionable clinical relevance. In addition, sole reliance on depression rating scales to determine treatment benefit is likely inadequate in understanding overall treatment efficacy.

The pooled difference in mean change across 11 trials was 2.69 points on the MADRS. The MADRS consists of ten items, each rated on a 0–6 scale, assessing sadness, inner tension, reduced sleep, loss of appetite, concentration, difficulty with starting daily activities, inability to feel, pessimism, and suicidal thoughts. A small difference favoring an atypical antipsychotic over placebo on the MADRS may thus reflect small differences across several dimensions, or perhaps a sizable difference on one or two dimensions combined with nil differences on other items. For instance, a pooled analysis of the two large quetiapine trials included in our meta-analysis found that quetiapine at 150 mg/d and quetiapine at 300 mg/d were superior to placebo by 2.50 and 2.85 points on the MADRS at study end point, respectively [49]. The treatment advantage in terms of the items ‘‘apparent sadness’’ and ‘‘reported sadness’’ appears to be about a third of that for the ‘‘reduced sleep’’ item (Figure 3b of [49]), suggesting that quetiapine’s sedative effect on sleep may account for a substantial degree of the observed improvement in depression scores. Thus, improvement in overall depression rating scales should be interpreted cautiously.

Response and remission rates are often used to convey the magnitude of treatment benefit; however, these categorical measures are created arbitrarily from underlying continuous rating scale data [50]. In some circumstances these categorical measures may inflate treatment differences relative to mean change on the continuous scale [51]. While response and remission rates are potentially useful outcome measures, they should be considered only in the context of a wider set of outcome data.

With the exception of risperidone, nearly all of the included trials estimated small or minimal benefits with regards to quality of life and functional impairment. Quetiapine and OFC generated no benefit on such measures, whereas the benefits of aripiprazole were statistically significant yet quite modest. Although risperidone appeared to possess the strongest risk–benefit profile in our analyses, our findings about risperidone were based on the smallest sample size of any of the included drugs. We also have concerns about data reported in the largest risperidone trial. The published version of the study emphasizes outcomes at the end of the 6-wk trial [44]. However, in its discussion section and the trial’s associated ClinicalTrials.gov registry entry [52], it is mentioned that the primary study end point was actually 4 wk; this is mentioned neither in the paper’s methods section nor in the abstract. The effect size on the HAM-D is 30% smaller at the 4-wk end point relative to the 6-wk end point. Effects on the Q-LES-Q and SDS were reported only at week 6, but it seems likely that these effects would be smaller at the primary study end point. Given that this study included 69% of the total participants in risperidone trials, our pooled estimate of risperidone efficacy is therefore driven by the inclusion of post hoc analyses. Further, a previously published relapse prevention study (not included in our meta-analysis due to its study design) found no benefit for risperidone over placebo, suggesting that risperidone-related gains may be transient [53,54].

Taken together, our findings raise significant concerns regarding the impact of these medications in improving overall well-being. Although improvements in quality of life or functional status commonly co-occur with improvements in depression symptom severity, this cannot automatically be assumed. One comprehensive literature review estimated only a moderate degree of correlation between these constructs [55]. It has been argued that changes on quality of life measures may lag changes on depressive symptom measures and that short-term trials may not be an appropriate setting in which to estimate changes on quality of life measures [15]. Contrary to this argument, however, four of five recently published short-term antidepressant medication trials found that benefits of medication over placebo were similar on measures of (1) quality of life or functional impairment (e.g., as measured by the Q-LES-Q and SDS) and (2) depression symptom severity (e.g., as measured by the HAM-D and MADRS) [5660]. Our findings highlight the fact that reporting data only on symptom response and resolution may provide an incomplete and perhaps overly optimistic summary of a medication’s overall effects on well-being [15,16,61]. More robust assessments of quality of life and functional impairment should be incorporated into the design of clinical trials of any putative antidepressant.

Without longer-term data on not only depression symptom severity but also quality of life and social functioning, it is difficult to assess the risk–benefit profile of these medications prescribed over the long term. None of the included trials provided data on long-term (i.e., ≥6 mo) outcomes comparing adjunctive antipsychotic medication treatment to adjunctive placebo. Our failure to find long-term outcome data is consistent with that of previous research teams [17,62]. For example, one systematic review of long-term, two-arm parallel randomized controlled antidepressant trials initially identified 2,693 abstracts, only to ultimately include six trials [62]. This limitation is shared with other treatments; there is very little understanding of how adjunctive treatments for depression influence long-term well-being.

In addition to providing a thorough assessment of efficacy outcomes, our meta-analysis departs from the literature in a second notable way by comprehensively summarizing the available safety information on these medications. Such safety data have not been included in prior quantitative reviews, but our conclusions echo concerns raised in previous meta-analyses and a narrative review regarding potential treatment-related harms associated with use of atypical antipsychotic medication in the adjunctive treatment of depression [68]. Overall, we found that treatment was linked to several adverse events, including akathisia (aripiprazole), sedation (quetiapine, OFC, and aripiprazole), abnormal metabolic laboratory results (quetiapine and OFC), and weight gain (all four drugs, especially OFC). Measures of absolute benefit and harm (NNT and NNH) provide an intuitive metric for understanding treatment-related benefits and harms. However, these measures are dependent on baseline control group risk, which may vary substantially across clinical subgroups [63]. Thus, our findings in terms of NNT and NNH should be interpreted as estimates of effects for each drug relative to control participants who may differ from participants treated in clinical practice.

Our ability to provide an adequate safety profile of these medications was limited in two respects. First, while 11 of 14 included trials used a structured instrument to elicit adverse events, these measures were limited to assessing potential EPS- and akathisia-related events, and, in five studies, sexual functioning. No study reported using a structured checklist to elicit adverse events outside of EPS, akathisia, or sexual functioning, which is a substantial limitation given that adverse events are reported with as much as 20 times greater frequency when elicited through structured checklists versus being recorded in response to patient complaints [19,20]. The importance of measuring adverse events systematically was demonstrated historically in the case of selective serotonin reuptake inhibitors: in registration trials, sexual dysfunction was neither systematically assessed nor found to be frequently spontaneously reported by patients [64,65]. Further investigation indicated, however, that sexual side effects on selective serotonin reuptake inhibitors are actually quite common [19]. While the collection of adverse event data via structured checklists is a more sensitive method of collecting adverse event data, it may result in many common (mostly minor) health problems being endorsed even if they are not due to treatment, potentially leading to decreased specificity in differentiating medications from placebo [66]. To bridge the differences between the systematic and open-ended assessment of adverse events in clinical trials, some sort of hybrid method of collecting adverse event data could be performed, such as randomly assigning some participants within both the active treatment and placebo groups to complete a structured checklist while assigning others to complete an open-ended assessment of adverse events.

A second constraint on our ability to adequately summarize the drugs’ safety profiles is that many adverse events were not reported in journal articles and that some of the data were incomplete or reported in a fashion that may have obscured treatment-related harms. We agree with the Cochrane reviewers that ‘‘data on side effects were often very poorly described’’ [9]. Conceptually similar events such as sedation, fatigue, and somnolence were sometimes reported separately, often with no attempt to pool them together. This is in direct contradiction to FDA guidance, which states that events that ‘‘represent the same phenomenon (e.g., somnolence, sedation, drowsiness) should ordinarily be grouped together as a single adverse reaction to avoid diluting or obscuring the true effect’’ [67].

Given the notable side effect profiles of the studied drugs, it is likely that the double-blind was significantly compromised; however, none of the included trials tested the integrity of blinding. For example, patients who rapidly gained weight in an OFC trial, who complained of akathisia in an aripiprazole trial, or who reported sedation in a quetiapine trial would likely cue the awareness of study personnel that they were assigned to the active drug condition. Assuming that proper informed consent was obtained, participants were also likely to accurately guess their treatment assignment based on side effect cues [68,69]. This could have led to inflated efficacy ratings by clinical raters and participants [70,71]. The lack of protocols assessing the integrity of the double-blind in the trials included in our meta-analysis is consistent with the wider clinical trials literature [72]. The potential for unblinding to cause inflated efficacy ratings among clinical raters could be substantially limited if efficacy outcomes were assessed by different personnel than those who assessed adverse events [70]. Yet the use of separate raters to assess efficacy and safety outcomes was reported in only one trial [73].

The FDA statistical reviewer for aripiprazole [74] wrote regarding Berman et al. [75] that ‘‘the medical reviewer is concerned about the considerable number of protocol violations in the study primarily due to usage of opiates/barbiturates’’ [74]. Regarding the Marcus et al. [76] trial, the FDA reviewer wrote that the difference between groups in the number of participants who used prohibited medications was ‘‘huge’’ [74], with nine patients in the placebo group doing so compared to 24 in the aripiprazole group. The reviewer thus performed a separate analysis, excluding patients in the two trials who violated the study protocol, the results of which indicated a minimal, non-statistically significant effect of aripiprazole on functional status. In the journal articles, this potentially important issue is not mentioned. The FDA reviewer reported results only from reanalysis of the MADRS and SDS, so it is unknown to what extent these protocol violations may have impacted results on other outcome measures [74].

Our results differ somewhat from those of Nelson and Papakostas, whose meta-analysis concluded that augmentation with atypical antipsychotics was effective and, further, that ‘‘this body of evidence is considerably larger than that for any other augmentation strategy in the treatment of major depressive disorder’’ [6]. There are seven differences in our analyses that provide reasons why we reached different conclusions. The greatest divergence in our results was regarding OFC, for which we found a lower OR favoring OFC for remission (1.42) than did Nelson and Papakostas (1.83). In this first instance, Nelson and Papakostas utilized whatever definition of remission was provided by the authors of each study, whereas we used a more restrictive definition. Three OFC trials defined remission as achieving a MADRS score of ≤8 at two consecutive visits—even if patients relapsed during the trial [2527]. We found that after meeting criteria for remission, OFC-treated participants were more likely to relapse than placebo-treated participants; this contributed to our finding a less favorable result for OFC in terms of remission. Second, we extracted data from all comparison groups that received adjunctive placebo treatment, whereas Nelson and Papakostas excluded one comparison group from each of two OFC trials [25,26]. Third, Nelson and Papakostas estimated a significant treatment effect for OFC on response, whereas we did not. This difference seems due to a combination of our inclusion of all adjunctive placebo comparison groups and our use of random effects analysis as opposed to their use of a fixed effects model [38]. Our fourth point of difference was that Nelson and Papakostas included data from two conference presentations on quetiapine that showed positive findings; we were unable to obtain data from these authors despite three emailed requests over a span of 4 wk. Additionally, we attempted to contact one author via phone; the attempt did not result in the release of any data. Nonetheless, the pooled ORs generated in our analyses for quetiapine in terms of response (1.53) and remission (1.79) were quite similar to those published in Nelson and Papakostas’s meta-analysis (1.60 and 1.89, respectively). Our fifth difference was the use of different definitions of remission in one risperidone trial [73], and the sixth difference was Nelson and Papakostas’s inclusion of data from one small risperidone trial from which we were unable to extract remission data [77], leading to our finding a slightly lower rate of remission (OR of 2.37 versus 2.63). Lastly, and most importantly, the primary point of difference is that our analysis provides a more comprehensive appraisal of treatment efficacy and safety, which, as discussed above, presents a more accurate assessment of the comparative risks and benefits of treatment.

Editors' Summary

Background Everyone feels miserable occasionally. But for people who are clinically depressed, feelings of sadness and hopelessness and physical symptoms such as sleeping badly can last for months or years and can make them feel life is no longer worth living. Depression affects one in six people at some time during their life. Clinicians diagnose depression by asking their patients a series of questions about their feelings and symptoms. The answer to each question is given a score, and the total score from the questionnaire (“depression rating scale'') indicates the severity of depression. Treatment of depression often involves talking treatments (psychotherapy) such as cognitive behavioral therapy, which helps people change negative ways of thinking and behaving and antidepressant drugs, most commonly “selective serotonin reuptake inhibitors'' such as fluoxetine and paroxetine.

Why Was This Study Done? Atypical antipsychotic medications (for example, aripiprazole, olanzapine/fluoxetine combination [OFC], quetiapine, and risperidone) are also widely prescribed for the treatment of depression. These drugs, which were developed to treat mental illnesses that are characterized by a loss of contact with reality, are used as adjunctive therapy for depression. That is, they are used in addition to antidepressant drugs. Clinicians wrote nearly four million prescriptions for adjunctive treatment of depression with atypical antipsychotic medications in 2007–2008 in the US alone. However, it is not known whether the benefits of using these drugs to treat depression outweigh their side effects, which include weight gain, sedation, and akathisia (a feeling of inner restlessness resulting in an urge to move, which may or may not be accompanied by increased movement). Here, the researchers undertake a systematic review and meta-analysis of the efficacy and safety profiles of atypical antipsychotic medications used for the adjunctive treatment of depression. A systematic review uses predefined criteria to identify all the research on a given topic; a meta-analysis is a statistical approach that combines the results of several studies.

What Did the Researchers Do and Find? The researchers identified 14 short-term randomized controlled trials (duration 4–12 weeks) that compared adjunctive antipsychotic medications (aripiprazole, OFC, quetiapine, or risperidone) to placebo (dummy drug) in the treatment of depression that had not responded to antidepressant medication alone. All four drugs had statistically significant effects (effects unlikely to have happened by chance) on remission, which was most commonly defined as a score of less than eight at the study end point on the Montgomery–Asberg Depression Rating Scale. The researchers calculated the number of patients that would have to be treated for one patient to achieve remission (number needed to treat, or NNT). For OFC, the NNT was 19; for all the other drugs it was nine. All the drugs except OFC also significantly improved response rates (defined as a 50% improvement in depression rating score). However, the medications provided little or no benefit in terms of functioning and quality of life, except for risperidone, which had a small-to-moderate effect on quality of life. Finally, treatment with atypical antipsychotic medications was linked to several adverse effects, including weight gain (all four drugs) and akathisia (aripiprazole).

What Do These Findings Mean? These results suggest that atypical antipsychotic medications for the adjunctive treatment of depression are efficacious in reducing observer-rated depressive symptoms. However, clinicians should interpret this conclusion cautiously for several reasons. First, adjunctive treatment with atypical antipsychotics provided only small-to-moderate benefits. Moreover, shortcomings in study design and data reporting methods may have inflated the apparent benefits of treatment and reduced the apparent incidence of adverse events. Second, this study provides little evidence that adjunctive treatment with atypical antipsychotics improves patients' quality of life or reduces their functional impairment. Finally, this study highlights abundant evidence of potential treatment-related harm. This evaluation of the safety and efficacy of adjunctive treatments for clinical depression provides critical insights that should help clinicians better understand the risk–benefit profiles of this approach to the treatment of major depressive disorder.

Additional Information Please access these websites via the online version of this summary at http://dx.doi.org/10. 1371/journal.pmed.1001403.

The US National Institute of Mental Health provides information on all aspects of depression (in English and Spanish); it has a webpage on mental health medications that includes information about atypical antipsychotics

The UK National Health Service Choices website also provides detailed information about depression and includes personal stories about depression

More personal stories about depression are available from healthtalkonline.org

The UK charity Mind provides information on depression and on antipsychotic drugs; Mind also includes personal stories about depression on its website

MedlinePlus provides links to other resources about depression (in English and Spanish)

Healthy Skepticism is an international nonprofit membership association that aims to improve health by reducing harm from misleading health information

Our review adds to the Cochrane review on this topic [9] by filling in three important data gaps: (1) unpublished data from the FDA and clinical trial registry reports, (2) data on functioning and quality of life outcomes, and (3) data on metabolic laboratory parameters. Thus, our dataset contained more outcomes and often provided a more comprehensive assessment of included outcomes than the Cochrane review. For instance, the Cochrane review included data from one trial that reported data on clinically significant weight gain for patients on OFC, whereas we included data on both mean weight changes and binary measures of clinically significant weight gain from four such trials. We included laboratory data for several metabolic parameters for both quetiapine and OFC. Despite some differences in methodology, we agree with the Cochrane review that the evidence supporting the use of adjunctive atypical antipsychotics for depression is modest.

Several methodological issues also bear mention. First, while all trials were described as randomized, double-blind trials, only three trials clearly described adequate sequence generation procedures; in the remaining studies, such procedures were unclear. A lack of appropriate randomization or differences in the taste, smell, or appearance of the medication and placebo may allow study personnel and/or participants to guess their treatment assignment. As purportedly double-blind trials with unclear or inadequate randomization are associated with larger effects than trials in which adequate randomization is clearly described, this leads to the possibility that the current set of efficacy ratings were inflated to an unknown extent [21,78]. Second, the design of some of the included trials may have compromised their validity. In each of the aripiprazole trials, patients were treated with an antidepressant plus adjunctive placebo for 8 wk; at that point, those who showed a treatment response were eliminated from the study, and the remaining patients were assigned to either remain on the same treatment or receive adjunctive aripiprazole in place of adjunctive placebo. Thus, all patients taking placebo during the randomized trial had clearly demonstrated poor response to placebo treatment and were likely predisposed to perform poorly during the randomized portion of the trial, thereby possibly inflating the estimated efficacy of the study drug [79].

In any systematic review, publication bias is a potentially serious problem [10,80]. To incorporate as much data as possible, we conducted a thorough literature search and included unpublished data. We did not uncover the existence of any additional unpublished negative trials in our search, but this does not mean that such trials do not exist. Given the small number of trials for each drug in our study, we lacked statistical power to conduct a formal analysis of publication bias for each drug. However, when pooling across drugs, we detected that publication bias may have slightly enhanced the overall effect size on depression measures. Our results likely represent an upper boundary for the efficacy of these compounds (as demonstrated in prior meta-analyses), assuming that relevant unpublished data are more negative than positive in terms of efficacy [10,11].

We are aware of no trials that have directly compared adjunctive atypical antipsychotic medication treatment to other adjunctive treatments such as psychotherapy or lithium, or to other treatment strategies such as switching the antidepressant medication initially used for treatment. Further study may answer critical outstanding questions regarding the safety profiles and longer-term outcomes associated with these medications. Taken together, our meta-analysis found evidence of (1) some improvement in clinician-assessed depressive symptoms, (2) little evidence of substantial benefit in overall well-being, and (3) abundant evidence of potential treatment-related harm. Our comprehensive evaluation of safety and both relative and absolute efficacy provides critical insight that may be useful for clinicians attempting to thoroughly understand the risk–benefit profiles of these adjunctive treatments for major depressive disorder.

References

1 Alexander GC, Gallagher SA, Mascola A, et al.. (2011) Increasing off-label use of antipsychotic medications in the United States, 1995–2008. Pharmacoepidemiol Drug Saf 20: 177–184.CrossrefGoogle Scholar

2 Sigal E (2009) Bristol-Myers Squibb Company Q1 2009 earnings call transcript. Seeking Alpha. Available: http://seekingalpha.com/article/133733-bristol-myers-squibb-company-q1-2009-earnings-call-transcript?part = qanda. Accessed 7 May 2009.Google Scholar

3 Olfson M, Marcus SC (2009) National patterns in antidepressant medication treatment. Arch Gen Psychiatry 66: 848–856.CrossrefGoogle Scholar

4 Gellad WF, Aspinall SL, Handler SM, et al.. (2012) Use of antipsychotics among older residents in VA nursing homes. Med Care 50: 954–960.CrossrefGoogle Scholar

5 Leslie DL, Mohamed S, Rosenheck RA (2009) Off-label use of antipsychotic medications in the department of Veterans Affairs health care system. Psychiatr Serv 60: 1175–1181.LinkGoogle Scholar

6 Nelson JC, Papakostas GI (2009) Atypical antipsychotic augmentation in major depressive disorder: a meta-analysis of placebo-controlled randomized trials. Am J Psychiatry 166: 980–991.CrossrefGoogle Scholar

7 Papakostas GI, Shelton RC, Smith J, et al.. (2007) Augmentation of antidepressants with atypical antipsychotic medications for treatment-resistant major depressive disorder: a meta-analysis. J Clin Psychiatry 68: 826–831.CrossrefGoogle Scholar

8 Shelton RC, Papakostas GI (2008) Augmentation of antidepressants with atypical antipsychotics for treatment-resistant major depressive disorder. Acta Psychiatr Scand 117: 253–259.CrossrefGoogle Scholar

9 Komossa K, Depping AM, Gaudchau A, et al.. (2010) Second-generation antipsychotics for major depressive disorder and dysthymia. Cochrane Database Syst Rev 2010: CD008121.Google Scholar

10 Turner EH, Matthews AM, Linardatos E, et al.. (2008) Selective publication of antidepressant trials and its influence on apparent efficacy. N Engl J Med 358: 252–260.CrossrefGoogle Scholar

11 Turner EH, Knoepflmacher D, Shapley L (2012) Publication bias in antipsychotic trials: an analysis of efficacy comparing the published literature to the US Food and Drug Administration database. PLoS Med 9: e1001189. doi:10.1371/journal.pmed.1001189CrossrefGoogle Scholar

12 Eyding D, Lelgemann M, Grouven U, et al.. (2010) Reboxetine for acute treatment of major depression: systematic review and meta-analysis of published and unpublished placebo and selective serotonin reuptake inhibitor controlled trials. BMJ 341: c4737.CrossrefGoogle Scholar

13 Kirkham JJ, Dwan KM, Altman DG, et al.. (2010) The impact of outcome reporting bias in randomised controlled trials on a cohort of systematic reviews. BMJ 340: c365.CrossrefGoogle Scholar

14 Ishak WW, Greenberg JM, Balayan K, et al.. (2011) Quality of life: the ultimate outcome measure of interventions in major depressive disorder. Harv Rev Psychiatry 19: 229–239.CrossrefGoogle Scholar

15 Bech P (2005) Social functioning: should it become an endpoint in trials of antidepressants? CNS Drugs 19: 313–324.CrossrefGoogle Scholar

16 Healy D (2000) The assessment of outcomes in depression: measures of social functioning. Rev Contemp Pharmacother 11: 295–301.Google Scholar

17 Tsai AC, Rosenlicht NZ, Jureidini JN, et al.. (2011) Aripiprazole in the maintenance treatment of bipolar disorder: a critical review of the evidence and its dissemination into the scientific literature. PLoS Med 8: 1–13. doi:10.1371/journal.pmed.1000434CrossrefGoogle Scholar

18 Goodwin FK, Whitham EA, Ghaemi SN (2011) Maintenance treatment study designs in bipolar disorder: do they demonstrate that atypical neuroleptics (antipsychotics) are mood stabilizers? CNS Drugs 25: 819–827.Google Scholar

19 Montejo-Gonzalez AL, Llorca G, Izquierdo JA, et al.. (1997) SSRI-induced sexual dysfunction: fluoxetine, paroxetine, sertraline, and fluvoxamine in a prospective, multicenter, and descriptive clinical study of 344 patients. J Sex Marital Ther 23: 176–194.CrossrefGoogle Scholar

20 Zimmerman M, Galione JN, Attiullah N, et al.. (2010) Underrecognition of clinically significant side effects in depressed outpatients. J Clin Psychiatry 71: 48–490.CrossrefGoogle Scholar

21 Savovic J, Jones HE, Altman DG, et al.. (2012) Influence of reported study design characteristics on intervention effect estimates from randomized, controlled trials. Ann Intern Med 157: 429–438.CrossrefGoogle Scholar

22 Schulz KF, Altman DG, Moher D, CONSORT Group (2010) CONSORT 2010 statement: updated guidelines for reporting parallel group randomised trials. PLoS Med 7: e1000251. doi:10.1371/journal.pmed.1000251CrossrefGoogle Scholar

23 Montgomery S, Asberg M (1979) A new depression scale designed to be sensitive to change. Br J Psychiatry 134: 382–389.CrossrefGoogle Scholar

24 Hamilton M (1960) A rating scale for depression. J Neurol Neurosurg 23: 56–61.CrossrefGoogle Scholar

25 Corya SA, Williamson D, Sanger TM, et al.. (2006) A randomized, double-blind comparison of olanzapine/fluoxetine combination, olanzapine, fluoxetine, and venlafaxine in treatment-resistant depression. Depress Anxiety 23: 364–372.CrossrefGoogle Scholar

26 Shelton RC, Williamson DJ, Corya SA, et al.. (2005) Olanzapine/fluoxetine combination for treatment-resistant depression: a controlled study of SSRI and nortriptyline resistance. J Clin Psychiatry 66: 1289–1297.CrossrefGoogle Scholar

27 Shelton RC, Tollefson GD, Tohen M, et al.. (2001) A novel augmentation strategy for treating resistant major depression. Am J Psychiatry 158: 131–134.CrossrefGoogle Scholar

28 Frank E, Prien RF, Jarrett RB, et al.. (1991) Conceptualization and rationale for consensus definitions of terms in major depressive disorder: remission, recovery, relapse, and recurrence. Arch Gen Psychiatry 48: 851–855.CrossrefGoogle Scholar

29 Rush AJ, Giles DE, Schlesser MA, et al.. (1986) The Inventory for Depressive Symptomatology (IDS): preliminary findings. Psychiatry Res 18: 65–87.CrossrefGoogle Scholar

30 Endicott J, Nee J, Harrison W, et al.. (1993) Quality of life enjoyment and satisfaction questionnaire: a new measure. Psychopharmacol Bull 29: 321–326.Google Scholar

31 Ware J, Sherbourne CD (1992) The MOS 36-item short-form health survey (SF-36). Med Care 30: 473–483.CrossrefGoogle Scholar

32 Sheehan D, Harnett-Sheehan K, Raj B (1996) The measurement of disability. Int Clin Psychopharmacol 11 (Suppl 3): 89–95.CrossrefGoogle Scholar

33 Hedges LV, Olkin I (1985) Statistical methods for meta-analysis. San Diego: Academic Press.Google Scholar

34 Laupacis A, Sackett DL, Roberts RS (1988) An assessment of clinically useful measures of the consequences of treatment. N Engl J Med 318: 1728–1733.CrossrefGoogle Scholar

35 Deeks JJ (2002) Issues in the selection of a summary statistic for meta-analysis of clinical trials with binary outcomes. Stat Med 21: 1575–1600.CrossrefGoogle Scholar

36 Cates CJ (2012) Visual RX, version 3 [computer program]. Available: http://www.nntonline.net/visualrx/. Accessed 2 September 2012.Google Scholar

37 Higgins JPT, Thompson SG, Deeks JJ, et al.. (2003) Measuring inconsistency in meta-analyses. BMJ 327: 557–560.CrossrefGoogle Scholar

38 DerSimonian R, Laird N (1986) Meta-analysis in clinical trials. Control Clin Trials 7: 177–188.CrossrefGoogle Scholar

39 Higgins JPT, Thompson SG (2002) Quantifying heterogeneity in a meta-analysis. Stat Med 21: 1539–1558.CrossrefGoogle Scholar

40 Biostat (2010) Comprehensive meta-analysis. Version 2.2.057, Englewood, NJ.Google Scholar

41 Duval S, Tweedie R (2000) A nonparametric ‘‘trim and fill’’ method of accounting for publication bias in meta-analysis. J Am Stat Assoc 95: 89–98.Google Scholar

42 Fava M, Mischoulon D, Iosifescu D, et al.. (2012) A double-blind, placebo-controlled study of aripiprazole adjunctive to antidepressant therapy among depressed outpatients with inadequate response to prior antidepressant therapy (ADAPT-A study). Psychother Psychosom 81: 87–97.CrossrefGoogle Scholar

43 Neyeloff JL, Fuchs SC, Moreira LB (2012) Meta-analyses and forest plots using a Microsoft Excel spreadsheet: step-by-step guide focusing on descriptive data analysis. BMC Res Notes 5: 52.CrossrefGoogle Scholar

44 Mahmoud RA, Pandina GJ, Turkoz I, et al.. (2007) Risperidone for treatment-refractory major depressive disorder: a randomized trial. Ann Intern Med 147: 593–602.CrossrefGoogle Scholar

45 Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edition. Hillsdale (New Jersey): Erlbaum.Google Scholar

46 Kirsch I, Deacon BJ, Huedo-Medina TB, et al.. (2008) Initial severity and antidepressant benefits: a meta-analysis of data submitted to the Food and Drug Administration. PLoS Med 5: e45. doi:10.1371/journal.pmed.0050045CrossrefGoogle Scholar

47 Moncrieff J, Kirsch I (2005) Efficacy of antidepressants in adults. BMJ 331: 155–157.CrossrefGoogle Scholar

48 Turner EH, Rosenthal R (2008) Efficacy of antidepressants. BMJ 336: 516–517.CrossrefGoogle Scholar

49 Bauer M, El-Khalili N, Datto C, et al.. (2010) A pooled analysis of two randomised, placebo-controlled studies of extended release quetiapine fumarate adjunctive to antidepressant therapy in patients with major depressive disorder. J Affect Disord 127: 19–30.CrossrefGoogle Scholar

50 Rush AJ, Kraemer HC, Sackeim HA, et al.. (2006) Report by the ACNP task force on response and remission in major depressive disorder. Neuropsychopharmacology 31: 1841–1853.CrossrefGoogle Scholar

51 Kirsch I, Moncrieff J (2007) Clinical trials and the response rate illusion. Contemp Clin Trials 28: 348–351.CrossrefGoogle Scholar

52 ClinicalTrials.gov (2011) Study comparing adjunctive risperidone versus placebo in major depressive disorder that is not responding to standard therapy. Available: http://clinicaltrials.gov/ct2/show/NCT00095134. Accessed 6 February 2013.Google Scholar

53 Rapaport MH, Gharabawi GM, Canuso CM, et al.. (2007) Corrigendum: Effects of risperidone augmentation in patients with treatment-resistant depression: results of open-label treatment followed by double-blind continuation. Neuropsychopharmacology 32: 1208.CrossrefGoogle Scholar

54 Carroll B (2009) Aripiprazole in refractory depression? J Clin Psychopharmacol 29: 90–91.CrossrefGoogle Scholar

55 McKnight PE, Kashdan TB (2009) The importance of functional impairment to mental health outcomes: a case for reassessing our goals in depression treatment research. Clin Psychol Rev 29: 243–259.CrossrefGoogle Scholar

56 Bech P, Tanghøj P, Cialdella P, et al.. (2004) Escitalopram dose-response revisited: an alternative psychometric approach to evaluate clinical effects of escitalopram compared to citalopram and placebo in patients with major depression. Int J Neuropsychopharmacol 7: 283–290.CrossrefGoogle Scholar

57 Dunlop BW, Reddy S, Yang L, et al.. (2011) Symptomatic and functional improvement in employed depressed patients: a double-blind clinical trial of desvenlafaxine versus placebo. J Clin Psychopharmacol 31: 569–576.CrossrefGoogle Scholar

58 Hewett K, Chrzanowski W, Schmitz M, et al.. (2009) Eight-week, placebo-controlled, double-blind comparison of the antidepressant efficacy and tolerability of bupropion XR and venlafaxine XR. J Psychopharmacol 23: 531–538.CrossrefGoogle Scholar

59 Hewett K, Gee MD, Krishen A, et al.. (2010) Double-blind, placebo-controlled comparison of the antidepressant efficacy and tolerability of bupropion XR and venlafaxine XR. J Psychopharmacol 24: 1209–1216.CrossrefGoogle Scholar

60 Zajecka J, Schatzberg A, Stahl S, et al.. (2010) Efficacy and safety of agomelatine in the treatment of major depressive disorder: a multicenter, randomized, double-blind, placebo-controlled trial. J Clin Psychopharmacol 30: 135–144.CrossrefGoogle Scholar

61 Papakostas GI, Petersen T, Mahal Y, et al.. (2004) Quality of life assessments in major depressive disorder: a review of the literature. Gen Hosp Psychiatry 26: 13–17.CrossrefGoogle Scholar

62 Deshauer D, Moher D, Fergusson D, et al.. (2008) Selective serotonin reuptake inhibitors for unipolar depression: a systematic review of classic long-term randomized controlled trials. CMAJ 178: 1293–1301.CrossrefGoogle Scholar

63 Cates CJ (2002) Simpson’s paradox and calculation of number needed to treat from meta-analysis. BMC Med Res Methodol 2:1.CrossrefGoogle Scholar

64 Bahrick AS, Harris MM (2009) Sexual side effects of antidepressant medications: an informed consent-accountability gap. J Contemp Psychother 39: 135–143.CrossrefGoogle Scholar

65 Balon R (2006) SSRI-associated sexual dysfunction. AmJ Psychiatry 163: 1504–1509.CrossrefGoogle Scholar

66 Bent S, Padula A, Avins AL (2006) Brief communication: Better ways to question patients about adverse medical events: a randomized, controlled trial. Ann Intern Med 144: 257–261.CrossrefGoogle Scholar

67 US Department of Health and Human Services Food and Drug Administration Center for Drug Evaluation and Research Center for Biologics Evaluation and Research (2006) Guidance for industry: adverse reactions section of labeling for human prescription drug and biological products—content and format. Available: http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/ucm075057.pdf. Accessed 4 September 2012.Google Scholar

68 Başoğlu M, Marks I, Livanou M, et al.. (1997) Double-blindness procedures, rater blindness, and ratings of outcome. Observations from a controlled trial. Arch Gen Psychiatry 54: 744–748.CrossrefGoogle Scholar

69 Wisner KL, Perel JM, Peindl KS, et al.. (2001) Prevention of recurrent postpartum depression: a randomized clinical trial. J Clin Psychiatry 62: 82–86.CrossrefGoogle Scholar

70 Perlis RH, Ostacher M, Fava M, et al.. (2010) Assuring that double-blind is blind. Am J Psychiatry 167: 250–252.CrossrefGoogle Scholar

71 Moncrieff J, Wessely S, Hardy R (1998) Meta-analysis of trials comparing antidepressants with active placebos. Br J Psychiatry 172: 227–231.CrossrefGoogle Scholar

72 Hrobjartsson A, Forfang E, Haahr M, et al.. (2007) Blinded trials taken to the test: an analysis of randomized clinical trials that report tests for the success of blinding. Int J Epidemiol 36: 654–663.CrossrefGoogle Scholar

73 Keitner GI, Garlow SJ, Ryan CE, et al.. (2009) A randomized, placebo-controlled trial of risperidone augmentation for patients with difficult-to-treat unipolar, non-psychotic major depression. J Psychiatr Res 43: 205–214.CrossrefGoogle Scholar

74 Zhang J, Mahjoob K, Yang P (2007) Statistical review and evaluation. NDA 21–436/S_018. Abilify (aripiprazole). Aripiprazole as adjunctive treatment of major depressive disorder. Center for Drug Evaluation and Research. Available: http://www.accessdata.fda.gov/drugsatfda_docs/nda/2007/021436s018_StatR.pdf. Accessed 10 January 2012.Google Scholar

75 Berman RM, Marcus RN, Swanink R, et al.. (2007) The efficacy and safety of aripiprazole as adjunctive therapy in major depressive disorder: a multicenter, randomized, double-blind, placebo-controlled study. J Clin Psychiatry 68: 843–853.CrossrefGoogle Scholar

76 Marcus RN, McQuade RD, Carson WH, et al.. (2008) The efficacy and safety of aripiprazole as adjunctive therapy in major depressive disorder: a second multicenter, randomized, double-blind, placebo-controlled study. J Clin Psychopharmacol 28: 156–165.CrossrefGoogle Scholar

77 Reeves H, Batra S, May RS, et al.. (2008) Efficacy of risperidone augmentation to antidepressants in the management of suicidality in major depressive disorder: a randomized, double-blind, placebo-controlled pilot study. J Clin Psychiatry 69: 1228–1336.CrossrefGoogle Scholar

78 Kjaergard LL, Villumsen J, Gluud C (2001) Reported methodologic quality and discrepancies between large and small randomized trials in meta-analyses. Ann Intern Med 135: 982.CrossrefGoogle Scholar

79 Tsai AC (2009) Unclear clinical significance of findings in adjunctive aripiprazole for major depressive disorder: comments on article by Marcus et al. J Clin Psychopharmacol 29: 91–2; author reply 92–3.CrossrefGoogle Scholar

80 Dwan K, Altman DG, Arnaiz JA, et al.. (2008) Systematic review of the empirical evidence of study publication bias and outcome reporting bias. PLoS ONE 3: e3081. doi:10.1371/journal.pone.0003081CrossrefGoogle Scholar

81 Bauer M, Pretorius HW, Constant EL, et al.. (2009) Extended-release quetiapine as adjunct to an antidepressant in patients with major depressive disorder: results of a randomized, placebo-controlled, double-blind study. J Clin Psychiatry 70: 540–549.CrossrefGoogle Scholar

82 Berman RM, Fava M, Thase ME, et al.. (2009) Aripiprazole augmentation in major depressive disorder: a double-blind, placebo-controlled study in patients with inadequate response to antidepressants. CNS Spectr 14: 197–206.CrossrefGoogle Scholar

83 El-Khalili N, Joyce M, Atkinson S, et al.. (2010) Extended-release quetiapine fumarate (quetiapine XR) as adjunctive therapy in major depressive disorder (MDD) in patients with an inadequate response to ongoing antidepressant treatment: a multicentre, randomized, double-blind, placebo-controlled study. Int J Neuropsychopharmacol 13: 917–932.CrossrefGoogle Scholar

84 McIntyre A, Gendron A, McIntyre A (2007) Quetiapine adjunct to selective serotonin reuptake inhibitors or venlafaxine in patients with major depression, comorbid anxiety, and residual depressive symptoms: a randomized, placebo-controlled pilot study. Depress Anxiety 24: 487–494.CrossrefGoogle Scholar

85 Thase ME, Corya SA, Osuntokun O, et al.. (2007) A randomized, double-blind comparison of olanzapine/fluoxetine combination, olanzapine, and fluoxetine in treatment-resistant major depressive disorder. J Clin Psychiatry 68: 224–236.CrossrefGoogle Scholar