Late-life major depression is associated with emotional suffering, disability, caregiver strain, suicide, and poor compliance with other medical treatments (1). Successful antidepressant treatment is one of the most effective ways to reduce disability, prevent morbidity, and improve quality of life in an older depressed patient (2, 3). In many cases, patients appear to be resistant to treatment because their antidepressants are switched prematurely (4, 5). If treating clinicians knew the probability of treatment response earlier in the course of treatment, they would be in a better position to advise patients about various treatment options. These options could include continuing the current medication, augmenting current pharmacotherapy, or switching to another approach (6, 7).
Previously, our group has shown that a patient's clinical status after 4 weeks of treatment predicts reliably their response status at week 12 (6). Furthermore, we have identified several biological, clinical, and psychosocial characteristics as predictors of treatment response in late-life major depression: allelic variation in the serotonin transporter promoter, REM sleep latency, cerebral glucose metabolism, age at onset of first episode, sleep disturbances at baseline, baseline Hamilton Depression Rating Scale (HAM-D) scores, baseline anxiety levels, suicidal ideation, response and adherence to previous antidepressant treatment, family support, and social inequalities (7—17). To our knowledge, there is no published report of a treatment decision tree based on a hierarchy of factors integrating various predictors of response in acute treatment of late-life major depression. Because most clinicians have access only to clinical variables, we conducted an analysis to integrate and develop a hierarchy of the previously identified clinical predictors. Our aim is to offer clinicians a strategy to tailor treatment planning based on patients' characteristics, thereby developing more successful treatment strategies for late-life major depression.
This analysis used pooled data from the acute treatment phase of three previously described treatment studies funded by the National Institute of Mental Health of geriatric depression (18—20). These studies were conducted by the same study personnel in the same university-based research center (21). Comparable inclusion and exclusion criteria as well as assessment and treatment procedures were followed during the acute treatment phase for the three studies.
A study of maintenance therapies in recurrent major depression (maintenance therapies in late-life depression 1) provided 168 subjects age 59 and over with recurrent nonpsychotic major depression to this analysis (18). A second maintenance study (maintenance therapies in late-life depression 2) provided 183 subjects age 69 and over with either a single or recurrent nonpsychotic depressive episode (20). The third study, which compared two active drugs in the treatment of late-life major depression (double-blind randomized comparison of nortriptyline and paroxetine in late-life depression) provided an additional 110 subjects age 60 and over with either a single or recurrent nonpsychotic depressive episode (19).
The subjects were assessed at baseline and weekly with the 17-item Hamilton Rating Scale for Depression (HAM-D-17) (22). Additional baseline assessments included the Mini-Mental State Examination (MMSE) (23) and the Cumulative Illness Rating Scale for Geriatrics (24).
All studies required that subjects have a current episode of nonpsychotic unipolar major depression. In maintenance therapies in late-life depression 1, diagnoses were based on the Research Diagnostic Criteria, as established by structured interview with the Schedule for Affective Disorders and Schizophrenia—Lifetime Version (25). In maintenance therapies in late-life depression 2 and double-blind randomized comparison of nortriptyline and paroxetine in late-life depression, diagnoses were based on the criteria of DSM-IV established with the Structured Clinical Interview for DSM-IV Axis I Disorders—Patient Edition (26). Inclusion criteria also required the following baseline scores: HAM-D-17≥17 in maintenance therapies in late-life depression 1 and ≥15 in maintenance therapies in late-life depression 2 and the double-blind randomized comparison of nortriptyline and paroxetine in late-life depression; MMSE ≥27, ≥18, and ≥15 for maintenance therapies in late-life depression 1, maintenance therapies in late-life depression 2, and double-blind randomized comparison of nortriptyline and paroxetine in late-life depression, respectively.
ACUTE TREATMENT PROTOCOLS
During the acute phase of maintenance therapies in late-life depression, the subjects were treated openly with nortriptyline titrated to yield a plasma level of 80—120 ng/ml, combined with weekly interpersonal therapy. During the acute phase of treatment in maintenance therapies in late-life depression 2, the subjects were treated openly with paroxetine in doses adjusted between 10 and 40 mg/day based on tolerability and response (mean [SD] final dose: 26 (11) mg/day), combined with weekly interpersonal therapy. In the double-blind randomized comparison of nortriptyline and paroxetine in late-life depression, the subjects were randomly assigned to blinded treatment with nortriptyline or paroxetine. Nortriptyline dose was titrated to a plasma level of 50—120 ng/ml. Paroxetine doses were adjusted between 10 and 40 mg/day based on tolerability and response (mean [SD] final dose: 23 (7) mg/day). All three studies included dosing strategies to optimize both treatment response and tolerability.
As described elsewhere, we pooled data from the three studies to obtain a heterogeneous sample representative of depressed elderly seeking treatment (6). The subjects who did not respond to their initial treatment were eligible to receive augmentation pharmacotherapy in maintenance therapies in late-life depression 1 and 2. In the double-blind randomized comparison of nortriptyline and paroxetine in late-life depression, the subjects who did not respond to their randomized treatment could be switched to the alternative drug. For this analysis, the subjects who did not receive monotherapy with their initial antidepressant medication for a full 12 weeks were censored at the time of augmentation or medication switch, and we used imputed values for the remaining period (see below). As a result, observed HAM-D scores were available for 461 subjects at baseline, 373 at week 4, and 247 at week 12. Imputations were performed for both intermittent missing and monotone missing observations (6). Intermittent missing refers to occasions when the missing observations are preceded and succeeded by observations (e.g., when data were not available during 1 week but were available for the preceding and following weeks). Monotone missing observations have no further observation, for example, due to withdrawing consent and dropping out of the study or to censoring at the time of initiation of adjunctive treatment or of switching medications. For this analysis, multiple imputations were performed using the Markov Chain Monte Carlo option of the multiple imputation procedure (PROC MI) in the SAS software (27). A detailed description of the Markov Chain Monte Carlo can be found in a previous report (6).
For all analyses, response was defined categorically as both a decrease in a HAM-D score of 50% or more from baseline and a score of 10 or less. Based on our previous work, we defined core HAM-D symptoms as HAM-D items 1, 2, 3, and 7 (depressed mood, guilt, suicide, and work/activities); anxiety HAM-D symptoms as items 9, 10, 11, and 15 (agitation, psychic anxiety, somatic anxiety, and hypochondriasis); and sleep HAM-D symptoms as items 4, 5, and 6 (early, middle, and late insomnia) (28). We initially performed a univariate logistic regression on the demographic and clinical variables to identify potential predictors of treatment response. Response at 12 weeks was the binary dependent variable (Table 1). Subsequently, to obtain a hierarchy of these predictors, we incorporated these potential predictors in a receiver-operating characteristic model using signal detection theory as described by Kiernan at al (29).
Univariate Predictors of Treatment Response at 12 Weeks (N = 461)
Signal detection theory has been especially useful in analyses where predictors are likely to be highly collinear and interactions between independent variables exist (29). In our case, the signal is a binary outcome (response/nonresponse at 12 weeks) and the detection is for the set of predictor variables (29). Signal detection identifies predictors with a stopping rule of p <0.05. The highest predicting variable is used to divide the sample into two subsamples, and the next predicting variable divides the higher-risk subsample. The process continues until the lowest risk variable stops at p <0.05 (29). Variables associated with a p >0.05 are excluded from the decision tree. Signal detection determines the optimal cutoff point across all increments of a variable and across all variables (29).
We built two different models by modulating the sensitivity threshold for each predictor of treatment response to obtain hierarchies of risk correlated with different patients' characteristics.
First, a low sensitivity threshold was used to define a model that minimizes false positives. A high rate of false positives (i.e., falsely predicting that patients will respond to treatment) could lead clinicians to continue a treatment that will eventually be ineffective. Avoiding this results in an aggressive treatment approach that clinicians might consider appropriate in various clinical situations (including but not limited to patients who have a higher risk of suicide or who are severely disabled by their depression).
Second, we used a high sensitivity threshold to define a model that minimizes false negatives. A high rate of false negatives (i.e., falsely predicting that a patient will be a nonresponder) could lead clinicians either to use unnecessary augmentation pharmacotherapy (thus exposing the patient to the risk of adverse effects) or to switch prematurely to another antidepressant (thus depriving patients from eventually responding to the first agent). Avoiding premature treatment changes results in a conservative approach (appropriate in various clinical situations, including but not limited to patients who have a history of multiple unsuccessful trials). In light of the results of the STAR-D study, emphasizing how long it can take to identify an effective treatment (30, 31), it is important not to "miss" an effective treatment because of a premature switch.
For the first model (minimizing false positives), we used a sensitivity cutoff point of 0.3. For the second model (minimizing false negatives), we used a sensitivity cutoff point of 0.7 (32). The selection of variables was based upon the univariate regression results and available sample. The potential predictors included race, age of onset, recurrence, baseline sleep disturbance, baseline anxiety, and early symptom improvement. Early symptom improvement was defined by the percent decrease in HAM-D score achieved by week 4. However, because clinicians do not use cutoff points such as those generated by the model, we converted the percentage cutoffs to corresponding clinical changes. Thus, we considered a decrease in HAM-D score of more than 45% at week 4 as a marked early improvement, a decrease of HAM-D score between 30% and 45% as a moderate early improvement, a decrease in HAM-D score between 18% and 30% as a mild early improvement, and a decrease of HAM-D of less than 18% as the absence of clinically noticeable improvement, i.e., a poor early improvement.
We used the antidepressant treatment history form to assess the adequacy of all the antidepressant trials received during the current episode based on both the duration and the dose of treatment. The antidepressant treatment history form scores are ordinal: 1 (definitely inadequate = trial of less than 4 weeks or of more than 4 weeks with a very low dose), 2 (probably inadequate = a trial of more than 4 weeks with probably inadequate doses), 3 (probably adequate = a trial of more than 4 weeks of an antidepressant at an adequate dose), 4 (definitely adequate = a trial longer than 4 weeks with intensive doses of antidepressant), or 5 (definitely adequate antidepressant with lithium augmentation). Antidepressant treatment history form scores were available for a subgroup of patients (N = 289), and we repeated the analysis in this subgroup using the highest scores—corresponding to the strongest previous treatment trial the patient had failed to respond to during the current episode—as had been done in several previous analyses (33, 34).
Table 2 presents the baseline demographic and clinical characteristics of the subjects from each of the three studies and of the pooled study group. Table 1 presents the results of the univariate logistic regression.
Sociodemographic, Clinical, and Treatment Characteristics of the Treatment Group
AGGRESSIVE TREATMENT APPROACH
In the first predictor of treatment results model, we set the sensitivity cutoff point at 0.3 to minimize false positives. In this model, the significant predictors of treatment response by week 12 were early symptom improvement, higher baseline anxiety, and younger age of onset in this ranking order (Figure 1). The other variables included in the model did not exceed the cutoff point of 0.05.
Hierarchy of Predictors of Treatment Response With an Aggressive Treatment Approach (i.e., A Cutoff Point for Sensitivity of 0.30)FN1
As illustrated in Figure 1, if a patient has a moderate early improvement, his chances of achieving a full response at week 12 are 43%, whereas if a patient has a marked early improvement, his chances of achieving a full response at week 12 are 82%. For the subjects with only a moderate early improvement, the next predictor influencing the likelihood of response is the level of baseline anxiety; a high baseline anxiety (HAM-D anxiety subscale ≥4) predicts a chance of response of 39% at week 12, whereas a low baseline anxiety improves that chance to 61%. For the subjects with only a moderate early improvement and high baseline anxiety, the next variable that weighs in predicting treatment response is age of onset. Older age of onset correlates with higher chance of response (54%), whereas younger age of onset correlates with a poorer chance of response (33%) (Figure 1).
We introduced the adequacy of previous treatment (antidepressant treatment history form score) into the same model, but the antidepressant treatment history form score does not constitute a significant predictor in this model (data not shown).
CONSERVATIVE TREATMENT APPROACH
In the second predictor of treatment response model, we set the sensitivity cutoff point at 0.7 to minimize false negatives. In this model, the significant predictors of treatment response by week 12 were early symptom improvement and sleep disturbance. Early symptom improvement is both the first- and second-tier variable, whereas baseline sleep disturbance is a third-tier predictor of treatment response for patients who had at least a mild early improvement. Thus, for a patient with at least a mild early improvement, high baseline sleep disturbance predicts a 19% chance of full response at 12 weeks, whereas low baseline sleep disturbance predicts a 51% chance of full response by 12 weeks (Figure 2).
Hierarchy of Predictors of Treatment Response With a Conservative Treatment Approach (i.e., A Cutoff Point for Sensitivity of 0.70)FN2
If we introduce the adequacy of previous treatment in this second model, we obtain a different hierarchy of risks: although early symptom improvement remains the highest-ranking predictor, the adequacy of previous antidepressant trials and baseline anxiety constitute the secondtier predictors. Thus, patients with minimal early symptom improvement who have received inadequate antidepressant treatment before study participation have a 45% chance of becoming full responders at week 12, whereas those who had received adequate trials of antidepressant pharmacotherapy have only a 13% chance of becoming full responders at week 12. We suggest that this last profile represents a subgroup of treatment-resistant subjects.
In contrast to patients with only a mild early improvement, patients with at least a moderate early improvement have a 73% chance of achieving a full response. Further-more, for these patients, the adequacy of previous trials is not a significant predictor. In this case, the second-tier predictor of treatment response is baseline anxiety. High baseline anxiety (HAM-D anxiety subscale score ≥8) lowers the chances of achieving a full response to 40%, whereas low baseline anxiety increases the chances of achieving a full response to 79% (Figure 3).
Hierarchy of Predictors of Treatment Response Using a Cutoff Point for Sensitivity of 0.7 (Conservative Treatment Approach) and Including the Antidepressant Treatment History Form Score as One of the Predictors of Treatment ResponseFN3
Our analysis confirms that demographic and clinical variables can be used to predict treatment response and guide treatment decision in late-life major depression. The group of predictors of treatment response identified by our univariate logistic regression is congruent with the previous literature reporting on the contribution of several demographic and clinical factors in predicting treatment response in late-life major depression (7, 9—11). Using signal detection theory, we identified the hierarchy of predictors of treatment response in two decision tree models.
In the first model—corresponding to an aggressive treatment approach in situations when long treatment trials that are eventually unsuccessful are particularly undesirable—the most important predictor of treatment response at 12 weeks is early symptom improvement. Thus, early symptom improvement "trumps" the predictive power of all other variables, and it is the most important variable in adjusting treatment decisions, followed by the presence at baseline of at least mild to moderate anxiety symptoms. This is congruent with results reported previously by our group (6, 7).
To translate this model in clinical terms, we may consider the case of a 65-year-old inpatient with major depression who, after 4 weeks of treatment, experienced moderate symptom improvement. This already tells the clinician that the patient has less than half (43%) a chance of becoming a full responder if the course of treatment remains unchanged for the entire 12 weeks. If the patient had minimal anxiety at baseline, the clinician might decide at this point to continue the current treatment because low baseline anxiety increases the patient's chances of becoming a full responder from 43% to 61%. But if this patient also started with moderate or high anxiety, her chances of responding have further decreased to 39%. Moreover, an earlier age of onset would put this patient in the worst-prognosis category, with a further decline of her chances of being a full responder at 12 weeks to 33%. In this case, rather than waiting for a full 12 weeks, the best clinical decision would be to attempt a different treatment course (e.g., switch or augmentation).
In the second model—consistent with a conservative treatment in situations when concluding prematurely and wrongly that a specific antidepressant is ineffective is particularly undesirable—the most important predictor of treatment response at 12 weeks remains the early symptom improvement, followed by severe sleep disturbance at baseline. If we incorporate the adequacy of previous antidepressant trials in this model, early symptom improvement remains the most important predictor of treatment response with adequacy of previous treatment becoming a second-ranked variable.
To translate this model in clinical terms, we can consider the case of a 65-year-old outpatient with major depression and a history of multiple unsuccessful antidepressant trials. After 4 weeks of treatment with a new drug, he experienced a moderate improvement of his symptoms. Based on our model for this clinical situation, the patient has less than a 39% chance of becoming of full responder if the course of treatment remains unchanged for the entire 12 weeks. If the clinician does not have an accurate treatment history of his patient, the only other clinical characteristic important in reaching a treatment decision at this point is the patient's baseline sleep. If the patient did not have severe baseline sleep disturbances, his chances of becoming a full responder at 12 weeks with the current treatment are less than one in five (19%). In this case, the clinician would probably consider changing the course of the treatment at week 4. However, if the patient did not have severe sleep disturbances at baseline, he has a 51% chance of becoming a full responder at 12 weeks. At this point, this model would allow the clinician to present to his patient an informal estimate of his chances of recovery and decide together with the patient the best course of action.
For the same patient, if the clinician has an accurate treatment history showing he never received an adequate trial of antidepressant during this depressive episode, the chances the patient has of becoming a full responder at 12 weeks are 46%. Because his chances of response are less than half with the current treatment, it probably makes more sense for the clinician to consider augmenting or switching treatment at this point.
Overall, instead of impressionistic-based clinical decisions, these models provide clinicians with a potentially useful tool for navigating the maze of multiple clinical predictors. They allow clinicians to adapt their decisions to the specific clinical characteristics encountered when treating an elderly patient with depression. These models also emphasize the importance of measure-based treatment approaches (e.g., the HAM-D) to monitor treatment response. In a managed-care environment, it is difficult to use time-consuming questionnaires routinely. However, using the HAM-D or a self-rated scale such as the Beck Depression Inventory (35) twice (at baseline and week 4) could save time and clinical effort.
This analysis has several strengths, including a relatively large sample. Despite a dropout rate of 48%, this sample size combined with an imputation model allowed us to identify second- and third-ranked predictors of treatment response. In addition, we examined two models that correspond to different treatment strategies. The limitation of this analysis includes the data pooling across three different studies with different entry criteria and different treatments. However, in an additional analysis, specific studies were not found to be predictors of outcome in either model (data not shown). Moreover, these differences enhance the generalizability of our findings. Our data were collected in patients receiving structured treatment, including weekly clinical monitoring. Thus, the course of symptom resolution we observed might represent what happens under optimal conditions rather than what typically happens under "usual care" conditions (36). By design, this analysis focuses on the outcome of first-line pharmacotherapy. Consequently, our models do not elucidate the role of augmentation or switching. However, our group has recently addressed the outcomes of augmentation strategies using a different analytical model (37, 38).
We could not include the antidepressant treatment history form variable in the main model because it was collected only for a subgroup of patients. Thus, we had less power to determine the extent to which adequacy of previous treatment is a predictor of treatment response. Most of our patients were Caucasians; therefore, our findings cannot be generalized to other ethno-racial groups.
In conclusion, our two models can improve the precision of clinical predictions based on a few clinical variables. Future work should assess whether these predictions can be refined further by combining clinical and biological variables (39). In particular, the inclusion of specific biomarkers might extend our predictive ability beyond the current estimates and would improve the development and monitoring of tailored treatments.
Dr. Whyte has received research support from NIMH, Forest, Ortho-McNeil, and Pfizer, Dr. Mazumdar owns stock in Forest (less than $10,000). Dr. Mulsant has received research support or honoraria from AstraZeneca, Bristol-Myers Squibb, Eli Lilly, Forest, GlaxoSmithKline, Janssen, Lundbeck, NIH, Corcept, Elsai, and Pfizer; he holds stock (all less than $10,000) in Akzo-Nobel, Alkermes, AstraZeneca, Biogen Idec, Celsion, Elan, Eli Lilly, Forest, Orchestra Therapeutics, General Electric, Immune Response, and Pfizer. He has been a consultant to AstraZeneca, Bristol-Myers Squibb, Elsai, Eli Lilly, Forest, Fox Learning System, GlaxoSmithKline, Janssen, Lundbeck, and Pfizer. He has been on the speakers bureaus of AstraZeneca, Elsai, Forest, GlaxoSmithKline, Janssen, and Pfizer. He has received other financial support from Forest and Janssen. Dr. Pollock has received honoraria and/or research support from Janssen Pharmaceutica, Forest, NIH, GlaxoSmithKline, and Solvay and is on the speakers bureau for Forest and Sepracor. He has been on the advisory boards of Forest and is a faculty member of Lundbeck Institute. Dr. Reynolds has received research support from GlaxoSmithKline, Pfizer, Eli Lilly, Bristol-Myers Squibb, and Forest. The remaining authors report no competing interests.