For HCPsIntended for HCPs.
Medical Research

Childhood Trauma's Impact on Brain: Limited Replicability Found

New study reveals limited replicability of brain alterations associated with childhood maltreatment, impacting treatment outcomes for affective disorders and major depressive disorder.

April 9, 2026
33 min read
6,591 words

Executive Brief

  • The News: Childhood maltreatment affects hippocampus and amygdala gray matter structure.
  • Clinical Win: Identifying maltreatment subtypes informs treatment optimization for major depressive disorder.
  • Target Specialty: Psychiatrists managing adults with major depressive disorder.

Key Data at a Glance

Condition: Childhood maltreatment (CM)

Associated Disorders: Affective disorders, Major depressive disorder (MDD)

Brain Regions Affected: Hippocampus, Amygdala, Dorsolateral prefrontal cortex

Key Finding: Lack of replicability in brain-wide association studies

Risk Factor: Childhood maltreatment (CM)

Associated Brain Alterations: Gray matter structure alterations

Childhood Trauma's Impact on Brain: Limited Replicability Found

Childhood maltreatment (CM) has been identified to be one of the most important risk factors for the development of affective disorders1,2 and is associated with chronic disease trajectories and poorer treatment outcomes in major depressive disorder (MDD)1,3. Within the past two decades, a plethora of neuroimaging studies has repeatedly suggested that experiences of abuse and neglect during childhood are associated with neurobiological alterations in adults4,5,6,7,8,9. Brain regions where these effects have been localized overlap with neural correlates of MDD, giving rise to the notion that neurobiological alterations may mediate the unfavorable effects of CM on clinical trajectories10,11. Thus, studying the neurobiological correlates of CM could give insights into the mechanistic processes of its clinical consequences, potentially informing the optimization of treatments or preventative measures for this population12.

In adults, CM effects on gray matter structure have been observed in an array of regions, with most frequent findings implying the hippocampus, amygdala, dorsolateral prefrontal cortex, insula and anterior cingulate cortex9,13,14,15. However, the investigation of CM-associated gray matter alterations has yielded considerable heterogeneity in findings regarding the localization of effects. Importantly, large-scale consortium studies and meta-analyses do not find these aforementioned regions, but rather report a multitude of other areas to be associated with CM, including the postcentral gyrus and occipital regions14, the median cingulate gyri and supplementary motor area16, the cerebellum and striatum17, as well as the precuneus18.

This heterogeneity could result from the diversity of measurement instruments (e.g., different retrospective self-report scales vs. prospective ratings) and operationalizations of CM (e.g., continuous vs. categorical), as well as different subtypes of maltreatment being studied separately (e.g., Ringwald et al.19, Sheffield et al.20, Van Harmelen et al.21). Regarding subtypes of CM, there has been considerable debate whether neural correlates could be specific to individual types of experiences. The increasingly influential dimensional model of adversity postulates that different dimensions of CM, such as threat-related and deprivation-related experiences or the unpredictability of one’s environment, underly differential neurobiological processes, consequently leading to differential neural correlates22,23. Evidence for this model in children and adolescents has been accumulated over several studies6. In contrast, other scholars have suggested the relevance of dividing CM experiences even further and have argued that brain alterations are aligned to these experiences in a very specific manner, such as parental verbal abuse impacting gray matter within the auditory cortex or sexual abuse being associated with cortical thinning within the somatosensory cortex9. Another potential source of heterogeneity could stem from varying sample characteristics, differing in diagnoses, the degree of psychopathology and the severity of CM exposure (e.g., McLaughlin et al.24). Furthermore, different statistical approaches have been used. One statistical challenge is a strong phenomenological co-occurrence with mental health problems. Often, psychiatric diagnosis is statistically controlled for, which leads to reduced power to detect maltreatment effects because both constructs strongly covary1 and both explain shared variance in neurobiological alterations10. On the other side, if not controlling for diagnosis, neurobiological effects due to maltreatment or due to depression are impossible to disentangle. Moreover, evidence suggests that the neural correlates of CM may differ by sex25,26,27 and could be moderated by age18, underscoring the importance of carefully considering these factors in investigations of CM effects.

The recent debate around questionable replicability in the neuroimaging domain due to underpowered samples and publication bias suggests the possibility of substantial false-positive findings within the previous body of evidence28,29. This notion is supported by evidence for considerable publication bias in the meta-analyzed findings of gray matter correlates of CM13. In fact, large-scale neuroimaging consortia, such as the ENIGMA consortium (Frodl et al.26, n = 3036) or the UK-Biobank (Gheorghe et al.17, n = 6751), have yielded much smaller effect sizes compared to studies with smaller samples, and have failed to replicate frequently reported associations of CM with the hippocampus or amygdala. However, these consortia still rely exclusively on segmented volumetric brain measures, thus losing spatial resolution, which may account for lower sensitivity to find gray matter alterations, posing a limitation to these findings.

In summary, inconclusive previous findings may result from variability in CM operationalizations, investigated clinical and non-clinical subgroups, varying statistical approaches, insufficient spatial resolution or simply because of false-positive results originating from underpowered studies. Systematic investigations of the replicability of these neural correlates do not exist to date. To shed light on this heterogeneity and re-evaluate our knowledge about the neurobiological underpinnings of adverse childhood experiences, we investigated the cross-cohort replicability of gray matter correlates of CM. We therefore utilized three large-scale, deeply phenotyped clinical cohort datasets, with a broad range of self-reported maltreatment experiences, in combination with high-resolution voxel-based morphometry (VBM). These rich datasets were assessed and processed in standardized pipelines harmonized across cohorts. We conducted subgroup analyses and probed different operationalizations and subtypes of maltreatment, as well as interactions with age. Additional analyses stratified for sex were run for all models to account for potential sex-specific neural correlates of CM. Replicability was assessed by the spatial overlap of significant findings between our three cohorts, in addition to analyzing all cohorts together in a pooled model. We tested the hypothesis that CM is associated with lower gray matter volume (GMV).

Here, we show that there is little evidence for the replicability of gray matter correlates of childhood maltreatment across well-powered adult cohorts, using retrospective self-report measures. This is shown for VBM and for regional parcellation-based measures of cortical thickness and surface, as well as subcortical volume. Consistent non-replicability is presented across all maltreatment operationalizations (including CM subtypes and severe forms of CM), subgroup analyses (including individuals with or without MDD, or medication-naïve MDD patients) and in additional analyses stratified by sex. Similar non-replicability is observed for CM interactions with age. The largest evidence for maltreatment-associated gray matter effects is found in VBM analyses when not adequately controlling for confounding MDD diagnosis. In contrast, the association between childhood maltreatment and depression is found across a variety of different clinical characteristics and replicates consistently across all three cohorts.

Associations of childhood maltreatment with demographic and clinical characteristics

CTQ scales were highly interrelated with each other, and they showed a pattern of small positive associations with age and small negative associations with education years (Fig. 1a). Furthermore, within the MDD participants CTQ scales showed a pattern of weak to moderate associations with previous and current clinical characteristics (Fig. 1a). Overall, the relationship between CM reports and demographic and clinical variables was highly similar across the three cohorts, except that age and number of inpatient treatments were not consistently associated with CTQ scales within the BiDirect cohort (Supplementary Figs. S2–S4). Participants with a MDD diagnosis reported significantly more severe CM, as compared to HC participants (Fig. 1b and Supplementary Table S5). This was found across all CM subtypes and highly consistent across all cohorts (Supplementary Fig. S5). Largest differences were found for the emotional abuse and neglect subscales (up to rrank-biserial = .517).

Voxel-based gray matter associations with childhood maltreatment – pooled sample across all cohorts

A total of 18 different statistical models were conducted for all brain-wide analyses. All conducted models are described in Table 1. Results using the full sample from pooling all cohorts together are presented at a conservative significance threshold of pFWE < .05, corrected at the voxel-level. Findings from the pooled analyses are shown in Table 2 and Fig. 2.

When controlling for MDD diagnosis (Model 1), no voxels with a significant CM association were found. Dropping MDD diagnosis as covariate (Model 2) yielded significant widespread clusters (total k = 5122), located mainly within superior and middle temporal areas, a bilateral fusiform and lingual complex, the thalamus, as well as in the orbitofrontal cortex and the insula. Subgroup analyses revealed small, significant clusters in HC individuals when using CTQ sum as a predictor (Model 3; total k = 122) within the medial orbitofrontal cortex, while no clusters survived the FWE-correction within the MDD sample (Model 4). Regarding subtypes of CM, no CTQ subscales were associated with GMV surpassing an FWE-corrected threshold, except a small cluster emerging when using physical abuse as a predictor (Model 8; total k = 3 within the thalamus). Similar results were obtained when investigating individuals with ‘severe’ maltreatment: again, the model without controlling for MDD diagnosis yielded widespread reductions in the group with severe maltreatment as compared to the group with ‘none to minimal’ maltreatment in widespread clusters (Model 13; total k = 11256). This effect was also found in much smaller localized clusters within HC samples only (Model 14; total k = 140). Effect sizes across models when pooling cohorts ranged between partial R2 = .006 and partial R2 = .022. Interaction models yielded only small, significant clusters within HC samples only (Model 17; total k = 2).

Pooled analyses stratified by sex yielded similar results, however, with some additional clusters emerging in female subsamples when investigating severe CM in HC and MDD samples, while controlling for diagnosis (Model 12; total k = 1847). Overall, there was a descriptive pattern of more models yielding significant effects (and larger clusters) in the female subsamples as compared to the male subsamples. Results stratified by sex are shown in Supplementary Table S6 and S7.

Replicability of voxel-based gray matter associations with childhood maltreatment across single cohorts

The same 18 models conducted in the pooled cohorts were also fitted in each cohort separately, using liberal uncorrected significance thresholds of punc < .001 and punc < .01. Within single cohorts, both significance thresholds and each statistical model yielded significant voxels in at least one of the three cohorts. In turn, each of the three cohorts produced significant voxels in most of the statistical models. The highest number of significant voxels was observed in Model 2 and Model 13 - both models where HC and MDD samples were included, but diagnosis was not included as a covariate. A detailed summary of cohort-wise results across models, including additional analyses stratified by sex, is shown in supplementary Tables S8–S13. Across probed models and across single cohorts, the nominally significant voxels were widespread throughout the brain, including the cerebellum, temporal and frontal areas, subcortical areas and somatosensory cortices.

Investigating replicability revealed that there was not one voxel that was congruently significant (i.e., replicable) at a threshold of punc < .001 in all three cohorts (Table 3). This finding was consistent across all probed statistical models, including HC and MDD subgroup analyses, testing subtypes of CM and comparing groups with severe CM and no CM, as well as when testing age interactions. Similarly, comparing pairs of cohorts also yielded no voxels that regionally overlapped between any pairwise cohort combinations, for most of the tested models. Only two models yielded marginal pairwise overlap in significance at this threshold: when testing the physical neglect subscale of the CTQ (Model 11), there was a small overlap between the MNC and BiDirect cohorts located within the supramarginal gyrus (overlap k = 3; Dice =.002). Furthermore, there was an overlap of k = 2 voxels (Dice = .001) between the MACS and the BiDirect cohort in Model 13 (comparing extreme groups without controlling for diagnosis). This extent of replicability was not significant (pFDR > .611), as indicated by permutation-based null-distributions of overlap across cohort-combinations.

When rerunning the replicability analyses using an even more liberal threshold of punc < .01 the observed spatial overlap in significant voxels was increased across models. The only models yielding any overlap across all three cohorts at this threshold were Model 2 (CTQ sum; not controlling for MDD diagnosis), with converging significance in k = 4 voxels, and Model 13 (k = 12 voxels; comparing extreme groups of CM, not controlling for MDD diagnosis). Pairwise cohort combinations yielded additional spatial overlap in significance across models, with maximum overlap in Model 13 (k = 1329, Dice = 0.081). All observed overlap of any cohort combination was non-significant. This was consistent across all models (all pFDR > .144), as indicated by permutation-based null-distributions of overlap across cohort-combinations.

Replicability results were largely consistent when rerunning all analyses stratified by sex. A summary of the extent of spatial overlap of effects across cohorts, as well as the significance of this replicability, is shown in Table 3 and Supplementary Tables S14–S18. Significant clusters across significance thresholds, cohorts and statistical models are shown in Fig. 3 and Supplementary Figs. S6–S23, with additional results stratified by sex presented in Supplementary Figs. S24–S59.

Parcellation-based and global gray matter density associations with childhood maltreatment

Investigating parcellation-based regional measures of cortical thickness and surface, as well as subcortical volume, yielded similar results as voxel-based analysis. For the pooled analysis, the largest significant effect was found in Model 12 for the surface of the right banks of the superior temporal sulcus (partial R2 = .007). Global gray matter density showed a significant negative association with CM in several models when using the pooled cohorts (with a maximum effect size in pooled Model 13; partial R² = .017). Parcellation-based and global gray matter density results significant at pFDR < .05 are shown in Supplementary Table S19. Replicability analysis revealed that only Model 7 (CTQ EA subscale) showed minor overlap in significance in one region (thickness of the left precuneus), in female participants only. All other combinations of cohorts, models, and sex-stratification did not yield any overlap in regional significance for any cortical or subcortical measure, including aggregate measures of cortical thickness and cortical surface for the left and right hemisphere. The association between global gray matter and CM was also not replicable across cohorts in any model. Replicability results are shown in Supplementary Table S20.

Findings implicating long-term effects of CM on the structural morphology of the brain have been frequently published over the last decades and are central to a neurobiological model of how environmental risk is conveyed to psychopathology. However, in an unprecedented replication effort, we present evidence that localized brain-wide associations between gray matter structure and various operationalizations of CM are essentially non-replicable. This lack of replicability was consistently shown for a wide variety of common statistical approaches, across non-clinical and clinical, as well as sex-stratified subgroups and across a variety of operationalizations of CM, and for interactions with age. Consistent non-replicability was furthermore observed using different complementary methodologies to assess gray matter structure. Central limitations arise due to the concrete assessment method of CM utilized in this study (retrospectively via the CTQ) and due to low demographic (particularly ethnic) variability of the included samples. The extensive non-replicability of CM-related gray matter effects were contrasted by highly replicable negative associations of CM with MDD diagnosis and with various measures of current and previous depression severity.

Clinical Perspective — Dr. Divya Agarwal, Dermatology

Workflow: As I assess patients with a history of childhood maltreatment (CM), I'm now more aware of the potential for neurobiological alterations, particularly in brain regions like the hippocampus and amygdala, which have been frequently implicated in prior studies. This knowledge informs my approach to screening for affective disorders and major depressive disorder (MDD), as CM is a significant risk factor for these conditions. I'd consider a more comprehensive diagnostic workup for patients with CM, given the potential for poorer treatment outcomes in MDD.

Economics: The article doesn't address cost directly, but I'm aware that optimizing treatments for patients with CM could have significant economic implications, particularly if it leads to better health outcomes and reduced healthcare utilization. By understanding the neurobiological correlates of CM, we may be able to develop more targeted and effective interventions, which could ultimately reduce healthcare costs. However, more research is needed to fully understand the economic impact of CM on healthcare systems.

Patient Outcomes: Patients with CM are at increased risk of developing affective disorders and MDD, with chronic disease trajectories and poorer treatment outcomes. By recognizing the potential for neurobiological alterations in CM, I can provide more tailored care and support, which may lead to improved patient outcomes. For example, studies have shown that CM is associated with alterations in gray matter structure, particularly in regions like the hippocampus and amygdala, which are critical for emotional regulation and mood stability.

Transparency & Corrections

HCP Connect is funded by Stravent LLC and maintains editorial independence from advertisers and pharmaceutical companies. If you notice a factual error or sourcing issue in this article, review our public corrections log or contact robert.foster@straventgroup.com.

Related Articles