By Denny Borsboom, Eiko Fried, Lourens Waldorp, Claudia van Borkulo, Han van der Maas, Angélique Cramer, Don Robinaugh, and Sacha Epskamp
The Journal of Abnormal Psychology has just published a paper by Forbes, Wright, Markon, and Krueger (2017a) entitled “Evidence that psychopathology networks have limited replicability”, along with two commentaries and a rejoinder. One of the commentaries is from our group. In it, we identify several major statistical errors and methodological problems in Forbes et al.’s paper (Borsboom, Fried, Epskamp, Waldorp, van Borkulo, van der Maas, & Cramer, 2017). Also, we show that network models replicate very well when properly implemented (the correlation between network parameters in the original and replication sample exceeds .90 for Ising models and relative importance networks). In fact, a formal statistical test shows that the hypothesis, that the Ising networks in both samples are precisely equal, cannot be rejected despite high statistical power.
We are strongly committed to open science and would have ideally shared all of the data and code, so that readers could make up their own minds on whether or not networks replicate. Unfortunately, however, both Forbes et al.’s analyses and our own are not reproducible, because the replication dataset used by Forbes et al. is not openly accessible. Therefore, we can share both the analysis code we used and the original dataset, but not the replication dataset.
In their rejoinder, Forbes et al. (2017b) do not debate the major points made in our commentary: they accept (a) that the relative importance networks they reported were incorrectly implemented; (b) that the data they used are unsuited due the fact that the correlation matrix is distorted by the skip structure in the interview schedule; and (c) that it is incorrect to assume different kinds of techniques should converge on the same network, as their analysis presupposed. In addition, Forbes et al. (2017b) now acknowledge that in the original paper that we were invited to comment on, which was widely shared by the authors and which we adressed in a previous blogpost on this website, the reported directed acyclic graphs were also incorrectly implemented.
Thus, relative to their evidence base at the time of writing their original publication, two out of three of Forbes et al.’s network models proved to be implemented incorrectly, and the correlations among the variables they studied turned out to be affected so strongly by the interview’s skip structure that no firm conclusions could be drawn from them.
Despite these facts, and even though virtually all the ‘evidence’ they originally based their conclusion on has dissolved, in their rejoinder Forbes et al. (2017b) repeat the claim that network models do not replicate.
Their line of argument is no longer based on their results from the target paper, but instead rests on two new pillars.
First, Forbes et al. (2017b) present a literature review of network papers related to post-traumatic stress (PTSD), which are argued to report different networks, and suggest that this proves that networks do not replicate. With regard to this line of argument, we note that there are many reasons that network structures can differ; for instance, because the networks really are different (e.g., the relevant PTSD networks were estimated on different samples, with different kinds of trauma, using different methodologies) or because of sampling error (e.g., due to small sample sizes). We agree that there is heterogeneity in the results of PTSD network analysis, which in 2015 inspired a large collaborative effort to investigate the replicability of PTSD network structures in a paper that is forthcoming in the journal Clinical Psychological Science. The question how these differences originate is important, but that there are differences between networks does not provide evidence for the thesis that network models are “plagued with substantial flaws”, as Forbes et al. conclude, for the same reason that finding different factor models for PTSD across samples (Armour, Müllerová, & Elhai, 2015) does not imply that there is something wrong with factor analysis. Network methodology is evaluated on methodological grounds, using mathematical analysis and simulations, not on the basis of whether it shows different results in different populations.
Second, Forbes et al. (2017b) base their conclusions on a technique presented by the other team of commentators (Steinley, Hoffman, Brusco, & Sher, 2017), who purport to evaluate the statistical significance of connection parameters in the network. This is done by evaluating these parameters relative to a sampling distribution constructed by keeping the margins of the contingency table fixed (i.e., controlling for item and person effects). The test in question is not new; it has been known in psychometrics for several years as a way to investigate whether items adhere to a unidimensional latent variable model called the Rasch model (Verhelst, 2008). While we welcome new techniques to assess network models, we know from standing mathematical theory that any network connections that produce data consistent with the Rasch model will not be picked up by this testing procedure. This is problematic because we also know that fully connected networks with uniform connection weights will produce precisely such data (Epskamp, Maris, Waldorp, & Borsboom, in press). As a consequence, the proposed test is unreliable and cannot be used to evaluate the robustness of network models. People in our team are now preparing a full methodological evaluation of Steinley et al.’s (2017) technique and will post the results soon.
In the conclusion of their rebuttal, Forbes et al. (2017b) call for methodologically rigorous testing of hypotheses that arise from network theory and analysis. We echo this conclusion. It is critical that we continue the hard work of advancing our analytic methods, forming causal hypotheses about the processes that generate and maintain mental disorders, and testing these hypotheses in dedicated studies. In addition, we think it is important that additional replication studies are being executed to examine to what extent network models replicate; interested researchers can use our analysis code as a template to implement relevant models in different populations and compare the results.
Armour, C., Műllerová, J., & Elhai, J. D. (2015). A systematic literature review of PTSD’s latent structure in the Diagnostic and Statistical Manual of Mental Disorders: DSM-IV to DSM-5. Clinical Psychology Review, 44, 60–74. http://doi.org/10.1016/j.cpr.2015.12.003
Borsboom, D., Fried, E. I., Epskamp, S., Waldorp, L. J., van Borkulo, C. D., van der Maas, H. L. J., & Cramer, A. O. J. (2017). False alarm? A comprehensive reanalysis of “Evidence that psychopathology symptom networks have limited replicability” by Forbes, Wright, Markon, and Krueger. Journal of Abnormal Psychology.
Epskamp, S., Waldorp, L. J., Mõttus, R., & Borsboom, D. (submitted). Discovering Psychological Dynamics: The Gaussian Graphical Model in Cross-sectional and Time-series Data. Preprint: https://arxiv.org/abs/1609.04156
Forbes, M., Wright, A., Markon, K., & Krueger, R. (2017b). Further evidence that psychopathology symptom networks have limited replicability and utility: Response to Borsboom et al. and Steinley et al. Journal of Abnormal Psychology. To our knowledge, the preprint for this paper is not currently available online.
Steinley, D., Hoffman, M., Brusco, M. J., & Sher, K. J. (2017). A method for making inferences in network analysis: A comment on Forbes, Wright, Markon, & Krueger (2017). Journal of Abnormal Psychology. To our knowledge, the preprint for this paper is not currently available online.