# New features in qgraph 1.5

Written by Sacha Epskamp.

While the developmental version is routinely updated, I update the stable qgraph releases on CRAN less often. The last major update (version 1.4) was 1.5 years ago. After several minor updates since, I have now completed work on a new larger version of the package, version 1.5, which is now available on CRAN. The full list of changes can be read in the NEWS file. In this blog post, I will describe some of the new functionality.

# New conservative GGM estimation algorithms

Recently, there has been some debate on the specificity of EBICglasso in exploratory estimation of Gaussian graphical models (GGM). While EBIC selection of regularized glasso networks works well in retrieving network structures at low sample sizes, Donald Williams and Philippe Rast recently showed that specificity can be lower than expected in dense networks with many small edges, leading to an increase in false positives. These false edges are nearly invisible under default qgraph fading options (also due to regularization), and should not influence typical interpretations of these models. However, some lines of research focus on discovering the smallest edges (e.g., bridge symptoms or environmental edges), and there has been increasing concerns regarding the replicability of such small edges. To this end, qgraph 1.5 now includes a warning when a dense network is selected, and includes two new more conservative estimation algorithms: thresholded EBICglasso estimation and unregularized model selection.

## Thresholded EBICglasso

Based on recent work by Jankova and Van de Geer (2018), a low false positive rate is guaranteed for off-diagonal ($$i \not= j$$) precision matrix elements (proportional to partial correlation coefficients) $$\kappa_{ij}$$ for which:

$|\kappa_{ij}| > \frac{\log p (p-1) / 2}{\sqrt{n}}.$
The option threshold = TRUE in EBICglasso and qgraph(..., graph = "glass") now employs this thresholding rule by setting edge-weights to zero that are not larger than the threshold in both in the returned final model as well in the EBIC computation of all considered models. Preliminary simulations indicate that with this thresholding rule, high specificity is guaranteed for many cases (an exception is the case in which the true model is not in the glassopath, at very high sample-sizes such as $$N > 10{,}000$$). A benefit of this approach over the unregularized option described above is that edge parameters are still regularized, preventing large visual overrepresentations due to sampling error.

The following codes showcase non-thresholded vs thresholded EBICglasso:

library("qgraph")
library("psych")
data(bfi)
bfiSub <- bfi[,1:25]
layout(t(1:2))
g1 <- qgraph(cor_auto(bfiSub), graph = "glasso", sampleSize = nrow(bfi),
layout = "spring", theme = "colorblind", title = "EBICglasso",
cut = 0)
g2 <- qgraph(cor_auto(bfiSub), graph = "glasso", sampleSize = nrow(bfi),
threshold = TRUE, layout = g1$layout, theme = "colorblind", title = "Thresholded EBICglasso", cut = 0)  While the thresholded graph is much sparser, that does not mean all removed edges are false positives. Many are likely reflecting true edges. ## Unregularized Model Search While the LASSO has mostly been studied in high-dimensional low-sample cases, in many situations research focuses on relatively low-dimensional (e.g., 20 nodes) settings with high sample size (e.g., $$N > 1{,}000$$). To this end, it is arguable if regularization techniques are really needed. In the particular case of GGMs, one could also use model selection on unregularized models in which some pre-defined edge-weights are set to zero. It has been shown that (extended) Bayesian information criterion (EBIC) selection of such unregularized models selects the true model as $$N$$ grows to $$\infty$$ (Foygel and Drton, 2010). The new function ggmModSelect now supports model search of unregularized GGM models, using the EBIC exactly as it is computed in the Lavaan package. The hypertuningparameter is set by default to $$0$$ (BIC selection) rather than $$0.5$$, as preliminary simulations indicate $$\gamma = 0$$ shows much better sensitivity while retaining high specificity. By default, ggmModSelect will first run the glasso algorithm for $$100$$ different tuning parameters to obtain $$100$$ different network structures. Next, the algorithm refits all those networks without regularization and picks the best. Subsequently, the algorithm adds and removes edges until EBIC can no longer be improved. The full algorithm is: 1. Run glasso to obtain 100 models 2. Refit all models without regularization 3. Choose the best according to EBIC 4. Test all possible models in which one edge is changed (added or removed) 5. If no edge can be added or changed to improve EBIC, stop here 6. Change the edge that best improved EBIC, now test all other edges that would have also lead to an increase in EBIC again 7. If no edge can be added or changed to improve EBIC, go to 4, else, go to 6. When stepwise = FALSE, steps 4 to 7 are ignored, and when considerPerStep = "all", all edges are considered at every step. The following codes showcase the algorithm: modSelect_0 <- ggmModSelect(cor_auto(bfiSub), nrow(bfi), gamma = 0, nCores = 8) modSelect_0.5 <- ggmModSelect(cor_auto(bfiSub), nrow(bfi), gamma = 0.5, nCores = 8)  layout(t(1:2)) g3 <- qgraph(modSelect_0$graph, layout = g1$layout, theme = "colorblind", title = "ggmModSelect (gamma = 0)", cut = 0) g4 <- qgraph(modSelect_0.5$graph, layout = g1\$layout, theme = "colorblind",
title = "ggmModSelect (gamma = 0.5)", cut = 0)


Note that this algorithm is very slow in higher dimensions (e.g., above 30-40 nodes), in which case only the regular EBICglasso, thresholded EBICglasso, or setting stepwise = FALSE are feasible. Of note, centrality analyses, especially of the more stable strength metric, are hardly impacted by the estimation method:

centralityPlot(
list(EBICglasso=g1,
EBICglasso_threshold=g2,
ggmModSelect_0 = g3,
ggmModSelect_0.5 = g4
))

Both thresholded EBICglasso and ggmModSelect are implemented in the development version of bootnet, which will be updated soon to CRAN as well. Preliminary simulations show that both guarantee high specificity, while losing sensitivity. Using ggmModSelect with $$\gamma = 0$$ (BIC selection) shows better sensitivity and works well in detecting small edges, but is slow when coupled with stepwise model search, which may make bootstrapping hard. I encourage researchers to investigate these and competing methods in large-scale simulation studies.

## Which estimation method to use?

Both new methods are much more conservative than the EBICglasso, leading to drops in sensitivity and possible misrepresentations of the true sparsity of the network structure. For exploratory hypothesis generation purposes in relatively low sample sizes, the original EBICglassso is likely to be preferred. In higher sample sizes and with a focus point on identifying small edges, the conservative methods may be preferred instead. There are many more GGM estimation procedures available in other R packages, and detailed simulation studies investigating which estimator works best in which case are now being performed in multiple labs. I have also implemented simulation functions in the developmental version of bootnet to aide studying these methods, which I will describe in an upcoming blog post.

# Flow diagrams

Sometimes, researchers are interested in the connectivity of one node in particular, which can be hard to see in the Fruchterman-Reingold algorithm, especially when the connections to that one node are weak. The new flow function, which I developed together with Adela Isvoranu, can be used to place nodes in such a way that connections of one node are clearly visible. The function will place the node of interest to the left, then in vertical levels nodes connected to the node of interest with 1, 2, 3, etcetera edges. Edges between nodes in the same level are displayed as curved edges. For example:

flow(g2, "N3", theme = "colorblind", vsize = 4)


# Expected influence

The centrality index expected influence is now returned by centrality() and can be plotted using centralityPlot(), although it has to be requested using include. In addition, the plots can now be ordered by one of the indices:

centralityPlot(g2, include = c("Strength","ExpectedInfluence"),
orderBy = "ExpectedInfluence")


Note, however, that the BFI network is not an optimal network to compute expected influence on, as some variables are (arbitrarily) scored negatively. It is best to compute expected influence on a network in which higher scores on all nodes have the same interpretation (e.g., symptoms in which higher = more severe).

# Future developments

As always, I highly welcome bug reports and code suggestions on Github. In addition, I will also update the bootnet package soon and write a separate blog post on its latest additions.

# References

Jankova, J., and van de Geer, S. (2018) Inference for high-dimensional graphical models. In: Handbook of graphical models (editors: Drton, M., Maathuis, M., Lauritzen, S., and Wainwright, M.). CRC Press: Boca Raton, Florida, USA.

Foygel, R., & Drton, M. (2010). Extended Bayesian information criteria for Gaussian graphical models. In Advances in neural information processing systems (pp. 604-612).

# Recent developments on the performance of graphical LASSO networks

This post was written by Sacha Epskamp, Gaby Lunansky, PIa Tio and Denny Borsboom, on behalf of the Psychosystems lab group.

Currently, many network models are estimated using LASSO regularization. For ordinal and continuous variables, a popular option is to use the graphical LASSO (GLASSO), in which the network is estimated by estimating a sparse inverse of the variance-covariance matrix. For a description of this method, see our recently published tutorial paper on this topic. Last week we were informed of the new pre-print archive of Donald Williams and Philippe Rast, who show that under certain conditions the GLASSO, coupled with EBIC model selection (EBICglasso), can have lower specificity than expected. Specificity is defined as: 1 – the probability of estimating an edge that is not in the true model. Thus, the results of Williams and Rast indicate that in certain cases edges may be retrieved that are not representative of true edges. This is interesting, as LASSO methods usually show high specificity all-around.

We have investigated these claims and replicated the findings using the independent implementations of EBICglasso in both the qgraph and huge packages. This effect seems to only occur in network structures that are very dense (much denser than typical simulations and proofs regarding GLASSO investigate) and feature many very small edges, blurring the line between true and false effects. We already noted a noteworthy lower specificity than expected in our tutorial paper when discussing tools one may use to investigate these properties for a given network structure, and hypothesized that this may be an effect of a dense network structure with many small edges. The results of Williams and Rast seem to confirm this speculation by taking this effect to the extreme.

What is the problem and how can it be solved?

The main problem lies in that in some cases, the GLASSO algorithm may retrieve very small false edges, which are not present in the true network under which the data are simulated. There are two reasons why this may occur:

1. For certain tuning parameters, GLASSO identifies the correct model. However, the EBIC or cross-validation selects the wrong tuning parameter.
2. GLASSO is incapable of selecting the right model for any tuning parameter even with infinite sample size, and as a result any selection procedure is doomed to fail.

Case 1 is interesting, and may be the reason why regularized EBICglasso estimation has high specificity but does not always converge to the true model. This case may lead to a drop in specificity to about 0.6 – 0.7 and seems to occur in dense networks. We hypothesize that the added penalty biases parameters so much EBIC tends to select a slightly denser model to compensate. We are currently working out ways to improve tuning parameter selection, including running more simulations on preferred EBIC hyperparameters in denser networks than usually studied, theoretically derived thresholding rules, and non-regularized model selection procedures.

Case 2 seems to be the case in which Williams and Rast operate, which is harder to tackle. To exemplify this problem, we generated a 20% sparse graph using the codes kindly provided by Williams and Rast, and entered the true implied variance-covariance matrix into glasso. Inverting this true matrix and thresholding at an arbitrary small number returns the correct graph. However, at no tuning parameter the GLASSO algorithm is capable of retrieving this graph:

The codes for this example are available online. It is striking that at no point GLASSO captures the true model, even though the true variance-covariance matrix (corresponding to N = infinite) was used as input. This is not common. Using the exact same codes, an 80% dense network parameterized as according to Tin & Li (2011), shows a strikingly different picture:

showing that the true model is retrieved for some GLASSO tuning parameters. We are currently investigating whether we can identify cases in which GLASSO is incapable of retrieving the correct model at infinite sample size, and hypothesize this may be due to (a) very low partial correlations used in the generating structure, leading to a blurred line between true and false edges, and (b) relatively large partial correlations or implied marginal correlations, leading to inversion problems. We hope anyone may have feedback on why this set of networks fails in GLASSO estimation and how such troublesome cases may be discovered from empirical data.

Why can sparsity impact GLASSO performance?

Many network estimation techniques, including the GLASSO, rely on the assumption of sparsity, also termed the bet on sparsity. That is, these methods work very well assuming the true model is sparse (e.g. contains relatively few edges). However, when the true model is not sparse, LASSO may lead to unexpected results. Most simulation studies and mathematical proofs studied network retrieval in sparse settings. As more psychological network studies are published, however, one consistent finding seems to be that psychological networks are less sparse than expected. At high sample sizes, most studies identify many small edges near zero. This finding led to a change in the qgraph default for the lower bound on the sparsity of networks compared (lambda.min.ratio), which when set to 0.1 leads to networks being estimated that are too sparse. The default is now 0.01, which leads to better sensitivity but also a small decrease in specificity. Williams and Rast set this value to 0.001 in some cases, which should mark again an increase in sensitivity but perhaps also a decrease in specificity.

Are graphical LASSO networks unstable?

Fortunately, even in the extreme cases discussed by Williams and Rast, low specificity arises exclusively from the fact that very weak edges might be incorrectly included in the graph. In a qgraph plot, for instance, these would be nearly invisible. As a result, the general structure or substantive interpretation of the network is not affected. GLASSO results, like other statistical methods, are not intrinsically stable or unstable, and follow up stability and accuracy checks should always be performed after fitting a network. If the estimated network is stable and accurate, there is no reason to doubt the interpretation based on the work of Williams and Rast, except when the interpretation relied on thorough investigation of the weakest edges in the network (possibly made visible by removing the fading of edges in displaying them). Of note, the bootstrapping methods also allow for checking if an edge is significantly different from zero. We argue against this procedure, as it would lead to double thresholding a network and a severe loss of sensitivity. But if the goal is to retrieve small edges, it may be warranted to perform the check if zero is contained in the confidence interval of the regularized edge parameter.

Does this affect the replicability of networks?

Replicability and model selection are related but do not refer to the same methodological criterium. The recent findings regarding the GLASSO algorithm appeal to model quality: when simulating data from a ‘true model’, we can check whether results from an estimation method converge to this true model with increasing sample size (Borsboom et al., in press). Under certain circumstances, GLASSO algorithm retrieves very small edges which are not present in the true model under which the data are simulated (e.g. the edges are false-positives).  Replicability concerns finding the same results in different empirical samples. Naturally, if the GLASSO algorithm indeed retrieves small false-positive edges, this will affect replicability of these small edges over different empirical samples. However, since these edges shouldn’t be there in the first place, the origin of the issue does not concern replicability but model selection. While we investigate these results further, an important take-home message of the work of Williams and Rast is that researchers who aim to use network models to identify very small effects (when it is plausible to assume that the ‘true model’ might be dense with many small edges), may wish to consider other estimation methods such as node-wise estimation or non-regularized model selection discussed below.

Are all networks estimated using the graphical LASSO?

No. GLASSO is only one of many options that are available to estimate undirected network models from data. GLASSO is often used because it (a) is fast, (b) has high power in detecting edges, (c) selects the entire model in one step, and (d) only requires a covariance matrix as input, allowing to compute networks from polychoric correlations (ordinal data) and facilitating reproducibility of results. Outside the GLASSO, another option for regularized network estimation is to perform (logistic) regressions of every variable on all other variables, and combining the resulting regression parameters (proportional to edge weights) in the network. This is called nodewise estimation, and is at the core of several often used estimation methods such as the adaptive LASSO, IsingFit, and mixed graphical models (MGM). While nodewise regression estimation has less power than GLASSO, there are cases in which GLASSO fails but nodewise regressions do not (Ravikumar et al., 2008). Our preliminary simulations showed that nodewise estimation methods retain high specificity in the particular case identified by Williams and Rast where the specificity of GLASSO drops.  In addition to LASSO estimation methods that aim to retrieve sparse models, other regularization techniques exist that instead aim to estimate low-rank dense networks. Still another option is to use elastic-net estimation, which compromises between estimating sparse and dense networks.

Of note, and as Williams and Rast describe clearly, networks can also be estimated using non-regularized methods. Two main methods by which this can be done are (a) estimating a saturated network and subsequently removing edges that are below some threshold (e.g., based on significance tests, as argued by Williams and Rast), and (b) selecting between several estimated models in which edge-parameters are estimated without regularization. Both have already been discussed in the blog post that also introduced GLASSO estimation in the qgraph package. By using the ‘threshold’ argument in qgraph and bootnet one can threshold edges based on significance (possibly after correcting for multiple tests) or false discovery rates. A downside of thresholding, however, is that no model selection is performed, which leads to estimated edges that do not take into account other edges that are set to zero. A (rather slow) step-wise model selection procedure has been implemented in qgraph in the function ‘findGraph’. We are currently working on a faster non-regularized model selection procedure, in which the GLASSO algorithm is first used to generate a limited set of models (e.g., 100), which are subsequently refitted without regularization. Preliminary simulations show that this procedure is more conservative than regularized GLASSO and often converges to the true model with high sample sizes.

The bootnet package contains many default sets to facilitate computing networks using different estimation procedures in a unified framework. The table below provides an overview of methods currently implemented as default sets and when they can be used. If you would like us to implement more default sets, please let us know!

References

Borsboom, D., Robinaugh, D. J.,  The Psychosystems Group, Rhemtulla M. & Cramer, A. O. J. (in press). Robustness and replicability of psychopathology networks. World Psychiatry.

Foygel, R., & Drton, M. (2010). Extended Bayesian information criteria for Gaussian graphical models. Advances in neural information processing systems, 604-612.

Ravikumar, P., Raskutti, G., Wainwright, M. J.,  & Yu, B. (2008). High-dimensional covariance estimation by minimizing l1-penalized log-determinant divergence. Electronic Journal of Statistics, 5:935–980.

Yin, J., & Li, H. (2011). A sparse conditional gaussian graphical model for analysis of genetical genomics data. The annals of applied statistics, 5(4), 2630-2650.

# New paper on the role of stabilizing and communicating symptoms

This guest post was written by Tessa F. Blanken and Marie K. Deserno who are both PhD-students associated with the Psychosystems Lab at the University of Amsterdam. They secretly meet up every Friday to vividly discuss potential extensions of the network analysis toolbox. This post summarizes a new paper available as a preprint online; the paper will be published in Scientific Reports on April 11th.

As two graduate students in the Psychological Methods department at the University of Amsterdam, we were familiarized with the work of Cramer and Borsboom on conceptualizing mental disorders as complex networks of interacting symptoms. This conceptualization signifies the role of symptoms and their interactions within and across disorders, and has inspired novel theoretical definitions of clinical concepts such as core symptoms and comorbidity (1).

We often found ourselves discussing the potential of tools and metrics from other research areas using network analytic techniques. In the summer of 2016 we came across Santo Fortunato’s Community detection in graphs (2010) – an excellent paper on various applications and implications of network analytic techniques (2). One specific sentence caught our attention:

“Identifying modules and their boundaries allows for a classification of vertices, according to their structural position in the modules. So, vertices with a central position in their clusters, i.e. sharing a large number of edges with the other group partners, may have an important function of control and stability within the group; vertices lying at the boundaries between modules play an important role of mediation and lead the relationships and exchanges between different communities.” (p. 3)

Reading this passage immediately sparked a discussion on the numerous possibilities of utilizing the community detection toolbox to develop empirical definitions of these theoretical concepts. The notion of “vertices with a central position within their cluster […] may have an important function of control and stability within the group” can readily be translated to the idea of core symptoms. Similarly, the idea that “vertices lying at the boundaries between modules play an important role [… in] exchanges between different communities” can be mapped onto the theoretical definition of comorbidity within the network perspective on psychopathology.

In our paper, entitled “The role of stabilizing and communicating symptoms given overlapping communities in psychopathology”, we aspired to complement the statistical toolbox of the network approach to psychopathology by exploring what overlapping community detection analysis has to offer. Using community detection and inspecting the differential role of symptoms within and between communities offers a framework to study the clinical concepts of comorbidity, heterogeneity and hallmark symptoms. Symptoms with many and strong connections within a community, defined as stabilizing symptoms, could be thought of as the core of a community, whereas symptoms that belong to multiple communities, defined as communicating symptoms, facilitate the communication between problem areas.

We applied community detection to a large dataset (N=2089) assessing a variety of psychological problems using the Symptom Checklist 90. We identified 18 communities of closely related symptoms. Importantly, these communities are empirically derived instead of theoretically defined. In the paper we illustrate how the proposed definitions on the differential role of symptoms can inform us on the structure of the psychopathological landscape: both globally as well as locally. As such, we adopted established metrics in network science to accelerate our understanding of the psychopathological landscape.

Figure 1. Illustration of (a) the local structure of Feelings of Worthlessness community, (b) its connection to other communities; and (c) a symptom-level example of its connection to the community Worried about Sloppiness..

From our perspective, this endeavour highlights that diving into the world of network science across all kinds of research areas can inspire great advances for the toolbox we use to study psychopathology networks. Drawing inspiration from fields concerned with complex systems such as brain networks, economic networks and social networks, the options seem infinite – and we cannot wait to explore them.

1. Cramer, A.O.J., Waldorp, L.J., van der Maas, H.L.J. & Borsboom, D. Comorbidity: a network perspective. Behav. Brain. Sci. 33, 137-150 (2010).

2. Fortunato, S. Community detection in graphs. Phys. Rep. 486, 75-174 (2010).

# Some notes on the Journal of Abnormal Psychology special issue on the reproducibility of network analysis

By Denny Borsboom, Eiko Fried, Lourens Waldorp, Claudia van Borkulo, Han van der Maas, Angélique Cramer, Don Robinaugh, and Sacha Epskamp

The Journal of Abnormal Psychology has just published a paper by Forbes, Wright, Markon, and Krueger (2017a) entitled “Evidence that psychopathology networks have limited replicability”, along with two commentaries and a rejoinder. One of the commentaries is from our group. In it, we identify several major statistical errors and methodological problems in Forbes et al.’s paper (Borsboom, Fried, Epskamp, Waldorp, van Borkulo, van der Maas, & Cramer, 2017). Also, we show that network models replicate very well when properly implemented (the correlation between network parameters in the original and replication sample exceeds .90 for Ising models and relative importance networks). In fact, a formal statistical test shows that the hypothesis, that the Ising networks in both samples are precisely equal, cannot be rejected despite high statistical power.

We are strongly committed to open science and would have ideally shared all of the data and code, so that readers could make up their own minds on whether or not networks replicate. Unfortunately, however, both Forbes et al.’s analyses and our own are not reproducible, because the replication dataset used by Forbes et al. is not openly accessible. Therefore, we can share both the analysis code we used and the original dataset, but not the replication dataset.

In their rejoinder, Forbes et al. (2017b) do not debate the major points made in our commentary: they accept (a) that the relative importance networks they reported were incorrectly implemented; (b) that the data they used are unsuited due the fact that the correlation matrix is distorted by the skip structure in the interview schedule; and (c) that it is incorrect to assume different kinds of techniques should converge on the same network, as their analysis presupposed. In addition, Forbes et al. (2017b) now acknowledge that in the original paper that we were invited to comment on, which was widely shared by the authors and which we adressed in a previous blogpost on this website, the reported directed acyclic graphs were also incorrectly implemented.

Thus, relative to their evidence base at the time of writing their original publication, two out of three of Forbes et al.’s network models proved to be implemented incorrectly, and the correlations among the variables they studied turned out to be affected so strongly by the interview’s skip structure that no firm conclusions could be drawn from them.

Despite these facts, and even though virtually all the ‘evidence’ they originally based their conclusion on has dissolved, in their rejoinder Forbes et al. (2017b) repeat the claim that network models do not replicate.

Their line of argument is no longer based on their results from the target paper, but instead rests on two new pillars.

First, Forbes et al. (2017b) present a literature review of network papers related to post-traumatic stress (PTSD), which are argued to report different networks, and suggest that this proves that networks do not replicate. With regard to this line of argument, we note that there are many reasons that network structures can differ; for instance, because the networks really are different (e.g., the relevant PTSD networks were estimated on different samples, with different kinds of trauma, using different methodologies) or because of sampling error (e.g., due to small sample sizes). We agree that there is heterogeneity in the results of PTSD network analysis, which in 2015 inspired a large collaborative effort to investigate the replicability of PTSD network structures in a paper that is forthcoming in the journal Clinical Psychological Science. The question how these differences originate is important, but that there are differences between networks does not provide evidence for the thesis that network models are “plagued with substantial flaws”, as Forbes et al. conclude, for the same reason that finding different factor models for PTSD across samples (Armour, Müllerová, & Elhai, 2015) does not imply that there is something wrong with factor analysis. Network methodology is evaluated on methodological grounds, using mathematical analysis and simulations, not on the basis of whether it shows different results in different populations.

Second, Forbes et al. (2017b) base their conclusions on a technique presented by the other team of commentators (Steinley, Hoffman, Brusco, & Sher, 2017), who purport to evaluate the statistical significance of connection parameters in the network. This is done by evaluating these parameters relative to a sampling distribution constructed by keeping the margins of the contingency table fixed (i.e., controlling for item and person effects). The test in question is not new; it has been known in psychometrics for several years as a way to investigate whether items adhere to a unidimensional latent variable model called the Rasch model (Verhelst, 2008). While we welcome new techniques to assess network models, we know from standing mathematical theory that any network connections that produce data consistent with the Rasch model will not be picked up by this testing procedure. This is problematic because we also know that fully connected networks with uniform connection weights will produce precisely such data (Epskamp, Maris, Waldorp, & Borsboom, in press). As a consequence, the proposed test is unreliable and cannot be used to evaluate the robustness of network models. People in our team are now preparing a full methodological evaluation of Steinley et al.’s (2017) technique and will post the results soon.

In the conclusion of their rebuttal, Forbes et al. (2017b) call for methodologically rigorous testing of hypotheses that arise from network theory and analysis. We echo this conclusion. It is critical that we continue the hard work of advancing our analytic methods, forming causal hypotheses about the processes that generate and maintain mental disorders, and testing these hypotheses in dedicated studies. In addition, we think it is important that additional replication studies are being executed to examine to what extent network models replicate; interested researchers can use our analysis code as a template to implement relevant models in different populations and compare the results.

References

Armour, C., Műllerová, J., & Elhai, J. D. (2015). A systematic literature review of PTSD’s latent structure in the Diagnostic and Statistical Manual of Mental Disorders: DSM-IV to DSM-5. Clinical Psychology Review, 44, 60–74. http://doi.org/10.1016/j.cpr.2015.12.003

Borsboom, D., Fried, E. I., Epskamp, S., Waldorp, L. J., van Borkulo, C. D., van der Maas, H. L. J., & Cramer, A. O. J. (2017). False alarm? A comprehensive reanalysis of “Evidence that psychopathology symptom networks have limited replicability” by Forbes, Wright, Markon, and Krueger. Journal of Abnormal Psychology.

Epskamp, S., & Fried, E. I. (in press). A Tutorial on Regularized Partial Correlation Networks. Psychological Methods. Preprint: http://arxiv.org/abs/1607.01367

Epskamp, S., Maris, G., Waldorp, L. J., & Borsboom, D. (in press). Network psychometrics. To appear in: Irwing, P., Hughes, D., & Booth, T. (Eds.), Handbook of Psychometrics. New York: Wiley.

Epskamp, S., Waldorp, L. J., Mõttus, R., & Borsboom, D. (submitted). Discovering Psychological Dynamics: The Gaussian Graphical Model in Cross-sectional and Time-series Data. Preprint: https://arxiv.org/abs/1609.04156

Forbes, M., Wright, A., Markon, K., & Krueger, R. (2017a).  Evidence that psychopathology symptom networks have limited replicability. Journal of Abnormal Psychology.

Forbes, M., Wright, A., Markon, K., & Krueger, R. (2017b).  Further evidence that psychopathology symptom networks have limited replicability and utility: Response to Borsboom et al. and Steinley et al. Journal of Abnormal Psychology. To our knowledge, the preprint for this paper is not currently available online.

Steinley, D., Hoffman, M., Brusco, M. J., & Sher, K. J. (2017). A method for making inferences in network analysis: A comment on Forbes, Wright, Markon, & Krueger (2017). Journal of Abnormal Psychology. To our knowledge, the preprint for this paper is not currently available online.

Verhelst, N. D. (2008). An Efficient MCMC Algorithm to Sample Binary Matrices with Fixed Marginals. Psychometrika, 73, 705–728. https://doi.org/10.1007/s11336-008-9062-3

# Two new papers on attitude networks

Our paper on predicting voting decisions using network analysis was published in Scientific Reports (PDF) and our tutorial on analyzing and simulating attitude networks was published in Social Psychological and Personality Science (PDF).

In the first paper, we show that whether attitudes toward presidential candidates predict voting decisions depends on the connectivity of the attitude network. If the attitude network is strongly connected, attitudes almost perfectly predict the voting decision. Less connected attitude networks are less predictive of voting decisions. Additionally, we show that the most central attitude elements have the highest predictive value.

In the second paper, we provide a state-of-the-art tutorial on how to estimate (cross-sectional) attitude networks and how to compute common network descriptives on estimated attitude networks. We also show how one can simulate from an estimated attitude network to derive formalized hypotheses regarding attitude dynamics and attitude change.

# Psychopathology networks replicate with stunning precision

By Denny Borsboom, Eiko Fried, Sacha Epskamp, Lourens Waldorp, Claudia van Borkulo, Han van der Maas, and Angélique Cramer

Update: The pre-print of our official commentary is now online at PsyArXiv.

In a forthcoming paper in the Journal of Abnormal Psychology entitled “Evidence that psychopathology networks do not replicate”, Miriam Forbes, Aidan Wright, Kristian Markon, and Robert Krueger purport to show that network structures do not replicate across datasets. They estimate networks for symptoms of Major Depressive Episode (MDE) and Generalized Anxiety Disorder (GAD) in two large datasets – one from the National Comorbidity Survey-Replication (NCS) and one from the Australian National Survey of Mental Health and Well-Being (NSMHWB). As is evident from our published work (Fried & Cramer, in press; Fried, van Borkulo, Cramer, Boschloo, Schoevers, & Borsboom, 2017; Epskamp, Borsboom, & Fried, 2017) we see the reproducibility of network research as a top priority, and are happy to see that researchers are investigating this important issue.

The conclusion proposed by the authors is that “network analysis […] had poor replicability between […] samples”. This conclusion is not supported by the data for state-of-the-art network models, as we will argue in a commentary solicited by the Journal of Abnormal Psychology. Given the intense interest in the matter, however, we deemed it useful to post a short blog post in advance to state a fact that may not be obvious to most readers given the rhetorical style of Forbes et al. (in press). In one sentence: state-of-the-art networks don’t just replicate – they replicate with stunning precision.

Unfortunately, the authors of the paper did not share their data or code with us yet, so we cannot fully evaluate their work, but they did share the parameter matrices that they got out of the analysis, and these are sufficient to establish that the authors’ conclusion does not apply to state-of-the-art network modeling techniques (e.g., networks estimated using our R-package IsingFit; Van Borkulo et al., 2014). The authors of the paper suggest as much when they say that “[t]he replicability of the edges in the Ising models was remarkably similar between and within samples”, but this conclusion is easily lost in the rhetoric of the paper’s title, abstract, and discussion. In addition, the authors insufficiently articulate just how similar the networks are; as a result, users of our techniques may be wondering whether Ising networks really live up to their reputation as highly stable and secure network estimation techniques.

Figure 1. The networks estimated from the NCS and NSMHWB samples, and the scatterplot of network parameters as estimated in both samples (r=.95). The network representations use the default settings of qgraph (Epskamp, Cramer, Waldorp, Schmittmann, & Borsboom, 2011) and a common (average) layout to optimize the comparison between datasets.

So just how replicable are Ising networks? Figure 1 pictures the situation quite clearly: the IsingFit networks are almost indistinguishable, and the network parameters display a whopping correlation of .95 for network edges and .93 for node thresholds across samples (Spearman correlations equal .88 and .85, respectively). Even centrality indices, which we usually approach with considerable caution due to their sensitivity to sampling variation (Epskamp, Borsboom, & Fried, 2017), show surprisingly good replication performance with correlations of .94 (strength), .94 (betweenness), and .76 (closeness).

Nobody in our group had in fact expected such an accurate replication across two entirely distinct samples. As such, we argue that the authors’ conclusion that “the unique utility of network analysis …seems limited to visualizing complex multivariate relationships…” is unwarranted. Given our re-analysis of the results of the Forbes et al. (in press) paper one will wonder: how on earth can the authors of the paper interpret this result as “evidence that psychopathology networks do not replicate”? Well, if you want to find that out, keep an eye out for the upcoming issue of the Journal of Abnormal Psychology, in which we will provide a comprehensive dissection of their methodology and argumentation. We’ll keep you posted!

References

Epskamp, S., Borsboom, D. & Fried, E.I. (2017). Estimating psychological networks and their accuracy: a tutorial paper. Behavior Research Methods. doi:10.3758/s13428-017-0862-1

Epskamp, S., Cramer, A. O. J., Waldorp, L. J., Schmittmann, V. D., & Borsboom, D. (2012). qgraph: Network visualizations of relationships in psychometric data. Journal of Statistical Software, 48, 1-18.

Forbes, Wright, Markon, and Krueger (in press).  Evidence that psychopathology symptom networks do not replicate. Journal of Abnormal Psychology.

Fried, E. I. & Cramer, A. O. J. (in press). Moving forward: challenges and directions for psychopathological network theory and methodology. Perspectives on Psychological Science.

Fried, E. I.*, van Borkulo, C. D.*, Cramer, A. O. J., Lynn, B., Schoevers, R. A., Borsboom, D. (2016). Mental disorders as networks of problems: a review of recent insights. Social Psychiatry and Psychiatric Epidemiology, 52, 1-10.

Van Borkulo, C.D., Borsboom, D., Epskamp, S., Blanken, T.F., Boschloo, L., Schoevers, R.A. & Waldorp, L.J. (2014). A new method for constructing networks from binary data. Scientific Reports, 4: 5918. doi: 10.1038/srep05918

# How theoretically very distinct mechanisms can generate identical observations

This post was written by Joost Kruis (j.kruis@uva.nl) and summarizes the paper entitled “Three representations of the Ising model” published recently in Scientific Reports. Joost works in the Psychological Methods Department at the University of Amsterdam.

Network models have gained increasing popularity in social sciences, and essentially describe the increasingly popular practice among researchers to explain associations, observed between measured variables, as a consequence of mutualistic relations between these variables themselves.

Examining the structure of observed associations between measured variables is an integral part in many branches of science. At face value, associations inform about a possible relation between two variables, yet contain no information about the nature and directions of these relations. This is captured in the (infamous) phrase about the quantity measuring the extent of the interdependence of variable quantities: correlation does not imply causation. Making causal inferences from associations requires the specification of a mechanism that explains the emergence of the associations.

In our paper we discuss three of these, theoretically very distinct, mechanisms and their prototypical statistical models. These three mechanisms, represented within the context of depression, are;

The first mechanism represents the (until recently most dominant) perspective on psychopathology where a mental disorder is viewed as the common cause of its symptoms. The common cause mechanism is statistically represented by the latent variable model, and explains the emergence of observed associations through an unobserved variable (depression) acting as a common cause with respect to the observed variables (sleep disturbances, loss of energy, concentration problems). The manifest variables (symptoms) are thus independent indicators of the latent variable (mental disorder) and reflect its current state.

The network perspective on psychopathology is captured by, what we in our paper term, the reciprocal effect mechanism. In this framework the associations between observed variables are explained as a consequence of mutualistic relations between these variables. In this framework the unobservable variable depression does not exist, but is merely a word used to describe particular collective states of a set of interacting features.

The third, common effect, mechanism explains associations between observed variables as arising from (unknowingly) conditioning on a common effect of these variables, and is statistically represented by a collider model. In this framework the observed variables act as a collective cause towards an effect. An example of this is receiving a depression diagnosis (effect) as a consequence of the occurrence of multiple symptoms (causes) that are linked by the DSM to the term depression.

While each of these mechanisms proposes a radically different explanation for the emergence of associations between a set of manifest variables. We demonstrate in the paper that their associated statistical models for binary data are mathematically equivalent. From this follows that, each of these three mechanisms is capable of generating the exact same observations, and as such that any set of associations between variables that is sufficiently described by a statistical model in one framework, can be explained as emerging from the mechanism represented by any of the three theoretical frameworks.

Having multiple possible interpretations for the same model allows for more plausible explanations when it comes to the theoretical concepts and the causal inferences we obtain from the measurement model applied to our data. Furthermore, the historical success of theoretically very implausible models, such as the latent variable model can, in retrospect, arguably be explained by the equivalence of these three models.

However, it also means that obtaining a sufficient fit for the statistical models in one of these frameworks is by no means evidence that it is the mechanism from this framework that actually generated the observations. That is, there will always exist representations from the other mechanisms that can explain our observations equally well.

We should thus not only apply a network model to our data because it gives us a pretty picture (which it does), but because we believe that the associations between the variables we have measured are explained as a consequence of mutualistic relations between these variables themselves.

Abstract

Statistical models that analyse (pairwise) relations between variables encompass assumptions about the underlying mechanism that generated the associations in the observed data. In the present paper we demonstrate that three Ising model representations exist that, although each proposes a distinct theoretical explanation for the observed associations, are mathematically equivalent. This equivalence allows the researcher to interpret the results of one model in three different ways. We illustrate the ramifications of this by discussing concepts that are conceived as problematic in their traditional explanation, yet when interpreted in the context of another explanation make immediate sense.
—Kruis, J. and Maris, G. Three representations of the Ising model. Sci. Rep. 6, 34175; doi: 10.1038/srep34175 (2016).

EDIT January 27 2017: Denny Borsboom published a blog post entitled “The meaning of model equivalence: Network models, latent variables, and the theoretical space in between” that puts this post here and the paper by Kruis & Maris into a broader perspective.

# How well do Network Models Predict Future Observations? Blogpost on Predictability in Network Models

Estimated network and predictability measures (blue pie charts) for the data in McNally et al. (2014)

Predictability of nodes in network models offer an additional perspective on symptom networks, which relates to the practical relevance of edges, the selection of optimal treatment and the degree to which (parts of) the network are self-determined or determined by factors outside of the networks.

For a brief introduction and fully reproducible example of how to compute and visualize predictability in R, have a look at this blog post. The preprint of the paper can be found here.

# Psychosystems Satellite Symposium at Conference on Complex Systems (CCS2016)

The Psychosystems group is hosting a satellite symposium at the Conference on Complex Systems (CCS2016) on 21 September, at the Beurs van Berlage in Amsterdam.

Title: Complexity in personalised dynamical networks for mental health

Summary: Psychopathology is recognized as a phenomenon that is important but equally difficult to understand. For instance, depression is often considered as some unknown (physical) system causing symptoms like loss of interest and insomnia. It is however difficult to determine what exactly the system is or how it influences the symptoms we can observe. Recently, a change in view has been proposed to consider symptoms and their interactions as the building blocks of a complex system in an effort to better understand the intricacies of mental health and pathologies. Such systems are inherently complex in the sense that the interactions between a plethora of processes and symptoms can result in bistable behaviour, which may explain, for instance, sudden transitions from healthy to depressed moods. The dynamical processes on such networks are now the main focus of investigation that may lead to better understanding and possibly prevention of pathologies. One of the most promising ways to investigate such complex systems is by monitoring single subjects for some period of time. This is often done by what is called experience sampling, where information through smart phones is obtained several times a day, usually at random intervals. The models and methods for the analyses of such time series are far from trivial. Borrowing from statistical physics, the Ising model, for instance, has proven useful in determining key differences between depressed and remittent patients.