Network Model Selection Using qgraph 1.3

In this post I will describe some of the new functionality in the 1.3 version of qgraph. The qgraph package is aimed at visualizing correlational structures and, more recently, at visualizing the results of network analysis applied to, in particular but not limited to, psychology. The application of network analysis to psychology has been the core interest of the Psychosystems research group in which the qgraph package has been developed. In version 1.2.4 and 1.2.5 new functions were introduced to compute graph descriptives of weighted graphs and in version 1.3, the visualization techniques are extended with model selection tools for estimating the graph structure. As such, qgraph is no longer just a visualization tool but a full suite of methods for applying network analysis from data to estimating a graph structure and assessing graph descriptives. Version 1.3 is on CRAN now and will be updated on mirrors shortly. To install and check if you have version 1.3 run:

install.packages("qgraph")
packageDescription("qgraph")$Version
Alternatively, you can install the developmental version from GitHub:
library("devtools")
install_github("sachaepskamp/qgraph")

Network models

In qgraph generally two types of network models can be estimated for multivariate Gaussian data: the correlation network and the partial correlation network. In graph literature a correlation network is typically denoted with bidirectional edges whereas a partial correlation network is typically denoted with undirected edges:   In qgraph, bidirectional edges can be plotted but are omitted by default to make the graph clearer. Both the correlation network and the partial correlation network are saturated models if fully connected and degrees of freedom can be obtained by removing edges. Thus, a network model simply means a correlation or partial correlation network in which some edges are set to be equal to zero; model selection means that we attempt to identify which edges should be set to zero. This can be done broadly in three ways:
  • Thresholding: simply remove all edges that are weaker than some threshold, such as all edges that are not significantly different from zero.
  • Model search: Continuously estimate models in which some edges are set to zero and stepwise comparisons or an information criterium to decide which model is optimal
  • LASSO estimation: jointly estimate parameter values and the absence of edges using regularization techniques (more on this below).

Correlation networks

A correlation network is a model for marginal independence relationships; two unconnected nodes imply a marginal independence between two nodes. For instance, the graph above implies the following independence relationship: \[ A \!\perp\!\!\!\perp C. \] We can use the schur complement to see the relationship between \(A\) and \(C\) after conditioning on \(B\): \[ \mathrm{Cov}\left( A, C \mid B \right) = \mathrm{Cov}\left( A, C \right) - \mathrm{Cov}\left( A, B \right) \mathrm{Var}\left( B \right)^{-1} \mathrm{Cov}\left(B, C \right) \not= 0, \] Since both covariances with \(B\) are nonzero and the inverse of a variance is always nonzero for any random variable. Thus, the correlation network above does not only imply a marginal independence, it also implies a conditional dependency relationship: \[ A \not\!\perp\!\!\!\perp C \mid B. \] Such an induced dependency can only occur in very specific causal structures that might not correspond to our general notion of nodes interacting with each other. In general, we expect that if \(A\) and \(B\) are correlated and \(B\) and \(C\) are correlated than \(A\) and \(C\) are also correlated. In general, we would expect correlation networks to often be saturated, especially when constructing networks based on psychometric tests that are designed to contain correlated items. As such, we generally do not apply model selection to correlation networks. Even though correlation networks are not that interesting from a modeling perspective they are very interesting in the information they portray about the structure of a dataset. Epskamp et al. (2012) described how qgraph and a correlation network of the Big 5 can be used from a psychometric point of view to simply visualize the correlational patterns in the dataset; factors should emerge as clusters of nodes. Cramer et al. (2012) interpreted the same Big 5 network in the context of the network perspective of psychology. Furthermore, correlation networks have also been described and used by Cramer, Borsboom, Aggen and Kendler (2011), Schmittmann et al. (2013) and Ruzzano, Borsboom and Geurts (2014).

Partial correlation network

The partial correlation network is a model for conditional independence relationships. In the above undirected network there is no edge between nodes \(A\) and \(C\), which implies that: \[ A \!\perp\!\!\!\perp C \mid B. \] Thus, the partial covariance between \(A\) and \(C\) after conditioning on all other nodes in the network equals zero. Now, by definition, \(\mathrm{Cov}\left( A, C \mid B \right)\) is equal to zero and thus: \[ \mathrm{Cov}\left( A, C \right) = \mathrm{Cov}\left( A, B \right) \mathrm{Var}\left( B \right)^{-1} \mathrm{Cov}\left(B, C \right) \not= 0, \] which implies the following marginal relationship: \[ A \not\!\perp\!\!\!\perp C. \] Therefore, the partial correlation network implies the opposite kind of independence relationships between unconnected nodes; unconnected nodes are conditionally independent but marginally still allowed to correlate. The edges in the network can be understood as pairwise interactions between two nodes This allows us to model interaction effects between pairs of nodes while still allowing all nodes to correlate. Such a network of pairwise interactions is termed a pairwise Markov Random Field (Lauritzen, 1996). If the data is assumed multivariate normally distributed then the model is called a Gaussian Graphical Model, Gaussian Random field, concentration network or partial correlation network; the network structure is then directly related to the inverse of the covariance matrix with zeroes indicating missing edges. We will use the term partial correlation network to describe this class of networks. In a partial correlation network, the strength of edges—the partial correlation between two nodes after conditioning on all other nodes—is directly equivalent to the predictive quality between two nodes that would be obtained in multiple regression. Thus, the strength of an edge can be seen as how well two nodes predict each other; in the above network knowing \(A\) tells you something about \(B\) and in turn knowing \(B\) tells you something about \(C\). The network can also be seen as large scale mediation analysis, as in the above network the predictive quality of \(A\) on \(C\) is mediated by \(B\).

Network estimation using qgraph

We will analyze the BFI dataset from the psych package, of which the correlation network had previously been described on Joel Cadwell's blog. This dataset contains 25 items on the five central personality traits: openness to Experience, conscientiousness, extroversion, agreeableness and neuroticism. For each trait there are five indicators, such as “make friends easily” for extroversion and “make people feel at ease” for agreeableness. We can load the data into R as follows:
library("psych")
data(bfi)
We are only interested in the interactions between items of the big five so we only subset the 25 indicators:
bfiSub <- bfi[,1:25]
These responses are measured on a five point scale, and thus the assumption of normality is violated by default. As a solution, we compute not the Pearson correlations but the polychoric correlations instead. To do this, we can use the cor_auto() function in qgraph that automatically detects if data looks ordinal and runs the lavCor() function from the lavaan package to compute Pearson, polychoric or polyserial correlations where needed:
library("qgraph")
bfiCors <- cor_auto(bfiSub)
## Variables detected as ordinal: A1; A2; A3; A4; A5; C1; C2; C3; C4; C5; E1; E2; E3; E4; E5; N1; N2; N3; N4; N5; O1; O2; O3; O4; O5
To help plot our graph we can also load more information on the meaning of each node:
Names <- scan("http://sachaepskamp.com/files/BFIitems.txt",what = "character", sep = "\n")
Groups <- rep(c('A','C','E','N','O'),each=5)
Now we can plot the (saturated) correlation network. To do so, we can simply send the correlation matrix to the qgraph() function and use some more arguments to make the plot look good. In addition, we can set graph = "cor" to tell qgraph that the input is, in fact, a correlation matrix (this doesn't do much other than check if the matrix is positive definite):
corGraph <- qgraph(bfiCors, layout = "spring", graph = "cor",
       nodeNames = Names, groups = Groups, legend.cex = 0.3,
      cut = 0.3, maximum = 1, minimum = 0, esize = 20,
      vsize = 5, repulsion = 0.8)
In this network each node represents one of the 25 items in the BFI dataset and edges represent the strength of correlation between them; green edges represent positive correlations, red edges negative correlations and the wider and more saturated an edge the stronger the correlation. The argument layout = "spring" forced qgraph to use a weighted version of the Fruchterman and Reingold (1991) algorithm to place strongly correlated nodes together. The nodeNames, groups and legend.cex arguments controlled the coloring and the legend placed next to the graph, the vsize argument the size of the nodes and the repulsion argument the repulsion ratio used in the Fruchterman-Reingold algorithm. The minimum argument causes all edges with a value below the minimum to not be drawn (these edges are only made invisible but still taken into account when computing graph descriptives), the maximum argument sets an upper bound for the width of edges to scale to and the cut argument splits scaling in saturation and width; edges above the cut value are fully saturated and scale in width whereas edges below the cut value are small and scale in color. By default qgraph will choose these values to optimally plot a graph (see the manual for more details); to know which values qgraph used you could use the argument details = TRUE. The above correlation network overall nicely retrieves the five factor structure; as expected factors emerge as clusters of correlations. To compute the saturated partial correlation network we can set the argument graph = "pcor":
pcorGraph <- qgraph(bfiCors, layout = corGraph$layout, graph = "pcor",
      nodeNames = Names, groups = Groups, legend.cex = 0.3, 
      cut = 0.1, maximum = 1, minimum = 0, esize = 20,
      vsize = 5)
Now we set the layout argument to be equal to the layout of the correlation network and the cut value to be lower as partial correlations are generally weaker than correlations. We can see that the partial correlation network generally returns the same pattern of interactions but also that some edges—such as between N4 and N1—are much weaker in the partial correlation network than in the correlation network.

Thresholding

Both the networks shown above are fully saturated; even though some (partial) correlations are so weak they are practically invisible both networks are fully connected. The aim of model selection is to identify which connections are zero, as these values indicate (conditional) independence relationships. A straightforward way of doing this is by thresholding; setting all edges that are not significantly different from zero to be equal to zero. To use thresholding in qgraph we can use the threshold argument, which works different than minimum in that it removes edges from the network, influencing the network descriptives. The threshold argument can be set to both a value or a string indicating the type of adjustment for multiple tests to be used, which corresponds to the adjust argument of corr.test function in the psych package. In addition to the options psych provides the threshold argument can be set to locfdr to use the local FDR measure provided by the fdrtool package and used in the FDRnetwork function in qgraph. For instance, we can remove all non-significant edges using the Holm correction on the partial correlation network as follows:
holmGraph <- qgraph(bfiCors, layout = corGraph$layout, graph = "pcor",
      threshold = "holm", sampleSize = nrow(bfi),
      nodeNames = Names, groups = Groups, legend.cex = 0.3, 
      cut = 0.1, maximum = 1, minimum = 0, esize = 20,
      vsize = 5)
Which makes the network much cleaner and more interpretable.

Model search

While thresholding can be insightful it is a rather crude way of model selection; we only remove the non-significant edges and do not look at how other edge values change nor at how well the model without certain edges explains the data. To accomplish this we need model selection. We are particularly interested in the question: what model explains the data appropriately while being simple (having few edges). One way to accomplish this is by evaluating all possible models and comparing them on some information criterium—such as the BIC—that weights the likelihood against the sparsity. For a three-node network all possible models are the following:   The connections between models indicate nesting relationships; a model with only an edge between \(A\) and \(B\) is nested in the model with both edges between \(A\) and \(B\) and \(B\) and \(C\). The number of possible models is \(2^{p(p-1)/2}\) with \(p\) being the amount of nodes in the network. It will be clear that in the case of a three node network it is entirely tractable to estimate \(8\) models and compare the BIC values but for the case of \(25\) nodes it becomes much impossible to compare all possible models. Thus, a search is performed to navigate through the model space. We could start at the bottom with the independence model and stepwise move higher to more complicated models by only comparing directly nested models, or start at the top with the saturated model and stepwise move lower to simpler models. The first routine is called step-up estimation and the second routine step-down estimation. The findGraph function in qgraph can be used to search for an optimal (partial) correlation network. This function can be used to do brute search of all models, step-up comparison and step-down comparison. To compare models the Extended Bayesian Information Criterium (EBIC; Foygel & Drton, 2010) is used. By default, the findGraph function will start at the thresholded graph using Holm correction and iterate between step-up and step-down search until an optimal model is found:
optGraph <- findGraph(bfiCors, nrow(bfi), type = "pcor")
optimalGraph <- qgraph(optGraph, layout = corGraph$layout,
      nodeNames = Names, groups = Groups, legend.cex = 0.3, 
      cut = 0.1, maximum = 1, minimum = 0, esize = 20,
      vsize = 5)
 

LASSO estimation

Model selection can be a slow process as the amount of variables increases, and the stepwise process might get lost in the model space and not converge to the best model. For example, stepwise up and stepwise down methods often obtain a different model. A well established method for simultaneously estimating parameters and performing model selection on a partial correlation network is to employ the least absolute shrinkage and selection operator (LASSO; Tibshirani, 1996) regularization technique. A particularly fast variant of the LASSO for these kinds of networks is the graphical LASSO which directly estimates the inverse of the covariance matrix (Friedman, Hastie and Tibshirani, 2008) and has been implemented in the R package glasso. In short, the graphical LASSO estimates not the maximum likelihood solution but a penalized maximum likelihood solution in which the likelihood is penalized for the sum of absolute parameter estimates. Because of this, the graphical LASSO estimates a model that still explains the data well with generally weaker parameter estimates and many parameter estimates of exactly zero. The amount of penalization is specified by a tuning parameter \(\lambda\). In qgraph, the EBICglasso function can be used to estimate a network using 100 different \(\lambda\) values and return of these the network that minimizes the extended Bayesian Information Criterium (EBIC; Chen, and Chen, 2008). Such a procedure has been established to accurately recover the network structure (Foygel & Drton, 2010; van Borkulo et al., 2014). To automatically have qgraph run EBICglasso and estimate the model we need only set graph = "glasso":
glassoGraph <- qgraph(bfiCors, layout = corGraph$layout, 
      graph = "glasso", sampleSize = nrow(bfi),
      nodeNames = Names, groups = Groups, legend.cex = 0.3, 
      cut = 0.1, maximum = 1, minimum = 0, esize = 20,
      vsize = 5)
This method runs extremely fast and returns a sparse and interpretable network structure. This network shows some interesting properties. For example, neurotocism consists of two distinct clusters, an anxiety and a depression cluster, connected by the item “N3: Have frequent mood swings”. The Conscientiousness item “C5: Waste my time” and the extroversion item “E3: Know how to captivate people” act as a bridging nodes between the different personality trait clusters. Finally, most strong connections make perfectly sense when interpreted as a pairwise interaction.

Graph descriptives using qgraph

Now that we have estimated the partial correlation network in different ways we can look at how these networks compare in terms of graph descriptives. We are particularly interested in node centrality and clustering. To compare the estimated networks, we can use the centralityPlot and clusteringPlot functions which use ggplot2 to visualize the estimates:
centralityPlot(
  list(
    saturated = pcorGraph, 
    threshold = holmGraph,
    modelSearch = optimalGraph,
    glasso = glassoGraph)
  )
We can similarly compare the clustering coefficients:
clusteringPlot(
  list(
    saturated = pcorGraph, 
    threshold = holmGraph,
    modelSearch = optimalGraph,
    glasso = glassoGraph)
  )
## Warning: Removed 25 rows containing missing values (geom_point).
We can see that even though the graphs all are highly similar they can differ on centrality and clustering coefficients. However, generally all graphs are somewhat aligned; nodes that are highly central are so across all graphs.

References