Estimate graph dimension via eigenvalue cross-validation (EigCV).
A graph has dimension k
if the first k
eigenvectors of its adjacency
matrix are correlated with its population eigenspace, and the others are not.
Edge bootstrapping sub-samples the edges of the graph (without replacement).
Edge splitting separates the edges into a training part and a testing part.
Usage
eigcv(
A,
k_max,
...,
num_bootstraps = 10,
test_portion = 0.1,
alpha = 0.05,
method = c("none", "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr"),
laplacian = FALSE,
regularize = TRUE
)
Arguments
- A
The adjacency matrix of graph. Must be non-negative and integer valued.
- k_max
The maximum dimension of the graph to consider. This many eigenvectors are computed. Should be a non-negative integer smallish relative the dimensions of
A
.- ...
Ignored.
- num_bootstraps
The number of times to bootstrap the graph. Since cross-validated eigenvalues are based on a random graph split, they are themselves random. By repeatedly computing cross-validated eigenvalues for different sample splits, the idea is to smooth away some of the randomness due to the graph splits. A small number of bootstraps (3 to 10) usually suffices. Defaults to
10
. Test statistics (i.e. z-scores for cv eigenvalues) are averaged across bootstraps and the p-values will be calculated based on the averaged statistics.- test_portion
The portion of the graph to put into the test graph, as opposed to the training graph. Defaults to
0.1
. Must be strictly between zero and one.- alpha
Significance level for hypothesis tests. Each dimension
1, ..., k_max
is tested when estimating graph dimension, and the overall graph dimension is taken to be the smallest number of dimensions such that all the tests reject.- method
Method to adjust p-values for multiple testing. Must be one of
"none"
,"holm"
,"hochberg"
,"hommel"
,"bonferroni"
,"BH"
,"BY"
, or"fdr"
. Passed tostats::p.adjust()
. Defaults to"none"
.- laplacian
Logical value indicating where to compute cross-validated eigenvalues for the degree-normalize graph Laplacian rather than the graph adjacency matrix. Experimental and should be used with caution. Defaults to
FALSE
.- regularize
Only applicable when
laplacian == TRUE
, in which case this parameter controls whether or not the degree-normalized graph Laplacian is regularized. Defaults toTRUE
.
Value
A eigcv
object, which is a list with the following named
elements.
estimated_dimension
: inferred graph dimension.summary
: summary table of the tests.num_bootstraps
: number of bootstraps performed.test_portion
: graph splitting probability used.alpha
: significance level of each test.
Examples
library(fastRG)
set.seed(27)
B <- matrix(0.1, 5, 5)
diag(B) <- 0.3
model <- sbm(
n = 1000,
k = 5,
B = B,
expected_degree = 40,
poisson_edges = FALSE,
allow_self_loops = FALSE
)
A <- sample_sparse(model)
eigs<- eigcv(A, k_max = 10)
#> 'as(<dsCMatrix>, "dgCMatrix")' is deprecated.
#> Use 'as(., "generalMatrix")' instead.
#> See help("Deprecated") and help("Matrix-deprecated").
eigs
#> Estimated graph dimension: 5
#>
#> Number of bootstraps: 10
#> Edge splitting probabaility: 0.1
#> Significance level: 0.05
#>
#> ------------ Summary of Tests ------------
#> k z pvals padj
#> 1 60.0858888 0.000000e+00 0.000000e+00
#> 2 11.7538714 3.372802e-32 3.372802e-32
#> 3 11.1552401 3.375515e-29 3.375515e-29
#> 4 11.3242906 4.974047e-30 4.974047e-30
#> 5 9.5379830 7.281856e-22 7.281856e-22
#> 6 -1.1633387 8.776540e-01 8.776540e-01
#> 7 -1.2996582 9.031409e-01 9.031409e-01
#> 8 -1.1750915 8.800209e-01 8.800209e-01
#> 9 -1.1354378 8.719040e-01 8.719040e-01
#> 10 -0.8694766 8.077067e-01 8.077067e-01
#>
plot(eigs, type = "z-score") # default
plot(eigs, type = "adjacency")
plot(eigs, type = "laplacian")