Skip to contents

Estimate graph dimension via eigenvalue cross-validation (EigCV). A graph has dimension k if the first k eigenvectors of its adjacency matrix are correlated with its population eigenspace, and the others are not. Edge bootstrapping sub-samples the edges of the graph (without replacement). Edge splitting separates the edges into a training part and a testing part.

Usage

eigcv(
  A,
  k_max,
  ...,
  num_bootstraps = 10,
  test_portion = 0.1,
  alpha = 0.05,
  method = c("none", "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr"),
  laplacian = FALSE,
  regularize = TRUE
)

Arguments

A

The adjacency matrix of graph. Must be non-negative and integer valued.

k_max

The maximum dimension of the graph to consider. This many eigenvectors are computed. Should be a non-negative integer smallish relative the dimensions of A.

...

Ignored.

num_bootstraps

The number of times to bootstrap the graph. Since cross-validated eigenvalues are based on a random graph split, they are themselves random. By repeatedly computing cross-validated eigenvalues for different sample splits, the idea is to smooth away some of the randomness due to the graph splits. A small number of bootstraps (3 to 10) usually suffices. Defaults to 10. Test statistics (i.e. z-scores for cv eigenvalues) are averaged across bootstraps and the p-values will be calculated based on the averaged statistics.

test_portion

The portion of the graph to put into the test graph, as opposed to the training graph. Defaults to 0.1. Must be strictly between zero and one.

alpha

Significance level for hypothesis tests. Each dimension 1, ..., k_max is tested when estimating graph dimension, and the overall graph dimension is taken to be the smallest number of dimensions such that all the tests reject.

method

Method to adjust p-values for multiple testing. Must be one of "none", "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", or "fdr". Passed to stats::p.adjust(). Defaults to "none".

laplacian

Logical value indicating where to compute cross-validated eigenvalues for the degree-normalize graph Laplacian rather than the graph adjacency matrix. Experimental and should be used with caution. Defaults to FALSE.

regularize

Only applicable when laplacian == TRUE, in which case this parameter controls whether or not the degree-normalized graph Laplacian is regularized. Defaults to TRUE.

Value

A eigcv object, which is a list with the following named elements.

  • estimated_dimension: inferred graph dimension.

  • summary: summary table of the tests.

  • num_bootstraps: number of bootstraps performed.

  • test_portion: graph splitting probability used.

  • alpha: significance level of each test.

Examples


library(fastRG)

set.seed(27)

B <- matrix(0.1, 5, 5)
diag(B) <- 0.3

model <- sbm(
  n = 1000,
  k = 5,
  B = B,
  expected_degree = 40,
  poisson_edges = FALSE,
  allow_self_loops = FALSE
)

A <- sample_sparse(model)

eigs<- eigcv(A, k_max = 10)
#> 'as(<dsCMatrix>, "dgCMatrix")' is deprecated.
#> Use 'as(., "generalMatrix")' instead.
#> See help("Deprecated") and help("Matrix-deprecated").
eigs
#> Estimated graph dimension:	 5
#> 
#> Number of bootstraps:		 10
#> Edge splitting probabaility:	 0.1
#> Significance level:		 0.05
#> 
#>  ------------ Summary of Tests ------------
#>   k          z        pvals         padj
#>   1 60.0858888 0.000000e+00 0.000000e+00
#>   2 11.7538714 3.372802e-32 3.372802e-32
#>   3 11.1552401 3.375515e-29 3.375515e-29
#>   4 11.3242906 4.974047e-30 4.974047e-30
#>   5  9.5379830 7.281856e-22 7.281856e-22
#>   6 -1.1633387 8.776540e-01 8.776540e-01
#>   7 -1.2996582 9.031409e-01 9.031409e-01
#>   8 -1.1750915 8.800209e-01 8.800209e-01
#>   9 -1.1354378 8.719040e-01 8.719040e-01
#>  10 -0.8694766 8.077067e-01 8.077067e-01
#> 

plot(eigs, type = "z-score")    # default

plot(eigs, type = "adjacency")

plot(eigs, type = "laplacian")