library(igraph)
library(fastRG)

fastRG quickly samples a broad class of network models known as generalized random product graphs. In particular, for matrices \(X\), \(S\) and \(Y\), fastRG samples a matrix \(A\) with expectation \(X S Y^T\) where individual entries are Poisson distributed. We recommend that you think of \(A\) as the adjacency matrix for a graph (or a multi-graph). Crucially, the sampling is \(\mathcal O(m)\), where \(m\) is the number of the edges in graph. Other algorithms are \(\mathcal O(n^2)\), where \(n\) is the number of nodes in the network. For additional details, see the paper.

Example Usage

The easiest way to use fastRG is to use wrapper functions that sample from popular graph models. For example, to sample from an Erdos-Renyi graph n = 1,000,000 nodes and expected degree of five, we can use the erdos_renyi() function.

A <- erdos_renyi(n = 10^6, avg_deg = 5)

Return format

By default fastRG() returns a Matrix object stored in CSC, which may be binary or integer, and possibly symmetric. We can also ask for the graph as an edgelist. This results in a fast way to create igraph objects using igraph::graph_from_edgelist().

fastRG() allows for simulating from graph with E(A) = X S Y^T, where A and S could be rectangular. This is helpful for bipartite graphs or matrices of graph features.

n <- 10000
d <- 1000
k1 <- 5
k2 <- 3

X <- matrix(rpois(n = n * k1, 1), nrow = n)
Y <- matrix(rpois(n = d * k2, 1), nrow = d)
S <- matrix(runif(n = k1 * k2, 0, 0.1), nrow = k1)

A <- fastRG(X, S, Y, avg_deg = 10)

Key sampling options

  • avg_deg: Set the average degree of the graph by scaling sampling probabilities. We strongly, strongly recommend that you always set this option. If you do not, it is easy accidentally sample from large and dense graphs.

  • directed: Either TRUE or FALSE depending on whether you would like a direct or undirected graph.

  • poisson_edges: Either TRUE or FALSE depending on whether you would like a Bernoulli graph or a Poisson multi-graph. Scaling via avg_deg always assumes a Poisson multi-graph.

  • allow_self_edges: Whether nodes should be allowed to connect to themselves. Either TRUE or FALSE.

Some blockmodel examples

WARNING: content below here is rough!

K <- 10
n <- 500
pi <- rexp(K) + 1
pi <- pi / sum(pi) * 3
B <- matrix(rexp(K^2) + 1, nrow = K)
diag(B) <- diag(B) + mean(B) * K
theta <- rexp(n)

A <- dcsbm(theta, pi, B, avg_deg = 50)
mean(rowSums(A))  # average degree
#> [1] 49.852

If we remove multiple edges, the avg_deg parameter is not trustworthy:

A <- dcsbm(theta, pi, B, avg_deg = 50, poisson_edges = FALSE)
mean(rowSums(A))
#> [1] 37.93

but it is a good upper bound when the graph is sparse:

n <- 10000
A <- dcsbm(rexp(n), pi, B, avg_deg = 50, poisson_edges = FALSE)
mean(rowSums(A))
#> [1] 48.8003

This draws a 100 x 100 adjacency matrix from each model. Each image might take around 5 seconds to render.

K <- 10
n <- 100
pi <- rexp(K) + 1
pi <- pi / sum(pi) * 3
B <- matrix(rexp(K^2) + 1, nrow = K)
diag(B) <- diag(B) + mean(B) * K
theta <- rexp(n)
A <- dcsbm(theta, pi, B, avg_deg = 50)

plot_matrix(A[, n:1])

K <- 2
n <- 100
alpha <- c(1, 1) / 5
B <- diag(c(1, 1))
theta <- n
A <- dc_mixed(theta, alpha, B, avg_deg = 50, sort_nodes = TRUE)

plot_matrix(A[, theta:1] / max(A))

n <- 100
K <- 2
pi <- c(.7, .7)
B <- diag(c(1, 1))
theta <- n
A <- dc_overlapping(theta, pi, B, avg_deg = 50, sort_nodes = TRUE)

plot_matrix(t(A[, 1:n] / max(A)))

K <- 10
n <- 100
pi <- rexp(K) + 1
pi <- pi / sum(pi)
B <- matrix(rexp(K^2), nrow = K)
B <- B / (3 * max(B))
diag(B) <- diag(B) + mean(B) * K
A <- fastRG::sbm(n, pi, B, sort_nodes = TRUE)

plot_matrix(A)

Next sample a DC-SBM with 10,000 nodes. Then compute and plot the leading eigenspace.

library(RSpectra)

K <- 10
n <- 10000
pi <- rexp(K) + 1
pi <- pi / sum(pi)
pi <- -sort(-pi)
B <- matrix(rexp(K^2) + 1, nrow = K)
diag(B) <- diag(B) + mean(B) * K
A <- dcsbm(rgamma(n, shape = 2, scale = .4), pi, B, avg_deg = 20, simple = TRUE)
mean(rowSums(A))
#> [1] 19.8851
K <- 10
n <- 10000
pi <- rexp(K) + 1
pi <- pi / sum(pi)
pi <- -sort(-pi)
B <- matrix(rexp(K^2) + 1, nrow = K)
diag(B) <- diag(B) + mean(B) * K
A <- dcsbm(rgamma(n, shape = 2, scale = .4), pi, B, avg_deg = 20, simple = TRUE, sort_nodes = TRUE)
mean(rowSums(A))
#> [1] 19.8144

or

s <- sort(sample(n, 10000))
X <- t(apply(ei$vectors[, 1:K], 1, function(x) return(x / sqrt(sum(x^2) + 1 / n))))
plot(X[s, 3]) # highly localized eigenvectors

To sample from a degree corrected and node contextualized graph…

# Here are the parameters for the features:

thetaY <- rexp(d)
piFeatures <- rexp(K) + 1
piFeatures <- piFeatures / sum(piFeatures) * 3
BFeatures <- matrix(rexp(K^2) + 1, nrow = K)
diag(BFeatures) <- diag(BFeatures) + mean(BFeatures) * K

paraFeat <- dcsbm_params(theta = thetaY, pi = piFeatures, B = BFeatures)
# the node "degrees" in the features, should be related to their degrees in the graph.
X <- paraG$X
X@x <- paraG$X@x + rexp(n) # the degree parameter should be different. X@x + rexp(n) makes feature degrees correlated to degrees in graph.

# generate the graph and features
A <- fastRG(paraG$X, paraG$S, avg_deg = 10)
features <- fastRG(X, paraFeat$S, paraFeat$X, avg_deg = 20)