Create an undirected overlapping degree corrected stochastic blockmodel object
Source:R/undirected_overlapping_sbm.R
overlapping_sbm.Rd
To specify a overlapping stochastic blockmodel, you must specify
the number of nodes (via n
),
the mixing matrix (via k
or B
), and the block
probabilities (optional, via pi
). We provide defaults for most of these
options to enable rapid exploration, or you can invest the effort
for more control over the model parameters. We strongly recommend
setting the expected_degree
or expected_density
argument
to avoid large memory allocations associated with
sampling large, dense graphs.
Usage
overlapping_sbm(
n,
k = NULL,
B = NULL,
...,
pi = rep(1/k, k),
sort_nodes = TRUE,
force_pure = TRUE,
poisson_edges = TRUE,
allow_self_loops = TRUE
)
Arguments
- n
The number of nodes in the overlapping SBM.
- k
(mixing matrix) The number of blocks in the blockmodel. Use when you don't want to specify the mixing-matrix by hand. When
k
is specified,B
is set to a diagonal dominant matrix with value0.8
along the diagonal and0.1 / (k - 1)
on the off-diagonal.k
defaults toNULL
. You must specify eitherk
orB
, but not both.- B
(mixing matrix) A
k
byk
matrix of block connection probabilities. The probability that a node in blocki
connects to a node in communityj
isPoisson(B[i, j])
. Must be an invertible, symmetric square matrix.matrix
andMatrix
objects are both acceptable. IfB
is not symmetric, it will be symmetrized via the updateB := B + t(B)
. Defaults toNULL
. You must specify eitherk
orB
, but not both.- ...
Arguments passed on to
undirected_factor_model
expected_degree
If specified, the desired expected degree of the graph. Specifying
expected_degree
simply rescalesS
to achieve this. Defaults toNULL
. Do not specify bothexpected_degree
andexpected_density
at the same time.expected_density
If specified, the desired expected density of the graph. Specifying
expected_density
simply rescalesS
to achieve this. Defaults toNULL
. Do not specify bothexpected_degree
andexpected_density
at the same time.
- pi
(block probabilities) Probability of membership in each block. Membership in each block is independent under the overlapping SBM. Defaults to
rep(1 / k, k)
.- sort_nodes
Logical indicating whether or not to sort the nodes so that they are grouped by block. Useful for plotting. Defaults to
TRUE
.- force_pure
Logical indicating whether or not to force presence of "pure nodes" (nodes that belong only to a single community) for the sake of identifiability. To include pure nodes, block membership sampling first proceeds as per usual. Then, after it is complete,
k
nodes are chosen randomly as pure nodes, one for each block. Defaults toTRUE
.- poisson_edges
Logical indicating whether or not multiple edges are allowed to form between a pair of nodes. Defaults to
TRUE
. WhenFALSE
, sampling proceeds as usual, and duplicate edges are removed afterwards. Further, whenFALSE
, we assume thatS
specifies a desired between-factor connection probability, and back-transform thisS
to the appropriate Poisson intensity parameter to approximate Bernoulli factor connection probabilities. See Section 2.3 of Rohe et al. (2017) for some additional details.- allow_self_loops
Logical indicating whether or not nodes should be allowed to form edges with themselves. Defaults to
TRUE
. WhenFALSE
, sampling proceeds allowing self-loops, and these are then removed after the fact.
Value
An undirected_overlapping_sbm
S3 object, a subclass of the
undirected_factor_model()
with the following additional
fields:
pi
: Sampling probabilities for each block.sorted
: Logical indicating where nodes are arranged by block (and additionally by degree heterogeneity parameter) within each block.
Generative Model
There are two levels of randomness in a degree-corrected
overlapping stochastic blockmodel. First, for each node, we
independently determine if that node is a member of each block. This is
handled by overlapping_sbm()
. Then, given these block memberships,
we randomly sample edges between nodes. This second
operation is handled by sample_edgelist()
,
sample_sparse()
, sample_igraph()
and
sample_tidygraph()
, depending depending on your desired
graph representation.
References
Kaufmann, Emilie, Thomas Bonald, and Marc Lelarge. "A Spectral Algorithm with Additive Clustering for the Recovery of Overlapping Communities in Networks," Vol. 9925. Lecture Notes in Computer Science. Cham: Springer International Publishing, 2016. https://doi.org/10.1007/978-3-319-46379-7.
Latouche, Pierre, Etienne Birmelé, and Christophe Ambroise. "Overlapping Stochastic Block Models with Application to the French Political Blogosphere." The Annals of Applied Statistics 5, no. 1 (March 2011): 309–36. https://doi.org/10.1214/10-AOAS382.
Zhang, Yuan, Elizaveta Levina, and Ji Zhu. "Detecting Overlapping Communities in Networks Using Spectral Methods." ArXiv:1412.3432, December 10, 2014. http://arxiv.org/abs/1412.3432.
See also
Other stochastic block models:
dcsbm()
,
directed_dcsbm()
,
mmsbm()
,
planted_partition()
,
sbm()
Other undirected graphs:
chung_lu()
,
dcsbm()
,
erdos_renyi()
,
mmsbm()
,
planted_partition()
,
sbm()
Examples
set.seed(27)
lazy_overlapping_sbm <- overlapping_sbm(n = 1000, k = 5, expected_density = 0.01)
#> Setting `B` to a matrix with value 0.8 on the diagonal and 0.1 / (k - 1) on the off-diagonal. This parameterization may change in the future. Explicitly set `B` for reproducible results.
lazy_overlapping_sbm
#> Undirected Degree-Corrected Overlapping Blockmodel
#> -------------------------------------------------
#>
#> Nodes (n): 1000 (sorted)
#> Blocks (k): 5
#>
#> Traditional Overlapping SBM parameterization:
#>
#> Block memberships (Z): 1000 x 5 [dgCMatrix]
#> Block probabilities (pi): 5 [numeric]
#>
#> Block connection propensities (B): 5 x 5 [matrix]
#>
#> Factor model parameterization:
#>
#> X: 1000 x 5 [dgCMatrix]
#> S: 5 x 5 [dsyMatrix]
#>
#> Poisson edges: TRUE
#> Allow self loops: TRUE
#>
#> Expected edges: 4995
#> Expected degree: 5
#> Expected density: 0.01
# sometimes you gotta let the world burn and
# sample a wildly dense graph
dense_lazy_overlapping_sbm <- overlapping_sbm(n = 500, k = 3, expected_density = 0.8)
#> Setting `B` to a matrix with value 0.8 on the diagonal and 0.1 / (k - 1) on the off-diagonal. This parameterization may change in the future. Explicitly set `B` for reproducible results.
dense_lazy_overlapping_sbm
#> Undirected Degree-Corrected Overlapping Blockmodel
#> -------------------------------------------------
#>
#> Nodes (n): 500 (sorted)
#> Blocks (k): 3
#>
#> Traditional Overlapping SBM parameterization:
#>
#> Block memberships (Z): 500 x 3 [dgCMatrix]
#> Block probabilities (pi): 3 [numeric]
#>
#> Block connection propensities (B): 3 x 3 [matrix]
#>
#> Factor model parameterization:
#>
#> X: 500 x 3 [dgCMatrix]
#> S: 3 x 3 [dsyMatrix]
#>
#> Poisson edges: TRUE
#> Allow self loops: TRUE
#>
#> Expected edges: 99800
#> Expected degree: 199.6
#> Expected density: 0.8
k <- 5
n <- 1000
B <- matrix(stats::runif(k * k), nrow = k, ncol = k)
pi <- c(1, 2, 4, 1, 1) / 5
custom_overlapping_sbm <- overlapping_sbm(
n = 200,
B = B,
pi = pi,
expected_degree = 5
)
custom_overlapping_sbm
#> Undirected Degree-Corrected Overlapping Blockmodel
#> -------------------------------------------------
#>
#> Nodes (n): 200 (sorted)
#> Blocks (k): 5
#>
#> Traditional Overlapping SBM parameterization:
#>
#> Block memberships (Z): 200 x 5 [dgCMatrix]
#> Block probabilities (pi): 5 [numeric]
#>
#> Block connection propensities (B): 5 x 5 [matrix]
#>
#> Factor model parameterization:
#>
#> X: 200 x 5 [dgCMatrix]
#> S: 5 x 5 [dgeMatrix]
#>
#> Poisson edges: TRUE
#> Allow self loops: TRUE
#>
#> Expected edges: 1000
#> Expected degree: 5
#> Expected density: 0.05025
edgelist <- sample_edgelist(custom_overlapping_sbm)
edgelist
#> # A tibble: 1,026 × 2
#> from to
#> <int> <int>
#> 1 3 33
#> 2 2 8
#> 3 1 1
#> 4 28 39
#> 5 26 33
#> 6 10 19
#> 7 19 32
#> 8 21 24
#> 9 25 41
#> 10 23 33
#> # … with 1,016 more rows
# efficient eigendecompostion that leverages low-rank structure in
# E(A) so that you don't have to form E(A) to find eigenvectors,
# as E(A) is typically dense. computation is
# handled via RSpectra
population_eigs <- eigs_sym(custom_overlapping_sbm)