Create a directed degree corrected stochastic blockmodel object
Source:R/directed_dcsbm.R
directed_dcsbm.Rd
To specify a degree-corrected stochastic blockmodel, you must specify
the degree-heterogeneity parameters (via n_in
or theta_in
, and
n_out
or theta_out
), the mixing matrix
(via k_in
and k_out
, or B
), and the relative block
probabilities (optional, via p_in
and pi_out
).
We provide defaults for most of these
options to enable rapid exploration, or you can invest the effort
for more control over the model parameters. We strongly recommend
setting the expected_in_degree
, expected_out_degree
,
or expected_density
argument
to avoid large memory allocations associated with
sampling large, dense graphs.
Arguments
- n
(degree heterogeneity) The number of nodes in the blockmodel. Use when you don't want to specify the degree-heterogeneity parameters
theta_in
andtheta_out
by hand. Whenn
is specified,theta_in
andtheta_out
are randomly generated from aLogNormal(2, 1)
distribution. This is subject to change, and may not be reproducible.n
defaults toNULL
. You must specify eithern
ortheta_in
andtheta_out
together, but not both.- theta_in
(degree heterogeneity) A numeric vector explicitly specifying the degree heterogeneity parameters. This implicitly determines the number of nodes in the resulting graph, i.e. it will have
length(theta_in)
nodes. Must be positive. Setting to a vector of ones recovers a stochastic blockmodel without degree correction. Defaults toNULL
. You must specify eithern
ortheta_in
andtheta_out
together, but not both.- theta_out
(degree heterogeneity) A numeric vector explicitly specifying the degree heterogeneity parameters. This implicitly determines the number of nodes in the resulting graph, i.e. it will have
length(theta)
nodes. Must be positive. Setting to a vector of ones recovers a stochastic blockmodel without degree correction. Defaults toNULL
. You must specify eithern
ortheta_in
andtheta_out
together, but not both.- k_in
(mixing matrix) The number of blocks in the blockmodel. Use when you don't want to specify the mixing-matrix by hand. When
k_in
is specified, the elements ofB
are drawn randomly from aUniform(0, 1)
distribution. This is subject to change, and may not be reproducible.k_in
defaults toNULL
. You must specify eitherk_in
andk_out
together, orB
. You may specify all three at once, in which casek_in
is only used to setpi_in
(whenpi_in
is left at its default argument value).- k_out
(mixing matrix) The number of blocks in the blockmodel. Use when you don't want to specify the mixing-matrix by hand. When
k_out
is specified, the elements ofB
are drawn randomly from aUniform(0, 1)
distribution. This is subject to change, and may not be reproducible.k_out
defaults toNULL
. You may specify all three at once, in which casek_out
is only used to setpi_out
(whenpi_out
is left at its default argument value).- B
(mixing matrix) A
k_in
byk_out
matrix of block connection probabilities. The probability that a node in blocki
connects to a node in communityj
isPoisson(B[i, j])
.matrix
andMatrix
objects are both acceptable. Defaults toNULL
. You must specify eitherk_in
andk_out
together, orB
, but not both.- ...
Arguments passed on to
directed_factor_model
expected_in_degree
If specified, the desired expected in degree of the graph. Specifying
expected_in_degree
simply rescalesS
to achieve this. Defaults toNULL
. Specify only one ofexpected_in_degree
,expected_out_degree
, andexpected_density
.expected_out_degree
If specified, the desired expected out degree of the graph. Specifying
expected_out_degree
simply rescalesS
to achieve this. Defaults toNULL
. Specify only one ofexpected_in_degree
,expected_out_degree
, andexpected_density
.expected_density
If specified, the desired expected density of the graph. Specifying
expected_density
simply rescalesS
to achieve this. Defaults toNULL
. Specify only one ofexpected_in_degree
,expected_out_degree
, andexpected_density
.
- pi_in
(relative block probabilities) Relative block probabilities. Must be positive, but do not need to sum to one, as they will be normalized internally. Must match the rows of
B
, ork_in
. Defaults torep(1 / k_in, k_in)
, or a balanced incoming blocks.- pi_out
(relative block probabilities) Relative block probabilities. Must be positive, but do not need to sum to one, as they will be normalized internally. Must match the columns of
B
, ork_out
. Defaults torep(1 / k_out, k_out)
, or a balanced outgoing blocks.- sort_nodes
Logical indicating whether or not to sort the nodes so that they are grouped by block. Useful for plotting. Defaults to
TRUE
.- force_identifiability
Logical indicating whether or not to normalize
theta_in
such that it sums to one within each incoming block andtheta_out
such that it sums to one within each outgoing block. Defaults toTRUE
.- poisson_edges
Logical indicating whether or not multiple edges are allowed to form between a pair of nodes. Defaults to
TRUE
. WhenFALSE
, sampling proceeds as usual, and duplicate edges are removed afterwards. Further, whenFALSE
, we assume thatS
specifies a desired between-factor connection probability, and back-transform thisS
to the appropriate Poisson intensity parameter to approximate Bernoulli factor connection probabilities. See Section 2.3 of Rohe et al. (2017) for some additional details.- allow_self_loops
Logical indicating whether or not nodes should be allowed to form edges with themselves. Defaults to
TRUE
. WhenFALSE
, sampling proceeds allowing self-loops, and these are then removed after the fact.
Value
A directed_dcsbm
S3 object, a subclass of the
directed_factor_model()
with the following additional
fields:
theta_in
: A numeric vector of incoming community degree-heterogeneity parameters.theta_out
: A numeric vector of outgoing community degree-heterogeneity parameters.z_in
: The incoming community memberships of each node, as afactor()
. The factor will havek_in
levels, wherek_in
is the number of incoming communities in the stochastic blockmodel. There will not always necessarily be observed nodes in each community.z_out
: The outgoing community memberships of each node, as afactor()
. The factor will havek_out
levels, wherek_out
is the number of outgoing communities in the stochastic blockmodel. There will not always necessarily be observed nodes in each community.pi_in
: Sampling probabilities for each incoming community.pi_out
: Sampling probabilities for each outgoing community.sorted
: Logical indicating where nodes are arranged by block (and additionally by degree heterogeneity parameter) within each block.
Generative Model
There are two levels of randomness in a directed degree-corrected
stochastic blockmodel. First, we randomly chose a incoming
block membership and an outgoing block membership
for each node in the blockmodel. This is
handled by directed_dcsbm()
. Then, given these block memberships,
we randomly sample edges between nodes. This second
operation is handled by sample_edgelist()
,
sample_sparse()
, sample_igraph()
and
sample_tidygraph()
, depending on your desired
graph representation.
Block memberships
Let \(x\) represent the incoming block membership of a node and \(y\) represent the outgoing block membership of a node. To generate \(x\) we sample from a categorical distribution with parameter \(\pi_in\). To generate \(y\) we sample from a categorical distribution with parameter \(\pi_out\). Block memberships are independent across nodes. Incoming and outgoing block memberships of the same node are also independent.
Degree heterogeneity
In addition to block membership, the DCSBM also nodes to have different propensities for incoming and outgoing edge formation. We represent the propensity to form incoming edges for a given node by a positive number \(\theta_in\). We represent the propensity to form outgoing edges for a given node by a positive number \(\theta_out\). Typically the \(\theta_in\) (and \(theta_out\)) across all nodes are constrained to sum to one for identifiability purposes, but this doesn't really matter during sampling.
Edge formulation
Once we know the block memberships \(x\) and \(y\)
and the degree heterogeneity parameters \(\theta_{in}\) and
\(\theta_{out}\), we need one more
ingredient, which is the baseline intensity of connections
between nodes in block i
and block j
. Then each edge forms
independently according to a Poisson distribution with
parameters
$$ \lambda = \theta_{in} * B_{x, y} * \theta_{out}. $$
See also
Other stochastic block models:
dcsbm()
,
mmsbm()
,
overlapping_sbm()
,
planted_partition()
,
sbm()
Other directed graphs:
directed_erdos_renyi()
Examples
set.seed(27)
B <- matrix(0.2, nrow = 5, ncol = 8)
diag(B) <- 0.9
ddcsbm <- directed_dcsbm(
n = 1000,
B = B,
k_in = 5,
k_out = 8,
expected_density = 0.01
)
#> Generating random degree heterogeneity parameters `theta_in` and `theta_out` from LogNormal(2, 1) distributions. This distribution may change in the future. Explicitly set `theta_in` and `theta_out` for reproducible results.
ddcsbm
#> Directed Degree-Corrected Stochastic Blockmodel
#> -----------------------------------------------
#>
#> Nodes (n): 1000 (arranged by block)
#> Incoming Blocks (k_in): 5
#> Outgoing Blocks (k_out): 8
#>
#> Traditional DCSBM parameterization:
#>
#> Block memberships (z_in): 1000 [factor]
#> Block memberships (z_out): 1000 [factor]
#> Degree heterogeneity (theta_in): 1000 [numeric]
#> Degree heterogeneity (theta_out): 1000 [numeric]
#> Block probabilities (pi_in): 5 [numeric]
#> Block probabilities (pi_out): 8 [numeric]
#>
#> Factor model parameterization:
#>
#> X: 1000 x 5 [dgCMatrix]
#> S: 5 x 8 [dgeMatrix]
#> Y: 1000 x 8 [dgCMatrix]
#>
#> Poisson edges: TRUE
#> Allow self loops: TRUE
#>
#> Expected edges: 10000
#> Expected in degree: 10
#> Expected out degree: 10
#> Expected density: 0.01
population_svd <- svds(ddcsbm)