An implementation of the AdaptiveImpute
algorithm using efficient
sparse matrix computations, specialized for the case when missing
values in the upper triangle are taken to be explicitly observed
zeros, as opposed to missing values. This is primarily
useful for spectral decompositions of adjacency matrices of graphs
with (near) tree structure, such as citation networks.
Usage
citation_impute(
X,
rank,
...,
initialization = c("svd", "adaptive-initialize", "approximate"),
max_iter = 200L,
check_interval = 1L,
epsilon = 1e-07,
additional = NULL
)
# S3 method for sparseMatrix
citation_impute(
X,
rank,
...,
initialization = c("svd", "adaptive-initialize", "approximate"),
additional = NULL
)
# S3 method for LRMF
citation_impute(
X,
rank,
...,
epsilon = 1e-07,
max_iter = 200L,
check_interval = 1L
)
Arguments
- X
A square sparse matrix of
Matrix::sparseMatrix()
class. Implicit zeros in the upper triangle of this matrix are considered observed and predictions on these elements contribute to the objective function minimized byAdaptiveImpute
.- rank
Desired rank (integer) to use in the low rank approximation. Must be at least
2L
and at most the rank ofX
. Note that the rank ofX
is typically unobserved and computations may be unstable or even fail whenrank
is near or exceeds this threshold.- ...
Unused additional arguments.
- initialization
How to initialize the low rank approximation. Options are:
"svd"
(default). In the initialization step, this treats unobserved values as zeroes."adaptive-initialize"
. In the initialization step, this treats unobserved values as actually unobserved. However, the currentAdaptiveInitialize
implementation relies on dense matrix computations that are only suitable for relatively small matrices."approximate"
. An approximate variant ofAdaptiveInitialize
that is less computationally expensive. Seeadaptive_initialize
for details.
Note that initialization matters as
AdaptiveImpute
optimizes a non-convex objective. The current theory shows that initializing withAdaptiveInitialize
leads to a consistent estimator, but it isn't know if this is the case for SVD initialization. Empirically we have found that SVD initialization works well nonetheless.- max_iter
Maximum number of iterations to perform (integer). Defaults to
200L
. In practice 10 or so iterations will get you a decent approximation to use in exploratory analysis, and and 50-100 will get you most of the way to convergence. Must be at least1L
.- check_interval
Integer specifying how often to perform convergence checks. Defaults to
1L
. In practice, check for convergence requires a norm calculation that is expensive for large matrices and decreasing the frequency of convergence checks will reduce computation time. Can also be set toNULL
, which casemax_iter
iterations of the algorithm will occur with no possibility of stopping due to small relative change in the imputed matrix. In this casedelta
will be reported asInf
.- epsilon
Convergence criteria, measured in terms of relative change in Frobenius norm of the full imputed matrix. Defaults to
1e-7
.- additional
Ignored except when
alpha_method = "approximate"
in which case it controls the precise of the approximation toalpha
. The approximate computation ofalpha
will always understandalpha
, but the approximation will be better for larger values ofadditional
. We recommend makingadditional
as large as computationally tolerable.
Value
A low rank matrix factorization represented by an
adaptive_imputation()
object.
Details
If OpenMP is available, citation_impute
will automatically
use getOption("Ncpus", 1L)
OpenMP threads to parallelize some
key computations. Note that some computations are performed with
the Armadillo C++ linear algebra library and may also be parallelized
dependent on your BLAS and LAPACK installations and configurations.
Examples
# create a (binary) square sparse matrix to demonstrate on
set.seed(887)
n <- 10
A <- rsparsematrix(n, n, 0.1, rand.x = NULL)
mf <- citation_impute(A, rank = 3L, max_iter = 1L, check_interval = NULL)
#> Warning:
#> Reached maximum allowed iterations. Returning early.
mf
#>
#> Adaptively Imputed Low Rank Matrix Factorization
#> ------------------------------------------------
#>
#> Rank: 3
#>
#> Rows: 10
#> Cols: 10
#>
#> d[rank]: 1.286
#> alpha: 0.345
#>
#> Components
#>
#> u: 10 x 3 [matrix]
#> d: 3 [numeric]
#> v: 10 x 3 [matrix]