CitationImpute — citation_impute • fastadi

An implementation of the AdaptiveImpute algorithm using efficient sparse matrix computations, specialized for the case when missing values in the upper triangle are taken to be explicitly observed zeros, as opposed to missing values. This is primarily useful for spectral decompositions of adjacency matrices of graphs with (near) tree structure, such as citation networks.

Usage

citation_impute(
  X,
  rank,
  ...,
  initialization = c("svd", "adaptive-initialize", "approximate"),
  max_iter = 200L,
  check_interval = 1L,
  epsilon = 1e-07,
  additional = NULL
)

# S3 method for class 'sparseMatrix'
citation_impute(
  X,
  rank,
  ...,
  initialization = c("svd", "adaptive-initialize", "approximate"),
  additional = NULL
)

# S3 method for class 'LRMF'
citation_impute(
  X,
  rank,
  ...,
  epsilon = 1e-07,
  max_iter = 200L,
  check_interval = 1L
)

Arguments

X

A square sparse matrix of Matrix::sparseMatrix() class. Implicit zeros in the upper triangle of this matrix are considered observed and predictions on these elements contribute to the objective function minimized by AdaptiveImpute.

rank

Desired rank (integer) to use in the low rank approximation. Must be at least 2L and at most the rank of X. Note that the rank of X is typically unobserved and computations may be unstable or even fail when rank is near or exceeds this threshold.

...

Unused additional arguments.

initialization

How to initialize the low rank approximation. Options are:

"svd" (default). In the initialization step, this treats unobserved values as zeroes.
"adaptive-initialize". In the initialization step, this treats unobserved values as actually unobserved. However, the current AdaptiveInitialize implementation relies on dense matrix computations that are only suitable for relatively small matrices.
"approximate". An approximate variant of AdaptiveInitialize that is less computationally expensive. See adaptive_initialize for details.

Note that initialization matters as AdaptiveImpute optimizes a non-convex objective. The current theory shows that initializing with AdaptiveInitialize leads to a consistent estimator, but it isn't know if this is the case for SVD initialization. Empirically we have found that SVD initialization works well nonetheless.

max_iter

Maximum number of iterations to perform (integer). Defaults to 200L. In practice 10 or so iterations will get you a decent approximation to use in exploratory analysis, and and 50-100 will get you most of the way to convergence. Must be at least 1L.

check_interval

Integer specifying how often to perform convergence checks. Defaults to 1L. In practice, check for convergence requires a norm calculation that is expensive for large matrices and decreasing the frequency of convergence checks will reduce computation time. Can also be set to NULL, which case max_iter iterations of the algorithm will occur with no possibility of stopping due to small relative change in the imputed matrix. In this case delta will be reported as Inf.

epsilon

Convergence criteria, measured in terms of relative change in Frobenius norm of the full imputed matrix. Defaults to 1e-7.

additional

Ignored except when alpha_method = "approximate" in which case it controls the precise of the approximation to alpha. The approximate computation of alpha will always understand alpha, but the approximation will be better for larger values of additional. We recommend making additional as large as computationally tolerable.

Value

A low rank matrix factorization represented by an adaptive_imputation() object.

Details

If OpenMP is available, citation_impute will automatically use getOption("Ncpus", 1L) OpenMP threads to parallelize some key computations. Note that some computations are performed with the Armadillo C++ linear algebra library and may also be parallelized dependent on your BLAS and LAPACK installations and configurations.

Examples


# create a (binary) square sparse matrix to demonstrate on

set.seed(887)

n <- 10
A <- rsparsematrix(n, n, 0.1, rand.x = NULL)

mf <- citation_impute(A, rank = 3L, max_iter = 1L, check_interval = NULL)
#> INFO [2025-05-02 19:14:27] Matrix has 10 non-zero elements, 4 in the upper triangle (including the diagonal), and 6 in the strict lower triangle.
#> INFO [2025-05-02 19:14:27] Using svd initialization.
#> INFO [2025-05-02 19:14:27] Done initializing.
#> INFO [2025-05-02 19:14:27] Beginning AdaptiveImpute (max 1 iterations).
#> INFO [2025-05-02 19:14:27] Iter 1 complete. delta = Inf, alpha = 0.345140654160396
#> Warning: 
#> Reached maximum allowed iterations. Returning early.
mf
#> 
#> Adaptively Imputed Low Rank Matrix Factorization
#> ------------------------------------------------
#> 
#> Rank: 3
#> 
#> Rows: 10
#> Cols: 10
#> 
#> d[rank]: 1.286
#> alpha:   0.345
#> 
#> Components
#> 
#> u: 10 x 3 [matrix] 
#> d: 3      [numeric] 
#> v: 10 x 3 [matrix]