An implementation of the AdaptiveImpute
algorithm for matrix completion
for sparse matrices.
Usage
adaptive_impute(
X,
rank,
...,
initialization = c("svd", "adaptive-initialize", "approximate"),
max_iter = 200L,
check_interval = 1L,
epsilon = 1e-07,
additional = NULL
)
# S3 method for sparseMatrix
adaptive_impute(
X,
rank,
...,
initialization = c("svd", "adaptive-initialize", "approximate"),
additional = NULL
)
# S3 method for LRMF
adaptive_impute(
X,
rank,
...,
epsilon = 1e-07,
max_iter = 200L,
check_interval = 1L
)
Arguments
- X
A sparse matrix of
Matrix::sparseMatrix()
class.- rank
Desired rank (integer) to use in the low rank approximation. Must be at least
2L
and at most the rank ofX
. Note that the rank ofX
is typically unobserved and computations may be unstable or even fail whenrank
is near or exceeds this threshold.- ...
Unused additional arguments.
- initialization
How to initialize the low rank approximation. Options are:
"svd"
(default). In the initialization step, this treats unobserved values as zeroes."adaptive-initialize"
. In the initialization step, this treats unobserved values as actually unobserved. However, the currentAdaptiveInitialize
implementation relies on dense matrix computations that are only suitable for relatively small matrices."approximate"
. An approximate variant ofAdaptiveInitialize
that is less computationally expensive. Seeadaptive_initialize
for details.
Note that initialization matters as
AdaptiveImpute
optimizes a non-convex objective. The current theory shows that initializing withAdaptiveInitialize
leads to a consistent estimator, but it isn't know if this is the case for SVD initialization. Empirically we have found that SVD initialization works well nonetheless.- max_iter
Maximum number of iterations to perform (integer). Defaults to
200L
. In practice 10 or so iterations will get you a decent approximation to use in exploratory analysis, and and 50-100 will get you most of the way to convergence. Must be at least1L
.- check_interval
Integer specifying how often to perform convergence checks. Defaults to
1L
. In practice, check for convergence requires a norm calculation that is expensive for large matrices and decreasing the frequency of convergence checks will reduce computation time. Can also be set toNULL
, which casemax_iter
iterations of the algorithm will occur with no possibility of stopping due to small relative change in the imputed matrix. In this casedelta
will be reported asInf
.- epsilon
Convergence criteria, measured in terms of relative change in Frobenius norm of the full imputed matrix. Defaults to
1e-7
.- additional
Ignored except when
alpha_method = "approximate"
in which case it controls the precise of the approximation toalpha
. The approximate computation ofalpha
will always understandalpha
, but the approximation will be better for larger values ofadditional
. We recommend makingadditional
as large as computationally tolerable.
Value
A low rank matrix factorization represented by an
adaptive_imputation()
object.
References
Cho, Juhee, Donggyu Kim, and Karl Rohe. “Asymptotic Theory for Estimating the Singular Vectors and Values of a Partially-Observed Low Rank Matrix with Noise.” Statistica Sinica, 2018. https://doi.org/10.5705/ss.202016.0205.
———. “Intelligent Initialization and Adaptive Thresholding for Iterative Matrix Completion: Some Statistical and Algorithmic Theory for Adaptive-Impute.” Journal of Computational and Graphical Statistics 28, no. 2 (April 3, 2019): 323–33. https://doi.org/10.1080/10618600.2018.1518238.
Examples
mf <- adaptive_impute(ml100k, rank = 3L, max_iter = 5L, check_interval = NULL)
#> Warning:
#> Reached maximum allowed iterations. Returning early.
mf
#>
#> Adaptively Imputed Low Rank Matrix Factorization
#> ------------------------------------------------
#>
#> Rank: 3
#>
#> Rows: 943
#> Cols: 1682
#>
#> d[rank]: 467.486
#> alpha: 144.663
#>
#> Components
#>
#> u: 943 x 3 [matrix]
#> d: 3 [numeric]
#> v: 1682 x 3 [matrix]