Computes the personalized pagerank for specified seeds using the
ApproximatePageRank algorithm of Andersen et al. (2006). Computes
degree-adjustments and degree-regularization of personalized
pagerank vectors as described in Algorithms 3 and 4 of Chen et al. (2019).
These algorithms are randomized; if results are unstable across
multiple runs, decrease epsilon.
Usage
appr(
graph,
seeds,
...,
alpha = 0.15,
epsilon = 1e-06,
tau = NULL,
max_visits = Inf
)
# S3 method for igraph
appr(graph, seeds, ...)
# S3 method for rtweet_graph
appr(graph, seeds, ...)Arguments
- graph
An
abstract_graph()object, such as that created byrtweet_graph(). This argument is required.- seeds
A character vector of seeds for the personalized pagerank. The personalized pagerank will return to each of these seeds with probability
alphaat each node transition. At the moment, all seeds are given equal weighting. This argument is required.- ...
Ignored. Passing arguments to
...results in a warning.- alpha
Teleportation constant. The teleportation constant is the probability of returning to a seed node at each node transition.
alphamust be a valid probabilty; that is, between zero and one. Defaults to0.15. This is the inverse of the "dampening factor" in the original PageRank paper, soalpha = 0.15corresponds to a dampening factor of0.85. Runtime is proportional to1 / (epsilon * alpha), so smallalphacan result in long runtimes.- epsilon
Desired accuracy of approximation.
epsilonmust be a small positive number. Defaults to1e-6.aPPRguarantees that approximated personalized pageranks are uniformly withinepsilonof their true value. That is, the approximation is guaranteed to be good in an L-infinity sense. This does not guarantee, however, that a ranking of nodes by aPPR is close to a ranking of nodes by PPR.For Twitter graphs, we recommend testing your code with
1e-4or1e-5, using1e-6for exploration, and1e-7to1e-8for final results, although these numbers are very rough. It also perfectly reasonable to runaPPRfor a given number of steps (set viamax_visits), and then note the approximation accuracy of your results. Internally,aPPRkeeps a running estimate of achieved accuracy that is always valid.Anytime you would like to explore more of the graph, you can simply decrease
epsilon. So you can start withepsilon = 1e-5and then gradually decreaseepsilonuntil you have a sample of the graph that you are happy with.Also note that runtime is proportional to
1 / (epsilon * alpha), so smallepsiloncan result in long runtimes.- tau
Regularization term. Additionally inflates the in-degree of each observation by this term by performing the degree adjustment described in Algorithm 3 and Algorithm 4, which are described in
vignette("Mathematical details"). Defaults toNULL, in which casetauis set to the average in-degree of the observed nodes. In general, setting it's reasonable to settauto the average in-degree of the graph.- max_visits
Maximum number of unique nodes to visit. Should be a positive integer. Defaults to
Inf, such that there is no upper bound on the number of unique nodes to visit. Useful when you want to specify a fixed amount of computation (or API calls) to use rather than an error tolerance. We recommend debugging withmax_visits ~ 20, exploration withmax_visitsin the hundreds, andmax_visitsin the thousands to ten of thousands for precise results, although this is a very rough heuristic.
Value
A Tracker() object. Most relevant is the stats field,
a tibble::tibble() with the following columns:
name: Name of a node (character).p: Current estimate of residual per out-degree for a node.r: Estimated error of pagerank estimate for a node.in_degree: Number of incoming edges to a node.out_degree: Number of outcoming edges from a node.degree_adjusted: The personalized pagerank divided by the node in-degree.regularized: The personalized pagerank divide by the node in-degree plustau.
When computing personalized pageranks for Twitter users (either
via rtweet_graph(), name is given
as a user ID, not a screen name, regardless of how the seed nodes
were specified.
References
Chen, Fan, Yini Zhang, and Karl Rohe. “Targeted Sampling from Massive Block Model Graphs with Personalized PageRank.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82, no. 1 (February 2020): 99–126. https://doi.org/10.1111/rssb.12349.
Andersen, Reid, Fan Chung, and Kevin Lang. “Local Graph Partitioning Using PageRank Vectors.” In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06), 475–86. Berkeley, CA, USA: IEEE, 2006. https://doi.org/10.1109/FOCS.2006.44.
Examples
library(aPPR)
library(igraph)
#>
#> Attaching package: ‘igraph’
#> The following object is masked from ‘package:aPPR’:
#>
#> neighborhood
#> The following objects are masked from ‘package:stats’:
#>
#> decompose, spectrum
#> The following object is masked from ‘package:base’:
#>
#> union
set.seed(27)
graph <- rtweet_graph()
if (FALSE) {
appr(graph, "alexpghayes")
}
graph2 <- sample_pa(100)
# this creates a Tracker object
ppr_results <- appr(graph2, seeds = "5")
# the portion of the Tracker object you probably care about
ppr_results$stats
#> # A tibble: 2 × 7
#> name regularized p in_degree out_degree degree_adjusted r
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 5 0.0300 0.150 2 1 0.0750 0.000000833
#> 2 4 0.0182 0.127 4 1 0.0319 0.000000667