The fasthplus R package provides fast approximations for
metrics of discordance or dissimilarity.
The metric G+ was introduced by W. T. Williams in 1971 as a way to measure the discordance or dissimilarity between two different classifications (where the classification consists of distance matrix and a set of predicted labels for each observation).
Here, we introduce the H+, a discordance metric modified from G+. This metric can be used (1) to evaluate the discordance between two arbitrary sets or (2) to evaluate label fitness (clustering) for a generalized dissimilarity matrix.
At present, our package is available only via github installation
using the devtools package.
library(devtools)
install_github(repo="ntdyjack/fasthplus", ref = "main")
After installation, the package can be loaded into R
library(fasthplus)
The main functions in the fasthplus package are
hpe() and hpb().
The hpe() function accepts either (1) two sets
(A and B) or (2) a distance matrix
(D) and set of labels (L). With additional
arguments alg (algorithm choice) alphaand
gammas (see vignette). The hpb() function
accepts (1) a data matrix (D) and set of labels
(L).
To run the hpe() function with two sets (A
and B) and the number of p + 1
percentiles:
a <- rnorm(n=500,mean=0)
b <- rnorm(n=500,mean=1)
h <- hpe(A=a,B=b,p=101,alg=1)
To run the hpe() and hpb() with
D (dissimilarity or data respectively) and set of labels
(L):
# Two sets
a <- sapply(1:100, function(i) rnorm(n=50,mean=0.0,sd=1))
b <- sapply(1:100, function(i) rnorm(n=50,mean=0.0,sd=1))
x <- t(cbind(a,b))
# Create a set of labels
l <- c(rep(0,100),rep(1,100))
#hpb estimate
hpb(D=x,L=l,t=10,r=10)
# Calculate dissimilarity matrix
d <- dist(x)
#hpe estimate
hpe(D=d,L=l,p=251)
Please use https://github.com/ntdyjack/fasthplus/issues to submit issues, bug reports, and comments.