The missRanger
package uses the ranger
package to do fast missing value imputation by chained random forest. As
such, it serves as an alternative implementation of the beautiful
‘MissForest’ algorithm, see vignette.
missRanger
offers the option to combine random forest
imputation with predictive mean matching. This firstly avoids the
generation of values not present in the original data (like a value
0.3334 in a 0-1 coded variable). Secondly, this step tends to raise the
variance in the resulting conditional distributions to a realistic
level, a crucial element to apply multiple imputation frameworks.
From CRAN:
install.packages("missRanger")
Latest version from github:
library(devtools)
install_github("mayer79/missRanger", subdir = "release/missRanger")
We first generate a data set with about 10% missing values in each
column. Then those gaps are filled by missRanger
. In the
end, the resulting data frame is displayed.
library(missRanger)
# Generate data with missing values in all columns
<- generateNA(iris, seed = 347)
irisWithNA
# Impute missing values with missRanger
<- missRanger(irisWithNA, pmm.k = 3, num.trees = 100)
irisImputed
# Check results
head(irisImputed)
head(irisWithNA)
head(iris)
# With extra trees algorithm
<- missRanger(irisWithNA, pmm.k = 3, splitrule = "extratrees", num.trees = 100)
irisImputed_et
# With `dplyr` syntax
library(dplyr)
%>%
iris generateNA() %>%
missRanger(verbose = 0) %>%
head()