R bindings for uchardet
library, that is the encoding detector library of Mozilla. It takes a
sequence of bytes in an unknown character encoding without any
additional information, and attempts to determine the encoding of the
text and returns encoding names in the iconv-compatible format.
Key features:
To install the package from the CRAN run the following command:
install.packages("uchardet", repos = "https://cloud.r-project.org/")
Also you could install the dev-version with the
install_gitlab()
function from the remotes
package:
::install_gitlab("artemklevtsov/uchardet@devel") remotes
This package contains the compiled code, therefore you have to use the Rtools to install it on Windows.
Installation from source requires uchardet
library and headers. On Linux or OSX the configure script try to find it
with pkg-config
or system include/library paths. You can
define include and library paths with UCHARDET_INCLUDES
and
UCHARDET_LIBS
configure variables.
If the uchardet
system library is not found it will be
compiled from source. You can force the compilation of the builtin
library with the --with-builtin-uchardet
configure
argument.
# load packages
library(uchardet)
# detect string encoding
<- "Hello, useR!"
ascii print(ascii)
#> [1] "Hello, useR!"
detect_str_enc(ascii)
#> [1] "ASCII"
<- "\u4e0b\u5348\u597d"
utf8 print(utf8)
#> [1] "下午好"
detect_str_enc(utf8)
#> [1] "UTF-8"
# detect raw vector encoding
detect_raw_enc(charToRaw(ascii))
#> [1] "ASCII"
detect_raw_enc(charToRaw(utf8))
#> [1] "UTF-8"
# detect file encoding
<- tempfile()
ascii_file writeLines(ascii, ascii_file)
detect_file_enc(ascii_file)
#> [1] "ASCII"
<- tempfile()
utf8_file writeLines(utf8, utf8_file)
detect_file_enc(utf8_file)
#> [1] "UTF-8"
Use the following command to go to the page for bug report submissions:
bug.report(package = "uchardet")
Before reporting a bug or submitting an issue, please do the following:
news(package = "uchardet", Version == packageVersion("uchardet"))
command;uchardet
package, not from other packages;Please attach traceback() and sessionInfo() output to bug report. It may save a lot of time.
The uchardet
package is distributed under GPLv2 license.