Word Embedding Research Framework for Psychological Science.
An integrated toolbox of word embedding research that provides:
⚠️ All users should update the package to version ≥ 0.3.0. Old versions (≤ 0.2.0) may run slowly, and some old functions have been deprecated.
Han-Wu-Shuang (Bruce) Bao 包寒吴霜
Email: baohws@foxmail.com
Homepage: psychbruce.github.io
## Method 1: Install from CRAN
install.packages("PsychWordVec")
## Method 2: Install from GitHub
install.packages("devtools")
::install_github("psychbruce/PsychWordVec", force=TRUE) devtools
PsychWordVec
embed |
wordvec |
|
---|---|---|
Basic class | matrix | data.table |
Row size | vocabulary size | vocabulary size |
Column size | dimension size | 2 (variables: word , vec ) |
Advantage | faster (with matrix operation) | easier to inspect and manage |
Function to get | as_embed() |
as_wordvec() |
Function to load | load_embed() |
load_wordvec() |
PsychWordVec
as_embed()
: from wordvec
(data.table) to
embed
(matrix)as_wordvec()
: from embed
(matrix) to
wordvec
(data.table)load_embed()
: load word embeddings data as
embed
(matrix)load_wordvec()
: load word embeddings data as
wordvec
(data.table)data_transform()
: transform plain text word vectors to
wordvec
or embed
subset()
: extract a subset of wordvec
and
embed
normalize()
: normalize all word vectors to the unit
length 1get_wordvec()
sum_wordvec()
plot_wordvec()
plot_wordvec_tSNE()
: 2D or 3D visualization with
t-SNEorth_procrustes()
: Orthogonal Procrustes matrix
alignmentcosine_similarity()
: cos_sim()
or
cos_dist()
pair_similarity()
plot_similarity()
tab_similarity()
most_similar()
: find the Top-N most similar wordsplot_network()
: visualize a (partial correlation)
network graph of wordstest_WEAT()
: WEAT and SC-WEAT with permutation test of
significancetest_RND()
: RND with permutation test of
significancedict_expand()
: expand a dictionary from the most
similar wordsdict_reliability()
: reliability analysis and PCA of a
dictionarytokenize()
train_wordvec()
text_init()
: set up a Python environment for PLMtext_model_download()
: download PLMs from HuggingFace to local “.cache”
foldertext_model_remove()
: remove PLMs from local “.cache”
foldertext_to_vec()
: extract contextualized token and text
embeddingstext_unmask()
: fill in the blank mask(s) in a
querySee the documentation (help pages) for their usage and details.