This vignette introduces the PRECAST workflow for the analysis of integrating multiple spatial transcriptomics datasets. The workflow consists of three steps
We demonstrate the use of PRECAST to two sliced human breast cancer Visium data that are here, which can be downloaded to the current working path by the following command:
githubURL <- "https://github.com/feiyoung/PRECAST/blob/main/vignettes_data/bc2.rda?raw=true"
download.file(githubURL, "bc2.rda", mode = "wb")
Then load to R
This data is also available at 10X genomics data website:
Users require the two folders for each dataset: spatial and filtered_feature_bc_matrix. Then the data can be read by the following commond.
# library(DR.SC) dir.file <- 'Section' ## the folders Section1 and Section2, and each includes
# two folders spatial and filtered_feature_bc_matrix seuList <- list() for (r in 1:2) {
# message('r = ', r) seuList[[r]] <- read10XVisium(paste0(dir.file, r)) } bc2 <- seuList
The package can be loaded with the command:
View human breast cancer Visium data from DataPRECAST
Check the content in bc2
Human breast cancer data have been pre-processed and saved as the PRECASTObj
format.
## check the number of genes/features after filtering step
PRECASTObj@seulist
## Add adjacency matrix list for a PRECASTObj object to prepare for PRECAST model fitting.
PRECASTObj <- AddAdjList(PRECASTObj, platform = "Visium")
## Add a model setting in advance for a PRECASTObj object. verbose =TRUE helps outputing the
## information in the algorithm.
PRECASTObj <- AddParSetting(PRECASTObj, Sigma_equal = FALSE, verbose = TRUE, int.model = NULL)
For function PRECAST
, users can specify the number of clusters \(K\) or set K
to be an integer vector by using modified BIC(MBIC) to determine \(K\). First, we try using user-specified number of clusters. For convenience, we give the selected number of clusters by MBIC (K=14).
Select a best model
## backup the fitting results in resList
resList <- PRECASTObj@resList
PRECASTObj <- selectModel(PRECASTObj)
Integrate the two samples by the function IntegrateSpaData
.
seuInt <- IntegrateSpaData(PRECASTObj, species = "Human")
seuInt
## The low-dimensional embeddings obtained by PRECAST are saved in PRECAST reduction slot.
Show the spatial scatter plot for clusters
cols_cluster <- c("#FD7446", "#709AE1", "#31A354", "#9EDAE5", "#DE9ED6", "#BCBD22", "#CE6DBD", "#DADAEB",
"yellow", "#FF9896", "#91D1C2", "#C7E9C0", "#6B6ECF", "#7B4173")
p12 <- SpaPlot(seuInt, batch = NULL, point_size = 1, cols = cols_cluster, combine = TRUE)
p12
# users can plot each sample by setting combine=FALSE
Show the spatial UMAP/tNSE RGB plot to illustrate the performance in extracting features.
seuInt <- AddUMAP(seuInt)
p13 <- SpaPlot(seuInt, batch = NULL, item = "RGB_UMAP", point_size = 2, combine = TRUE, text_size = 15)
p13
# seuInt <- AddTSNE(seuInt) SpaPlot(seuInt, batch=NULL,item='RGB_TSNE',point_size=2,
# combine=T, text_size=15)
Show the tSNE plot based on the extracted features from PRECAST to check the performance of integration.
seuInt <- AddTSNE(seuInt, n_comp = 2)
library(patchwork)
p1 <- dimPlot(seuInt, item = "cluster", point_size = 0.5, font_family = "serif", cols = cols_cluster) # Times New Roman
p2 <- dimPlot(seuInt, item = "batch", point_size = 0.5, font_family = "serif")
p1 + p2
Combined differential expression analysis
library(Seurat)
dat_deg <- FindAllMarkers(seuInt)
library(dplyr)
n <- 10
dat_deg %>%
group_by(cluster) %>%
top_n(n = n, wt = avg_log2FC) -> top10
seuInt <- ScaleData(seuInt)
seus <- subset(seuInt, downsample = 400)
Plot DE genes’ heatmap for each spatial domains identified by PRECAST.
color_id <- as.numeric(levels(Idents(seus)))
library(ggplot2)
## HeatMap
p1 <- doHeatmap(seus, features = top10$gene, cell_label = "Domain", grp_label = F, grp_color = cols_cluster[color_id],
pt_size = 6, slot = "scale.data") + theme(legend.text = element_text(size = 10), legend.title = element_text(size = 13,
face = "bold"), axis.text.y = element_text(size = 5, face = "italic", family = "serif"))
p1