library(STANCE)
## Warning: replacing previous import 'gaston::sigma' by 'stats::sigma' when
## loading 'STANCE'
library(Matrix)
library(ggplot2)
library(reshape2)
library(dplyr)
Mouse olfactory bulb replicate No.8 is a \(100 \mu m^2\) resolution spatial transcriptomics dataset of 15,928 genes measured on 262 spots. Following the procedure of Miller et al. (2022), data preprocessing and cell type deconvolution by STdeconvolve have been conducted, resulting in the mOB dataset, which is a list containing the gene expression, spot location coordinates and cell type compositions.
Specifically, counts contains the gene expression of 7,365 genes across 260 spots.
data(mOB)
mOB$counts[1:5,1:5]
## ACAACTATGGGTTGGCGG ACACAGATCCTGTTCTGA ACATCACCTGCGCGCTCT
## 0610007P14Rik 1 3 3
## 0610009B22Rik 0 0 0
## 0610009D07Rik 1 2 5
## 0610009O20Rik 0 2 2
## 0610010F05Rik 0 0 1
## ACATTTAAGGCGCATGAT ACCACTGTAATCTCCCAT
## 0610007P14Rik 4 1
## 0610009B22Rik 1 0
## 0610009D07Rik 5 1
## 0610009O20Rik 1 1
## 0610010F05Rik 2 0
pos is the spatial location coordinates of the 260 spots.
mOB$pos[1:5,]
## x y
## ACAACTATGGGTTGGCGG 16.001 16.036
## ACACAGATCCTGTTCTGA 26.073 15.042
## ACATCACCTGCGCGCTCT 13.048 21.079
## ACATTTAAGGCGCATGAT 13.963 18.117
## ACCACTGTAATCTCCCAT 24.104 13.074
prop is cell type proportion of 12 cell types across the 260 spots.
mOB$prop[1:5,]
## 1 2 3 4 5
## ACAACTATGGGTTGGCGG 0.11641560 0.0000000 0.08669002 0.0000000 0.0000000
## ACACAGATCCTGTTCTGA 0.00000000 0.1561022 0.00000000 0.0000000 0.0000000
## ACATCACCTGCGCGCTCT 0.09625801 0.0000000 0.00000000 0.3728770 0.1578947
## ACATTTAAGGCGCATGAT 0.11485435 0.0000000 0.00000000 0.7181721 0.1669736
## ACCACTGTAATCTCCCAT 0.07077041 0.0000000 0.00000000 0.0000000 0.0914882
## 6 7 8 9 10
## ACAACTATGGGTTGGCGG 0.0000000 0.1204314 0.13020670 0.09144814 0.1808773
## ACACAGATCCTGTTCTGA 0.0000000 0.0000000 0.08471756 0.00000000 0.0000000
## ACATCACCTGCGCGCTCT 0.1174574 0.0000000 0.00000000 0.25551295 0.0000000
## ACATTTAAGGCGCATGAT 0.0000000 0.0000000 0.00000000 0.00000000 0.0000000
## ACCACTGTAATCTCCCAT 0.0000000 0.0000000 0.23230714 0.00000000 0.2202639
## 11 12
## ACAACTATGGGTTGGCGG 0.2739309 0.0000000
## ACACAGATCCTGTTCTGA 0.0000000 0.7591802
## ACATCACCTGCGCGCTCT 0.0000000 0.0000000
## ACATTTAAGGCGCATGAT 0.0000000 0.0000000
## ACCACTGTAATCTCCCAT 0.3256000 0.0595703
To run STANCE, we need to first create a STANCE object using
creatSTANCEobject function, which requires 4 inputs: 1.
counts: gene expression matrix with genes in rows and
spots in columns; 2. pos: dimensional location
coordinates matrix. The rownames of pos matrix should
match the colnames of counts matrix; 3.
prop: he cell type proportion of each spot. Rows
represent spots and columns represent cell types. The rownames of
prop matrix should match the colnames of
counts matrix; 4. covariates (default
NULL) the covariate (if any) design matrix modeling gene expression. The
rownames of covariates matrix should match the colnames
of counts matrix.
All the inputs should be convertible to ‘matrix’ form.
mySTANCE <- creatSTANCEobject(counts = mOB$counts,
pos = mOB$pos,
prop = mOB$prop,
covariates = NULL)
Normalize gene expression count data, scale location coordinates and
perform quality conrol steps, including removing spots with gene
expression less than \(10\), and
removing genes whose proportion of zero-express spots greater than \(0.05\) by default.
normalized = TRUE is used to specify that the gene
expression input has been normalized so as to skip the normalization
step.
mySTANCE <- data_preprocess(object = mySTANCE,
gene.threshold = 0.05, spot.threshold = 10,
normalized = FALSE)
## The location matrix has been scaled.
## 0 spots with gene expression less than 10 have been removed.
## 0 mt genes and the genes with gene expression rate less than 0.05 have been removed.
## Normalizing the count data...
Build the Gaussian kernel matrix to model spatial correlation between spots. The Gaussian kernel bandwidth will be selected automatically: - for datasets with 5000 spots or fewer, non-parametric Sheather-Jones’ bandwidths are computed for each gene, and the median of these values is used as the common bandwidth; - for datasets with more than 5000 spots, Silverman’s bandwidths are computed for each gene, and the median of these gene-specific bandwidths is used as the common bandwidth.
This step also constructs the important matrices, including design matrices \(\mathbf{X}\), covariance matrices (i.e. \(\mathbf{K}\) for spatial effects \(\gamma_k\) and \(\mathbf{I}\) for residual error \(\varepsilon\)) and \(\mathbf{\Sigma}_k = \mathbf{\Pi}_k \mathbf{K} \mathbf{\Pi}_k^T\) matrices for all the \(\gamma_k\)’s, before fitting STANCE model.
mySTANCE <- build_kernelMatrix(object = mySTANCE)
## Building kernel matrix...
## Computing covariance matrices...
Here, for efficiency, we skip the bias correction procedure
(correction = F). Use “Benjamini-Yekutieli” method to
adjust p-values to control false discovery rate
(pv.adjust = "BY").
mySTANCE <- runTest1(object = mySTANCE,
correction = F, pv.adjust = "BY")
## Start running STANCE Stage 1 test...
The function outputs a data frame containing the test statistic values, the associated p-values as well as the “BY” adjusted p-values.
head(mySTANCE@Test_1)
## statistic p_value p_value_adj
## 0610007P14Rik 697.0190 0.0730441 1
## 0610009B22Rik 643.3463 0.5322611 1
## 0610009D07Rik 339.1009 0.6041702 1
## 0610009O20Rik 890.6290 0.2941044 1
## 0610010F05Rik 705.0588 0.5180449 1
## 0610010K14Rik 694.9316 0.1702137 1
# List for all the genes in the experiment
genes.list <- rownames(mySTANCE@gene_expression)
# List for utSVGs detected by STANCE overall test
utSVG.list <- genes.list[mySTANCE@Test_1$p_value_adj < 0.05]
utSVG.list[171:180]
## [1] "Cpe" "Cplx1" "Cplx2" "Cpne2" "Cpne4" "Cpne6" "Crim1" "Crip2" "Crtc1"
## [10] "Cryab"
Perform STANCE individual test on each of genes passing the overall
test (we called these genes “utSVGs”).
Cell_types_to_test = NULL indicates that all the cell types
would be tested. Genes_to_test = utSVG.list[1:100]
specifies that the first 100 significant genes detected by overall test
will be tested.
mySTANCE <- runTest2(object = mySTANCE,
Genes_to_test = utSVG.list[101:200],
Cell_types_to_test = NULL,
correction = F,
ncores = 1)
## Start running STANCE Stage 2 test...
The function outputs a list containing a data frame of test statistic values, as well as the associated p-values for each cell type.
# Individual testing results for cell type "1"
head(mySTANCE@Test_2[["1"]])
## statistic p_value
## Cabin1 32.189117 0.006338967
## Cabp1 8.277671 0.720506455
## Cacnb3 5.408803 0.627886687
## Cacnb4 4.204393 0.796376056
## Cacng3 8.888090 0.828823859
## Cadm2 9.944913 0.437784111
# Individual testing results for cell type "2"
head(mySTANCE@Test_2[["2"]])
## statistic p_value
## Cabin1 34.27551 0.5671753
## Cabp1 37.02897 0.5966591
## Cacnb3 27.48364 0.4167491
## Cacnb4 43.96634 0.1495813
## Cacng3 82.10546 0.2512983
## Cadm2 12.18509 0.9038280
Genes_to_test = genes.sig.1[1:100] specifies that the
first 100 significant genes detected by overall test will be
estimated.
mySTANCE <- varcomp_est(object = mySTANCE,
Genes_to_est = utSVG.list[101:200],
ncores = 1)
head(mySTANCE@VarComp_estimates)
## 1 2 3 4 5 6
## Cabin1 0.2707131 0.00000100 0.03740342 0.06001812 0.0000010 0.04635161
## Cabp1 0.0000010 0.00000100 0.85834618 0.10474024 0.2698156 0.02106626
## Cacnb3 0.0000010 0.00000100 0.07156818 0.08217742 0.7436404 0.15448227
## Cacnb4 0.0000010 0.03835654 0.93057709 0.00000100 0.1765448 0.09052980
## Cacng3 0.0000010 0.03935319 0.00000100 0.10606986 0.0000010 0.13619183
## Cadm2 0.0000010 0.00000100 0.00000100 0.09824489 0.1126940 0.17077836
## 7 8 9 10 11 12
## Cabin1 0.0000010 0.000001000 0.00000100 0.000001000 0.00000100 0.02082207
## Cabp1 0.0000010 0.009390651 0.03807576 1.128242690 0.31302512 0.02143620
## Cacnb3 0.0000010 0.000001000 0.11339463 0.006910621 0.36386536 0.00000100
## Cacnb4 0.0000010 0.000001000 0.00000100 0.877788517 1.11742047 0.00000100
## Cacng3 0.0000010 0.000001000 0.00000100 0.000001000 0.03674453 0.00000100
## Cadm2 0.1226651 0.043506983 0.13440258 0.000001000 0.00000100 0.08973406
## Error_residuals
## Cabin1 0.06559035
## Cabp1 0.05293168
## Cacnb3 0.10684810
## Cacnb4 0.11050081
## Cacng3 0.04797411
## Cadm2 0.06939496
Given the significance level 0.05, pick out top 20 ctSVGs (if less
than 20, then all the ctSVGs will be displayed) for cell type of
interest (CT_of_interest = "1").
mySTANCE <- CT_topGenes(object = mySTANCE,
CT_of_interest = "1", numTopGenes = 20)
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
# Print the list of top genes for cell type "1"
cat(paste0('The top cell type specific SVGs for cell type \"',
mySTANCE@cell_type_top_genes$cell_type_of_interest,
'\": \n',
paste(mySTANCE@cell_type_top_genes$top_genes, collapse = ","),
'. \n'))
## The top cell type specific SVGs for cell type "1":
## Cpne2,Cabin1,Dab2,Crip2,Chst8,Cbx6,Cxx1b,D4Wsu53e,Cdk17,Clstn2.
Draw a stacked bar plot for top genes visualizing the proportion of variance explained by all the cell types and error.
print(mySTANCE@cell_type_top_genes$stacked_bar_plot)
For a given gene, its expression levels across spatial locations
within the tissue can be visualized, providing an intuitive
representation of spatial gene expression patterns. Here, we employ the
visualizeGenePattern function to display the spatial
expression pattern of “Cabin1,” which was identified as a
cell-type-1-specific ctSVG by the STANCE Stage 2 test results.
When setting scaled_gene_expression = TRUE, the gene
expression \(y_i\) at the \(i\)-th spot is scaled using the formula:
\[y^{\text{scaled}}_i = \frac{y_i -
\min(\mathbf{y})}{\max(\mathbf{y}) - \min(\mathbf{y})}\] where
\(\mathbf{y}\) represents the vector of
gene expression values across all spots. This scaling ensures the
expression values are normalized to a range between 0 and 1,
facilitating comparison and visualization.
visualizeGenePattern(object = mySTANCE, gene_to_display = "Cabin1",
filling_colors = c("white", "darkblue"),
scaled_gene_expression = TRUE,
point_size = 4, title_size = 16, legend_title_size = 14, legend_text_size = 10)