Vignette 5 -- DAR analysis with PIC parametric framework
vignette-5----DAR-analysis-with-PIC-parametric-framework.Rmd
Content
This Vignette contains an example to conduct DAR analysis with PIC parametric framework. We developed a statistical framework to test the differential accessible regions (DARs) between cell types. Our framework incorporated the distribution of snATAC-seq count data (PIC count, can also run on fragment count), and will have a higher power in detecting DARs in snATAC-seq data
Required libraries
Please make sure the following libraries are installed and loaded for the analysis.
Input files
The PIC test framework requires the following input:
- peak by cell PIC count matrix (
pic_mat
) - cell type labels (
cell_types
)
Note, please refer to our vignettes for obtaining
pic_mat
, and clustering and annotation can be conducted
with Seurat, ArchR, or other pipelines. Below, we describe the two steps
of the test
Cell-specific capturing rate estimation
As described in the manuscript, in the snATAC-seq data, each cell may have largely variant capturing rate (sequencing depth), and here, we will compute the caputuring rate in each cell while separating the effect of cell type-specific open probability
r_by_ct_out <- get_r_by_ct_mat_pq(
cell_type_set = unique(cell_types),
r_by_c = pic_mat,
cell_type_labels = cell_types,
n_features_per_cell = dim(pic_mat)[1]
)
The object r_by_ct_out
is a list with two elements,
- p_by_t Peak by cell type matrix, each element represents the open probability of the peak in the corresponding cell type
- q_vec A vector of cell-specific capturing rate
The second element will be the input of the next function
DAR test
With the cell-specific caputuring rate, we can now execute our DAR test function
p_vals <- DAR_by_LRT(pic_mat = pic_mat, capturing_rates = r_by_ct_out$q_vec, cell_type_labels = cell_types)
The output is a p value vector, specifying the p value for each peak. In addition, you can calculate the mean log fold change after adjusting for the capturing rate to add additional criteria for determining the set of DAR. The example code is provided below:
cell_type_set <- unique(cell_types)
capturing_rates <- r_by_ct_out$q_vec
## mean accessibility values after correcting for sequencing depth
rm1 <- rowSums(pic_mat[, cell_types == cell_type_set[1]] %*%
diag(1 / capturing_rates[cell_types == cell_type_set[1]])) /
sum(cell_types == cell_type_set[1])
rm2 <- rowSums(pic_mat[, cell_types == cell_type_set[2]] %*%
diag(1 / capturing_rates[cell_types == cell_type_set[2]])) /
sum(cell_types == cell_type_set[2])
## sequencing depth corrected log fold change
log_fc_depth_corrected <- log(rm2 + 0.0001) - log(rm1 + 0.0001) # 0.0001 is
## added to prevent log(0)
## in situations sequencing depth disparity is not a concern, we can also
## use the uncorrected mean accessibility value, the codes are
rm1_unc <- rowMeans(pic_mat[, cell_types == cell_type_set[1]])
rm2_unc <- rowMeans(pic_mat[, cell_types == cell_type_set[2]])
log_fc_uncorrected <- log(rm2_unc + 0.0001) - log(rm1_unc + 0.0001) # 0.0001
## is added to prevent log(0)