Vignette 1 -- PIC counting with 10X Cell Ranger output
vignette-1----PIC-counting-with-10X-Cell-Ranger-output.Rmd
content
This Vignette contains an example to use PIC counting to construct a Peak-by-Cell matrix on 10X Cell Ranger output.
required libraries
Please make sure the following libraries are installed and loaded for the analysis.
Input files
The following files are required input for PIC counting:
- cell barcodes with metadata (
singlecell.csv
) - list of peak regions (
peaks.bed
) - fragment files (
fragment.tsv.gz
)
Below are the step-by-step instructions for obtaining these input from 10X Cell Ranger output files. Example datasets can be downloaded from 10X Genomics website here.
Cell Barcodes
If we want to keep the cells from Cell Ranger filtering scheme:
meta.data <- read.csv("atac_pbmc_5k_nextgem_singlecell.csv", header = TRUE)
# -- Note, in your own Cell Ranger output, the file name will be 'singlecell.csv'
meta.data_filtered <- meta.data[meta.data$is__cell_barcode == 1, ]
cells <- meta.data_filtered$barcode
Alternatively, we can use the 'singlecell.csv'
file to
conduct customized filtering based on the total number of reads and TSS
enrichment, etc.
Peak set
If we want to keep the set of peaks from Cell Ranger:
peaks <- data.table::fread("atac_pbmc_5k_nextgem_peaks.bed", header = FALSE)
# -- in your own Cell Ranger output, the file name will be 'peaks.bed'
colnames(peaks) <- c("seqname", "start", "end")
peak_sets <- GenomicRanges::makeGRangesFromDataFrame(peaks)
Note that Cell Ranger tend to output wider peaks and the peak set is
not cell-type specific, so MACS2
is often used to call
peaks for each cell type separately before combining into one union set
of peaks. To use MACS2-called peaks, just change the file name
accordingly.
Run PIC and save output
Load the whole file (memory heavy)
fragment_tsv_gz_file_location <- "atac_pbmc_5k_nextgem_fragments.tsv.gz"
pic_mat <- PIC_counting(
cells = cells,
fragment_tsv_gz_file_location = fragment_tsv_gz_file_location,
peak_sets = peak_sets
)
saveRDS(pic_mat, "PIC_mat.rds")
Note if you have multiple matrices, please make sure you save them as different files.
Load the file dynamically via Rsamtools
Alternatively, for very large fragment files, we may want to load it
dynamically. This is simple with PIC_counting, we just need to specify
load_full = FALSE
and PIC_counting will load the file for
each chromosome separately.
fragment_tsv_gz_file_location <- "atac_pbmc_5k_nextgem_fragments.tsv.gz"
pic_mat <- PIC_counting(
cells = cells,
fragment_tsv_gz_file_location = fragment_tsv_gz_file_location,
peak_sets = peak_sets,
load_full = FALSE
)
saveRDS(pic_mat, "PIC_mat.rds")