Skip to contents

content

This Vignette contains an example to use PIC counting in ArchR workflow.

required libraries

Please make sure the following libraries are installed and loaded for the analysis.

Input files

Below are the step-by-step instructions for obtaining these input. Here we used the ArchR tutorial dataset to illustrate how to add PIC counting output to ArchR analysis. Please also refer to ArchR Vignettes for additional information of ArchR workflow

Please make sure to download the tutorial dataset by running getTutorialData() function:

Load Example Dataset and Creating Arrow Files

Here we used ArchR example dataset as

inputFiles <- getTutorialData("Hematopoiesis")
# inputFiles
addArchRGenome("hg19")
ArrowFiles <- createArrowFiles(
  inputFiles = inputFiles,
  sampleNames = names(inputFiles),
  minTSS = 4, # Dont set this too high because you can always increase later
  minFrags = 1000,
  addTileMat = FALSE,
  addGeneScoreMat = FALSE
)

Inferring doublet

We have this section as ArchR tutorial recommended. But this is option and please make sure you understand the process of doublet inferrence before executing this part

doubScores <- addDoubletScores(
  input = ArrowFiles,
  k = 10, # Refers to how many cells near a "pseudo-doublet" to count.
  knnMethod = "UMAP", # Refers to the embedding to use for nearest neighbor search.
  LSIMethod = 1
)

Create an ArchRproject

proj <- ArchRProject(
  ArrowFiles = ArrowFiles,
  outputDirectory = "HemeTutorial",
  copyArrows = TRUE # This is recommened so that you maintain an unaltered
  ## copy for later usage.
)

Peak calling with MACS2

Please make sure MACS2 is installed before running this. ArchR also supports other peak calling methods, please refer to ArchR manual for additional options.

pathToMacs2 <- findMacs2()
proj <- addReproduciblePeakSet(
  ArchRProj = proj,
  pathToMacs2 = pathToMacs2
)

Obtain PIC matrix from ArchRproject input

Here, we have three samples, we compute the PIC matrix for each of them separately.

pic_mat <- rep(list(), length = length(inputFiles))
for (s in seq_along(inputFiles)) {
  ## select one sample
  sample_sel <- names(inputFiles)[s]
  allCells <- rownames(getCellColData(proj)) ## all cells after filtering
  cells_sample <- allCells[startsWith(allCells, sample_sel)]
  cells_sample_name_trunc <- sub(".*\\#", "", cells_sample)

  peak_sets <- proj@peakSet ## the peak set, from MACS2 peak calling results
  names(peak_sets) <- NULL ## remove the peakset names
  fragment_tsv_gz_file_location <- inputFiles[s]

  pic_mat[[s]] <- PIC_counting(
    cells = cells_sample_name_trunc,
    fragment_tsv_gz_file_location = fragment_tsv_gz_file_location,
    peak_sets = peak_sets
  )

  ## change back the cell names to match with original input
  colnames(pic_mat[[s]]) <- cells_sample
}

## we can run the following codes to get the complete peak by cell matrix
## and we may save this matrix for tuture uses (e.g., if we hope to run
## other analyses)

# pic_mat_comb <- do.call(cbind, pic_mat)
# pic_mat_comb <- pic_mat_comb[,rownames(getCellColData(proj))]

Add PIC matrix to ArchRproject object

Here we iteratively add PIC matrix to the ArchR object

for (s in seq_along(inputFiles)) {
  ArchR:::.addMatToArrow(
    mat = pic_mat[[s]],
    ArrowFile = ArrowFiles[s],
    group = "PIC",
    binarize = FALSE
  )
}

Save and load ArchRProject

## to save object
proj <- saveArchRProject(ArchRProj = proj)

## to load object
proj <- loadArchRProject(path = "HemeTutorial")

Reference

If you used PIC-snATAC counting in your analysis, please cite our manuscript:

Miao Z and Kim J. Uniform quantification of single-nucleus ATAC-seq data with Paired- Insertion Counting (PIC) and a model-based insertion rate estimator. Nature Methods 2023 (In press)

ArchR manuscript:

Granja, Jeffrey M., et al. “ArchR is a scalable software package for integrative single-cell chromatin accessibility analysis.” Nature genetics 53.3 (2021): 403-411.