BASIC: B-cell receptor (BCR) assembly from single cell RNA-seq

BASIC is a semi-de novo assembly method to determine the full-length sequence of the BCR in single B cells from scRNA-seq data.

Download BASIC

Download: BASIC.tar.gz.

Pre-requisites and Installation

Example data

To demonstrate the utility of our software, we subjected single B cells from a human donor to scRNA-seq.
Data can be downloaded from [E-MTAB-4745]
For the example below, the PW1_A1 cell data is also available locally from here.

BCR reconstruction

BASIC assembles BCR heavy and light chain sequences at single cell level.

Usage

Using the A1 cell as an example (see Example data). Open your Terminal and run:

$ python BASIC.py -b <path to Bowtie2> -SE A1_001.fastq.gz 

The heavy and light chain sequences will be present in result.txt.

Additional arguments

Run:

$ python BASIC.py -h 

Output:

usage: BASIC.py [-h] [-p CONSTANT_VALUE] [-n NAME] [-SE FASTQ]
                [-PE_1 LEFT] [-PE_2 RIGHT] [-g GENOME] [-b BOWTIE]
                [-o OUTPUT_LOCATION] [-v] [--version]

optional arguments:
  -h, --help          Show this help message and exit
  -p CONSTANT_VALUE   Launch p > 2 threads that will run on separate
                      processors/cores (default: 2)
  -n NAME             Name of output file (default: result)
  -SE FASTQ           Single end FASTQ file 
                      (example: se.fastq)
  -PE_1 LEFT          Paired end (left) FASTQ file
                      -PE_2 is required and pairs must match order
                      (example: pe_1.fastq)
  -PE_2 RIGHT         Paired end (right) FASTQ files
                      (example: pe_2.fastq)
  -g GENOME           hg19 or mm10 (default: hg19)
  -b BOWTIE           Absolute path to directory that contains the bowtie2 executable
  -o OUTPUT_LOCATION  Output dir (default: none -- current working directory)
  -v                  Turns on verbosity (more details)
  --version           Show BASIC version number and exit

FAQ

Q1) Does de novo assembly in BASIC use paired-end information?
A1) BASIC does not currently use pairing information to guide de novo assembly.

Q2) How can I analyze multiple samples with BASIC simultaneously?
A2) Since most modern computers are multi-core machines, a simple bash script allows you to process multiple scRNA-seq samples with BASIC at the same time. For best performance, the total number of simultaneous BASIC instances should be ≈ (total number of cores)/2, since BASIC uses 2 cores by default.

Q3) Should I use single-end sequencing or paired-end sequencing?
A3) We recently analyzed data obtained from both single-end sequencing and paired-end sequencing. There is a clear advantage with paired-end reads as it allows for greater coverage of the BCR transcript. Sequencing coverage and expression of the BCR remains the primary determinants of successful BCR assembly.

Evaluation

Sanger sequencing of each single cell can be downloaded here.

Version

1.0.1 (July, 15, 2016)

Publication

Canzar S.#, Neu KE.#, Wilson PC. and Khan AA. BASIC: BCR assembly from single cells. Submitted 2016. In review. (#Equal contribution)

Contact

Please contact: aakhan@ttic.edu for any questions or comments.

License

Software provided to academic users under MIT License