BASIC: B-cell receptor (BCR) assembly from single cell RNA-seq

BASIC is a semi-de novo assembly method to determine the full-length sequence of the BCR in single B cells from scRNA-seq data.

Download BASIC

Download: BASIC.tar.gz.
New Version (Nov, 5, 2017; see FAQ)!

Pre-requisites and Installation

Example data

To demonstrate the utility of our software, we subjected single B cells from a human donor to scRNA-seq.
Data can be downloaded from [E-MTAB-4745]
For the example below, the PW1_A1 cell data is also available locally from here.

BCR reconstruction

BASIC assembles BCR heavy and light chain sequences at single cell level.


Using the A1 cell as an example (see Example data). Open your Terminal and run:

$ python -b <path to Bowtie2> -SE A1_001.fastq.gz 

The heavy and light chain sequences will be present in result.txt.

Additional arguments


$ python -h 


usage: [-h] [-p CONSTANT_VALUE] [-n NAME] [-SE FASTQ]
                [-PE_1 LEFT] [-PE_2 RIGHT] [-g GENOME] [-b BOWTIE]
                [-o OUTPUT_LOCATION] [-v] [--version]

optional arguments:
  -h, --help          Show this help message and exit
  -p CONSTANT_VALUE   Launch p > 2 threads that will run on separate
                      processors/cores (default: 2)
  -n NAME             Name of output file (default: result)
  -SE FASTQ           Single end FASTQ file 
                      (example: se.fastq)
  -PE_1 LEFT          Paired end (left) FASTQ file
                      -PE_2 is required and pairs must match order
                      (example: pe_1.fastq)
  -PE_2 RIGHT         Paired end (right) FASTQ files
                      (example: pe_2.fastq)
  -g GENOME           hg19 or mm10 (default: hg19)
  -b BOWTIE           Absolute path to directory that contains the bowtie2 executable
  -o OUTPUT_LOCATION  Output dir (default: none -- current working directory)
  -v                  Turns on verbosity (more details)
  --version           Show BASIC version number and exit


Q1) Does de novo assembly in BASIC use paired-end information?
A1) BASIC does not currently use pairing information to guide de novo assembly.

Q2) How can I analyze multiple samples with BASIC simultaneously?
A2) Since most modern computers are multi-core machines, a simple bash script allows you to process multiple scRNA-seq samples with BASIC at the same time. For best performance, the total number of simultaneous BASIC instances should be ≈ (total number of cores)/2, since BASIC uses 2 cores by default.

Q3) Should I use single-end sequencing or paired-end sequencing?
A3) We recently analyzed data obtained from both single-end sequencing and paired-end sequencing. There is a clear advantage with paired-end reads as it allows for greater coverage of the BCR transcript. Sequencing coverage and expression of the BCR remains the primary determinants of successful BCR assembly.

Q4) How can I use pipeline-based read trimming tools (such as Trim Galore or Trimmomatic) with BASIC?
A4) BASIC can now assemble BCRs using reads of different lengths that arise from read trimming. We have corrected a bug in our software and released a new version of BASIC (1.2 beta) compatible with different read lengths. We thank the Fabio Lucian Lab for bringing the bug to our attention. Depending on the amount of sequencing noise present in your data, we recommend trying BASIC with and without read trimming to determine if read trimming is necessary.


Sanger sequencing of each single cell can be downloaded here.


1.2 Beta (Nov, 5, 2017)

1.0.1 (July, 15, 2016)


Canzar S.#, Neu KE.#, Wilson PC. and Khan AA. BASIC: BCR assembly from single cells. Submitted 2016. In review. (#Equal contribution)


Please contact: for any questions or comments.


Software provided to academic users under MIT License