BASIC is a semi-de novo assembly method to determine the full-length sequence of the BCR in single B cells from scRNA-seq data.
To demonstrate the utility of our software, we subjected single B cells from a human donor to scRNA-seq.
Data can be downloaded from [E-MTAB-4745]
For the example below, the PW1_A1 cell data is also available locally from here.
BASIC assembles BCR heavy and light chain sequences at single cell level.
Using the A1 cell as an example (see Example data). Open your Terminal and run:
$ python BASIC.py -b <path to Bowtie2> -SE A1_001.fastq.gz
The heavy and light chain sequences will be present in result.txt.
$ python BASIC.py -h
usage: BASIC.py [-h] [-p CONSTANT_VALUE] [-n NAME] [-SE FASTQ] [-PE_1 LEFT] [-PE_2 RIGHT] [-g GENOME] [-b BOWTIE] [-o OUTPUT_LOCATION] [-v] [--version] optional arguments: -h, --help Show this help message and exit -p CONSTANT_VALUE Launch p > 2 threads that will run on separate processors/cores (default: 2) -n NAME Name of output file (default: result) -SE FASTQ Single end FASTQ file (example: se.fastq) -PE_1 LEFT Paired end (left) FASTQ file -PE_2 is required and pairs must match order (example: pe_1.fastq) -PE_2 RIGHT Paired end (right) FASTQ files (example: pe_2.fastq) -g GENOME hg19 or mm10 (default: hg19) -b BOWTIE Absolute path to directory that contains the bowtie2 executable -o OUTPUT_LOCATION Output dir (default: none -- current working directory) -v Turns on verbosity (more details) --version Show BASIC version number and exit
Q1) Does de novo assembly in BASIC use paired-end information?
A1) BASIC does not currently use pairing information to guide de novo assembly.
Q2) How can I analyze multiple samples with BASIC simultaneously?
A2) Since most modern computers are multi-core machines, a simple bash script allows you to process multiple scRNA-seq samples with BASIC at the same time. For best performance, the total number of simultaneous BASIC instances should be ≈ (total number of cores)/2, since BASIC uses 2 cores by default.
Q3) Should I use single-end sequencing or paired-end sequencing?
A3) We recently analyzed data obtained from both single-end sequencing and paired-end sequencing. There is a clear advantage with paired-end reads as it allows for greater coverage of the BCR transcript. Sequencing coverage and expression of the BCR remains the primary determinants of successful BCR assembly.
Sanger sequencing of each single cell can be downloaded here.
1.0.1 (July, 15, 2016)
Canzar S.#, Neu KE.#, Wilson PC. and Khan AA. BASIC: BCR assembly from single cells. Submitted 2016. In review. (#Equal contribution)
Please contact: firstname.lastname@example.org for any questions or comments.
Software provided to academic users under MIT License