History

News

Before 2023-11-26

Since v1.2.3, UB, instead of UR, is used as default UMI tag when barcodes are given.

The Too many open files issue has been fixed (since v1.2.0). The issue is commonly caused by exceeding the RLIMIT_NOFILE resource limit (i.e. the max number of files allowed to be opened by system for single process), which is typically 1024. Specifically, in the case of M input files and N threads, cellsnp-lite would open in total about M*N files. So the issue would more likely happen when large M or N is given. In order to fix it, cellsnp-lite would firstly try to increase the limit to the max possible value (which is typically 4096) and then use a fail-retry strategy to auto detect the most suitable number of threads (which could be smaller than the original nthreads specified by user).

The command line option --maxFLAG is now deprecated (since v1.0.0), please use --inclFLAG and --exclFLAG instead, which are more flexible for reads filtering. You could refer to the explain_flags page to easily aggregate and convert all flag bits into one integer. One example is that the default exclFLAG value (without using UMIs) is 1796, which is calculated by adding four flag bits: UNMAP (4), SECONDARY (256), QCFAIL (512) and DUP (1024).

Release Notes

Release v1.2.3 (23/02/2023)

  • use UB instead of UR as default UMI tag when barcodes are given.

  • add –maxPILEUP and update –maxDEPTH.

  • fix the segmentation fault of getopt_long when given unrecognized cmdline parameters.

  • allow more cmdline options in lower case.

  • print CMD, VERSION and global settings.

  • improve logging output.

  • update manual according to issues (till Feb 23, 2023).

Release v1.2.2 (03/11/2021)

  • add -f/–refseq so that the real (genomic) ref could be extracted from the fasta file for Mode 2 or for the missing REFs in the input VCF for Mode 1. Credit to issue 28.

  • add –maxDEPTH for read filtering.

  • version: add the version of htslib to the version string and output it by default for usage().

  • remove redundant comments and re-format codes.

Release v1.2.1 (07/09/2021)

  • implement infering ALT when REF is provided in vcf, which could avoid wrong genotypes (e.g., 1/1 -> 0/0; or 1/2 -> 0/1) caused by a lack of input REF.

  • fix a bug of double free csp_mplp_t in csp_mplp_prepare()

  • update with 2 modes instead of initial 3 modes.

  • compile: fix the inline issue without the -fgnu89-inline option.

  • compile: linking libhts.a has higher priority than linking libhts.so

  • compile: add configure using autotools.

  • add readthedoc.

  • add benchmark directory and scripts.

Release v1.2.0 (04/12/2020)

  • add -T cmdline option, which is similar with samtools/bcftools mpileup -T for the next position is accessed by streaming rather than index-jumping, which is different from -R. Similar with -R, cellsnp-lite -T would use REF & ALT of the input vcf.

  • fix a bug of not reseting thpool for too many open files issue.

Release v1.1.2 (02/12/2020)

  • fix the issue of too many open files by 1) increase the soft limit of rlimit; 2) let nthreads auto fit nfiles

Release v1.1.1 (28/11/2020)

  • use only single copy of hdr & idx in csp_bam_fs for fetch method

  • use only single copy of hdr (and no idx) for pileup method

  • merge nthread > 1 and nthread = 1 for fetch method

  • merge nthread > 1 and nthread = 1 for pileup method

Release v1.1.0 (26/11/2020)

  • split cellsnp_utils.h & general_utils.h & cellsnp.c into separate modules and scripts (small .h and .c files)

  • bgzip temporary files

Release v1.0.1 (17/11/2020)

  • fix a bug of m2 & k2 in csp_infer_allele() which could lead to error AD and error MAF calculation.

  • fix a bug of not allocating space for sample_id

  • add badges

Release v1.0.0 (15/10/2020)

  • add mode 2

  • replace –maxFLAG with –inclFLAG and –exclFLAG

  • always filter unmapped reads

Release v0.3.1 (22/07/2020)

  • turn off the PCR duplicate filtering by default (–maxFLAG), as it is not well flagged in CellRanger, hence may result in loss of a substantial fraction of SNPs.

Release v0.3.0 (05/06/2020)

  • fix a bug of using read.qqual and read.query_alignment_sequence in pileup_bases() and fetch_bases(), which could cause error when CIGAR string includes the ‘I’ (Insertion) op

  • fix a bug of repetitive UMIs existing in different cells when grouping and counting UMIs

  • the pos in the output vcf of mode 2 is switched from 0-based to 1-based

Release v0.1.8 (06/02/2020)

  • Mode 2 now supports pile up chromosomes for a single bulk sample

  • Mode 3 now supports multiple bam files in a list file

Release v0.1.7 (04/10/2019)

  • fix a bug when chromosome is not in the bam file

  • support barcodes.tsv.gz

  • liftOver supports bgzip compress

  • add vcf format to cellSNP.base.vcf.gz

Release v0.1.6 (14/07/2019)

  • support saving to sparse matrices: Please use -O for out directory instead of -o for VCF output only. Also, you can use sparseVCF.py to convert existing VCF.gz into sparse matrices

  • turn off breaking from warnings

  • change P_error with BQ ranges [0.25, 45]

  • h5py is not a required dependent package anymore

Release v0.1.5 (02/07/2019)

  • fix a bug in qual_vector when the Quality (Phred) scores is 0, i.e., ASCII Code “!”, and it will give a P_error as 1, hence fail with log transformation. Now, I limited the P_error to 0.9999.

Release v0.1.4 (24/06/2019)

  • use bgzip by default if bgzip is executable, otherwise use gzip

  • change GL to PL: Phred-scaled genotype likelihoods and move to after OTH tag

  • filter_reads is not in use anymore and the filtering is combined into fetch_bases or pileup_bases

  • slightly optimise the memory by not keeping all reads but only positions

Release v0.1.3 (12/06/2019)

  • Fix a minor bug for when loading unzipped vcf file.

Release v0.1.2 (10/06/2019)

  • turn off the defaul HDF5 file output, but keep it optional.

Release v0.1.1 (09/06/2019)

Release v0.1.0 (21/05/2019)

  • support the estimate the genotype and genotype likelihood for each cell. The GT is for 0/0, 1/0, 1/1, while the genotype likelihood is for 0/0, 1/0, 1/1, and 0/0+1/0, 1/1+1/0. The genotype estimate is based on the this paper (table 1; same as supp table S3 in Demuxlet paper): https://doi.org/10.1016/j.ajhg.2012.09.004

  • cell tag changed from CR to CB and the lane info is kept

  • pileup whole genome uses the same reads filtering as pile up positions

  • add test files (note, the bam file is 19G)

  • require pysam>=0.15.2 to get the qqual for each base call in the reads

Release v0.0.8 (21/12/2018)

  • update the default setting that UMItag is not in use in bulk RNA-seq, as UMI is cell specific in pseudo-bulk RNA-seq, hence better turn it UMI off by default

  • support output file in the same path of command line

  • support cram input file, besides bam/sam

  • update readme file, especially for processed common variants from 1000 genome project (https://sourceforge.net/projects/cellsnp/files/SNPlist/)

Release v0.0.7 (04/10/2018)

  • change the header of the VCF file to be more suitable for bcftools

  • realise the issue of heavy memory consuming, which even kills the jobs in cluster. The menory taken increase linearly to the number of processors used. When using 20 CUPs, >20G memory is recomended for >5K cells. Solution for higher memory efficiency will be proposed in future.

Release v0.0.6 (29/09/2018)

  • fix the bug in pileup a list of positions with pysam-fetch: input wrong REF and ALT bases.

  • support pileup a list of positions for multiple bulk samples

  • check liftOver works fine: the last part of the SNPs have matched REF in fasta file.

  • polish the printout log: label the three modes:

    • Mode 1: Pileup a list of positions for single cells (most common)

    • Mode 2: Pileup whole genome for single cells

    • Mode 3: Pileup a list of positions for (multiple) bulk sample(s)

Release v0.0.5 (24/09/2018)

  • pileup a list of positions with pysam-fetch, which may returns more reads than pysam-pileup. This feature requires further check

  • change vcf file header to be more compatible with bcftools

  • support turning cell-barcode off to return a sample level only

Release v0.0.4 (25/08/2018)

  • pileup the whole genome for 10x single-cell RNA-seq data

  • Note, post-filetering is needed as the current filtering doesn’t consider the heterozygous genotype for all donors.