PuFFIN - A Parameter-free Method to Build Genome-wide Nucleosome Maps from Paired-end Sequencing Data

Anton Polishko*, Evelien M. Bunnik**, Karine Le Roch** and Stefano Lonardi*

*Department of Computer Science and Engineering, University of California, Riverside CA 92521

**Department of Cell Biology and Neuroscience, University of California, Riverside CA 92521

We introduce a novel method, called PuFFIN, to build genome-wide nucleosome maps specifically designed to take advantage of paired-end reads. The availability of paired-end reads enables our method to produce a higher number of detected nucleosomes. In contrast to other approaches that require users to optimize several parameters according to their data (e.g., the maximum allowed nucleosome overlap or legal ranges for the fragment sizes) our method can accurately determine a genome-wide set of non-overlapping nucleosomes without any user-defined parameter. This feature makes PuFFIN significantly easier to use and prevents users from choosing ``bad'' parameters and obtain suboptimal nucleosome maps. Here for the first time, we frame the problem of determining genome-wide nucleosome locations in a multi-scale (or multi-resolution) framework. Our algorithm builds a set of nucleosome "landscape functions" at different resolution level, in which each function represents the likelihood of a genomic location to be occupied by a nucleosome. After a set of candidate nucleosomes is computed for each function, our method produces a consensus set that satisfies non-overlapping constraints and maximizes the number of nucleosomes.

Results: We report comprehensive experimental results that compares PuFFIN with recently published tools (NSeq, NPS, NOrMAL and Template Filtering) on real datasets for S. cerevisiae, P. falciparum. Experimental results show that our approach is able to detect more non-overlapping nucleosomes than other available tools.

Description

PuFFIN is a command line tool for accurate placing of the nucleosomes based on the pair-end reads. It was designed to place non-overlapping nucleosomes using extra lenght information present in pair-end data-sets.

The tool is written in python. There are no special requirements except for python2.7+.

The software is freely available for academic use. The software is still in development and may contain bugs and not 100% bulletproof.

How to install?

  1. Download the latest source code here
  2. Unpack the archive to preferred location *folder*
  3. $> cd *folder*

PuFFIN in use

Input

The input file for the tool should contain the reads only for considered chromosome (contig) and is in a BED format obtained by simply parsing the BAM/SAM file (see bam2bedpe.sh for example).

Usage

Given that input reads are in input.bam, that contain only reads for particular chromosome
  1. ./bam2bedpe.sh input.bam > input.bed
  2. python Run.py input.bed
  3. The output will be printed in input.bed.nucs using next column format:
  4. <Position of the nucleosome center> <width of the peak> <confidence score> <"Fuzziness"> <Level of the curve that was used to detect nucleosome >

Downloads

puffin.zip