NOrMAL: Accurate Nucleosome Positioning using a Modified Gaussian Mixture Model

Anton Polishko*,Karine Le Roch**, Nadia Ponts** and Stefano Lonardi*

*Department of Computer Science and Engineering, University of California, Riverside CA 92521

**Department of Cell Biology and Neuroscience, University of California, Riverside CA 92521

Nucleosomes are the basic elements of DNA chromatin structure. They control the packaging of DNA and play a critical role in gene regulation by allowing physical access to transcription factors. The advent of second-generation sequencing has enabled landmark genome-wide studies of nucleosome position for several model organisms. Current methods to determine nucleosome positioning first compute an occupancy coverage profile by mapping nucleosome-enriched sequenced reads to a reference genome; then, nucleosomes are placed according to the peaks of the coverage profile. These methods are quite accurate on placing isolated nucleosomes, but they do not properly handle “overlapping” nucleosomes. Also, they can only provide the positions of nucleosomes and their occupancy level, while it is very beneficial to supply molecular biologists additional information about nucleosomes like the probability of placement, the size of DNA fragments enriched for nucleosomes, and/or whether nucleosome are well-positioned or “fuzzy” in the sequenced cell sample.

Results: We address these issues by providing a novel method based on a parametric probabilistic model. An expectation maximization (EM) algorithm is used to infer the parameters of the mixture of distributions.

Description

NOrMAL is a command line tool for accurate placing of the nucleosomes. It was designed to resolve overlapping nucleosomes and extract extra information ("fuzziness", probability, etc.) of nucleosome placement. To achieve this goal the tool clusters the input tags according to Nucleosome Model (see the paper for detailed description) using EM learning process.

The tool is written in C++. There are no special requirements except for g++ compiler and *nix environment to compile and use the tool. It was checked to compile using g++ compiler under Ubuntu 11.04 and Mac OS X 10.6

The software is freely available for academic use. The software is still in development and may contain bugs and not 100% bulletproof.

How to install?

  1. Download the latest source code here
  2. Unpack the archive to preferred location *folder*
  3. Compile using make command within the folder
  4. $> cd *folder*

    $> make

  5. To check the compiled executable evoke
  6. $> make test

    The tool will process the small test case and will produce test_results.txt file. It should take not more than ~20 sec.

NOrMAL in use

Input

As the input the executable has 3 input files: configuration, forward and reverse tags.

Configuration file consists of the algorithm parameters. The config.txt is provided, all the parameters are self-explanatory and could be adjusted to your needs.

The main input for the tool is the set of 5' end positions of the forward and reverse mapped nucleosome reads (tags). The tags should be specified in two separate files as the simple list of locations (numbers).

Usage

  1. Specify the parameters of the algorithm in *config_file* (check provided default "config.txt").
  2. Provide forward and reverse tags as the simple list of numbers in files *forward_tags* and *reverse_tags* respectively
  3. Run the tool using command
  4. ./NOrMAL *config_file* *forward_tags* *reverse_tags* *output_file*

  5. The output will be printed in *output_file* using next column format:
  6. <Position of the nucleosome center> <"Fuzziness"> <Nucleosome Size> <confidence score> <Forward votes> <Reverse votes>