Visual Exploration of Genomic Data
 

Michail Vlachos 1, Bahar Taneri 2, Eamonn Keogh 3, Philip S. Yu 1
 

1 IBM T.J. Watson Research Center, Hawthorne, NY

2 Scripps Institute of Oceanography, UCSD, CA

3 University of California, Riverside, CA

Homepage: http://www.cs.ucr.edu/~mvlachos/VizDNA

 

Paper in PDF format:

 

Abstract:

 

We address the topic of DNA sequence visualization. Given that humans are better at comparing and conceptualizing shapes, rather than text, we present visualization methods that capture the form and relationship of DNA nucleotide sequences. First, we illustrate a transformation of nucleotide sequences into numerical trajectories. The trajectory visually captures the nucleotide content of each sequence, allowing for quick and easy visualization. Then we project the relative placement of the trajectories on the 2D plane using a spanning-tree arrangement method, which allows for the efficient comparison of different sequences. We illustrate the potential of our technique for effective visualization in evolutionary biology.

 

 


Additional Material

 

Datasets (names as on the figures):

o   Dataset 1
Human, Chimpanzee, Pygmy chimpanzee, Baboon, Gorilla, Orangutan, Gibbon, Macaque

o   Dataset 2
Homo Sapiens, Pan paniscus, Pan troglodytes, Hippopotamus amphibius, Canis familiaris, Balaenoptera physalus, Balaenoptera musculus, Ursus maritimus, Ursus americanus, Loxodonta africana, Elephas maximum indicus


 

Use of a warping distance to quantify similarity between the DNA trajectories
Left: Comparison of Human and Chimpanzee DNA, Right: Comparison of Human and Bear DNA

 

 

Experiments on Evolutionary Biology:

Human and orangutan divergence took place approximately 11 million years ago. Whereas, gibbon and human divergence occurred approximately 15 million years ago [1]. According to the same source, gorilla divergence occurred about 6.5 million years ago and chimpanzee divergence took place about 5.5 million years ago. Similar findings can also be observed in the following figure, which is produced by our mapping technique.

 

 

Comparison between Homo sapiens and 7 other related species (dataset 1)

 

 

 


 

Comparison between Homo sapiens and various species (dataset 2)

 

 

Preliminary Experiments on Cancer Data:

(UniGene clusters in Hs.3.chr13p.5113: Hs.48554: RAP2A, member of RAS oncogene  family)

 

The DNA sequences visualized in this figure are extracted from HumanSDB3, a database for alternatively spliced genes for human transcriptome [2]. This database not only contains alternative splicing information, but also provides tissue information from which the transcripts are sequenced as well as information about the pathological conditions of the tissues. A large number of transcripts are in fact sequenced from cancerous tissues [3].

 



Evaluation of cancer state on human tissues.

 

The above figure shows clustering of two normal and two cancer transcripts sequenced from the RAP2A gene, a member of RAS oncogene family. The first normal transcript is sequenced from unaffected hippocampal cell and the second normal transcript is sequenced from unaffected placental cell. First cancer transcript is identified as carcinoid, however specific tissue is not indicated. Second cancer transcript is sequenced from parathyroid tumor cell. 

 

References:

[1] RL Stauffer, A Walker, OA Ryder, M Lyons-Weiler M, SB Hedges: Human and Ape Molecular Clocks and Constraints on Paleontological Hypotheses. The Journal of Heredity 2001:92(6):469-474.

[2] Bahar Taneri, Alexey Novoradovsky, Ben Snyder, Terry Gaasterland: Databases for Comparative Analysis of Human-Mouse Orthologous Alternative Splicing. Comparative Genomics 2004: 123-131

[3] Bahar Taneri, Ben Snyder, Terry Gaasterland: Pathological alternative splicing in cancer tissues. Tenth Annual International Conference on Research in Computational Molecular Biology. (RECOMB 2006) April 2006,  Venice, Italy.