 Course CS234  Computational model for Biomolecular Data Project progress Oct  02 - Oct 13 Decide the project and prepare Oct  14 - Oct 20 Project:Motif discovery.  Task : Implement the "random projection" algorithm for motif finding described in Tompa and Buhler, Finding Motifs Using Random Projections, Proc. RECOMB, 67-74, 2001 (also described in the slides). Run the program and collect experimental data. How would you improve its performance?  My present process: Reading the paper - it has 37 pages ! Oct  21 - Oct 27 Reading Paper  Planted(l,d)-Motif  Problem Definition: Suppose there is a fixed but unknown nucleotide sequence M (the motif) of length l. The problem is to determine M, given t nucleotide sequences each of length n, and each containing a planted variant of M. More Precisely, each such planted variant is a substring that is M with exactly d point substitutions. Oct  28 - Nov  3 Reading Paper　 The projection algorithm: performs a number of independent trials of a basic iterant. In each such trial, it chooses a random projection h and hashes each l-mer x in the input sequences to its bucket h(x). Any hash bucket with sufficiently many entries is explored as a source of the planted motif, using a series of refinement steps.  Viewing  x as a point in an  l-dimensional Hamming space, h(x) is the projection of  x onto a k-dimensional subspace. If  M is the unknown plated motif, we will call the bucket with hash value h(M) the planted bucket. The fundamental intuition underlying PROJECTION is that, if k