DAME :
Disk-Aware Motif
Enumeration
This is a supporting page to our paper - A Disk-Aware Algorithm for Time Series Motif Discovery, by Abdullah Mueen, Eamonn Keogh and Nima Bigdely-Shamlo The paper is here. |
|
Code and Executables | |
We
have two
versions of DAME;
one for motif from a database and the other for
subsequence
motif. We also have the codes for three Brute Force motif finding
algorithms which are lot slower than DAME.
Click here to
download
the codes.
Codes are written in C and compiled in windows
environment. Please follow the instructions in the Readme.txt
in each subfolders for detailed information. Gunnar Waterstraat from Charité - Universitätsmedizin Berlin has made a linux version of the code which is available here. Thanks to Gunnar Waterstraat. |
|
Spreadsheet of Experimental Resluts | |
We
have compiled results of all the experiments in an spreadsheet.
In the subsequent sections, we will refer to sheets from this document.
Execution times may not be exactly the same as in the spreadsheet
because of the random referencing, but they should be very close and
representative for the claim of our paper. It is worth
mentioning some notes about the document.
|
|
Synthetic Data | |
Synthetic
data we use
is a large random walk dataset which is too large for downloads. So
we provide the following MATLAB code snippetwhich
reproduce
the dataset exactly. The dataset contains 20 files of
50,000
random walks each. The random walks are of length 1024. In the code
snippet, the files are named as "u1.txt", "u2.txt" and so on. Make sure
rootDir is set to your desired directory. Although saving the files in
ascii mode wastes space, DAME reads them only once and uses its own
temporary database in binary format, while running. |
randn('state',[362436069;521288629]);
rootDir = 'u'; for i=1:80 c = random_walk(1024,50000); save(sprintf('%s\\u%d.txt',rootDir,i),'c','-ascii'); end |
EEG Dataset | |
IC17
is the Independent
Component that we use in the Brain Activity experiment. It consists of
118 epochs. Each epoch contains 640 data points recorded at 256 Hz and span from -1000 to
1500 ms of Target presentation.
We vectorize these data into a single column and then look for 600 ms (153 data point) motifs. We run DAME
for five times successively and each time we separate out subsequences
that are inside the hypersphere of radius 2 from the current motif pair
after each run. For each of the motif clusters, we go back to the
original latencies of the 600 ms subsequences in the 118 epochs. The subsequences that are within the hyperspehere of radius 2 from the Motif 1 have a densed distribution around 100ms after target presentation (shown in the next plot). |
|
"Tiny Image" Dataset | |
In this experiment we use the
database of "80 million
tiny images" from
80 million tiny
images: a large dataset for non-parametric object and scene recognition
by A. Torralba, R. Fergus and W. T. Freeman, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.30(11), pp. 1958-1970, 2008. For the database please contact with the authors of the above paper. DAME finds 3,836,902 images which have at least one duplicate in the first 40 million images of the above database. DAME also finds 542,603 images that have at leat one near duplicate in the database with distance less than 0.1. We use tiny2ts.m and im2ts.m to convert the image database into databases of time series of length 768 points. Some representitive duplicates and motifs are presented here. For complete list of duplicates and near-duplicates please email us. |
|
EOG Dataset | |
We have collected an EOG trace from the Sleep
Heart Health Study Polysomnography Database. The full trace
is here. DAME finds
the following motif of 4.0 seconds length in this trace. The trace is overly sampled with 250Hz sample rate and over 8 million data points. We have downdsampled the trace to about 1 million points by a 8:1 sampling ratio. We have found the following motif using DAME which comes from times when the patient went back and forth between arousal and sleep stage1. |
|
Insect Dataset | |
In
this experiment, we have done conditional motif discovery. We have
collected two time series A
and B
for two different insects for a specific sensor setup which collects
data about insects behavior while sucking sap from leaves. We have run
DAME on the concatenated time series (AB) in such a way that
resulting motif pair contains subsequences from both A and B.
This
is an example of generalizing DAME for other definitions of motifs. The following figure also shows the motif in the original space. This demonstrates that z-normalization removed the offset errors before computing the Euclidean distance, and therefore, motif pair could be so different in their offset from 0. |
|
World Cup Dataset | |
We have collected the access
logs of the official
website for the 1998 FIFA world cup. We then extracted the number
of accesses that requested the "index.html" page in every minute and generated the minute-access log for "index.html" from the beginning to the end of the world
cup. We have run DAME on this log and found the motif pair shown on the right column. Here the motif length is more than 24 hours (1500 minutes). The days that poped up as motif are the last two days of the first round of the world cup when 8 teams, which had some chance to move to the next round, played their last match in the first round. |
|
This
page is created by - Abdullah Mueen Department of Computer Science and Engineering, University of California - Riverside. |