Qiang Zhu



Mining Historical Manuscripts with Local Color Patches (ICDM 2010)
  • Focused on local region-of-interest patches, other than the common global atomic images
  • Introduced a simple color measure which both addresses and exploits the rich color information of images in historical manuscripts
  • Proposed a novel and tight lower bound for color matching, to cheaply prune off unpromising candidatesto and enable the efficient mining of massive archives
  • Built several highler-level data mining tools including motif discovery and link analyses, beyond the fast similarity search
  • Demonstrated on extensive manuscripts dating back to the fifteenth century
  • Designed an image retrieval framework which utilizes both the existing web image search engine and our color distance measurement



Fast and Flexible Multivariate Time-Series Search (ICDM 2010, work with NASA Ames Research Center)
  • Conducted the first work of MTS subsequence matching supporting queries on any subset of variables with time-shifting between them
  • Designed a novel indexing method allowing fast search on very large datasets (the C-MAPSS and Conex datasets we indexed are much larger than any other datasets considered in the literature of time-series subsequence search)
  • Algorithms and codes will be used within multiple NASA projects, including the Integrated Vehicle Health Management project
MTS Search

Using CAPTCHAs to Index Cultural Artifacts (IDA 2010)
  • Proposed a novel language-independent CAPTCHA which considers inherently real-valued data (photographs of rock art) and expects real-valued responses (mouse movements)
  • The first real-valued-response CAPTCHA for crowdsourcing in data mining
  • Very easy for humans, but extremely hard or even impossible for current machines
  • Human efforts spent solving the CAPTCHAs (usually wasted) now can be utilized on another Human Computation project, which helps to extract useful data from incredibly heterogeneous and noisy datasets
  • Made indexing all the world’s rock art possible
Petroglyph CAPTCHA

Mining and Indexing of Petroglyphs (KDD 2009, CAA 2009, DMKD 2010)
  • Considered, for the first time, the problem of data mining large collections of rock art
  • Introduced an explicit framing of Generalized Hough Transform (GHT) as the similarity measure
  • Estimated a lower bound distance based on one dimensional signatures extracted from original data
  • Proposed algorithms which allow efficient and effective mining of rock art, e.g.: finding repeated motifs, clustering, and enabling query-by-content
  • Working on supporting rotation invariance and partial shape matching

lower bound

Exact Discovery of Time Series Motifs (SDM 2009, DMKD 2010)
  • Presented a tractable exact algorithm to find time series motifs, which are most similar pairs of time series
  • Faster than all current approximate algorithms and up to three orders of magnitude faster than the brute-force search in large datasets
  • Proposed a novel use of time series motifs in anytime classification
anytime classification

Interactive and Intelligent Searching of Biological Images
  • Focused on identification of nematodes, which are particularly difficult to identify and have direct and significant effect on humans
  • Constructed multiple graphs for nematodes based on different features and similarity functions
  • Built an interactive navigator (use guest/guest to login) which makes nematode identification a simple process of point and click
nemascope navigator

Surface Depth Hallucination
  • Implemented a perceptually validated model for depth hallucination
  • Enabled acquiring surface detail for texturing by a standard digital camera
  • Estimated a 3D model that can be viewed from any angle under any illumination using two single-view 2D images
  • Source code,  user manual, etc can be downloaded here 
demo of depth hallucination

Credit Evaluation of Banking Customers
  • Constructed a hybrid evaluation model based on clustering (SOM and K-means) and probabilistic neural network (PNN)
  • Performed better than simplex PNN and other 14 classic evaluation models on two standard datasets