Li Wei

        

       

    

Office:

1600 Amphitheatre Pkwy

Mountain View, CA, 94043

     

Email:

wli@cs.ucr.edu

    

URL:

www.cs.ucr.edu/~wli

    

    

    

Education

    

      

      

    

Ph.D., Computer Science (GPA: 4.0/4.0)

December 2006

    

University of California, Riverside

Advisor: Prof. Eamonn Keogh

 

    

      

 

    

M.S., Computer Science (GPA: 3.76/4.0)

June 2003

    

Fudan University, China

Advisor: Prof. Aoying Zhou

 

    

     

 

    

B.S., Computer Science (GPA: 3.61/4.0)

June 2000

    

Fudan University, China

 

     

     

   

Honors and Awards

    

 

 

    
  • Dean's Fellowship Award, University of California, Riverside, 2003 - 2005

    
  • Microsoft Fellow, 2002 (One of the 18 winners in Asia)

    
  •  Best Paper Award of National Database Conference, China, twice, in 2001 and 2002, respectively

    
  •  Intel Scholarship, 2001 (One of the 40 winners in China)

    
  •  First Class Graduate Scholarship, Fudan University, 2002

    
  •  Kodak Scholarship, 2001

    
  •  Honored Graduate, Fudan University, 2000

    
  •  Motorola Prize, twice, in 1997 and 1998, respectively

    
  •  People's Scholarship of Fudan University, each semester from 1996 to 2000

     

     

   

Working Experience

    

 

 

    

Software Engineer

 

    

Google Inc.

January 2007 - present

     
    

Graduate Student Researcher

 

    

Database Lab, University of California, Riverside

September 2003 - December 2006

    

Conducted research in the area of time series data mining and visualization.

- Query Filtering for Streaming Time Series

Defined time series query filtering, the problem of monitoring streaming time series for a set of predefined patterns. Proposed an envelope based lower bounding technique to allow monitoring at higher bandwidths, while maintaining a guarantee of no false dismissals.

- Semi-supervised Time Series Classification

Proposed a semi-supervised learning framework to build accurate time series classifiers when only a small set of labeled examples is available. The technique has been successfully applied to domains where the cost of collecting annotated data is high (e.g. handwriting indexing and heartbeat classification).

- Shape Indexing

Proposed a technique to support fast rotation-invariant search of large shape datasets with arbitrary representations and distance functions. On real world problems our technique is four orders of magnitude faster than existing approaches, and guarantees no false dismissals.

- Data Visualization

Introduced a novel visualization framework which replaces the standard file icons with icons that reflect the contents of the files and arranges them by their similarity/differences. This provides a greater possibility of unexpected and serendipitous discoveries.

    

 

 

    

Intern

 

    

AdSpam Team, Google Inc.

June 2006 - September 2006

    

- Signals Over Time Analysis

Used time series data mining techniques to analyze signals over time. Designed and implemented an anomaly detection model for time series signals. The model has been shown to be effective and is being integrated into current spam classifier to enhance its accuracy.

- Other Works

Other works include top ten signals analysis and premium publisher study.

    

 

 

    

Intern

 

    

Xerox Innovation Group, Xerox Corporation

June 2005 - September 2005

    

- Workflow Requirements Clustering

Proposed a compression-based dissimilarity measure to cluster questionnaire-formatted case logs generated by Xerox Workflow Configuration Tool. The clustering results are used to validate rules that generate workflows and make recommendations for new requirements.

- Printing Service Data Mining

Analyzed printing service data using different data mining techniques, including clustering, association rule mining, and sequential pattern mining. Discovered useful patterns to improve printing service.

    

 

 

    

Graduate Student Instructor

 

    

University of California, Riverside

September 2003 - June 2005

    

Taught undergraduate level classes, including Database Management Systems, Project in Computer Science, Introduction to Data Structures and Algorithms, and Theory of Automata and Formal Languages.

    

 

 

    

Graduate Student Researcher

 

    

Web Database & P2P Computing Lab, Fudan University, China

November 1999 - June 2003

    

Conducted research in the area of data mining; participated in several Chinese National high-tech projects; designed and implemented systems for categorical data outlier detection, exceptional association rule mining, data stream density estimation, topic exploration, data cleaning, etc.

 

      

      

    

Publications

    

 

 

    

Journal Papers

   
  • L. Wei, E. Keogh, X. Xi, S.-H. Lee. Supporting Anthropological Research with Efficient Rotation Invariant Shape Similarity Measure. Journal of the Royal Society Interface, to appear, 2006.

   
  • L. Wei, E. Keogh, H. Van Herle, A. Mafra-Neto, R. Abbott. Efficient Query Filtering for Streaming Time Series with Applications to Semi Supervised Learning of Time Series Classifiers. Knowledge and Information Systems Journal, to appear, 2006.

   
  • E. Keogh, S. Lonardi, C. A. Ratanamahatana, L. Wei, S.-H. Lee, J. Handley. Compression-based Data Mining of Sequential Data. The Data Mining and Knowledge Discovery Journal, to appear, 2006.

   
  • L. Wei, E. Keogh, X. Xi, S. Lonardi. Integrating Lite-Weight but Ubiquitous Data Mining into GUI Operating Systems. Journal of Universal Computer Science, Special Issue on Visual Data Mining, Editor: Jesus S. Aguilar-Ruiz, Volume 11/Issue 11. pp. 1820-1834, 2005.

   
  • L. Wei, X. Gong, W. Qian, A. Zhou. Finding Outliers in High-Dimensional Space. (in Chinese) Journal of Software, Vol.13 (2): 280-290, 2002.

   
  • X. Wang, H. Wu, L. Wei, A. Zhou. A Similarity-Based Model for Topic Distillation. International Journal of Computational Intelligence and Applications, Vol 2(3): 267-275, Imperial College Press/WSPC, 2002.

   
  • A. Zhou, L. Wei, F. Yu. Effective Discovery of Exception Class Association Rules. Journal of Computer Science and Technology, Vol. 17(3): 304-313, 2002.

   

 

 

   

Conference Papers

   
  • L. Wei, E. Keogh, X. Xi. SAXually Explicit Images: Finding Unusual Shapes. In Proc. of the 6th IEEE International Conference on Data Mining (ICDM 2006), to appear, 2006.

   
  • E. Keogh, L. Wei, X. Xi, S. Lonardi, J. Shieh, S. Sirowy. Intelligent Icons: Integrating Lite-Weight Data Mining and Visualization into GUI Operating Systems. In Proc. of the 6th IEEE International Conference on Data Mining (ICDM 2006), to appear, 2006.

   
  • L. Wei, J. Handley, N. Martin, T. Sun, E. Keogh. Clustering Workflow Requirements Using Compression Dissimilarity Measure. In Proc. of Ontology Mining and Knowledge Discovery from Semistructured Documents Workshop at IEEE/WIC/ACM ICDM (MDS 2006), to appear, 2006.

   
  • L. Wei and E. Keogh. Semi-Supervised Time Series Classification. In Proc. of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2006), pp. 748-753, 2006.

   
  • E. Keogh, L. Wei, X. Xi, S. Lee. LB_Keogh Supports Exact Indexing of Shapes under Rotation Invariance with Arbitrary Representations and Distance Measures. In Proc. of the 32nd International Conference on Very Large Data Bases (VLDB 2006), pp. 882-893, 2006.

   
  • X. Xi, E. Keogh, C. Shelton, L. Wei, C. A. Ratanamahatana. Fast Time Series Classification Using Numerosity Reduction. In Proc. of the 23rd International Conference on Machine Learning (ICML 2006), pp. 1033-1040, 2006.

   
  • L. Wei, E. Keogh, H. Van Herle, A. Mafra-Neto. Atomic Wedgie: Efficient Query Filtering for Streaming Time Series. In Proc. of the 5th IEEE International Conference on Data Mining (ICDM 2005), pp. 490-497, 2005.

   
  • L. Wei, N. Kumar, V. Lolla, E. Keogh, S. Lonardi, C. A. Ratanamahatana. Assumption-Free Anomaly Detection in Time Series. In Proc. of the 17th International Scientific and Statistical Database Management Conference (SSDBM 2005), pp. 237-240, 2005.

   
  • L. Wei, N. Kumar, V. Lolla, E. Keogh, S. Lonardi, C. A. Ratanamahatana, H. Van Herle. A Practical Tool for Visualizing and Data Mining Medical Time Series. In Proc. of the 18th IEEE International Symposium on Computer-Based Medical Systems (CBMS 2005), pp. 341-346, 2005.

   
  • N. Kumar, V. Lolla, E. Keogh, S. Lonardi, C. A. Ratanamahatana, L. Wei. Time-series Bitmaps: A Practical Visualization Tool for Working with Large Time Series Databases. In Proc. of the 2005 SIAM International Conference on Data Mining (SDM 2005), pp. 531-535, 2005.

   
  • L. Wei, W. Qian, A. Zhou, W. Jin. HOT: Hypergraph-based Outlier Test for Categorical Data. In Proc. of 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2003), pp. 399-410, 2003.

   
  • A. Zhou, Z. Cai, L. Wei, Weining Qian. M-Kernel Merging: Towards Density Estimation over Data Streams. In Proc. of 8th International Conference on Database Systems for Advanced Applications (DASFAA 2003), pp. 285-292, 2003.

   
  • W. Qian, H. Qian, L. Wei, Y. Wang, A. Zhou. Structure-based Query Expansion for XML Search Engine. In Proc. Of 11th International Conference of New Information Technology, pp. 235-242, 2001.

   

 

 

    

    

    

Patent

    

 

 

    
  • L. Wei, J. Handley, N. Martin. Recommendation System. US Patent (pending), January 2006.

     

     

    

Presentations

    

 

 

    
  • Efficient Query Filtering for Streaming Time Series, presented at the 5th IEEE International Conference on Data Mining, Houston, TX, November 2005.

    
  • Xerox Workflow Configuration Tool Case Log Clustering, presented at Xerox Innovation Group, Webster, NY, September 2005.

     

     

    

Professional Activities

    

 

 

    

Reviewer

    
  • ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2006

    
  • European Conference on Machine Learning (ECML), 2006

    
  • European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2006

    
  • International Symposium on Computer and Information Sciences (ISCIS), 2006

    
  • International Conference on Data Warehousing and Knowledge Discovery (DaWaK), 2006

    
  • International Conference on Advanced Data Mining and Application (ADMA), 2006.

    
  • New Generation Computing Journal, 2006

    
  • IEEE International Symposium on Computer-Based Medical Systems (CBMS), 2005, 2006.

    

 

 

    

Membership

    
  • Society for Industrial and Applied Mathematics (SIAM)

    
  • Computing Research Association - Women (CRA-W)

     

     

    

Computer Skills

    

 

 

    
  • Programming:

Matlab, C/C++, Java, Python, PHP, Perl, etc.

    
  • OS:

Windows, Linux

    
  • Database:

IBM DB2, MySQL, PostgreSQL

      

   

     

    

Certifications

    

 

 

    
  • IBM Certified Solutions Expert DB2 UDB V7.1 Database Administration for Linux, UNIX, Windows and OS/2

    
  • IBM Certified Specialist DB2 UDB V6.1/V7.1 User

     

     

References available upon request

   

   

   

Last updated 03/07/07 11:30:36 PM -0800.

    

Back to Li Wei's Homepage