My research interests lie in the area of time series analysis and data mining. Specially, I am interested in the study of time series classification, pattern/abnormal detection and mining. And I also interested in applying time series analysis into image and video mining areas, making the classification faster, more accurate, and detecting interesting events.
Image Indexing and Mining
Eamonn Keogh, Li Wei, Xiaopeng Xi, Sang-Hee Lee and Michail Vlachos (2006). LB_Keogh Supports Exact Indexing of Shapes under Rotation Invariance with Arbitrary Representations and Distance Measures. In proc. of the 32nd International Conference on Very Large Data Bases (VLDB). [pdf], for more information, see here.
The matching of two-dimensional shapes is an important problem with applications in domains as diverse as biometrics, industry, medicine and anthropology. The distance measure used must be invariant to many distortions, including scale, offset, noise, partial occlusion, etc. Most of these distortions are relatively easy to handle, either in the representation of the data or in the similarity measure used. However rotation invariance seems to be uniquely difficult. Current approaches typically try to achieve rotation invariance in the representation of the data, at the expense of poor discrimination ability, or in the distance measure, at the expense of efficiency. We show that we can take the slow but accurate approaches and dramatically speed them up. On real world problems our technique can take current approaches and make them 4 orders of magnitude faster, without false dismissals. Moreover, our technique can be used with any of the dozens of existing shape representations and with all the most popular distance measures including Euclidean distance, Dynamic Time Warping and Longest Common Subsequence.
Fast Time Series Classification
Xiaopeng Xi, Eamonn Keogh, Christian Shelton, Li Wei, Chotirat Ann Ratanamahatana (2006). Fast Time Series Classification Using Numerosity Reduction. In proc. of the 23rd International Conference on Machine Learning (ICML). [pdf][presentation slides]
Many algorithms have been proposed for the problem of time series classification, however it is clear that one-nearest-neighbor with Dynamic Time Warping (DTW) distance is exceptionally difficult to beat. This approach has one weakness, however; it is computationally too demanding for many real-time applications. One way to mitigate this problem is to speed up the DTW calculations. Nonetheless, there is a limit to how much this can help. We propose an additional technique, numerosity reduction, to speed up one-nearest-neighbor DTW. While the idea of numerosity reduction for nearest-neighbor classifiers has a long history, we show here that we can leverage off an original observation about the relationship between dataset size and DTW constraints to produce an extremely compact dataset with little or no loss in accuracy.
For many real world problems we must perform classification under widely varying amounts of computational resources. For example, if asked to classify an instance taken from a bursty stream, we may have from milliseconds to minutes to return a class prediction. For such problems an anytime algorithm may be especially useful. We show how we can convert the ubiquitous nearest neighbor classifier into an anytime algorithm that can produce an instant classification, or if given the luxury of additional time, can utilize the extra time to increase classification accuracy. By pushing the worst exemplars to the end of dataset, we start anytime classification by examining the most representative exemplars in the early stages. Furthermore, for some specialized domains such as time series, we can further improve our algorithm with domain specific techniques.
Multiple Insects Tracking and Behavior Analysis
Although object recognition and tracking attracts many interests by researchers in computer vision, machine learning, data mining areas, tracking multiple fast moving targets and classify their species are still very difficult. Mosquitoes are small and fly in very fast speed, which make it difficult to obtain high resolution images to use its color, texture information in classification, and to track given that its trajectory can be highly non-linear and non-Gaussian. In addition, in the crowd environment of insects, such as many mosquitoes, crickets, ants, their interaction between each other may cause their paths changing dramatically, which make traditional particle filter tracking fail.
In our project, the goal is to tracking flying insects, classify their species, and analyze their behaviors.
This is the ongoing project, when I am working part time at ISCA technolegy, more sample videos are coming soon.
Time Series Visualization
Real Time Traffic Tracking and Monitoring
Xiaopeng Xi, Eamonn Keogh, Giang Ngo, Agenor Mafra-Neto (2006). Human Scene Understanding. Tech Report, with ISCA Technologies.
We discuss a novel technique to find usual behaviors in video surveillance streams. While there already exist several such systems, they all suffer from a fundamental weakness that prevents their widespread deployment. They all require many parameters to be set (or learned). With so many parameters to learn, overfitting becomes a major problem, and the systems are plagued with many false alarms. For example, a system that was set up and trained on a sunny day, may trigger an alarm simply because the sun has gone behind a cloud. Our solution to this problem is to introduce a novel definition unusual/novel/suspicious behavior, which only requires exactly one intuitive parameter. Our approach works by tracking the trajectories of individuals (or vehicles) in the frame of view, and finding trajectories that are different from previously observed trajectories. We call such trajectories discords. While the brute force algorithm to discover trajectory discords is quadratic in the number of observed trajectories, we show a simple algorithm that is 3 to 4 orders of magnitude faster than brute force, while guaranteed to produce identical results.
Intelligent Architectural Drawings Recognition and Reconstruction
In the collaboration with Hong Kong Youlee Holding Limited, we designed the commercial software VHRecQS that could automatically recognize architectural drawings, understand them intelligently and reconstruct the 3D models of buildings. According to demand analysis, we first divided the software into four modules: reading, recognition, reconstruction, and calculation. In reading module, the contents of .DWG or .DXF files (plan of building) are read in and converted into primitive graphical symbols. Recognition module takes responsibility for interpreting them to generate building components such as columns, beams and walls. Reconstruction module is used to get the 3D models of building. And the steel quantity is acquired in calculation module. I was responsible for part of recognition module and user interface design. For recognition module, I designed the algorithms, which made the result achieve high performance and accuracy. In user interface part, I implemented heuristic and self-adaptive interaction tools, which are flexible and intelligent. The software was developed on VC6.0 platform. This software has been widely used in construction area in Hong Kong and southern China.
Tong Lu, Xiaopeng Xi, Ming Rui, Shijie Cai (2005). Recognition and Understanding of Architectural Drawings: Model and Algorithm. Journal of Computer Research and Development, Vol.42(1):144-152, 2005 (in Chinese).
Xiaopeng Xi, Wanchun Dou, Tong Lu, Shijie Cai (2004). Research on Automated Recognizing and Interpreting Architectural Drawings. Proceedings of the First International Conference on Machine Learning and Cybernetics (IEEE). Beijing, China 4-5 Nov 2002, P1000-P1004.