CPMD Toolbox

This toolbox contains a set of routines for sovling change point discovery and motif discovery problems as well as several supporting routines for evaluation, test dataset generation etc.

CPMD toolbox


Copyright and terms of use

The code is made freely available for non-commercial uses only, provided that the copyright header in each file not be removed, and suitable citation(s) (see below) be made for papers published based on the code.

The code is not optimized for speed, and we are not responsible for any errors that might occur in the code.
Use of this code in teaching is allowed given that the authors are notified of the intention of this usage.
Please read the copyright information of each routine before using it.

Please do not directly distribute this code without an agreement from the authors.
The copyright of the code is retained by the authors. By downloading/using the code you agree to all the terms stated above.

Yasser Mohammad and Toyoaki Nishida, "CPMD: A MATLAB Toolbox for Change Point and Constraint Motif Discovery", 25th International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems (IEA/AIE) 2012
Copyright (c) 2011, Yasser Mohammad, Toyoaki Nishida. All rights reserved.

Please direct all questions or comments to yasser@ii.ist.i.kyoto-u.ac.jp, yasserm@aun.edu.eg, or yasserfarouk@gmail.com

The toolbox is available for researchers from the authors. Please contact yasser@ii.ist.i.kyoto-u.ac.jp


Usage Examples

Change Point Disocvery

Change point discovery (CPD) is an important knowledge discovery problem. The problem is simply to find the locations at which the underlying generating dynamics of a signal (timeseries) changes. This does not -- in many cases -- correspond to an easily discernable change in the signal itself.

CPD algorithms usually proceed in two steps: firstly, a score is assigned to every time step signifying the possibility of change at this point (it is not always a probability because it is not always normalized). We call this step scoring and the most general routing to achieve it in the toolbox is cpd().

Secondly, the scores are analyzed to localize change points. We call this step localization and routines that achieve this step are beginning with findLoc in the toolbox.

In many cases these two steps are separate and one can change one of them without affecting the other generating a new algorithm variation. In continuous inspection applications, the focus is usually put onto the localization step because accurate and fast localization is of utmost important. In some other applications (e.g. physiological signal analysis), the scoring step is the important one and may be the localization step can be removed all-together. Given this, it is important to be able to evaluate the performance of both the scoring and localization steps.

The Figure shows one time series with added zero mean gaussian noise and 1 P-P standard deviation and the results of applying the four available CPD routines on the toolbox to it. Even with this very high noise level, cpd() can still find some of the major changes in the signal.

Constrained Motif Discovery
Motif discovery is a related problem which can be informally defined as efficiently finding unknown recurring patterns in long time series with no prior knowledge about their locations. The research in this problem have led to many techniques including the PROJECTIONS algorithm, PERUSE, and Gemoda  among many others. With the exception of Gemoda which is quadratic in time and space complexities, these algorithms aim to achieve sub-quadratic time complexity by first looking for candidate motif stems using some heuristic method and then doing an exhaustive motif detection instead of motif discovery which is linear in time.

Constrained Motif Discovery relates the aforementioned two problems. In CMD, the goal is to find unknown recurrent patterns but utilizing a set of constraints on motif locations. One way to introduce these constraints when no prior knowledge is available is to use a CPD algorithm and then search for the motifs around the discovered change points. In this sense a CMD algorithm can be used in conjunction with ta CPD algorithm to solve the general motif discovery problem. CMD has applications in activity detection, gesture recognition , and Human-Robot Interaction, among other fields.

The toolbox provides both mcfull() and dgr() routines for solving this problem. The first is usually faster while the later provides more accurate detection of motif boundaries.
The Figure shows, an example where the location of a differential drive robot wandering around was collected over time and motifs are discovered in it using dgr().

Sample Applications

Physiological Signal Analysisphyisological

RSST was used in [1] to distinguish between natural and unnatural interaction behaviors by anaylzing the phsiological signals of the partner. The results shown in  the Figure reveal that RSST based features were more effective than traditional feature types for distinguishing the natural and unnatural conditions in a controlled experiments. These results reported in [2] received the best paper award from IEA/AIE 2009.

RSST was used in this work to find regions of fast changes in respiration, skin conductance, and blood volume pulse and this was shown to provide more accurate classification between the two conditions of the experiment compared with traditional statistics.

[1] Yasser Mohammad and Toyoaki Nishida, Using Physiological Signals to Detect Natural Interactive Behavior, Applied Intelligence, 13(1) 79-92 [original article | local copy]

[2] Yasser Mohammad and Toyoaki Nishida, Measuring Naturalness During Close Encounters Using Physiological Signal Processing, The Twenty Second International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems (IEA/AIE 2009), June 2009, Taiwan, pp. 281-290 [original article | local copy]

Learning Guided Navigation

In [3], we presented a novel algorithm for learning commands, actions and their associations in a human-robot interaction context. The proposed learning paradigm assumes that the robot can watch the operator instructing an expert actor (another robot or a human subject) and uses the proposed algorithm to discover the kinds of commands/gestures used by the operator, their occurrences, and their meaning in terms of the action primitives of the robot. A feasibility study was conducted using eight novice untrained human subjects to assess the applicability of the proposed algorithm. The evaluation experiment shows that the proposed algorithm can achieve its goals and build a probabilistic network of associations between action primitives and the gestures used by the operator after learning both of them. The algorithm can discover 83.3% of the gesture types and in average it finds 82% of the occurrences of discovered gestures. 100% of action types were discovered and the accuracy of detecting the discovered actions was 88.3%. Moreover the induced PN achieved 95.2% accuracy in predicting actor’s behavior.

Motif discovery in this experiment used dgr() and change point discovery used rsst().

[3] Yasser Mohammad, Toyoaki Nishida, and Shogo Okada, Unsupervised Simultaneous Learning of Gestures, Actions and their Associations for Human-Robot Interaction, IEEE IROS 2009 (October 11 to 15, St. Louis, MO, USA), pp. 2537-2544 [original article | local copy]

Other Applications: