# ICPRAM 2016 Abstracts

## Conference

## ICPRAM 2016

## Area 1 - Theory and Methods

Full PapersPaper Nr: | 5 |

Title: | ## On the Computation of the Number of Bubbles and Tunnels of a 3-D Binary Object |

Authors: | ## Humberto Sossa |

Abstract: | Two formulations and two general procedures useful to compute the number of bubbles and tunnels of 3-D binary objects are introduced in this paper. The first formulation is useful to compute the number of bubbles or voids of the object, while the second one is only useful to compute the number of tunnels or holes of the object. The first procedure allows obtaining, in two steps, the number of bubbles and tunnels of a 3-D object. Finally, the second procedure permits computing, in three steps, the number of bubbles and tunnels of several 3-D objects from an image containing them. Correctness of the functioning of the two formulations is established theoretically. Results with a set of images are provided to demonstrate the utility and validity of the second proposed procedures. |

Paper Nr: | 6 |

Title: | ## Hadamard Code Graph Kernels for Classifying Graphs |

Authors: | ## Tetsuya Kataoka and Akihiro Inokuchi |

Abstract: | Kernel methods such as Support Vector Machines (SVMs) are becoming increasingly popular because of their high performance on graph classification problems. In this paper, we propose two novel graph kernels called the Hadamard Code Kernel (HCK) and the Shortened HCK (SHCK). These kernels are based on the Hadamard code, which is used in spread spectrum-based communication technologies to spread message signals. The proposed graph kernels are equivalent to the Neighborhood Hash Kernel (NHK), one of the fastest graph kernels, and comparable to the Weisfeiler-Lehman Subtree Kernel (WLSK), one of the most accurate graph kernels. The fundamental performance and practicality of the proposed graph kernels are evaluated using three real-world datasets. |

Paper Nr: | 9 |

Title: | ## Interval Coded Scoring Index with Interaction Effects - A Sensitivity Study |

Authors: | ## Lieven Billiet |

Abstract: | Scoring systems have been used since long in medical practice, but often they are based on experience rather than a structural approach. In literature, the interval coded scoring index (ICS) has been introduced as an alternative. It derives a scoring system from data using optimization techniques. This work discusses an extension, ICS*, that takes variable interactions into account. Furthermore, a study is performed to give insight into the new model’s sensitivity to noise, the size of the data set and the number of non-informative variables. The study shows interactions can mostly be discovered robustly, even in the presence of noise and spurious variables. A final validation on two UCI data sets further indicates the quality of the approach. |

Paper Nr: | 37 |

Title: | ## Assessing the Number of Clusters in a Mixture Model with Side-information |

Authors: | ## Edith Grall-Maes and Duc Tung Dao |

Abstract: | This paper deals with the selection of cluster number in a clustering problem taking into account the sideinformation that some points of a chunklet arise from a same cluster. An Expectation-Maximization algorithm is used to estimate the parameters of a mixture model and determine the data partition. To select the number of clusters, usual criteria are not suitable because they do not consider the side-information in the data. Thus we propose suitable criteria which are modified version of three usual criteria, the bayesian information criterion (BIC), the Akaike information criterion (AIC), and the entropy criterion (NEC). The proposed criteria are used to select the number of clusters in the case of two simulated problems and one real problem. Their performances are compared and the influence of the chunklet size is discussed. |

Paper Nr: | 46 |

Title: | ## An Experimental Evaluation of the Adaptive Sampling Method for Time Series Classification and Clustering |

Authors: | ## Muhammad Marwan Muhammad Fuad |

Abstract: | Adaptive sampling is a dimensionality reduction technique of time series data inspired by the dynamic programming piecewise linear approximation. This dimensionality reduction technique yields a suboptimal solution of the problem of polygonal curve approximation by limiting the search space. In this paper, we conduct extensive experiments to evaluate the performance of adaptive sampling in 1-NN classification and k-means clustering tasks. The experiments we conducted show that adaptive sampling gives satisfactory results in the aforementioned tasks even for relatively high compression ratios. |

Paper Nr: | 47 |

Title: | ## Adapted SIFT Descriptor for Improved Near Duplicate Retrieval |

Authors: | ## Afra'a Ahmad Alyosef and Andreas Nürnberger |

Abstract: | The scale invariant feature transformation algorithm (SIFT) has been designed to detect and characterize local features in images. It is widely used to find similar regions in affine transformed images, to recognize similar objects or to retrieve near-duplicates of images. Due to the computational complexity of SIFT based matching operations several approaches have been proposed to speed up this process. However, most approaches lack significant decrease of matching accuracy compared to the original descriptor. We propose an approach that is optimized for near-duplicate image retrieval tasks by a dimensionality reduction process that differs from other methods by preserving the information around the keypoints of any region patches of the original descriptor. The computation of the proposed Region Compressed (RC) SIFT−64D descriptors is therefore faster and requires less memory for indexing. Most important, the obtained features show at the same time a better retrieval performance and seem to be even more robust. In order to prove this, we provide results of a comparative performance analysis using the original SIFT−128D, reduced SIFT versions, SURF−64D and the proposed RC-SIFT−64D in image near-duplicate retrieval using large scale image benchmark databases. |

Paper Nr: | 48 |

Title: | ## Foreground Segmentation for Moving Cameras under Low Illumination Conditions |

Authors: | ## Wei Wang, Weili Li and Xiaoqing Yin |

Abstract: | A foreground segmentation method, including image enhancement, trajectory classification and object segmentation, is proposed for moving cameras under low illumination conditions. Gradient-field-based image enhancement is designed to enhance low-contrast images. On the basis of the dense point trajectories obtained in long frames sequences, a simple and effective clustering algorithm is designed to classify foreground and background trajectories. By combining trajectory points and a marker-controlled watershed algorithm, a new type of foreground labeling algorithm is proposed to effectively reduce computing costs and improve edge-preserving performance. Experimental results demonstrate the promising performance of the proposed approach compared with other competing methods. |

Paper Nr: | 49 |

Title: | ## A New Family of Bounded Divergence Measures and Application to Signal Detection |

Authors: | ## Shivakumar Jolad, Ahmed Roman and Mahesh C. Shastry |

Abstract: | We introduce a new one-parameter family of divergence measures, called bounded Bhattacharyya distance (BBD) measures, for quantifying the dissimilarity between probability distributions. These measures are bounded, symmetric and positive semi-definite and do not require absolute continuity. In the asymptotic limit, BBD measure approaches the squared Hellinger distance. A generalized BBD measure for multiple distributions is also introduced. We prove an extension of a theorem of Bradt and Karlin for BBD relating Bayes error probability and divergence ranking. We show that BBD belongs to the class of generalized Csiszar f-divergence and derive some properties such as curvature and relation to Fisher Information. For distributions with vector valued parameters, the curvature matrix is related to the Fisher-Rao metric. We derive certain inequalities between BBD and well known measures such as Hellinger and Jensen-Shannon divergence. We also derive bounds on the Bayesian error probability. We give an application of these measures to the problem of signal detection where we compare two monochromatic signals buried in white noise and differing in frequency and amplitude. |

Paper Nr: | 52 |

Title: | ## Continuous Set Packing and Near-Boolean Functions |

Authors: | ## Giovanni Rossi |

Abstract: | Given a family of feasible subsets of a ground set, the packing problem is to find a largest subfamily of pairwise disjoint family members. Non-approximability renders heuristics attractive viable options, while efficient methods with worst-case guarantee are a key concern in computational complexity. This work proposes a novel near-Boolean optimization method relying on a polynomial multilinear form with variables ranging each in a high-dimensional unit simplex. These variables are the elements of the ground set, and distribute each a unit membership over those feasible subsets where they are included. The given problem is thus translated into a continuous version where the objective is to maximize a function taking values on collections of points in a unit hypercube. Maximizers are shown to always include collections of hypercube disjoint vertices, i.e. partitions of the ground set, which constitute feasible solutions for the original discrete version of the problem. A gradient-based local search in the expanded continuous domain is designed. Approximations with polynomials of bounded degree and near-Boolean coalition formation games are also finally discussed. |

Paper Nr: | 55 |

Title: | ## Classifier Ensembles with Trajectory Under-Sampling for Face Re-Identification |

Authors: | ## Roghayeh Soleymani |

Abstract: | In person re-identification applications, an individual of interest may be covertly tracked and recognized based on trajectories of faces or other distinguishing information captured with video surveillance camera. However, a varying level of imbalance often exists between target and non-target facial captures, and this imbalance level may differ from what was considered during design. The performance of face classification systems typically declines in such cases because, to avoid bias towards the majority class (non-target), they tend to optimize the overall accuracy under a balance class assumption. Specialized classifier ensembles trained on balanced data, where non-target samples are selected through random under-sampling or cluster-based sampling, have been proposed in literature, but they suffer from loss of information and low diversity and accuracy. In this paper, a new ensemble method is proposed for generating a diverse pool of classifiers, each one trained on different levels of class imbalance and complexity for a greater diversity of opinion. Ensembles with Trajectory Under Sampling (EoC-TUS) allows to select subsets of non-target training data based on trajectories information. Variants of these ensembles can give more importance to the most efficient classifiers in identifying target samples, or define efficient and diverse decision boundaries by starting selection of trajectories from the farthest ones to the target class. For validation, experiments are conducted using videos captured in the Faces In Action dataset, and compared to several baseline techniques. The proposed EoC-TUS outperforms state-of-the-art techniques in terms of accuracy and diversity over a range of imbalance levels in the input video. |

Paper Nr: | 113 |

Title: | ## Document Clustering Games |

Authors: | ## Rocco Tripodi and Marcello Pelillo |

Abstract: | In this article we propose a new model for document clustering, based on game theoretic principles. Each document to be clustered is represented as a player, in the game theoretic sense, and each cluster as a strategy that the players have to choose in order to maximize their payoff. The geometry of the data is modeled as a graph, which encodes the pairwise similarity among each document and the games are played among similar players. In each game the players update their strategies, according to what strategy has been effective in previous games. The Dominant Set clustering algorithm is used to find the prototypical elements of each cluster. This information is used in order to divide the players in two disjoint sets, one collecting labeled players, which always play a definite strategy and the other one collecting unlabeled players, which update their strategy at each iteration of the games. The evaluation of the system was conducted on 13 document datasets and shows that the proposed method performs well compared to different document clustering algorithms. |

Paper Nr: | 132 |

Title: | ## Nonparametric Bayesian Line Detection - Towards Proper Priors for Robotic Computer Vision |

Authors: | ## Anne C. van Rossum and Hai Xiang Lin |

Abstract: | In computer vision there are many sophisticated methods to perform inference over multiple lines, however they are quite ad-hoc. In this paper a fully Bayesian approach is used to fit multiple lines to a point cloud simultaneously. Our model extends a linear Bayesian regression model to an infinite mixture model and uses a Dirichlet process as a prior for the partition. We perform Gibbs sampling over non-unique parameters as well as over clusters to fit lines of a fixed length, a variety of orientations, and a variable number of data points. The performance is measured using the Rand Index, the Adjusted Rand Index, and two other clustering performance indicators. This paper is mainly meant to demonstrate that general Bayesian methods can be used for line estimation. Bayesian methods, namely, given a model and noise, perform optimal inference over the data. Moreover, rather than only demonstrating the concept as such, the first results are promising with respect to the described clustering performance indicators. Further research is required to extend the method to inference over multiple line segments and multiple volumetric objects that will need to be built on the mathematical foundation that has been laid down in this paper. |

Short Papers

Paper Nr: | 10 |

Title: | ## Similarity Function Learning with Data Uncertainty |

Authors: | ## Julien Bohné and Sylvain Colin |

Abstract: | Similarity functions are at the core of many pattern recognition applications. Standard approaches use feature vectors extracted from a pair of images to compute their degree of similarity. Often feature vectors are noisy and a direct application of standard similarly learning methods may result in unsatisfactory performance. However, information on statistical properties of the feature extraction process may be available, such as the covariance matrix of the observation noise. In this paper, we present a method which exploits this information to improve the process of learning a similarity function. Our approach is composed of an unsupervised dimensionality reduction stage and the similarity function itself. Uncertainty is taken into account throughout the whole processing pipeline during both training and testing. Our method is based on probabilistic models of the data and we propose EM algorithms to estimate their parameters. In experiments we show that the use of uncertainty significantly outperform other standard similarity function learning methods on challenging tasks. |

Paper Nr: | 14 |

Title: | ## Similarity Assessment as a Dual Process Model of Counting and Measuring |

Authors: | ## Bert Klauninger and Horst Eidenberger |

Abstract: | Based on recent findings from the field of human similarity perception, we propose a dual process model (DPM) of taxonomic and thematic similarity assessment which can be utilised in machine learning applications. Taxonomic reasoning is related to predicate based measures (counting) whereas thematic reasoning is mostly associated with metric distances (measuring). We suggest a procedure that combines both processes into a single similarity kernel. For each feature dimension of the observational data, an optimal measure is selected by a Greedy algorithm: A set of possible measures is tested, and the one that leads to improved classification performance of the whole model is denoted. These measures are combined into a single SVM kernel by means of generalisation (converting distances into similarities) and quantisation (applying predicate based measures to interval scale data). We then demonstrate how to apply our model to a classification problem of MPEG-7 features from a test set of images. Evaluation shows that the performance of the DPM kernel is superior to those of the standard SVM kernels. This supports our theory that the DPM comes closer to human similarity judgment than any singular measure, and it motivates our suggestion to employ the DPM not only in image retrieval but also in related tasks. |

Paper Nr: | 15 |

Title: | ## Machine Learning with Dual Process Models |

Authors: | ## Bert Klauninger |

Abstract: | Similarity measurement processes are a core part of most machine learning algorithms. Traditional approaches focus on either taxonomic or thematic thinking. Psychological research suggests that a combination of both is needed to model human-like similarity perception adequately. Such a combination is called a Similarity Dual Process Model (DPM). This paper describes how to construct DPMs as a linear combination of existing measures of similarity and distance. We use generalisation functions to convert distance into similarity. DPMs are similar to kernel functions. Thus, they can be integrated into any machine learning algorithm that uses kernel functions.Clearly, not all DPMs that can be formulated work equally well. Therefore we test classification performance in a real-world task: the detection of pedestrians in images. We assume that DPMs are only viable if they yield better classifiers than their constituting parts. In our experiments, we found DPM kernels that matched the performance of conventional ones for our data set. Eventually, we provide a construction kit to build such kernels to encourage further experiments in other application domains of machine learning. |

Paper Nr: | 17 |

Title: | ## Hidden Markov Random Fields and Direct Search Methods for Medical Image Segmentation |

Authors: | ## El-Hachemi Guerrout and Samy Ait-Aoudia |

Abstract: | The goal of image segmentation is to simplify the representation of an image to items meaningful and easier to analyze. Medical image segmentation is one of the fundamental probl |