ICPRAM 2019 Abstracts


Area 1 - Theory and Methods

Full Papers
Paper Nr: 2
Title:

Actual Impact of GAN Augmentation on CNN Classification Performance

Authors:

Thomas Pinetz, Johannes Ruisz and Daniel Soukup

Abstract: In industrial inspection settings, it is common that data is either hard or expensive to acquire. Generative modeling offers a way to reduce those costs by filling out scarce training data sets automatically. Generative Adversarial Networks (GANs) have shown incredible results in the field of artificial image data generation, but until recently were not ready for industrial applications, because of unclear performance metrics and instabilities. However, with the introduction of Wasserstein GAN, which comprises an interpretable loss metric and general stability, it is promising to try using those algorithms for industrial classification tasks. Therefore, we present a case study on a single digit image classification task of banknote serial numbers, where we simulate use cases with missing data. For those selected situations, different data generation algorithms were implemented incorporating GANs in various ways to augment scarce training data sets. As a measure of plausibility of those artificially generated data, we used the classification performance of a CNN trained on them. We analyzed the gains in classification accuracy when augmenting the training samples with GAN images and compare them to results with either more classically generated, rendered artificial data and near perfect training data situations, respectively.

Paper Nr: 11
Title:

Deep Learning for Radar Pulse Detection

Authors:

Ha Q. Nguyen, Dat T. Ngo and Van L. Do

Abstract: In this paper, we introduce a deep learning based framework for sequential detection of rectangular radar pulses with varying waveforms and pulse widths under a wide range of noise levels. The method is divided into two stages. In the first stage, a convolutional neural network is trained to determine whether a pulse or part of a pulse appears in a segment of the signal envelop. In the second stage, the change points in the segment are found by solving an optimization problem and then combined with previously detected edges to estimate the pulse locations. The proposed scheme is noise-blind as it does not require a noise floor estimation, unlike the threshold-based edge detection (TED) method. Simulations also show that our method significantly outperforms TED in highly noisy cases.

Paper Nr: 17
Title:

The Growing N-Gram Algorithm: A Novel Approach to String Clustering

Authors:

Corrado Grappiolo, Eline Verwielen and Nils Noorman

Abstract: Connected high-tech systems allow the gathering of operational data at unprecedented volumes. A direct benefit of this is the possibility to extract usage models, that is, a generic representations of how such systems are used in their field of application. Usage models are extremely important, as they can help in understanding the discrepancies between how a system was designed to be used and how it is used in practice. We interpret usage modelling as an unsupervised learning task and present a novel algorithm, hereafter called Growing N-Grams (GNG), which relies on n-grams — arguably the most popular modelling technique for natural language processing — to cluster and model, in a two-step rationale, a dataset of strings. We empirically compare its performance against some other common techniques for string processing and clustering. The gathered results suggest that the GNG algorithm is a viable approach to usage modelling.

Paper Nr: 21
Title:

BOOK: Storing Algorithm-Invariant Episodes for Deep Reinforcement Learning

Authors:

Simyung Chang, YoungJoon Yoo, Jaeseok Choi and Nojun Kwak

Abstract: We introduce a novel method to train agents of reinforcement learning (RL) by sharing knowledge in a way similar to the concept of using a book. The recorded information in the form of a book is the main means by which humans learn knowledge. Nevertheless, the conventional deep RL methods have mainly focused either on experiential learning where the agent learns through interactions with the environment from the start or on imitation learning that tries to mimic the teacher. Contrary to these, our proposed book learning shares key information among different agents in a book-like manner by delving into the following two characteristic features: (1) By defining the linguistic function, input states can be clustered semantically into a relatively small number of core clusters, which are forwarded to other RL agents in a prescribed manner. (2) By defining state priorities and the contents for recording, core experiences can be selected and stored in a small container. We call this container as ‘BOOK’. Our method learns hundreds to thousand times faster than the conventional methods by learning only a handful of core cluster information, which shows that deep RL agents can effectively learn through the shared knowledge from other agents.

Paper Nr: 33
Title:

Dual SVM Training on a Budget

Authors:

Sahar Qaadan, Merlin Schüler and Tobias Glasmachers

Abstract: We present a dual subspace ascent algorithm for support vector machine training that respects a budget constraint limiting the number of support vectors. Budget methods are effective for reducing the training time of kernel SVM while retaining high accuracy. To date, budget training is available only for primal (SGD-based) solvers. Dual subspace ascent methods like sequential minimal optimization are attractive for their good adaptation to the problem structure, their fast convergence rate, and their practical speed. By incorporating a budget constraint into a dual algorithm, our method enjoys the best of both worlds. We demonstrate considerable speed-ups over primal budget training methods.

Paper Nr: 44
Title:

Feature Extraction of Epileptic EEG in Spectral Domain via Functional Data Analysis

Authors:

Shengkun Xie and Anna Lawniczak

Abstract: Functional data analysis is a natural tool for functional data to discover functional patterns. It is also often used to investigate the functional variation of random signals. In this work, we propose a novel approach by analyzing EEG signals in the spectral domain using functional data analysis techniques including functional descriptive statistics, functional probes, and functional principal component analysis. By first transforming EEG signals into their power spectra, the functionality of random signals is greatly enhanced. Because of this improvement, the application of functional data analysis becomes meaningful in feature extraction of random signals. Our study also illustrates a great potential of using functional PCA as a feature extractor for EEG signals in epilepsy diagnosis.

Paper Nr: 63
Title:

A Cellular Automata based Classification Algorithm

Authors:

Tuğba Usta, Enes B. Dündar and Emin E. Korkmaz

Abstract: Data classification is a well studied problem where the aim is to identify the categories in the data based on a training set. Various machine learning methods have been utilized for the problem. On the other side, cellular automata have drawn the attention of researchers as the system provides a dynamic and a discrete model for computation. In this study a novel approach is proposed for the classification problem. The method is based on formation of classes in a cellular automata by the interaction of neighborhood cells. Initially, the training data instances are assigned to the cells of a cellular automaton. The state of a cell denotes the class assignment of that point in the instance space. At the beginning of the process, only the cells that have a data instance have class assignments. However, these class assignments are spread to the neighbor cells based on a rule inspired by the heat transfer process in nature. The experiments carried out denote that the model can identify the categories in the data and promising results have been obtained.

Paper Nr: 71
Title:

Annealing by Increasing Resampling in the Unified View of Simulated Annealing

Authors:

Yasunobu Imamura, Naoya Higuchi, Takeshi Shinohara, Kouichi Hirata and Tetsuji Kuboyama

Abstract: Annealing by Increasing Resampling (AIR) is a stochastic hill-climbing optimization by resampling with increasing size for evaluating an objective function. In this paper, we introduce a unified view of the conventional Simulated Annealing (SA) and AIR. In this view, we generalize both SA and AIR to a stochastic hill-climbing for objective functions with stochastic fluctuations, i.e., logit and probit, respectively. Since the logit function is approximated by the probit function, we show that AIR is regarded as an approximation of SA. The experimental results on sparse pivot selection and annealing-based clustering also support that AIR is an approximation of SA. Moreover, when an objective function requires a large number of samples, AIR is much faster than SA without sacrificing the quality of the results.

Paper Nr: 76
Title:

Detecting a Fetus in Ultrasound Images using Grad CAM and Locating the Fetus in the Uterus

Authors:

Genta Ishikawa, Rong Xu, Jun Ohya and Hiroyasu Iwata

Abstract: In this paper, we propose an automatic method for estimating fetal position based on classification and detection of different fetal parts in ultrasound images. Fine tuning is performed in the ultrasound images to be used for fetal examination using CNN, and classification of four classes "head", "body", "leg" and "other" is realized. Based on the obtained learning result, binarization that thresholds the gradient of the feature obtained by Grad Cam is performed in the image so that a bounding box of the region of interest with large gradient is extracted. The center of the bounding box is obtained from each frame so that the trajectory of the centroids is obtained; the position of the fetus is obtained as the trajectory. Experiments using 2000 images were conducted using a fetal phantom. Each recall ratiso of the four class is 99.6% for head, 99.4% for body, 99.8% for legs, 72.6% for others, respectively. The trajectories obtained from the fetus present in “left”, “center”, “right” in the images show the above-mentioned geometrical relationship. These results indicate that the estimated fetal position coincides with the actual position very well, which can be used as the first step for automatic fetal examination by robotic systems.

Paper Nr: 78
Title:

Detecting and Tracking Surgical Tools for Recognizing Phases of the Awake Brain Tumor Removal Surgery

Authors:

Hiroki Fujie, Keiju Hirata, Takahiro Horigome, Hiroshi Nagahashi, Jun Ohya, Manabu Tamura, Ken Masamune and Yoshihiro Muragaki

Abstract: In order to realize automatic recognition of surgical processes in surgical brain tumor removal using microscopic camera, we propose a method of detecting and tracking surgical tools by video analysis. The proposed method consists of a detection part and tracking part. In the detection part, object detection is performed for each frame of surgery video, and the category and bounding box are acquired frame by frame. The convolution layer strengthens the robustness using data augmentation (central cropping and random erasing). The tracking part uses SORT, which predicts and updates the acquired bounding box corrected by using Kalman Filter; next, the object ID is assigned to each corrected bounding box using the Hungarian algorithm. The accuracy of our proposed method is very high as follows. As a result of experiments on spatial detection. the mean average precision is 90.58%. the mean accuracy of frame label detection is 96.58%. These results are very promising for surgical phase recognition.

Paper Nr: 82
Title:

Accelerated Algorithm for Computation of All Prime Patterns in Logical Analysis of Data

Authors:

Arthur Chambon, Frédéric Lardeux, Frédéric Saubion and Tristan Boureau

Abstract: The analysis of groups of binary data can be achieved by logical based approaches. These approaches identify subsets of relevant Boolean variables to characterize observations and may help the user to better understand their properties. In logical analysis of data, given two groups of data, patterns of Boolean values are used to discriminate observations in these groups. In this work, our purpose is to highlight that different techniques may be used to compute these patterns. We present a new approach to compute prime patterns that do not provide redundant information. Experiments are conducted on real biological data.

Paper Nr: 106
Title:

Adversarial Alignment of Class Prediction Uncertainties for Domain Adaptation

Authors:

Jeroen Manders, Twan van Laarhoven and Elena Marchiori

Abstract: We consider unsupervised domain adaptation: given labelled examples from a source domain and unlabelled examples from a related target domain, the goal is to infer the labels of target examples. Under the assumption that features from pre-trained deep neural networks are transferable across related domains, domain adaptation reduces to aligning source and target domain at class prediction uncertainty level. We tackle this problem by introducing a method based on adversarial learning which forces the label uncertainty predictions on the target domain to be indistinguishable from those on the source domain. Pre-trained deep neural networks are used to generate deep features having high transferability across related domains. We perform an extensive experimental analysis of the proposed method over a wide set of publicly available pre-trained deep neural networks. Results of our experiments on domain adaptation tasks for image classification show that class prediction uncertainty alignment with features extracted from pre-trained deep neural networks provides an efficient, robust and effective method for domain adaptation.

Paper Nr: 140
Title:

Gaussian Model Trees for Traffic Imputation

Authors:

Sebastian Buschjäger, Thomas Liebig and Katharina Morik

Abstract: Traffic congestion is one of the most pressing issues for smart cities. Information on traffic flow can be used to reduce congestion by predicting vehicle counts at unmonitored locations so that counter-measures can be applied before congestion appears. To do so pricy sensors must be distributed sparsely in the city and at important roads in the city center to collect road and vehicle information throughout the city in real-time. Then, Machine Learning models can be applied to predict vehicle counts at unmonitored locations. To be fault-tolerant and increase coverage of the traffic predictions to the suburbs, rural regions, or even neighboring villages, these Machine Learning models should not operate at a central traffic control room but rather be distributed across the city. Gaussian Processes (GP) work well in the context of traffic count prediction, but cannot capitalize on the vast amount of data available in an entire city. Furthermore, Gaussian Processes are a global and centralized model, which requires all measurements to be available at a central computation node. Product of Expert (PoE) models have been proposed as a scalable alternative to Gaussian Processes. A PoE model trains multiple, independent GPs on different subsets of the data and weight individual predictions based on each experts uncertainty. These methods work well, but they assume that experts are independent even though they may share data points. Furthermore, PoE models require exhaustive communication bandwidth between the individual experts to form the final prediction. In this paper we propose a hierarchical Product of Expert model, which consist of multiple layers of small, independent and local GP experts. We view Gaussian Process induction as regularized optimization procedure and utilize this view to derive an efficient algorithm which selects independent regions of the data. Then, we train local expert models on these regions, so that each expert is responsible for a given region. The resulting algorithm scales well for large amounts of data and outperforms flat PoE models in terms of communication cost, model size and predictive performance. Last, we discuss how to deploy these local expert models onto small devices.

Short Papers
Paper Nr: 6
Title:

mDBSCAN: Real Time Superpixel Segmentation by DBSCAN Clustering based on Boundary Term

Authors:

Hasan Almassri, Tim Dackermann and Norbert Haala

Abstract: mDBSCAN is an improved version of DBSCAN (Density Based Spatial Clustering of Applications with Noise) superpixel segmentation. Unlike DBSCAN algorithm, the proposed algorithm has an automatic threshold based on the colour and gradient information. The proposed algorithm performs under different colour space such as RGB, Lab and grey images using a novel distance measurement. The experimental results demonstrate that the proposed algorithm outperforms the state of the art algorithms in terms of boundary adherence and segmentation accuracy with low computational cost (30 frames/s).

Paper Nr: 10
Title:

Object Detection and Classification on Heterogeneous Datasets

Authors:

Tobias Brosch and Ahmed Elshaarany

Abstract: To train an object detection network labeled data is required. More precisely, all objects to be detected must be labeled in the dataset. Here, we investigate how to train an object detection network from multiple heterogeneous datasets to avoid the cost and time intensive task of labeling. In each dataset only a subset of all objects must be labeled. Still, the network shall be able to learn to detect all of the desired objects from the combined datasets. In particular, if the network selects an unlabeled object during training, it should not consider it a negative sample and adapt its weights accordingly. Instead, it should ignore such detections in order to avoid a negative impact on the learning process. We propose a solution for two-stage object detectors like Faster R-CNN (which can probably also be applied to single-stage detectors). If the network detects a class of an unlabeled category in the current training sample it will omit it from the loss-calculation not only in the detection but also in the proposal stage. The results are demonstrated with a modified version of the Faster R-CNN network with Inception-ResNet-v2. We show that the model’s average precision significantly exceeds the default object detection performance.

Paper Nr: 24
Title:

Multimodal Sentiment Analysis: A Multitask Learning Approach

Authors:

Mathieu P. Fortin and Brahim Chaib-Draa

Abstract: Multimodal sentiment analysis has recently received an increasing interest. However, most methods have considered that text and image modalities are always available at test time. This assumption is often violated in real environments (e.g. social media) since users do not always publish a text with an image. In this paper we propose a method based on a multitask framework to combine multimodal information when it is available, while being able to handle the cases where a modality is missing. Our model contains one classifier for analyzing the text, another for analyzing the image, and another performing the prediction by fusing both modalities. In addition to offer a solution to the problem of a missing modality, our experiments show that this multitask framework improves generalization by acting as a regularization mechanism. We also demonstrate that the model can handle a missing modality at training time, thus being able to be trained with image-only and text-only examples.

Paper Nr: 34
Title:

Simple Domain Adaptation for CAD based Object Recognition

Authors:

Kripasindhu Sarkar and Didier Stricker

Abstract: We present a simple method of domain adaptation between synthetic images and real images - by high quality rendering of the 3D models and correlation alignment. Using this method, we solve the problem of 3D object recognition in 2D images by fine-tuning existing pretrained CNN models for the object categories using the rendered images. Experimentally, we show that our rendering pipeline along with the correlation alignment improve the recognition accuracy of existing CNN based recognition trained on rendered images - by a canonical renderer - by a large margin. Using the same idea we present a general image classifier of common objects which is trained only on the 3D models from the publicly available databases, and show that a small number of training models are sufficient to capture different variations within and across the classes.

Paper Nr: 56
Title:

Faster RBF Network Learning Utilizing Singular Regions

Authors:

Seiya Satoh and Ryohei Nakano

Abstract: There are two ways to learn radial basis function (RBF) networks: one-stage and two-stage learnings. Recently a very powerful one-stage learning method called RBF-SSF has been proposed, which can stably find a series of excellent solutions, making good use of singular regions, and can monotonically decrease training error along with the increase of hidden units. RBF-SSF was built by applying the SSF (singularity stairs following) paradigm to RBF networks; the SSF paradigm was originally and successfully proposed for multilayer perceptrons. Although RBF-SSF has the strong capability to find excellent solutions, it required a lot of time mainly because it computes the Hessian. This paper proposes a faster version of RBF-SSF called RBF-SSF(pH) by introducing partial calculation of the Hessian. The experiments using two datasets showed RBF-SSF(pH) ran as fast as usual one-stage learning methods while keeping the excellent solution quality.

Paper Nr: 58
Title:

Mixture of Multilayer Perceptron Regressions

Authors:

Ryohei Nakano and Seiya Satoh

Abstract: This paper investigates mixture of multilayer perceptron (MLP) regressions. Although mixture of MLP regressions (MoMR) can be a strong fitting model for noisy data, the research on it has been rare. We employ soft mixture approach and use the Expectation-Maximization (EM) algorithm as a basic learning method. Our learning method goes in a double-looped manner; the outer loop is controlled by the EM and the inner loop by MLP learning method. Given data, we will have many models; thus, we need a criterion to select the best. Bayesian Information Criterion (BIC) is used here because it works nicely for MLP model selection. Our experiments showed that the proposed MoMR method found the expected MoMR model as the best for artificial data and selected the MoMR model having smaller error than any linear models for real noisy data.

Paper Nr: 60
Title:

Improving the Dictionary Construction in Sparse Representation using PCANet for Face Recognition

Authors:

Peiyu Kang, Yonggang Lu, Diqi Pan and Wenjie Guo

Abstract: Recently, sparse representation has attracted increasing interest in computer vision. Sparse representation based methods, such as sparse representation classification (SRC), have produced promising results in face recognition, while the dictionary used for sparse representation plays a key role in it. How to improve the dictionary construction in sparse representation is still an open question. Principal component analysis network (PCANet), as a newly proposed deep learning method, has the advantage of simple network architecture and competitive performance for feature learning. In this paper, we have studied how to use the PCANet to improve the dictionary construction in sparse representation, and proposed a new method for face recognition. The PCANet is used to learn new features from face images, and the learned features are used as dictionary atoms to code the query face images, and then the reconstruction errors after sparse coding are used to classify the face images. It is shown that the proposed method can achieve better performance than the other five state-of-art methods for face recognition.

Paper Nr: 61
Title:

Document Image Dewarping using Deep Learning

Authors:

Vijaya B. Ramanna, Saqib Bukhari and Andreas Dengel

Abstract: The distorted images have been a major problem for Optical Character Recognition (OCR). In order to perform OCR on distorted images, dewarping has become a principal preprocessing step. This paper presents a new document dewarping method that removes curl and geometric distortion of modern and historical documents. Finally, the proposed method is evaluated and compared to the existing Computer Vision based method. Most of the traditional dewarping algorithms are created based on the text line feature extraction and segmentation. However, textual content extraction and segmentation can be sophisticated. Hence, the new technique is proposed, which doesn’t need any complicated methods to process the text lines. The proposed method is based on Deep Learning and it can be applied on all type of text documents and also documents with images and graphics. Moreover, there is no preprocessing required to apply this method on warped images. In the proposed system, the document distortion problem is treated as an image-to-image translation. The new method is implemented using a very powerful pix2pixhd network by utilizing Conditional Generative Adversarial Networks (CGAN). The network is trained on UW3 dataset by supplying distorted document as an input and cleaned image as the target. The generated images from the proposed method are cleanly dewarped and they are of high-resolution. Furthermore, these images can be used to perform OCR.

Paper Nr: 69
Title:

Fast Nearest Neighbor Search with Narrow 16-bit Sketch

Authors:

Naoya Higuchi, Yasunobu Imamura, Tetsuji Kuboyama, Kouichi Hirata and Takeshi Shinohara

Abstract: We discuss the nearest neighbor search using sketch which is a kind of locality sensitive hash (LSH). Nearest neighbor search using sketch is done in two stages. In the first stage, the top K candidates, which have close sketches to a query, are selected, where K ≥ 1. In the second stage, the nearest object to the query from K candidates is selected by performing actual distance calculations. Conventionally, higher accurate search requires wider sketches than 32-bit. In this paper, we propose search methods using narrow 16-bit sketch, which enables efficient data management by buckets and implement a faster first stage. To keep accuracy, search using 16-bit sketch requires larger K than using 32-bit sketch. By sorting the data objects according to sketch’s values, cost influence due to the increase in the number of candidates K can be reduced by improving memory locality in the second stage search. The proposed method achieves about 10 times faster search speed while maintaining accuracy.

Paper Nr: 72
Title:

Pedestrian Similarity Extraction to Improve People Counting Accuracy

Authors:

Xu Yang, Jose Gaspar, Wei Ke, Chan T. Lam, Yanwei Zheng, Weng H. Lou and Yapeng Wang

Abstract: Current state-of-the-art single shot object detection pipelines, composed by an object detector such as Yolo, generate multiple detections for each object, requiring a post-processing Non-Maxima Suppression (NMS) algorithm to remove redundant detections. However, this pipeline struggles to achieve high accuracy, particularly in object counting applications, due to a trade-off between precision and recall rates. A higher NMS threshold results in fewer detections suppressed and, consequently, in a higher recall rate, as well as lower precision and accuracy. In this paper, we have explored a new pedestrian detection pipeline which is more flexible, able to adapt to different scenarios and with improved precision and accuracy. A higher NMS threshold is used to retain all true detections and achieve a high recall rate for different scenarios, and a Pedestrian Similarity Extraction (PSE) algorithm is used to remove redundant detentions, consequently improving counting accuracy. The PSE algorithm significantly reduces the detection accuracy volatility and its dependency on NMS thresholds, improving the mean detection accuracy for different input datasets.

Paper Nr: 77
Title:

Comparison between Supervised and Unsupervised Feature Selection Methods

Authors:

Lilli Haar, Katharina Anding, Konstantin Trambitckii and Gunther Notni

Abstract: The reduction of the feature set by selecting relevant features for the classification process is an important step within the image processing chain, but sometimes too little attention is paid to it. Such a reduction has many advantages. It can remove irrelevant and redundant data, improve recognition performance, reduce storage capacity requirements, computational time of calculations and also the complexity of the model. Within this paper supervised and unsupervised feature selection methods are compared with respect to the achievable recognition accuracy. Supervised Methods include information of the given classes in the selection, whereas unsupervised ones can be used for tasks without known class labels. Feature clustering is an unsupervised method. For this type of feature reduction, mainly hierarchical methods, but also k-means are used. Instead of this two clustering methods, the Expectation Maximization (EM) algorithm was used in this paper. The aim is to investigate whether this type of clustering algorithm can provide a proper feature vector using feature clustering. There is no feature reduction technique that provides equally best results for all datasets and classifiers. However, for all datasets, it was possible to reduce the feature set to a specific number of useful features without losses and often even with improvements in recognition performance.

Paper Nr: 79
Title:

Vertical and Horizontal Distances to Approximate Edit Distance for Rooted Labeled Caterpillars

Authors:

Kohei Muraka, Takuya Yoshino and Kouichi Hirata

Abstract: A rooted labeled caterpillar (caterpillar, for short) is a rooted labeled tree transformed to a rooted path (called a backbone) after removing all the leaves in it and we can compute the edit distance between caterpillars in quartic time. In this paper, we introduce two vertical distances and two horizontal distances for caterpillars. The former are based on a string edit distance between the string representations of the backbones and the latter on a multiset edit distance between the multisets of labels occurring in all the leaves. Then, we show that these distances give both lower bound and upper bound of the edit distance and we can compute the vertical distances in quadratic time and the horizontal distances in linear time under the unit cost function.

Paper Nr: 85
Title:

Accurate Prediction of Advertisement Clicks based on Impression and Click-Through Rate using Extreme Gradient Boosting

Authors:

Tülin Çakmak, Ahmet T. Tekin, Çağla Şenel, Tuğba Çoban, Zeynep E. Uran and C. O. Sakar

Abstract: Online travel agencies (OTAs) aim to use digital media advertisements in the most efficient way to increase their market share. One of the most commonly used digital media environments by OTAs are the metasearch bidding engines. In metasearch bidding engines, many OTAs offer daily bids per click for each hotel to get reservations. Therefore, management of bidding strategies is crucial to minimize the cost and maximize the revenue for OTAs. In this paper, we aim to predict both the impression count and Click-Through-Rate (CTR) metrics of hotel advertisements for an OTA and then use these values to obtain the number of clicks the OTA will take for each hotel. The initial version of the dataset was obtained from the dashboard of an OTA which contains features for each hotel’s last day performance values in the search engine. We enriched the initial dataset by creating features using window-sliding approach and integrating some domain-specific features that are considered to be important in hotel click prediction. The final set of features are used to predict next day’s CTR and impression count values. We have used state-of-the-art prediction algorithms including decision tree-based ensemble methods, boosting algorithms and support vector regression. An important contribution of this study is the use of Extreme Gradient Boosting (XGBoost) algorithm for hotel click prediction, which overwhelmed state-of-the-art algorithms on various tasks. The results showed that XGBoost gives the highest R-Squared values in the prediction of all metrics used in our study. We have also applied a mutual information filter feature ranking method called minimum redundancy-maximum relevance (mRMR) to evaluate the importance of the features used for prediction. The bid value offered by OTA at time t − 1 is found to be the most informative feature both for impression count and CTR prediction. We have also observed that a subset of features selected by mRMR achieves comparable performance with using all of the features in the machine learning model.

Paper Nr: 86
Title:

Normalization of the Histogram of Forces

Authors:

M. Jazouli, J. Wadsworth and P. Matsakis

Abstract: The histogram of forces is a quantitative representation of the relative position of two image objects. It is an image descriptor, like, e.g., shape descriptors. It is not invariant under similitudes, but can be made invariant under similitudes. These are two desirable properties that have been exploited in many applications. Making the histogram of forces invariant under similitudes is achieved through a procedure called normalization. In this paper, we formalize the concept of normalization, review the existing normalization procedures, introduce new ones, and compare all these procedures through experiments involving over 170,000 histogram computations or normalizations.

Paper Nr: 98
Title:

Image-based Discrimination and Spatial Non-uniformity Analysis of Effect Coatings

Authors:

Jiří Filip, Radomír Vávra, Frank J. Maile and Bill Eibon

Abstract: Various industries are striving for novel, more reliable but still efficient approaches to coatings characterization. Majority of industrial applications use portable instruments for characterization of effect coatings. They typically capture a limited set of in-plane geometries and have limited ability to reliably characterize gonio-apparent behavior typical for such coatings. The instruments rely mostly on color and reflectance characteristics without using a texture information across the coating plane. In this paper, we propose image-based method that counts numbers of effective pigments and their active area. First, we captured appearance of eight effect coatings featuring four different pigment materials, in in-plane and out-of-plane geometries. We used a gonioreflectometer for fixed viewing and varying illumination angles. Our analysis has shown that the proposed method is able to clearly distinguish pigment materials and coating applications in both in-plane and out-of-plane geometries. Finally, we show an application of our method to analysis of spatial non-uniformity, i.e. cloudiness or mottling, across a coated panel.

Paper Nr: 100
Title:

Bipartite Edge Correlation Clustering: Finding an Edge Biclique Partition from a Bipartite Graph with Minimum Disagreement

Authors:

Mikio Mizukami, Kouich Hirata and Tetsuji Kuboyama

Abstract: In this paper, first we formulate the problem of a bipartite edge correlation clustering which finds an edge biclique partition with the minimum disagreement from a bipartite graph, by extending the bipartite correlation clustering which finds a biclique partition. Then, we design a simple randomized algorithm for bipartite edge correlation clustering, based on the randomized algorithm of bipartite correlation clustering. Finally, we give experimental results to evaluate the algorithms from both artificial data and real data.

Paper Nr: 102
Title:

Approximation of the Distance from a Point to an Algebraic Manifold

Authors:

Alexei Y. Uteshev and Marina V. Goncharova

Abstract: The problem of geometric distance d evaluation from a point X0 to an algebraic curve in R2 or manifold G(X) = 0 in R3 is treated in the form of comparison of exact value with two its successive approximations d(1) and d(2). The geometric distance is evaluated from the univariate distance equation possessing the zero set coinciding with that of critical values of the function d2(X0), while d(1)(X0) and d(2)(X0) are obtained via expansion of d2(X0) into the power series of the algebraic distance G(X0). We estimate the quality of approximation comparing the relative positions of the level sets of d(X), d(1)(X) and d(2)(X).

Paper Nr: 103
Title:

Dynamic Texture Modeling based on Dynamic Convolution Autoencoder

Authors:

Jing Liu, Wei Li, Zhicheng Liu and Ziqi Zhu

Abstract: Dynamic textures are widely found in various video sequences. Therefore, dynamic texture modeling can better help people understand the content of the video. Based on the feature extraction capability and nonlinear expression capability of convolution neural network, a new convolution autoencoder model is proposed. In this model, we designed a convolution encoder model for dynamic feature extraction. At the top of the model, we designed a dynamic feature prediction network based on long short-term memory technology. The dynamic convolution autoencoder can be constructed by appropriate stacking strategies and trained by layering pre-training and joint fine tuning techniques. Our proposed model can be used for dynamic texture synthesis. For dynamic texture synthesis, we can firstly predict the dynamic features of dynamic texture video sequence. In addition, predicted video frames can be reconstructed by convolution decoder. We did experiments on the DynTex database and compared it with stacked autoencoder.Our model could generate longer videos with better results .The experiment verifies the validity of the proposed modeling method.

Paper Nr: 107
Title:

3D Face Reconstruction from RGB-D Data by Morphable Model to Point Cloud Dense Fitting

Authors:

Claudio Ferrari, Stefano Berretti, Pietro Pala and Alberto D. Bimbo

Abstract: 3D cameras for face capturing are quite common today thanks to their ease of use and affordable cost. The depth information they provide is mainly used to enhance face pose estimation and tracking, and face-background segmentation, while applications that require finer face details are usually not possible due to the low-resolution data acquired by such devices. In this paper, we propose a framework that allows us to derive high-quality 3D models of the face starting from corresponding low-resolution depth sequences acquired with a depth camera. To this end, we start by defining a solution that exploits temporal redundancy in a short-sequence of adjacent depth frames to remove most of the acquisition noise and produce an aggregated point cloud output with intermediate level details. Then, using a 3DMM specifically designed to support local and expression-related deformations of the face, we propose a two-steps 3DMM fitting solution: initially the model is deformed under the effect of landmarks correspondences; subsequently, it is iteratively refined using points closeness updating guided by a mean-square optimization. Preliminary results show that the proposed solution is able to derive 3D models of the face with high visual quality; quantitative results also evidence the superiority of our approach with respect to methods that use one step fitting based on landmarks.

Paper Nr: 109
Title:

Real-time 2D Multi-Person Pose Estimation on CPU: Lightweight OpenPose

Authors:

Daniil Osokin

Abstract: In this work we adapt multi-person pose estimation architecture to use it on edge devices. We follow the bottom-up approach from OpenPose (Cao et al., 2017), the winner of COCO 2016 Keypoints Challenge, because of its decent quality and robustness to number of people inside the frame. With proposed network design and optimized post-processing code the full solution runs at 28 frames per second (fps) on Intel® NUC 6i7KYB mini PC and 26 fps on Core i7-6850K CPU. The network model has 4.1M parameters and 9 billions floating-point operations (GFLOPs) complexity, which is just ∼15% of the baseline 2-stage OpenPose with almost the same quality. The code and model are available as a part of Intel® OpenVINOTM Toolkit.

Paper Nr: 111
Title:

Computer Vision and Deep Learning Tools for the Automatic Processing of Wasan Documents

Authors:

Yago Diez, Toya Suzuki, Marius Vila and Katsushi Waki

Abstract: ”Wasan” is a type of mathematical texts unique from Japan developed during the Edo period (1603-1867). These ancient documents present a wealth of knowledge and are of great cultural and historical importance. In this paper we present a fully automatic algorithm to locate a landmark element within Wasan documents. Specifically, we use classical computer vision techniques as well as deep learning tools in order to locate one particular kanji character called the ”ima” kanji. Even though the problem is challenging due to the low image quality of manually scanned ancient documents and the complexity of handwritten kanji detection and recognition, our pipeline including noise reduction, orientation correction, candidate kanji region detection and kanji classification achieves a 93% success rate. Experiments run on a dataset with 373 images are presented.

Paper Nr: 112
Title:

Cost-constrained Drone Presence Detection through Smart Sound Processing

Authors:

Joaquín García-Gómez, Marta Bautista-Durán, Roberto Gil-Pita, Inma Mohíno-Herranz, Miguel Aguilar-Ortega and César Clares-Crespo

Abstract: Sometimes, drones lead to problems of invasion of privacy or access to restricted areas. Because of that, it is important to develop a system capable of detecting the presence of these vehicles in real time in environments where they could be used for malicious purposes. However, the computational cost associated to that system must be limited if it has to work in an autonomous way. In this manuscript an algorithm based on Smart Sound Processing techniques has been developed. Feature extraction, cost constrained feature selection and detection processes, typically implemented in pattern recognition systems, are applied. Results show that it is possible to detect the presence of drones with low cost feature subsets, where MFCCs and pitch are the most relevant ones.

Paper Nr: 115
Title:

Learning Domain Specific Features using Convolutional Autoencoder: A Vein Authentication Case Study using Siamese Triplet Loss Network

Authors:

Manish Agnihotri, Aditya Rathod, Daksh Thapar, Gaurav Jaswal, Kamlesh Tiwari and Aditya Nigam

Abstract: Recently, deep hierarchically learned models (such as CNN) have achieved superior performance in various computer vision tasks but limited attention has been paid to biometrics till now. This is major because of the number of samples available in biometrics are limited and are not enough to train CNN efficiently. However, deep learning often requires a lot of training data because of the huge number of parameters to be tuned by the learning algorithm. How about designing an end-to-end deep learning network to match the biometric features when the number of training samples is limited? To address this problem, we propose a new way to design an end-to-end deep neural network that works in two major steps: first an auto-encoder has been trained for learning domain specific features followed by a Siamese network trained via. triplet loss function for matching. A publicly available vein image data set has been utilized as a case study to justify our proposal. We observed that transformations learned from such a network provide domain specific and most discriminative vascular features. Subsequently, the corresponding traits are matched using multimodal pipelined end-to-end network in which the convolutional layers are pre-trained in an unsupervised fashion as an autoencoder. Thorough experimental studies suggest that the proposed framework consistently outperforms several state-of-the-art vein recognition approaches.

Paper Nr: 117
Title:

Collaborative Learning of Human and Computer: Supervised Actor-Critic based Collaboration Scheme

Authors:

Ashwin Devanga and Koichiro Yamauchi

Abstract: Recent large-scale neural networks show a high performance to complex recognition tasks but to get such ability, it needs a huge number of learning samples and iterations to optimize it’s internal parameters. However, under unknown environments, learning samples do not exist. In this paper, we aim to overcome this problem and help improve the learning capability of the system by sharing data between multiple systems. To accelerate the optimization speed, the novel system forms a collaboration with human and reinforcement learning neural network and for data sharing between systems to develop a super neural network.

Paper Nr: 120
Title:

Collaborative Merging of Radio SLAM Maps in View of Crowd-sourced Data Acquisition and Big Data

Authors:

Kenneth Batstone, Magnus Oskarsson and Kalle Åstrom

Abstract: Indoor localization and navigation is a much researched and difficult problem. The best solutions, usually use expensive specialized equipment and/or prior calibration of some form. To the average person with smart or Internet-Of-Things devices, these solutions are not feasible, particularly in large scales. With hardware advancements making Ultra-Wideband devices more accurate and low powered, this unlocks the potential of having such devices in commonplace around factories and homes, enabling an alternative method of navigation. Therefore, indoor anchor calibration becomes a key problem in order to implement these devices efficiently and effectively. In this paper, we present a method to fuse radio SLAM (also known as Time-Of- Arrival self-calibration) maps together in a linear way. In doing so we are then able to collaboratively calibrate the anchor positions in 3D to native precision of the devices. Furthermore, we introduce an automatic scheme to determine which of the maps are best to use to further improve the anchor calibration and its robustness but also show which maps could be discarded. Additionally, when a map is fused in a linear way, it is a very computationally cheap process and produces a reasonable map which is required to push for crowd-sourced data acquisition.

Paper Nr: 121
Title:

Machine Learning Approach to the Synthesis of Identification Procedures for Modern Photon-Counting Sensors

Authors:

Viacheslav Antsiperov

Abstract: The article presents the results of developing a machine learning approach to the problem of object identification (recognition) in images (data) recorded by photo-counting sensors. Such images are significantly different from the traditional ones, taken with conventional sensors in the process of time exposure and spatial averaging of the incident radiation. The result of radiation registration by photo-counting sensors (image) is rather a continuous stream of data, whose time frame is characterized by a relatively small number of photocounts. The latter leads to a low signal-to-noise ratio, low contrast and fuzzy shapes of the objects. For this reason, the well-known methods, designed for traditional image recognition, are not effective enough in this case and new recognition approaches, oriented to a low-count images, are required. In this paper we propose such an approach. It is based on the machine learning paradigm and designed for identifying (low count) objects given by point-sets. Consistently using a discrete set of coordinates of photocounts rather than a continuous image reconstructed, we formalize the problem in question as the problem of the best fitting of this set of counts, considered as the realization of a certain point process, to the statistical description of one of the previously registered point processes, which we call precedents. It is shown, that applying the Poisson point process model for formalizing the registration process in photo-counting sensors, it is possible to reduce the problem of object identification to the problem of maximizing the tested point--set likelihood with respect to the classes of modelling object distributions up to shape size and position. It is also demonstrated that these procedures can be brought to an algorithmic realization, analogous in structure to the popular EM algorithms. At the end of the paper we, for the sake of illustration, present some results of applying the developed algorithms to the identification of objects in a small artificial data base of low-count images.

Paper Nr: 123
Title:

Using Stigmergy as a Computational Memory in the Design of Recurrent Neural Networks

Authors:

Federico A. Galatolo, Mario A. Cimino and Gigliola Vaglini

Abstract: In this paper, a novel architecture of Recurrent Neural Network (RNN) is designed and experimented. The proposed RNN adopts a computational memory based on the concept of stigmergy. The basic principle of a Stigmergic Memory (SM) is that the activity of deposit/removal of a quantity in the SM stimulates the next activities of deposit/removal. Accordingly, subsequent SM activities tend to reinforce/weaken each other, generating a coherent coordination between the SM activities and the input temporal stimulus. We show that, in a problem of supervised classification, the SM encodes the temporal input in an emergent representational model, by coordinating the deposit, removal and classification activities. This study lays down a basic framework for the derivation of a SM-RNN. A formal ontology of SM is discussed, and the SM-RNN architecture is detailed. To appreciate the computational power of an SM-RNN, comparative NNs have been selected and trained to solve the MNIST handwritten digits recognition benchmark in its two variants: spatial (sequences of bitmap rows) and temporal (sequences of pen strokes).

Paper Nr: 127
Title:

Analyzing the Linear and Nonlinear Transformations of AlexNet to Gain Insight into Its Performance

Authors:

Jyoti Nigam, Srishti Barahpuriya and Renu M. Rameshan

Abstract: AlexNet, one of the earliest and successful deep learning networks, has given great performance in image classification task. There are some fundamental properties for good classification such as: the network preserves the important information of the input data; the network is able to see differently, points from different classes. In this work we experimentally verify that these core properties are followed by the AlexNet architecture. We analyze the effect of linear and nonlinear transformations on input data across the layers. The convolution filters are modeled as linear transformations. The verified results motivate to draw conclusions on the desirable properties of transformation matrix that aid in better classification.

Paper Nr: 132
Title:

Learning Ensembles in the Presence of Imbalanced Classes

Authors:

Amal Saadallah, Nico Piatkowski, Felix Finkeldey, Petra Wiederkehr and Katharina Morik

Abstract: Class imbalance occurs when data classes are not equally represented. Generally, it occurs when some classes represent rare events, while the other classes represent the counterpart of these events. Rare events, especially those that may have a negative impact, often require informed decision-making in a timely manner. However, class imbalance is known to induce a learning bias towards majority classes which implies a poor detection of minority classes. Thus, we propose a new ensemble method to handle class imbalance explicitly at training time. In contrast to existing ensemble methods for class imbalance that use either data driven or randomized approaches for their constructions, our method exploits both directions. On the one hand, ensemble members are built from randomized subsets of training data. On the other hand, we construct different scenarios of class imbalance for the unknown test data. An ensemble is built for each resulting scenario by combining random sampling with the estimation of the relative importance of specific loss functions. Final predictions are generated by a weighted average of each ensemble prediction. As opposed to existing methods, our approach does not try to fix imbalanced data sets. Instead, we show how imbalanced data sets can make classification easier, due to a limited range of true class frequencies. Our procedure promotes diversity among the ensemble members and is not sensitive to specific parameter settings. An experimental demonstration shows, that our new method outperforms or is on par with state-of-the-art ensembles and class imbalance techniques.

Paper Nr: 1
Title:

A Fast and Efficient Method for Solving the Multiple Closed Curve Detection Problem

Authors:

Una Radojičić, Rudolf Scitovski and Kristian Sabo

Abstract: The paper deals with the multiple closed curve detection problem, i.e. the multiple circle and the multiple ellipse detection problem are especially considered. Based on data coming from a number of closed curves not known in advance, it is necessary to recognize these curves. The approach we propose is based on center-based clustering, which means that the problem is reduced to searching for an optimal partition whose cluster-centers are the closed curves we searched for. By using a modified well-known k-means algorithm with carefully chosen initial approximation we shall look for these curves. For detecting the most appropriate number of clusters (closed curves), we have also proposed a new, specialized index that was significantly better than other well-known indexes.

Paper Nr: 12
Title:

Deep Learning for Pulse Repetition Interval Classification

Authors:

Ha P. K. K. Nguyen, Ha Q. Q. Nguyen and Dat T. Ngo

Abstract: Pulse Repetition Intervals (PRI)—the distances between consecutive times of arrival of radar pulses—is an important characteristic of the radar emitting source. The recognition of various PRI modulation types is therefore a key task of an Electronic Support Measure (ESM) system for accurate identification of threat emitters. This problem is challenging due to the missing and spurious pulses. In this paper, we introduce a deep-learning-based method for the classification of 7 popular PRI modulation types. In this approach, a convolutional neural network (CNN) is proposed as the classifier. Our method works well with raw input PRI sequences and, thus, gets rid of all preprocessing steps such as noise mitigation, feature extraction, and threshold setting, as required in previous approaches. Extensive simulations demonstrate that the proposed scheme outperforms existing methods by a significant margin over a variety of PRI parameters, especially in severely noisy conditions.

Paper Nr: 14
Title:

Classification of Ground Moving Radar Targets with RBF Neural Networks

Authors:

Eran Notkin, Tomer Cohen and Akiva Novoselsky

Abstract: This paper presents a novel method for classification of targets detected by Ground Moving Target Indication (GMTI) radar systems. GMTI radar systems provide no direct information regarding the type or size of the detected targets. The suggested method allow classification of ground moving targets into few groups of size, by analysis of Signal to Noise Ratio (SNR) values of GMTI radar measurements. The classification method is based on Radial Basis Function (RBF) neural networks. The data used as features for classification composed of Radar Cross Section (RCS) values of the target (obtained from the SNR values) in varying aspect angles. The proposed classifier was tested on diverse simulative cases and yielded very good results in classification of targets for three groups of size.

Paper Nr: 25
Title:

Feedforward and Feedback Processing of Spatiotemporal Tubes for Efficient Object Localization

Authors:

Khari Jarrett, Joachim Lohn-Jaramillo, Elijah Bowen, Laura Ray and Richard Granger

Abstract: We introduce a new set of mechanisms for tracking entities through videos, at substantially less expense than required by standard methods. The approach combines inexpensive initial processing of individual frames together with integration of information across long time spans (multiple frames), resulting in the recognition and tracking of spatially and temporally contiguous entities, rather than focusing on the individual pixels that comprise those entities.

Paper Nr: 26
Title:

Dimensionality Reduction in Supervised Models-based for Heart Failure Prediction

Authors:

Anna G. Escamilla, Amir H. El Hassani and Emmanuel Andres

Abstract: Cardiovascular diseases are the leading cause of death worldwide. Therefore, the use of computer science, especially machine learning, arrives as a solution to assist the practitioners. The literature presents different machine learning models that provide recommendations and alerts in case of anomalies, such as the case of heart failure. This work used dimensionality reduction techniques to improve the prediction of whether a patient has heart failure through the validation of classifiers. The information used for the analysis was extracted from the UCI Machine Learning Repository with data sets containing 13 features and a binary categorical feature. Of the 13 features, top six features were ranked by Chi-square feature selector and then a PCA analysis was performed. The selected features were applied to the seven classification models for validation. The best performance was presented by the ChiSqSelector and PCA models.

Paper Nr: 29
Title:

Overcoming Labeling Ability for Latent Positives: Automatic Label Correction along Data Series

Authors:

Azusa Sawada and Takashi Shibata

Abstract: Although recent progress in machine learning has substantially improved the accuracy of pattern recognition and classification task, the performances of these learned models depend on the annotation quality. Therefore, in the real world, the accuracy of these models is limited by the labelling skills of the annotators. To tackle this problem, we propose a novel learning framework that can obtain an accurate model by finding latent positive samples that are often overlooked by non-skilled annotators. The key of the proposed method is to focus on the data series that is helpful to find the latent positive labels. The proposed method has two main interacting components: 1) a label correction part to seek positives along data series and 2) a model training part on modified labels. The experimental results on simulated data show that the proposed method can obtain the same performance as supervision by oracle label and outperforms the existing method in terms of area under the curve (AUC).

Paper Nr: 30
Title:

Japanese Scene Character Recognition using Random Image Feature and Ensemble Scheme

Authors:

Fuma Horie and Hideaki Goto

Abstract: Scene character recognition is challenging and difficult owing to various environmental factors at image capturing and complex design of characters. Japanese character recognition requires a large number of scene character images for training since thousands of character classes exist in the language. In order to enhance the Japanese scene character recognition, we utilized a data augmentation method and an ensemble scheme in our previous work. In this paper, Random Image Feature (RI-Feature) method is newly proposed for improving the ensemble learning. Experimental results show that the accuracy has been improved from 65.57% to 78.50% by adding the RI-Feature method to the ensemble learning. It is also shown that HOG feature outperforms CNN in the Japanese scene character recognition.

Paper Nr: 35
Title:

Deep Neural Networks with Intersection over Union Loss for Binary Image Segmentation

Authors:

Floris van Beers, Arvid Lindström, Emmanuel Okafor and Marco A. Wiering

Abstract: In semantic segmentation tasks the Jaccard Index, or Intersection over Union (IoU), is often used as a measure of success. While this measure is more representative than per-pixel accuracy, state-of-the-art deep neural networks are still trained on accuracy by using Binary Cross Entropy loss. In this research, an alternative is used where deep neural networks are trained for a segmentation task of human faces by optimizing directly an approximation of IoU. When using this approximation, IoU becomes differentiable and can be used as a loss function. The comparison between IoU loss and Binary Cross Entropy loss is made by testing two deep neural network models on multiple datasets and data splits. The results show that training directly on IoU significantly increases performance for both models compared to training on conventional Binary Cross Entropy loss.

Paper Nr: 43
Title:

Polyp Classification and Clustering from Endoscopic Images using Competitive and Convolutional Neural Networks

Authors:

Avish Kabra, Yuji Iwahori, Hiroyasu Usami, M. K. Bhuyan, Naotaka Ogasawara and Kunio Kasugai

Abstract: Understanding the type of Polyp present in the body plays an important role in medical diagnosis. This paper proposes an approach to classify and cluster the polyp present in an Endoscopic scene into malignant or benign class. CNN and Self Organizing Maps are used to classify and cluster from white light and Narrow Band (NBI) Endoscopic Images . Using Competitive Neural Network different polyps available from previous data are plotted with the new polyp according to their structural similarity. Such kind of presentation not only help the doctor in it’s easy understanding but also helps him to know what kind of medical procedures were followed in similar cases.

Paper Nr: 45
Title:

Sparse l2-norm Regularized Regression for Face Recognition

Authors:

Ahmad J. Qudaimat and Hasan Demirel

Abstract: In this paper, a new `2-norm regularized regression based face recognition method is proposed, with `0-norm constraint to ensure sparse projection. The proposed method aims to create a transformation matrix that transform the images to sparse vectors with positions of nonzero coefficients depending on the image class. The classification of a new image is a simple process that only depends on calculating the norm of vectors to decide the class of the image. The experimental results on benchmark face databases show that the new method is comparable and sometimes superior to alternative projection based methods published in the field of face recognition.

Paper Nr: 48
Title:

Understanding Sprinting Motion Skills using Unsupervised Learning for Stepwise Skill Improvements of Running Motion

Authors:

Chanjin Seo, Masato Sabanai, Hiroyuki Ogata and Jun Ohya

Abstract: To improve running performances, each runner’s skill, such as characteristics and habits, needs to be known, and feedback on the performance should be outputted according to the runner's skill level. In this paper, we propose a new coaching system for detecting the skill of a runner and a method of giving feedback using a sprint motion dataset. Our proposed method calculates an extracted feature to detect the skill using an autoencoder whose middle layer is an LSTM layer; we analyse the feature using hierarchical clustering, and we analyse the human joints that affect the skill. As a result of experiments, five clusters are obtained using hierarchical clustering. This paper clarifies how to detect the skill and to output feedback to achieve a level of performance one step higher than the current level.

Paper Nr: 50
Title:

Privacy and Fairness in Recommender Systems via Adversarial Training of User Representations

Authors:

Yehezkel S. Resheff, Yanai Elazar, Moni Shahar and Oren S. Shalom

Abstract: Latent factor models for recommender systems represent users and items as low dimensional vectors. Privacy risks of such systems have previously been studied mostly in the context of recovery of personal information in the form of usage records from the training data. However, the user representations themselves may be used together with external data to recover private user information such as gender and age. In this paper we show that user vectors calculated by a common recommender system can be exploited in this way. We propose the privacy-adversarial framework to eliminate such leakage of private information, and study the trade-off between recommender performance and leakage both theoretically and empirically using a benchmark dataset. An advantage of the proposed method is that it also helps guarantee fairness of results, since all implicit knowledge of a set of attributes is scrubbed from the representations used by the model, and thus can’t enter into the decision making. We discuss further applications of this method towards the generation of deeper and more insightful recommendations.

Paper Nr: 51
Title:

A Novel Approach for Anomaly Detection in Power Consumption Data

Authors:

C. Chahla, H. Snoussi, L. Merghem and M. Esseghir

Abstract: Anomalies are patterns in data that do not follow the expected behaviour and they are rarely encountered. Anomaly detection has been widely used within diverse research areas such as credit card fraud detection, image processing, and many other application domains. In this paper, we focus on detecting anomalies in power consumption data. The identification of unusual behaviours is important in order to foresee uncommon events and to improve energy efficiency. To this end, we propose a model to precisely identify anomalous days and another one to localize the detected anomalies. Normal days are identified using a simple Auto-Encoder reconstruction technique, whereas the localization of the anomaly throughout the day is performed using a combination of LSTM and K-means algorithms. This hybrid model that combines prediction and clustering techniques, permits to detect unusual behaviour based on the assumption that identical daily consumption can appear repeatedly due to users’ living habits. The model is evaluated using real-world power consumption data collected from Pecanstreet in the United States.

Paper Nr: 64
Title:

Multi-stage Off-line Arabic Handwriting Recognition Approach using Advanced Cascading Technique

Authors:

Taraggy Ghanim, Mahmoud I. Khalil and Hazem M. Abbas

Abstract: Automatic Recognition of Arabic Handwriting is a pervasive field that has many challenging complications to solve. Such complications include big databases and complex computing activities. The proposed approach is a multi-stage cascading recognition system bases on applying Random Forest classifier (RF) to construct a forest of decision trees. The constructed decision trees split big databases to multiple smaller data-mined sets based on the most discriminating computed geometric and regional features. Each data-mined set include similar database classes. RF match each test image with one of the data-mined sets. Afterwards, the matching classes are sorted relative to test image using Pyramid Histogram of Gradients and Kullback-Leibler based ranking algorithm. Finally, the classification process is applied on the highly ranked matching classes to assign a class membership to test image. Adjusting the classification process to only consider the highly ranked database classes reduced the computing classification and enhanced the overall performance. The proposed approach was tested on IFN-ENIT Arabic database and achieved satisfactory results and enhanced sensitivity of decision trees to reach 93.5% instead of 86.5% (Ghanim et al., 2018).

Paper Nr: 80
Title:

Clustering Honeybees by Its Daily Activity

Authors:

Edgar Acuna, Velcy Palomino, José Agosto, Rémi Mégret, Tugrul Giray, Alberto Prado, Cédric Alaux and Yves L. Conte

Abstract: In this work, we analyze the activity of bees starting at 6 days old. The data was collected at the INRA (France) during 2014 and 2016. The activity is counted according to whether the bees enter or leave the hive. After data wrangling, we decided to analyze data corresponding to a period of 10 days. We use clustering method to determine bees with similar activity and to estimate the time during the day when the bees are most active. To achieve our objective, the data was analyzed in three different time periods in a day. One considering the daily activity during in two periods: morning and afternoon, then looking at activities in periods of 3 hours from 8:00am to 8:00pm and, finally looking at the activities hourly from 8:00am to 8:00pm. Our study found two clusters of bees and in one of them clearly the bees activity increased at the day 5. The smaller cluster included the most active bees representing about 24 percent of the total bees under study. Also, the highest activity of the bees was registered between 2:00pm until 3:00pm. A Chi-square test shows that there is a combined effect Treatment× Colony on the clusters formation.

Paper Nr: 83
Title:

Modeling of Goal-oriented Human Motion Evolution using Hidden Markov Models

Authors:

Eman Ahmed, Reda A. El-Khoribi, Alexandre Muzy, Gilles Bernot and Gamal Darwish

Abstract: Humans have the ability to make many complex movements at the same time with full coordination through the whole body. This requires control of all body muscles. The body muscles are controlled by the Central Nervous System (CNS) which consists of the brain and the spinal cord through a group of neurons called the motor neurons. Each muscle is controlled by lower-level motor neurons called the motor neurons. A motor neuron controls a group of muscle fibers of the muscle such that when it is activated, this group contracts. Hence, a muscle movement occurs. Currently, many questions remain unanswered: How this system evolves to generate the complex movements? How to control the muscles to achieve a certain goal such as reaching a target position? and how a human becomes able to define goals in the first place? It is believed that the development of motion begins prenatally with spontaneous fetal movements. In this paper, we are trying to answer these questions by proposing a theoretical model of human learning of motion starting from being a fetus. Simulation is provided using computational intelligence and statistical methods.

Paper Nr: 84
Title:

Traffic Sign Classification using Hybrid HOG-SURF Features and Convolutional Neural Networks

Authors:

Rishabh Madan, Deepank Agrawal, Shreyas Kowshik, Harsh Maheshwari, Siddhant Agarwal and Debashish Chakravarty

Abstract: Traffic signs play an important role in safety of drivers and regulation of traffic. Traffic sign classification is thus an important problem to solve for the advent of autonomous vehicles. There have been several works that focus on traffic sign classification using various machine learning techniques. While works involving the use of convolutional neural networks with RGB images have shown remarkable results, they require a large amount of training time, and some of these models occupy a huge chunk of memory. Earlier works like HOG-SVM make use of local feature descriptors for classification problem but at the expense of reduced performance. This paper explores the use of hybrid features by combining HOG features and SURF with CNN classifier for traffic sign classification. We propose a unique branching based CNN classifier which achieves an accuracy of 98.48% on GTSRB test set using just 1.5M trainable parameters.

Paper Nr: 99
Title:

Measuring the Data Efficiency of Deep Learning Methods

Authors:

Hlynur D. Hlynsson, Alberto N. Escalante-B. and Laurenz Wiskott

Abstract: In this paper, we propose a new experimental protocol and use it to benchmark the data efficiency — performance as a function of training set size — of two deep learning algorithms, convolutional neural networks (CNNs) and hierarchical information-preserving graph-based slow feature analysis (HiGSFA), for tasks in classification and transfer learning scenarios. The algorithms are trained on different-sized subsets of the MNIST and Omniglot data sets. HiGSFA outperforms standard CNN networks when the models are trained on 50 and 200 samples per class for MNIST classification. In other cases, the CNNs perform better. The results suggest that there are cases where greedy, locally optimal bottom-up learning is equally or more powerful than global gradient-based learning.

Paper Nr: 116
Title:

HFDSegNet: Holistic and Generalized Finger Dorsal ROI Segmentation Network

Authors:

Gaurav Jaswal, Shreyas Patil, Kamlesh Tiwari and Aditya Nigam

Abstract: The aforementioned works and other analogous studies in finger knuckle images recognition have claimed that the precise detection of true features is difficult from poorly segmented images and the main reason for matching errors. Thus, an accurate segmentation of the region of interest is very crucial to achieve superior recognition results. In this paper, we have proposed a novel holistic and generalized segmentation Network (HFDSegNet) that automatically categorizes the given finger dorsal image obtained from multiple sensory resources into particular class and then extracts three possible ROIs (major knuckle, minor knuckle and nail) accurately. To best of our knowledge, this is the first attempt, an end-to-end trained object detector inspired by Deep Learning technique namely faster R-CNN (Region based Convolutional Neural Network) has been employed to detect and localize the position of finger knuckles and nail, even finger images exhibit blur, occlusion, low contrast etc. The experimental results are examined on two publicly available databases named as Poly-U contact-less FKI data-set, and Poly U FKP database. The proposed network is trained only over 500 randomly selected images per database, demonstrate the outstanding performance of proposed ROI’s segmentation network.

Paper Nr: 133
Title:

Understanding of Non-linear Parametric Regression and Classification Models: A Taylor Series based Approach

Authors:

Thomas Bocklitz

Abstract: Machine learning methods like classification and regression models are specific solutions for pattern recognition problems. Subsequently, the patterns ’found’ by these methods can be used either in an exploration manner or the model converts the patterns into discriminative values or regression predictions. In both application scenarios it is important to visualize the data-basis of the model, because this unravels the patterns. In case of linear classifiers or linear regression models the task is straight forward, because the model is characterized by a vector which acts as variable weighting and can be visualized. For non-linear models the visualization task is not solved yet and therefore these models act as ’black box’ systems. In this contribution we present a framework, which approximates a given trained parametric model (either classification or regression model) by a series of polynomial models derived from a Taylor expansion of the original non-linear model’s output function. These polynomial models can be visualized until the second order and subsequently interpreted. This visualization opens the ways to understand the data basis of a trained non-linear model and it allows estimating the degree of its non-linearity. By doing so the framework helps to understand non-linear models used for pattern recognition tasks and unravel patterns these methods were using for their predictions.

Area 2 - Applications

Full Papers
Paper Nr: 4
Title:

Planar Motion Bundle Adjustment

Authors:

Marcus V. Örnhag and Mårten Wadenbäck

Abstract: In this paper we consider trajectory recovery for two cameras directed towards the floor, and which are mounted rigidly on a mobile platform. Previous work for this specific problem geometry has focused on locally minimising an algebraic error between inter-image homographies to estimate the relative pose. In order to accurately track the platform globally it is necessary to refine the estimation of the camera poses and 3D locations of the feature points, which is commonly done by utilising bundle adjustment; however, existing software packages providing such methods do not take the specific problem geometry into account, and the result is a physically inconsistent solution. We develop a bundle adjustment algorithm which incorporates the planar motion constraint, and devise a scheme that utilises the sparse structure of the problem. Experiments are carried out on real data and the proposed algorithm shows an improvement compared to established generic methods.

Paper Nr: 16
Title:

Fast Non-minimal Solvers for Planar Motion Compatible Homographies

Authors:

Marcus V. Örnhag

Abstract: This paper presents a novel polynomial constraint for homographies compatible with the general planar motion model. In this setting, compatible homographies have five degrees of freedom—instead of the general case of eight degrees of freedom—and, as a consequence, a minimal solver requires 2.5 point correspondences. The existing minimal solver, however, is computationally expensive, and we propose using non-minimal solvers, which significantly reduces the execution time of obtaining a compatible homography, with accuracy and robustness comparable to that of the minimal solver. The proposed solvers are compared with the minimal solver and the traditional 4-point solver on synthetic and real data, and demonstrate good performance, in terms of speed and accuracy. By decomposing the homographies obtained from the different methods, it is shown that the proposed solvers have future potential to be incorporated in a complete Simultaneous Localization and Mapping (SLAM) framework.

Paper Nr: 18
Title:

Pluggable Drone Imaging Analysis Framework for Mob Detection during Open-air Events

Authors:

Jerico Moeyersons, Brecht Verhoeve, Pieter-Jan Maenhaut, Bruno Volckaert and Filip De Turck

Abstract: Drones and thermal cameras are often combined within applications such as search and rescue, and fire fighting. Due to vendor specific hardware and software, applications for these drones are hard to develop and maintain. As a result, a pluggable drone imaging analysis architecture is proposed that facilitates the development of custom image processing applications. This architecture is prototyped as a microservice-based plugin framework and allows users to build image processing applications by connecting media streams using microservices that connect inputs (e.g. regular or thermal camera image streams) to image analysis services. The prototype framework is evaluated in terms of modifiability, interoperability and performance. This evaluation has been carried out on the use case of detecting large crowds of people (mobs) during open-air events. The framework achieves modifiability and performance by being able to work in soft real-time and it achieves the interoperability by having an average successful exchange ratio of 99.998%. A new dataset containing thermal images of such mobs is presented, on which a YOLOv3 neural network is trained. The trained model is able to detect mobs on new thermal images in real-time achieving frame rates of 55 frames per second when deployed on a modern GPU.

Paper Nr: 28
Title:

Eliminating Noise in the Matrix Profile

Authors:

Dieter De Paepe, Olivier Janssens and Sofie Van Hoecke

Abstract: As companies are increasingly measuring their products and services, the amount of time series data is rising and techniques to extract usable information are needed. One recently developed data mining technique for time series is the Matrix Profile. It consists of the smallest z-normalized Euclidean distance of each subsequence of a time series to all other subsequences of another series. It has been used for motif and discord discovery, for segmentation and as building block for other techniques. One side effect of the z-normalization used is that small fluctuations on flat signals are upscaled. This can lead to high and unintuitive distances for very similar subsequences from noisy data. We determined an analytic method to estimate and remove the effects of this noise, adding only a single, intuitive parameter to the calculation of the Matrix Profile. This paper explains our method and demonstrates it by performing discord discovery on the Numenta Anomaly Benchmark and by segmenting the PAMAP2 activity dataset. We find that our technique results in a more intuitive Matrix Profile and provides improved results in both usecases for series containing many flat, noisy subsequences. Since our technique is an extension of the Matrix Profile, it can be applied to any of the various tasks that could be solved by it, improving results where data contains flat and noisy sequences.

Paper Nr: 38
Title:

Deep Learning-based Method for Classifying and Localizing Potato Blemishes

Authors:

Sofia Marino, Pierre Beauseroy and André Smolarz

Abstract: In this paper we address the problem of potato blemish classification and localization. A large database with multiple varieties was created containing 6 classes, i.e., healthy, damaged, greening, black dot, common scab and black scurf. A Convolutional Neural Network was trained to classify face potato images and was also used as a filter to select faces where more analysis was required. Then, a combination of autoencoder and SVMs was applied on the selected images to detect damaged and greening defects in a patch-wise manner. The localization results were used to classify the potato according to the severity of the blemish. A final global evaluation of the potato was done where four face images per potato were considered to characterize the entire tuber. Experimental results show a face-wise average precision of 95% and average recall of 93%. For damaged and greening patch-wise localization, we achieve a False Positive Rate of 4.2% and 5.5% and a False Negative Rate of 14.2% and 28.1% respectively. Concerning the final potato-wise classification, we achieved in a test dataset an average precision of 92% and average recall of 91%.

Paper Nr: 49
Title:

Goal-conditioned User Modeling for Dialogue Systems using Stochastic Bi-Automata

Authors:

Manex Serras, María Inés Torres and Arantza del Pozo

Abstract: User Models (UM) are commonly employed to train and evaluate dialogue systems as they generate dialogue samples that simulate end-user behavior. This paper presents a stochastic approach for user modeling based in Attributed Probabilistic Finite State Bi-Automata (A-PFSBA). This framework allows the user model to be conditioned by the dialogue goal in task-oriented dialogue scenarios. In addition, the work proposes two novel smoothing policies that employ the K-nearest A-PFSBA states to infer the next UM action in unseen interactions. Experiments on the Dialogue State Tracking Challenge 2 (DSTC2) corpus provide results similar to the ones obtained through deep learning based user modeling approaches in terms of F1 measure. However the proposed Bi-Automata User Model (BAUM) requires less resources both of memory and computing time.

Paper Nr: 57
Title:

All Together Now! The Benefits of Adaptively Fusing Pre-trained Deep Representations

Authors:

Yehezkel S. Resheff, Itay Lieder and Tom Hope

Abstract: Pre-trained deep neural networks, powerful models trained on large datasets, have become a popular tool in computer vision for transfer learning. However, the standard approach of using a single network potentially misses out on valuable information contained in other readily available models. In this work, we study the Mixture of Experts (MoE) approach for adaptively fusing multiple pre-trained models for each individual input image. In particular, we explore how far we can get by combining diverse pre-trained representations in a customized way that maximizes their potential in a lightweight framework. Our approach is motivated by an empirical study of the predictions made by popular pre-trained nets across various datasets, finding that both performance and agreement between models vary across datasets. We further propose a miniature CNN gating mechanism operating on a thumbnail version of the input image, and show this is enough to guide a good fusion. Finally, we explore a multi-modal blend of visual and natural-language representations, using a label-space embedding to inject pre-trained word-vectors. Across multiple datasets, we demonstrate that an adaptive fusion of pre-trained models can obtain favorable results.

Paper Nr: 59
Title:

Removal of Historical Document Degradations using Conditional GANs

Authors:

Veeru Dumpala, Sheela R. Kurupathi, Syed S. Bukhari and Andreas Dengel

Abstract: One of the most crucial problem in document analysis and OCR pipeline is document binarization. Many traditional algorithms over the past few decades like Sauvola, Niblack, Otsu etc,. were used for binarization which gave insufficient results for historical texts with degradations. Recently many attempts have been made to solve binarization using deep learning approaches like Autoencoders, FCNs. However, these models do not generalize well to real world historical document images qualitatively. In this paper, we propose a model based on conditional GAN, well known for its high-resolution image synthesis. Here, the proposed model is used for image manipulation task which can remove different degradations in historical documents like stains, bleed-through and non-uniform shadings. The performance of the proposed model outperforms recent state-of-the-art models for document image binarization. We support our claims by benchmarking the proposed model on publicly available PHIBC 2012, DIBCO (2009-2017) and Palm Leaf datasets. The main objective of this paper is to illuminate the advantages of generative modeling and adversarial training for document image binarization in supervised setting which shows good generalization capabilities on different inter/intra class domain document images.

Paper Nr: 67
Title:

Automatic Information Extraction from Piping and Instrumentation Diagrams

Authors:

Rohit Rahul, Shubham Paliwal, Monika Sharma and Lovekesh Vig

Abstract: One of the most common modes of representing engineering schematics are Piping and Instrumentation diagrams (P&IDs) that describe the layout of an engineering process flow along with the interconnected process equipment. Over the years, P&ID diagrams have been manually generated, scanned and stored as image files. These files need to be digitized for purposes of inventory management and updation, and easy reference to different components of the schematics. There are several challenging vision problems associated with digitizing real world P&ID diagrams. Real world P&IDs come in several different resolutions, and often contain noisy textual information. Extraction of instrumentation information from these diagrams involves accurate detection of symbols that frequently have minute visual differences between them. Identification of pipelines that may converge and diverge at different points in the image is a further cause for concern. Due to these reasons, to the best of our knowledge, no system has been proposed for end-to-end data extraction from P&ID diagrams. However, with the advent of deep learning and the spectacular successes it has achieved in vision, we hypothesized that it is now possible to re-examine this problem armed with the latest deep learning models. To that end, we present a novel pipeline for information extraction from P&ID sheets via a combination of traditional vision techniques and state-of-the-art deep learning models to identify and isolate pipeline codes, pipelines, inlets and outlets, and for detecting symbols. This is followed by association of the detected components with the appropriate pipeline. The extracted pipeline information is used to populate a tree-like data structure for capturing the structure of the piping schematics. We have also evaluated our proposed method on a real world dataset of P&ID sheets obtained from an oil firm and have obtained extremely promising results. To the best of our knowledge, this is the first system that performs end-to-end data extraction from P&ID diagrams.

Paper Nr: 81
Title:

Stochastic Phase Estimation and Unwrapping

Authors:

Mara Pistellato, Filippo Bergamasco, Andrea Albarelli, Luca Cosmo, Andrea Gasparetto and Andrea Torsello

Abstract: Phase-shift is one of the most effective techniques in 3D structured-light scanning for its accuracy and noise resilience. However, the periodic nature of the signal causes a spatial ambiguity when the fringe periods are shorter than the projector resolution. To solve this, many techniques exploit multiple combined signals to unwrap the phases and thus recovering a unique consistent code. In this paper, we study the phase estimation and unwrapping problem in a stochastic context. Assuming the acquired fringe signal to be affected by additive white Gaussian noise, we start by modelling each estimated phase as a zero-mean Wrapped Normal distribution with variance σ̄2. Then, our contributions are twofolds. First, we show how to recover the best projector code given multiple phase observations by means of a ML estimation over the combined fringe distributions. Second, we exploit the Cramér-Rao bounds to relate the phase variance σ̄2 to the variance of the observed signal, that can be easily estimated online during the fringe acquisition. An extensive set of experiments demonstrate that our approach outperforms other methods in terms of code recovery accuracy and ratio of faulty unwrappings.

Paper Nr: 134
Title:

Using Recurrent Neural Networks for Action and Intention Recognition of Car Drivers

Authors:

Martin Torstensson, Boris Duran and Cristofer Englund

Abstract: Traffic situations leading up to accidents have been shown to be greatly affected by human errors. To reduce these errors, warning systems such as Driver Alert Control, Collision Warning and Lane Departure Warning have been introduced. However, there is still room for improvement, both regarding the timing of when a warning should be given as well as the time needed to detect a hazardous situation in advance. Two factors that affect when a warning should be given are the environment and the actions of the driver. This study proposes an artificial neural network-based approach consisting of a convolutional neural network and a recurrent neural network with long short-term memory to detect and predict different actions of a driver inside a vehicle. The network achieved an accuracy of 84% while predicting the actions of the driver in the next frame, and an accuracy of 58% 20 frames ahead with a sampling rate of approximately 30 frames per second.

Paper Nr: 141
Title:

AveRobot: An Audio-visual Dataset for People Re-identification and Verification in Human-Robot Interaction

Authors:

Mirko Marras, Pedro A. Marín-Reyes, Javier Lorenzo-Navarro, Modesto Castrillón-Santana and Gianni Fenu

Abstract: Intelligent technologies have pervaded our daily life, making it easier for people to complete their activities. One emerging application is involving the use of robots for assisting people in various tasks (e.g., visiting a museum). In this context, it is crucial to enable robots to correctly identify people. Existing robots often use facial information to establish the identity of a person of interest. But, the face alone may not offer enough relevant information due to variations in pose, illumination, resolution and recording distance. Other biometric modalities like the voice can improve the recognition performance in these conditions. However, the existing datasets in robotic scenarios usually do not include the audio cue and tend to suffer from one or more limitations: most of them are acquired under controlled conditions, limited in number of identities or samples per user, collected by the same recording device, and/or not freely available. In this paper, we propose AveRobot, an audio-visual dataset of 111 participants vocalizing short sentences under robot assistance scenarios. The collection took place into a three-floor building through eight different cameras with built-in microphones. The performance for face and voice re-identification and verification was evaluated on this dataset with deep learning baselines, and compared against audio-visual datasets from diverse scenarios. The results showed that AveRobot is a challenging dataset for people re-identification and verification.

Short Papers
Paper Nr: 7
Title:

Semantic Segmentation via Global Convolutional Network and Concatenated Feature Maps

Authors:

Chuan K. Wang and Long W. Chang

Abstract: Most of the segmentation CNNs (convolutional neural network) based on the ResNet. Recently, Huang et al. introduced a new classification CNN called DenseNet. Then Jégou et al. used a sequence of building blocks for DenseNet to build their semantic segmentation CNN, called FC-DenseNet, and achieved state-of-the-art results on CamVid dataset. In this paper, we implement the design concept of DenseNet into a ResNet-based semantic segmentation CNN called Global Convolutional Network (GCN) and build our own network by switching every identity mapping operation of the decoder network in GCN to a concatenation operation. Our network uses less computational resources than FC-DenseNet to obtain a mean IoU score of 69.34% on CamVid dataset, and surpass the 66.9% obtained in the paper of FC-DenseNet.

Paper Nr: 8
Title:

Metric Learning in Codebook Generation of Bag-of-Words for Person Re-identification

Authors:

Lu Tian, Ranran Huang and Yu Wang

Abstract: Person re-identification is generally divided into two part: the first is how to represent a pedestrian by discriminative visual descriptors and the second is how to compare them by suitable distance metrics. Conventional methods isolate these into two parts, the first part usually unsupervised and the second part supervised. The Bag-of-Words (BoW) model is a widely used image representing descriptor in part one. Its codebook is simply generated by clustering visual features in Euclidean space, however, it is not optimal. In this paper, we propose to use a metric learning techniques of part two in the codebook generation phase of BoW. In particular, the proposed codebook is clustered under Mahalanobis distance which is learned supervised. Then local feature is compared with the codewords in the codebook by the trained Mahalanobis distance metric. Extensive experiments prove that our proposed method is effective. With several low level features extracted on superpixel and fused together, our method outperforms state-of-the-art on person re-identification benchmarks including VIPeR, PRID 450S, and Market-1501.

Paper Nr: 13
Title:

Multiclass Tissue Classification of Whole-Slide Histological Images using Convolutional Neural Networks

Authors:

Rune Wetteland, Kjersti Engan, Trygve Eftestøl, Vebjørn Kvikstad and Emilius M. Janssen

Abstract: Globally there has been an enormous increase in bladder cancer incidents the past decades. Correct prognosis of recurrence and progression is essential to avoid under- or over-treatment of the patient, as well as unnecessary suffering and cost. To diagnose the cancer grade and stage, pathologists study the histological images. However, this is a time-consuming process and reproducibility among pathologists is low. A first stage for an automated diagnosis system can be to identify the diagnostical relevant areas in the histological whole-slide images (WSI), segmenting cell tissue from damaged areas, blood, background, etc. In this work, a method for automatic classification of urothelial carcinoma into six different classes is proposed. The method is based on convolutional neural networks (CNN), firstly trained unsupervised using unlabelled images by utilising an autoencoder (AE). A smaller set of labelled images are used to train the final fully-connected layers from the low dimensional latent vector of the AE, providing an output as a probability score for each of the six classes, suitable for automatically defining regions of interests in WSI. For evaluation, each tile is classified as the class with the highest probability score. The model achieved an average F1-score of 93.4% over all six classes.

Paper Nr: 15
Title:

Flock Patterns When Pigeons Fly over Terrain with Different Properties

Authors:

Margarita Zaleshina and Alexander Zaleshin

Abstract: The way in which flocks are organized affects the ability of birds to perceive the landscape over which they fly: if the pattern of the flock changes, then the generalized perception of the terrain over which birds fly will also change. In this paper the features of dynamic spatial organization of pigeons in a flock during flights over landscapes with different characteristics were studied based on the analysis of GPS tracks of birds. The ways of group flight were revealed for typical situations, such as survey of unfamiliar terrain or flight home from remote sites. The spatial distribution of distances between pairs of individual birds and directions of movement were calculated, and then related to the features of terrain over which flights were occurred. The data analysis was performed based on comparison of flock patterns during group flights over terrain of distinct types (sea coast, urban and countryside terrain, and natural landscape). The spatial data was processed using the geographic information system QGIS.

Paper Nr: 19
Title:

Physical Activity Recognition by Utilising Smartphone Sensor Signals

Authors:

Abdulrahman Alruban, Hind Alobaidi, Nathan Clarke and Fudong Li

Abstract: Human physical motion activity identification has many potential applications in various fields, such as medical diagnosis, military sensing, sports analysis, and human-computer security interaction. With the recent advances in smartphones and wearable technologies, it has become common for such devices to have embedded motion sensors that are able to sense even small body movements. This study collected human activity data from 60 participants across two different days for a total of six activities recorded by gyroscope and accelerometer sensors in a modern smartphone. The paper investigates to what extent different activities can be identified by utilising machine learning algorithms using approaches such as majority algorithmic voting. More analyses are also provided that reveal which time and frequency domain-based features were best able to identify individuals’ motion activity types. Overall, the proposed approach achieved a classification accuracy of 98% in identifying four different activities: walking, walking upstairs, walking downstairs, and sitting (on a chair) while the subject is calm and doing a typical desk-based activity.

Paper Nr: 20
Title:

Two-layer Residual Feature Fusion for Object Detection

Authors:

Jaeseok Choi, Kyoungmin Lee, Jisoo Jeong and Nojun Kwak

Abstract: Recently, a lot of single stage detectors using multi-scale features have been actively proposed. They are much faster than two stage detectors that use region proposal networks (RPN) without much degradation in the detection performances. However, the feature maps in the lower layers close to the input which are responsible for detecting small objects in a single stage detector have a problem of insufficient representation power because they are too shallow. There is also a structural contradiction that the feature maps not only have to deliver low-level information to next layers but also have to contain high-level abstraction for prediction. In this paper, we propose a method to enrich the representation power of feature maps using a new feature fusion method which makes use of the information from the consecutive layer. It also adopts a unified prediction module which has an enhanced generalization performance. The proposed method enables more precise prediction, which achieved higher or compatible score than other competitors such as SSD and DSSD on PASCAL VOC and MS COCO. In addition, it maintains the advantage of fast computation of a single stage detector, which requires much less computation than other detectors with similar performance.

Paper Nr: 23
Title:

A Study of Various Text Augmentation Techniques for Relation Classification in Free Text

Authors:

Praveen B. Giridhara, Chinmaya Mishra, Reddy M. Venkataramana, Syed S. Bukhari and Andreas Dengel

Abstract: Data augmentation techniques have been widely used in visual recognition tasks as it is easy to generate new data by simple and straight forward image transformations. However, when it comes to text data augmentations, it is difficult to find appropriate transformation techniques which also preserve the contextual and grammatical structure of language texts. In this paper, we explore various text data augmentation techniques in text space and word embedding space. We study the effect of various augmented datasets on the efficiency of different deep learning models for relation classification in text.

Paper Nr: 27
Title:

Semantic Segmentation of Non-linear Multimodal Images for Disease Grading of Inflammatory Bowel Disease: A SegNet-based Application

Authors:

Pranita Pradhan, Tobias Meyer, Michael Vieth, Andreas Stallmach, Maximilian Waldner, Michael Schmitt, Juergen Popp and Thomas Bocklitz

Abstract: Non-linear multimodal imaging, the combination of coherent anti-stokes Raman scattering (CARS), two-photon excited fluorescence (TPEF) and second harmonic generation (SHG), has shown its potential to assist the diagnosis of different inflammatory bowel diseases (IBDs). This label-free imaging technique can support the ‘gold-standard’ techniques such as colonoscopy and histopathology to ensure an IBD diagnosis in clinical environment. Moreover, non-linear multimodal imaging can measure biomolecular changes in different tissue regions such as crypt and mucosa region, which serve as a predictive marker for IBD severity. To achieve a real-time assessment of IBD severity, an automatic segmentation of the crypt and mucosa regions is needed. In this paper, we semantically segment the crypt and mucosa region using a deep neural network. We utilized the SegNet architecture (Badrinarayanan et al., 2015) and compared its results with a classical machine learning approach. Our trained SegNet model achieved an overall F1 score of 0.75. This model outperformed the classical machine learning approach for the segmentation of the crypt and mucosa region in our study.

Paper Nr: 46
Title:

A Novel Breadth-first Strategy Algorithm for Discovering Sequential Patterns from Spatio-temporal Data

Authors:

Piotr S. Maciąg and Robert Bembenik

Abstract: In the paper, we consider the problem of discovering sequential patterns from dataset of event instances and event types. We offer a breadth-first strategy algorithm (spatio-temporal breadth-first miner, STBFM) to search for significant sequential patterns denoting relations between event types in the dataset. We introduce Sequential Pattern Tree (SPTree), a novel structure significantly reducing the time of patterns mining process. Our algorithm is compared with STMiner - the algorithm for discovering sequential patterns from event data. A modification of STBFM allowing to discover Top-N most significant sequential patterns in a given dataset is provided. Experimental studies have been performed on the crime incidents dataset for Boston city.

Paper Nr: 55
Title:

Deep Learning for Relevance Filtering in Syndromic Surveillance: A Case Study in Asthma/Difficulty Breathing

Authors:

Oduwa Edo-Osagie, Beatriz De La Iglesia, Iain Lake and Obaghe Edeghere

Abstract: In this paper, we investigate deep learning methods that may extract some word context for Twitter mining for syndromic surveillance. Most of the work on syndromic surveillance has been done on the flu or Influenza-Like Illnesses (ILIs). For this reason, we decided to look at a different but equally important syndrome, asthma/difficulty breathing, as this is quite topical given global concerns about the impact of air pollution. We also compare deep learning algorithms for the purpose of filtering Tweets relevant to our syndrome of interest, asthma/difficulty breathing. We make our comparisons using different variants of the F-measure as our evaluation metric because they allow us to emphasise recall over precision, which is important in the context of syndromic surveillance so that we do not lose relevant Tweets in the classification. We then apply our relevance filtering systems based on deep learning algorithms, to the task of syndromic surveillance and compare the results with real-world syndromic surveillance data provided by Public Health England (PHE).We find that the RNN performs best at relevance filtering but can also be slower than other architectures which is important for consideration in real-time application. We also found that the correlation between Twitter and the real-world asthma syndromic surveillance data was positive and improved with the use of the deeplearning- powered relevance filtering. Finally, the deep learning methods enabled us to gather context and word similarity information which we can use to fine tune the vocabulary we employ to extract relevant Tweets in the first place.

Paper Nr: 75
Title:

Template based Human Pose and Shape Estimation from a Single RGB-D Image

Authors:

Zhongguo Li, Anders Heyden and Magnus Oskarsson

Abstract: Estimating the 3D model of the human body is needed for many applications. However, this is a challenging problem since the human body inherently has a high complexity due to self-occlusions and articulation. We present a method to reconstruct the 3D human body model from a single RGB-D image. 2D joint points are firstly predicted by a CNN-based model called convolutional pose machine, and the 3D joint points are calculated using the depth image. Then, we propose to utilize both 2D and 3D joint points, which provide more information, to fit a parametric body model (SMPL). This is implemented through minimizing an objective function, which measures the difference of the joint points between the observed model and the parametric model. The pose and shape parameters of the body are obtained through optimization and the final 3D model is estimated. The experiments on synthetic data and real data demonstrate that our method can estimate the 3D human body model correctly.

Paper Nr: 88
Title:

Robust Cylinder Estimation in Point Clouds from Pairwise Axes Similarities

Authors:

Mara Pistellato, Filippo Bergamasco, Andrea Albarelli and Andrea Torsello

Abstract: The ubiquitous presence of cylindrical shapes in both natural and man-made environments makes their automated extraction a pivotal task for a broad range of applications such as robot manipulation, reverse engineering and automated industrial inspection. Albeit conceptually simple, the task of fitting cylinders from 3D data can quickly become challenging if performed "in-the-wild", when no prior is given to the number of primitives to find or when the point cloud is noisy and not oriented. In this paper we introduce a new robust approach to iteratively extract cylindrical primitives from a 3D point cloud by exploiting mutual consensus of different cylinder candidates. First, a set of possible axes is generated by slicing the point cloud with multiple random planes. Then, a game-theoretic inlier selection process is performed to extract a subset of axes maximizing the fitness against a payoff function based on the shortest geodesic path in SE(3) between pairs of corresponding 3D lines. Finally, the probability distribution resulting from the previous selection step is used to weight the input candidates and robustly obtain the final cylinder coefficients. Compared to other methods, our approach does not require point normals, offers superior resilience to noise and does not depend on delicate tuning of multiple parameters.

Paper Nr: 90
Title:

SOCRatES: A Database of Realistic Data for SOurce Camera REcognition on Smartphones

Authors:

Chiara Galdi, Frank Hartung and Jean-Luc Dugelay

Abstract: SOCRatES: SOurce Camera REcognition on Smartphones, is an image and video database especially designed for source digital camera recognition on smartphones. It answers to two specific needs, the need of wider pools of data for the developing and benchmarking of image forensic techniques, and the need to move the application of these techniques on smartphones, since, nowadays, they are the most employed devices for image capturing and video recording. What makes SOCRatES different from all previous published databases is that it is collected by the smartphone owners themselves, introducing a great heterogeneity and realness in the data. SOCRatES is currently made up of about 9.700 images and 1000 videos captured with 103 different smartphones of 15 different makes and about 60 different models. With 103 different devices, SOCRatES is the database for source digital camera identification that includes the highest number of different sensors. In this paper we describe SOCRatES and we present a baseline assessment based on the Sensor Pattern Noise computation.

Paper Nr: 94
Title:

A Framework for Discovering Frequent Event Graphs from Uncertain Event-based Spatio-temporal Data

Authors:

Piotr S. Maciąg

Abstract: The aim of this paper is to discuss a novel framework designed for discovering frequent event graphs from uncertain spatio-temporal data. We consider the problem of discovering hidden relations between event types and their set of uncertain spatio-temporal instances. For that purpose, we designed the following data mining framework: microclustering of uncertain instances, generating set of possible worlds according to the possible worlds semantic technique, creating a microclustering index for each world, generating a set of event graphs from created microclusters and defining apriori based algorithm mining frequent event graphs (EventGraph Miner). To the best of our knowledge this is the first approach to discover hidden patterns from event-type spatio-temporal data when dataset contains uncertain instances. While the paper does not present experimental results for the proposed framework, it presents its potential for futher studies in the topic.

Paper Nr: 97
Title:

Portfolio Variance Constraints

Authors:

Kevin R. Keane

Abstract: Two very popular categories of state of the art risk models are fundamental factor models and statistical factor models. Fundamental factor models are readily interpreted, but frequently there is concern about “missing” factors. Statistical factor models do an excellent job at identifying patterns in asset returns ex ante, but frequently are viewed as difficult to interpret and temporally unstable. The Portfolio Variance Constraints (PVC) model blends the interpretability and temporal stability of fundamental factor models with the adaptiveness and efficiency of statistical factor models. Using exogenously defined portfolios, PVC constructs Gaussian Markov random field probability models using the maximum entropy principle.

Paper Nr: 108
Title:

Adversarial Media-fusion Approach to Strain Prediction for Bridges

Authors:

Takaya Kawakatsu, Kenro Aihara, Atsuhiro Takasu and Jun Adachi

Abstract: This paper contributes to the wide acceptance of autonomous health monitoring for real bridges. Our approach involves dynamic simulation, whereby damage may be identified by detecting abnormal mechanical behavior in the bridge components in response to passing vehicles. Conventionally, dynamic simulation requires expert knowledge of mechanics, components, materials, and structures, in addition to accurate modeling. Moreover, it requires detailed specification of the external forces applied, such as vehicle speeds, loci, and axle weights. This paper introduces a novel media-fusion framework to obtain a bridge dynamic model in a fully data-driven fashion. The proposed generative model also successfully simulated strain responses for a real road bridge by using a camera and strain sensors on the bridge. The generative network was trained by an adversarial learning algorithm customized for media-fusion analysis.

Paper Nr: 113
Title:

Predicting Group Convergence in Egocentric Videos

Authors:

Jyoti Nigam and Renu M. Rameshan

Abstract: In this work, our aim is to find the group dynamics in social gathering videos that are captured from the first-person perspective. The complexity of the task is high as only one sensor (wearable camera) is present to sense all the N agents. An additional complexity arises as the first-person who is part of the group is not visible in the video. In particular, we are interested in identifying the group (with certain number of agents) convergence. We generate a dataset named EgoConvergingGroup to evaluate our proposed method. The proposed method predicts the group convergence in 90 to 250 number of frames, which is much ahead of the actual convergence.

Paper Nr: 122
Title:

Detecting Permanent and Intermittent Purchase Hotspots via Computational Stigmergy

Authors:

Antonio L. Alfeo, Mario A. Cimino, Bruno Lepri, Alex “. Pentland and Gigliola Vaglini

Abstract: The analysis of credit card transactions allows gaining new insights into the spending occurrences and mobility behavior of large numbers of individuals at an unprecedented scale. However, unfolding such spatiotemporal patterns at a community level implies a non-trivial system modeling and parametrization, as well as, a proper representation of the temporal dynamic. In this work we address both those issues by means of a novel computational technique, i.e. computational stigmergy. By using computational stigmergy each sample position is associated with a digital pheromone deposit, which aggregates with other deposits according to their spatiotemporal proximity. By processing transactions data with computational stigmergy, it is possible to identify high-density areas (hotspots) occurring in different time and days, as well as, analyze their consistency over time. Indeed, a hotspot can be permanent, i.e. present throughout the period of observation, or intermittent, i.e. present only in certain time and days due to community level occurrences (e.g. nightlife). Such difference is not only spatial (where the hotspot occurs) and temporal (when the hotspot occurs) but affects also which people visit the hotspot. The proposed approach is tested on a real-world dataset containing the credit card transaction of 60k users between 2014 and 2015.

Paper Nr: 124
Title:

Adaptive Exploration of a UAVs Swarm for Distributed Targets Detection and Tracking

Authors:

Mario A. Cimino, Massimiliano Lega, Manilo Monaco and Gigliola Vaglini

Abstract: This paper focuses on the problem of coordinating multiple UAVs for distributed targets detection and tracking, in different technological and environmental settings. The proposed approach is founded on the concept of swarm behavior in multi-agent systems, i.e., a self-formed and self-coordinated team of UAVs which adapts itself to mission-specific environmental layouts. The swarm formation and coordination are inspired by biological mechanisms of flocking and stigmergy, respectively. These mechanisms, suitably combined, make it possible to strike the right balance between global search (exploration) and local search (exploitation) in the environment. The swarm adaptation is based on an evolutionary algorithm with the objective of maximizing the number of tracked targets during a mission or minimizing the time for target discovery. A simulation testbed has been developed and publicly released, on the basis of commercially available UAVs technology and real-world scenarios. Experimental results show that the proposed approach extends and sensibly outperforms a similar approach in the literature.

Paper Nr: 125
Title:

Adapting YOLO Network for Ball and Player Detection

Authors:

Matija Burić, Miran Pobar and Marina Ivašić-Kos

Abstract: In this paper, we consider the task of detecting the players and sports balls in real-world handball images, as a building block for action recognition. Detecting the ball is still a challenge because it is a very small object that takes only a few pixels in the image but carries a lot of information relevant to the interpretation of scenes. Balls can vary greatly regarding color and appearance due to various distances to the camera and motion blur. Occlusion is also present, especially as handball players carry the ball in their hands during the game and it is understood that the player with the ball is a key player for the current action. Handball players are located at different distances from the camera, often occluded and have a posture that differs from ordinary activities for which most object detectors are commonly learned. We compare the performance of 6 models based on the YOLOv2 object detector, trained on an image dataset of publicly available sports images and images from custom handball recordings. The performance of a person and ball detection is measured on the whole dataset and the custom part regarding mean average precision metric.

Paper Nr: 136
Title:

Cascaded Acoustic Group and Individual Feature Selection for Recognition of Food Likability

Authors:

Dara Pir

Abstract: This paper presents the novel Cascaded acoustic Group and Individual Feature Selection (CGI-FS) method for automatic recognition of food likability rating addressed in the ICMI 2018 Eating Analysis and Tracking Challenge’s Likability Sub-Challenge. Employing the speech and video recordings of the iHEARu-EAT database, the Likability Sub-Challenge attempts to recognize self-reported binary labels, ‘Neutral’ and ‘Like’, assigned by subjects to food they consumed while speaking. CGI-FS uses an audio approach and performs a sequence of two feature selection operations by considering the acoustic feature space first in groups and then individually. In CGI-FS, an acoustic group feature is defined as a collection of features generated by the application of a single statistical functional to a specified set of audio low-level descriptors. We investigate the performance of CGI-FS using four different classifiers and evaluate the relevance of group features to the task. All four CGI-FS system results outperform the Likability Sub-Challenge baseline on iHEARu-EAT development data with the best performance achieving a 9.8% relative Unweighted Average Recall improvement over it.

Paper Nr: 138
Title:

Identification of Diseases in Corn Leaves using Convolutional Neural Networks and Boosting

Authors:

Prakruti Bhatt, Sanat Sarangi, Anshul Shivhare, Dineshkumar Singh and Srinivasu Pappula

Abstract: Precision farming technologies are essential for a steady supply of healthy food for the increasing population around the globe. Pests and diseases remain a major threat and a large fraction of crops are lost each year due to them. Automated detection of crop health from images helps in taking timely actions to increase yield while helping reduce input cost. With an aim to detect crop diseases and pests with high confidence, we use convolutional neural networks (CNN) and boosting techniques on Corn leaf images in different health states. The queen of cereals, Corn, is a versatile crop that has adapted to various climatic conditions. It is one of the major food crops in India along with wheat and rice. Considering that different diseases might have different treatments, incorrect detection can lead to incorrect remedial measures. Although CNN based models have been used for classification tasks, we aim to classify similar looking disease manifestations with a higher accuracy compared to the one obtained by existing deep learning methods. We have evaluated ensembles of CNN based image features, with a classifier and boosting in order to achieve plant disease classification. Using an ensemble of Adaptive Boosting cascaded with a decision tree based classifier trained on features from CNN, we have achieved an accuracy of 98% in classifying the Corn leaf images into four different categories viz. Healthy, Common Rust, Late Blight and Leaf Spot. This is about 8% improvement in classification performance when compared to CNN only.

Paper Nr: 145
Title:

FoodIE: A Rule-based Named-entity Recognition Method for Food Information Extraction

Authors:

Gorjan Popovski, Stefan Kochev, Barbara K. Seljak and Tome Eftimov

Abstract: The application of Natural Language Processing (NLP) methods and resources to biomedical textual data has received growing attention over the past years. Previously organized biomedical NLP-shared tasks (such as, for example, BioNLP Shared Tasks) are related to extracting different biomedical entities (like genes, phenotypes, drugs, diseases, chemical entities) and finding relations between them. However, to the best of our knowledge there are limited NLP methods that can be used for information extraction of entities related to food concepts. For this reason, to extract food entities from unstructured textual data, we propose a rule-based named-entity recognition method for food information extraction, called FoodIE. It is comprised of a small number of rules based on computational linguistics and semantic information that describe the food entities. Experimental results from the evaluation performed using two different datasets showed that very promising results can be achieved. The proposed method achieved 97% precision, 94% recall, and 96% F1 score.

Paper Nr: 5
Title:

Detection of Primitives in Engineering Drawing using Genetic Algorithm

Authors:

Salwan Alwan, Jean-Marc L. Caillec and Gerard L. Meur

Abstract: This paper presents a method for vectorizing paper drawings (raster). The method consists of skeletonizing the input image, segmenting the skeleton based on junction detection and recognizing primitives using genetic algorithm. The method is tested on different images and compared with previous works, results are promising and show the high accuracy of our method.

Paper Nr: 32
Title:

Estimation of Correlation between Texture Features and Surface Parameters for Milled Metal Parts

Authors:

Konstantin Trambitckii, Katharina Anding, Lilli Haar and Gunther Notni

Abstract: Fast developing of computer technologies led to vast improvements of image processing systems and algorithms. Nowadays these algorithms are widely used in different areas of computer and machine vision systems. In this research texture features were used to analyse metal surfaces using a set of images obtained with industrial camera with macro lens. This kind of contactless surface roughness estimation is cheaper and quicker in comparison with traditional methods. A set of 27 texture features were calculated for a set of surface images. Correlation coefficients between the texture features and 10 roughness parameters for the sample surfaces were estimated. Obtained results showed that texture features can be successfully used for quick surface quality estimation.

Paper Nr: 73
Title:

A Robust Page Frame Detection Method for Complex Historical Document Images

Authors:

Mohammad M. Reza, Md. A. Rakib, Syed S. Bukhari and Andreas Dengel

Abstract: Document layout analysis is the most important part of converting scanned page images into search-able full text. An intensive amount of research is going on in the field of structured and semi-structured documents (journal articles, books, magazines, invoices) but not much in historical documents. Historical document digitization is a more challenging task than regular structured documents due to poor image quality, damaged characters, big amount of textual and non-textual noise. In the scientific community, the extraneous symbols from the neighboring page are considered as textual noise, while the appearances of black borders, speckles, ruler, different types of image etc. along the border of the documents are considered as non-textual noise. Existing historical document analysis method cannot handle all of this noise which is a very strong reason of getting undesired texts as a result from the output of Optical Character Recognition (OCR) that needs to be removed afterward with a lot of extra afford. This paper presents a new perspective especially for the historical document image cleanup by detecting the page frame of the document. The goal of this method is to find actual contents area of the document and ignore noises along the page border. We use morphological transforms, the line segment detector, and geometric matching algorithm to find an ideal page frame of the document. After the implementation of page frame method, we also evaluate our approach over 16th-19th century printed historical documents. We have noticed in the result that OCR performance for the historical documents increased by 4.49% after applying our page frame detection method. In addition, we are able to increase the OCR accuracy around 6.69% for contemporary documents too.

Paper Nr: 74
Title:

Forecasting Hotel Room Sales within Online Travel Agencies by Combining Multiple Feature Sets

Authors:

Gizem Aras, Gülşah Ayhan, Mehmet A. Sarikaya, A. A. Tokuç and C. O. Sakar

Abstract: Hotel Room Sales prediction using previous booking data is a prominent research topic for the online travel agency (OTA) sector. Various approaches have been proposed to predict hotel room sales for different prediction horizons, such as yearly demand or daily number of reservations. An OTA website includes offers of many companies for the same hotel, and the position of the company’s offer in OTA website depends on the bid amount given for each click by the company. Therefore, the accurate prediction of the sales amount for a given bid is a crucial need in revenue and cost management for the companies in the sector. In this paper, we forecast the next day’s sales amount in order to provide an estimate of daily revenue generated per hotel. An important contribution of our study is to use an enriched dataset constructed by combining the most informative features proposed in various related studies for hotel sales prediction. Moreover, we enrich this dataset with a set of OTA specific features that possess information about the relative position of the company’s offers to that of its competitors in a travel metasearch engine website. We provide a real application on the hotel room sales data of a large OTA in Turkey. The comparative results show that enrichment of the input representation with the OTA-specific additional features increases the generalization ability of the prediction models, and tree-based boosting algorithms perform the best results on this task.

Paper Nr: 95
Title:

Efficient Keypoint Reduction for Document Image Matching

Authors:

Thomas Konidaris, Volker Märgner, Hussein A. Mohammed and H. S. Stiehl

Abstract: In this paper we propose a method for eliminating SIFT keypoints in document images. The proposed method is applied as a first step towards word spotting. One key issue when using SIFT keypoints in document images is that a large number of keypoints can be found in non-textual regions. It would be ideal if we could eliminate as much as irrelevant keypoints as possible in order to speed-up processing. This is accomplished by altering the original matching process of SIFT descriptors using an iterative process that enables the detection of keypoints that belong to multiple correct instances throughout the document image, which is an issue that the original SIFT algorithm cannot tackle in a satisfactory way. The proposed method manages a reduction over 99% of the extracted keypoints with satisfactory performance.

Paper Nr: 101
Title:

Automated Vision System for Cutting Fixed-weight or Fixed-length Frozen Fish Portions

Authors:

Dibet G. Gonzalez, Nelson Alves, Ricardo Figueiredo, Pedro Maia and Miguel G. Lopez

Abstract: The increase in fish demand poses a challenge to the food industry that needs upgrades both for: (1) offering new and diverse products and (2) optimizing it’s processing line throughput and at the same time guaranteeing the fishes’ quality and appearance. This work presents an innovative computer vision system, to be integrated into an automatic frozen fish cutting production line. The proposed system is able to perform the 3D reconstruction in real time of every frozen fish, and allows to: (1) identify and automatically separate head bone, body and tail parts of the fishes; and (2) estimate with high accuracy where cut the fishes’ body to produce the wanted slices, according to requirements (parameters) of weight or width previously defined. The experimental and statistical results are very promising and show the viability of the developed system. As main contribution (novelty) this new method is able to estimate automatically and with high precision the weight of the part corresponding to the body of the fishes and thus optimizing the cut of the fish slices. With this, we expect to achieve a significant reduction of fish losses.

Paper Nr: 110
Title:

Text Recognition on Khmer Historical Documents using Glyph Class Map Generation with Encoder-Decoder Model

Authors:

Dona Valy, Michel Verleysen and Sophea Chhun

Abstract: In this paper, we propose a handwritten text recognition approach on word image patches extracted from Khmer historical documents. The network consists of two main modules composing of deep convolutional and multi-dimensional recurrent blocks. We utilize the annotated information of glyph components in the word image to build a glyph class map which is to be predicted by the first module of the network call glyph class map generator. The second module of the network encodes the generated glyph class map and transform it into a context vector which is to be decoded to produce the final word transcription. We also adapt an attention mechanism to the decoder to take advantage of local contexts which are also provided by the encoder. Experiments on a publicly available dataset of digitized Khmer palm leaf manuscripts called SleukRith set are conducted.

Paper Nr: 118
Title:

3DCNN Performance in Hand Gesture Recognition Applied to Robot Arm Interaction

Authors:

J. A. Castro-Vargas, B. S. Zapata-Impata, P. Gil, J. Garcia-Rodriguez and F. Torres

Abstract: In the past, methods for hand sign recognition have been successfully tested in Human Robot Interaction (HRI) using traditional methodologies based on static image features and machine learning. However, the recognition of gestures in video sequences is a problem still open, because current detection methods achieve low scores when the background is undefined or in unstructured scenarios. Deep learning techniques are being applied to approach a solution for this problem in recent years. In this paper, we present a study in which we analyse the performance of a 3DCNN architecture for hand gesture recognition in an unstructured scenario. The system yields a score of 73% in both accuracy and F1. The aim of the work is the implementation of a system for commanding robots with gestures recorded by video in real scenarios.

Paper Nr: 126
Title:

uPAD: Unsupervised Privacy-Aware Anomaly Detection in High Performance Computing Systems

Authors:

Siavash Ghiasvand

Abstract: Rapid growing complexity of HPC systems in response to demand for higher computing performance, results in higher probability of failures. Early detection of failures significantly reduces the damages caused by failure via impeding their propagation through system. Various anomaly detection mechanism are proposed to detect failures in their early stages. Insufficient amount of failure samples in addition to privacy concerns extremely limits the functionality of available anomaly detection approaches. Advances in machine learning techniques, significantly increased the accuracy of unsupervised anomaly detection methods, addressing the challenge of insufficient failure samples. However, available approaches are either domain specific, inaccurate, or require comprehensive knowledge about the underlying system. Furthermore, processing certain monitoring data such as system logs raises high privacy concerns. In addition, noises in monitoring data severely impact the correctness of data analysis. This work proposes an unsupervised and privacy-aware approach for detecting abnormal behaviors in general HPC systems. Preliminary results indicate high potentials of autoencoders for automatic detection of abnormal behaviors in HPC systems via analyzing anonymized system logs using fast-trainable noise-resistant models.

Paper Nr: 137
Title:

Unsupervised Image Segmentation using Convolutional Neural Networks for Automated Crop Monitoring

Authors:

Prakruti Bhatt, Sanat Sarangi and Srinivasu Pappula

Abstract: Among endeavors towards automation in agriculture, localization and segmentation of various events during the growth cycle of a crop is critical and can be challenging in a dense foliage. Convolutional Neural Network based methods have been used to achieve state-of-the-art results in supervised image segmentation. In this paper, we investigate the unsupervised method of segmentation for monitoring crop growth and health conditions. Individual segments are then evaluated for their size, color, and texture in order to measure the possible change in the crop like emergence of a flower, fruit, deficiency, disease or pest. Supervised methods require ground truth labels of the segments in a large number of the images for training a neural network which can be used for similar kind of images on which the network is trained. Instead, we use information of spatial continuity in pixels and boundaries in a given image to update the feature representation and label assignment to every pixel using a fully convolutional network. Given that manual labeling of crop images is time consuming but quantifying an event occurrence in the farm is of utmost importance, our proposed approach achieves promising results on images of crops captured in different conditions. We obtained 94% accuracy in segmenting Cabbage with Black Moth pest, 81% in getting segments affected by Helopeltis pest on Tea leaves and 92% in spotting fruits on a Citrus tree where accuracy is defined in terms of intersection over union of the resulting segments with the ground truth. The resulting segments have been used for temporal crop monitoring and severity measurement in case of disease or pest manifestations.

Paper Nr: 139
Title:

Data for Image Recognition Tasks: An Efficient Tool for Fine-Grained Annotations

Authors:

Marco Filax, Tim Gonschorek and Frank Ortmeier

Abstract: Using large datasets is essential for machine learning. In practice, training a machine learning algorithm requires hundreds of samples. Multiple off-the-shelf datasets from the scientific domain exist to benchmark new approaches. However, when machine learning algorithms transit to industry, e.g., for a particular image classification problem, hundreds of specific purpose images are collected and annotated in laborious manual work. In this paper, we present a novel system to decrease the effort of annotating those large image sets. Therefore, we generate 2D bounding boxes from minimal 3D annotations using the known location and orientation of the camera. We annotate a particular object of interest in 3D once and project these annotations on to every frame of a video stream. The proposed approach is designed to work with off-the-shelf hardware. We demonstrate its applicability with an example from the real world. We generated a more extensive dataset than available in other works for a particular industrial use case: fine-grained recognition of items within grocery stores. Further, we make our dataset available to the interested vision community consisting of over 60,000 images. Some images were taken under ideal conditions for training while others were taken with the proposed approach in the wild.

Paper Nr: 144
Title:

Automatic Perception Enhancement for Simulated Retinal Implants

Authors:

Johannes Steffen, Georg Hille and Klaus Tönnies

Abstract: This work addresses the automatic enhancement of visual percepts of virtual patients with retinal implants. Specifically, we render the task as an image transformation problem within an artificial neural network. The neurophysiological model of (Nanduri et al., 2012) was implemented as a tensor network to simulate a virtual patient’s visual percept and used together with an image transformation network in order to perform end-to-end learning on an image reconstruction and a classification task. The image reconstruction task was evaluated using the MNIST data set and yielded plausible results w.r.t. the learned transformations while halving the dissimilarity (mean-squared-error) of an input image to its simulated visual percept. Furthermore, the classification task was evaluated on the cifar-10 data set. Experiments show, that classification accuracy increases by approximately 12.9% when a suitable input image transformation is learned.