Many real-world data-mining applications involve obtaining and evaluating predictive models using data sets with strongly imbalanced distributions of the target variable. Frequently, the least-common values are associated with events that are highly relevant for end users. This problem has been thoroughly studied in the last decade with a specific focus on classification tasks. However, the research community has started to address this problem within other contexts such as regression, ordinal classification, multi-label classification, multi-instance learning, data streams and time series forecasting. It is now recognized that imbalanced domains are a broader and important problem posing relevant challenges for both supervised and unsupervised learning tasks, in an increasing number of real world applications.

Tackling issues raised by imbalanced domains is crucial to both academia and industry. To researchers, it is an opportunity to develop more adaptable and robust systems/approaches for very complex tasks. For the industry, these tasks are in fact those that many already face today. Examples include the ability to prevent fraud, to anticipate catastrophes, and in general to enable more preemptive actions.

This workshop invites inter-disciplinary contributions to tackle the problems that many real-world domains face today. With the growing attention that this problem has collected, it is crucial to promote its development and to tackle its theoretical and application challenges.


The research topics of interest to LIDTA'2017 workshop include (but are not limited to) the following:

Foundations of learning in imbalanced domains
Probabilistic and statistical models
New knowledge discovery theories and models
Understanding the nature of learning difficulties embedded in imbalanced data
Deep learning with imbalanced data
Handling imbalanced big data
One-class learning
Learning with non i.i.d. data
New approaches for data pre-processing (e.g. resampling strategies)
Post-processing approaches
Sampling approaches
Feature selection and feature transformation
Evaluation in imbalanced domains

Knowledge discovery and machine learning in imbalanced domains
Classification, ordinal classification
Data streams and time series forecasting
Adaptive learning and algorithm-level approaches
Multi-label, multi-instance, sequence and association rules mining
Active learning
Spatial and spatio-temporal learning

Applications in imbalanced domains
Fraud detection (e.g. finance, credit and online banking)
Anomaly detection (e.g. industry, intrusion detection)
Health applications
Environmental applications (e.g. meteorology, biology)
Social media applications (e.g. popularity prediction, recommender systems)
Real world applications (e.g. oil spill detection)
Case studies


Submission Deadline (NEW): Monday, July 10, 2017
Notification of Acceptance: Monday, July 24, 2017
Camera-ready Deadline: Monday, August 7, 2017

ECML/PKDD 2017: 18-22nd September, 2017
LIDTA 2017: 22th September, 2017


09.00: Opening Remarks
09.10: Keynote: Professor Nitesh Chawla. "Marking the 15-year anniversary of SMOTE: Origin, Progress and Opportunities"

Abstract: The Synthetic Minority Oversampling Technique (SMOTE) algorithm has been established as a “de facto” standard in the framework of learning from imbalanced data. Since its publication in 2002, SMOTE has inspired several approaches to counter the issue of class imbalance, and has also made its way to classification paradigms, including multilabel classification, incremental learning, semi-supervised learning, multi-instance learning, among others. It has also been featured in various applications. In this talk, I'll present my perspective on SMOTE, its origins, and current state of affairs, including challenges in learning from imbalanced datasets.

10.15: Idea in a nutshell (Posters)
- E. Krasanakis, E. Spyromitros-Xioufis, S. Papadopoulos and Y. Kompatsiaris. Tunable Plug-In Rules with Reduced Posterior Certainty Loss in Imbalanced Datasets
- N. Moniz, P. Branco and L. Torgo. Evaluation of Ensemble Methods in Imbalanced Regression Tasks
- Y. Resheff, A. Mandelbom and D. Weinshall. Controlling Imbalanced Error in Deep Learning with the Log Bilinear Loss
- C. Fayet, A. Delhay, D. Lolive and P.-F. Marteau. Unsupervised Classification of Speaker Profiles as a Point Anomaly Detection Task
- P. Ksieniewicz and M. Wozniak. Dealing with the task of imbalanced, multidimensional data classification using ensembles of exposers
10.40: Coffee Break
11.00: Paper Presentations I
- N. Guennemann and J. Pfeffer. Predicting Defective Engines using Convolutional Neural Networks on Temporal Vibration Signals
- P. Skryjomski and B. Krawczyk. Influence of minority class instance types on SMOTE imbalanced data oversampling
- P. Szymański and T. Kajdanowicz. A Network Perspective on Stratification of Multi-Label Data
- A. Pakrashi and B. Mac Namee. Stacked-MLkNN: A stacking based improvement to Multi-Label k-Nearest Neighbours
- S. Sharma, C. Bellinger, O. Zaiane and N. Japkowicz. Sampling a Longer Life: Binary versus One-class classification Revisited
12.40: Lunch
14.00: Paper Presentations II
- Z. Bing, S. vanden Broucke, B. Baesens and S. Maldonado. Improving Resampling-based Ensemble in Churn Prediction
- P. Branco, L. Torgo and R. P. Ribeiro. SMOGN: a Pre-processing Approach for Imbalanced Regression
- X. Cui, F. Coenen and D. Bollegala. Effect of Data Imbalance on Unsupervised Domain Adaptation of Part-of-Speech Tagging and Pivot Selection Strategies
15.00: Poster Session
15.40: Coffee Break
16.00: Discussion Table: What's next?
17.30: Closing Remarks


Roberto Alejo, Tecnológico de Estudios Superiores de Jocotitlán
Thomas Bäck, Leiden University
Colin Bellinger, University of Alberta
Seppe vanden Broucke, KU Leuven
Alberto Cano, Virginia Commonwealth University
Vítor Cerqueira, Universidade do Porto
Inês Dutra, Universidade do Porto
Mikel Galar, Universidad Pública de Navarra
Wojtek Kowalczyk, Leiden University
Ronaldo Prati, Universidade Federal do ABC
Rita Ribeiro, Universidade do Porto
Marina Sokolova, University of Ottawa
Isaac Velásquez, University of Nottingham
Michal Wozniak, Wroclaw University of Science and Technology


All accepted papers will be included in the workshop proceedings, published as a volume in Proceedings of Machine Learning Research (PMLR). Additionally, based on the success of the workshop, authors of selected papers will be invited to submit extended versions of their manuscripts to a premier journal concerning the topics of this workshop.

Submit your paper!

This workshop accepts two types of submissions: Full and Short (Poster) Papers
For each of the accepted full papers, a presentation slot of 20 minutes is provided.
As for short papers, these will be introduced with short presentations, and a poster session will be organized.

* The maximum length for full papers is 12 pages and for the short papers the limit is 10 pages. Papers not respecting such limit will be rejected.
* All submissions must be written in English and follow the PMLR format. Instructions for authors and style files may be found here.
* All submissions will be reviewed by the Program Committee using a double-blind method. As such, it is required that no personal information or reference to the authors should be introduced in the submitted paper.
* Full papers that have already been accepted or are currently under review for other workshops, conferences, or journals will not be considered.
* Submissions will be evaluated concerning their technical quality, relevance, significance, originality and clarity.
* At least one author of each accepted paper must attend the workshop and present the paper.

To submit a paper, authors must use the on-line submission system hosted in EasyChair.


Luís Torgo | University of Porto, LIAAD - INESC TEC

Bartosz Krawczyk | Virginia Commonwealth University

Paula Branco | University of Porto, LIAAD - INESC TEC

Nuno Moniz | University of Porto, LIAAD - INESC TEC