LIDTA 2018. Learning with Imbalanced Domain: Theory and Applications

Many real-world data-mining applications involve obtaining and evaluating predictive models using data sets with strongly imbalanced distributions of the target variable. Frequently, the least-common values are associated with events that are highly relevant for end users. This problem has been thoroughly studied in the last decade with a specific focus on classification tasks. However, the research community has started to address this problem within other contexts such as regression, ordinal classification, multi-label classification, multi-instance learning, data streams and time series forecasting. It is now recognized that imbalanced domains are a broader and important problem posing relevant challenges for both supervised and unsupervised learning tasks, in an increasing number of real world applications.

Tackling issues raised by imbalanced domains is crucial to both academia and industry. To researchers, it is an opportunity to develop more adaptable and robust systems/approaches for very complex tasks. For the industry, these tasks are in fact those that many already face today. Examples include the ability to prevent fraud, to anticipate catastrophes, and in general to enable more preemptive actions.

This workshop+tutorial is focused on providing a significant contribution to the problem of learning with imbalanced domains, and to increasing the interest and the contributions to solving some of its challenges. The tutorial component is designed to target researchers and professionals who have a recent interest on the subject, but also those who have previous knowledge and experience concerning this problem. The workshop component invites inter-disciplinary contributions to tackle the problems that many real-world domains face nowadays. With the growing attention that this problem has been collecting, it is important to promote its further development in order to tackle its theoretical and application challenges.

The research topics of interest to LIDTA'2018 workshop include (but are not limited to) the following:

Foundations of learning in imbalanced domains
Probabilistic and statistical models
New knowledge discovery theories and models
Understanding the nature of learning difficulties embedded in imbalanced data
Deep learning with imbalanced data
Handling imbalanced big data
One-class learning
Learning with non i.i.d. data
New approaches for data pre-processing (e.g. resampling strategies)
Post-processing approaches
Sampling approaches
Feature selection and feature transformation
Evaluation in imbalanced domains

Knowledge discovery and machine learning in imbalanced domains
Classification, ordinal classification
Regression
Data streams and time series forecasting
Clustering
Adaptive learning and algorithm-level approaches
Multi-label, multi-instance, sequence and association rules mining
Active learning
Spatial and spatio-temporal learning

Applications in imbalanced domains
Fraud detection (e.g. finance, credit and online banking)
Anomaly detection (e.g. industry, intrusion detection)
Health applications
Environmental applications (e.g. meteorology, biology)
Social media applications (e.g. popularity prediction, recommender systems)
Real world applications (e.g. oil spill detection)
Case studies

THE TUTORIAL (slides)

09h00: Welcome
09h10: Imbalanced Domain Learning - Fundamentals (Luís Torgo)
09h50: Strategies for Imbalanced Learning (Luís Torgo)
10h40: Coffee Break
11h00: Imbalanced Regression (Paula Branco)
11h50: Evaluation and Pitfalls - Case Studies (Paula Branco)
12h30: Imbalanced Time Series and Challenges (Nuno Moniz)
13h00: Lunch

THE WORKSHOP

14h00: Keynote: Professor João Gama (INESC TEC/University of Porto). "Novelty Detection: Beyond one-class Classification". (slides)
14h40: Papers Presentation I
- Jessa Bekker and Jesse Davis. Learning from Positive and Unlabeled Data under the Selected At Random Assumption (slides)
- Martha Roseberry and Alberto Cano. Multi-label kNN Classifier with Self Adjusting Memory for Drifting Data Streams (slides)
- Jordan Frery, Amaury Habrard, Marc Sebban and Liyun He-Guelton. Non-Linear Gradient Boosting for Class-Imbalance Learning (slides)
15h40: Coffee Break
16h00: Papers Presentation II
- Alexander Hepburn, Ryan McConville, Raul Santos-Rodriguez, Jesus Cid-Sueiro and Dario Garcia-Garcia. Proper Losses for Learning with Example-Dependent Costs (slides)
- Paula Branco, Luis Torgo and Rita P. Ribeiro. REBAGG: REsampled BAGGing for Imbalanced Regression (slides)
- Paweł Ksieniewicz. Undersampled Majority Class Ensemble for highly imbalanced binary classification (slides)
- Dariusz Brzezinski, Mateusz Lango and Jerzy Stefanowski. ImWeights: Classifying Imbalance Data Using Local and Neighborhood Information (slides)
- Andre G. Maletzke, Denis Dos Reis, Everton Cherman and Gustavo Batista. On the Need of Class Ratio Insensitive Drift Tests for Data Streams (slides)
17h40: Final Remarks / Farewell

Roberto Alejo, Tecnológico Nacional de México/Instituto Tecnlógico de Toluca
Gustavo Batista, Universidade de São Paulo
Colin Bellinger, University of Alberta
Seppe Vanden Broucke, Katholieke Universiteit Leuven
Alberto Cano, Virginia Commonwealth University
Inês Dutra, DCC - Faculty of Sciences, University of Porto
Tom Fawcett, Apple
Mikel Galar, Universidad Pública de Navarra
Salvador García, Granada University
Francisco Herrera, Granada University
Jose Hernandez-Orallo, Universitat Politecnica de Valencia
Ronaldo Prati, Universidade Federal do ABC
Rita Ribeiro, DCC - Faculty of Sciences, University of Porto
José Antonio Saez, University of Salamanca
Shengli Victor Sheng, University of Central Arkansas
Marina Sokolova, University of Ottawa
Jerzy Stefanowski, Poznan University of Technology
Isaac Triguero Velázquez, University of Nottingham
Anibal R. Figueiras-Vidal, Universidad Carlos III de Madrid
Shuo Wang, University of Birmingham
Michal Wozniak, Wroclaw University of Science and Technology

Proceedings
All accepted papers will be included in the workshop proceedings, published as a volume in Proceedings of Machine Learning Research (PMLR). Additionally, based on the success of the workshop, authors of selected papers will be invited to submit extended versions of their manuscripts to a premier journal concerning the topics of this workshop.

Submit your paper!

For each accepted paper, a presentation slot of 20 minutes is provided.

* The maximum length for papers is 14 pages. Papers not respecting such limit will be rejected.
* All submissions must be written in English and follow the PMLR format. Instructions for authors and style files may be found here.
* All submissions will be reviewed by the Program Committee using a double-blind method. As such, it is required that no personal information or reference to the authors should be introduced in the submitted paper.
* Papers that have already been accepted or are currently under review for other workshops, conferences, or journals will not be considered.
* Submissions will be evaluated concerning their technical quality, relevance, significance, originality and clarity.
* At least one author of each accepted paper must attend the workshop and present the paper.

To submit a paper, authors must use the on-line submission system hosted in EasyChair.

LIDTA 2018 @ ECML/PKDD 2018

LIDTA 2018

ABOUT

TOPICS OF INTEREST

KEY DATES

PROGRAM: WORKSHOP+TUTORIAL

THE TUTORIAL (slides)

THE WORKSHOP

PROGRAM COMMITTEE

SUBMISSION

Submit your paper!

ORGANIZATION