Research
Research
Research Areas:
Machine Learning
Anomaly/Outlier Detection
Unsupervised Learning
Data Mining
Deep Learning
Experience in Tools:
Languages: (1) Python, (2) C Language, and (3) Java,
Other: (1) Scikit-Learn, (2) Pandas, (3) NumPy, (4) Keras, (5) Tensorflow, (6) Tweepy, (7) Matplotlib, (8) Vader, (9) Orange, (10) Anaconda, (11) Jupyter Notebook, (12) Google Colab, (13) Overleaf LaTeX software
*Last updated on Feb 2023
Current Research:
On finding the "best" HDBSCAN* Hierarchy for Unsupervised Outlier Detection (MSc Research in 2 lines)
Kushankur Ghosh, Murilo Coelho Naldi, and Jörg Sander
We address the problem of choosing a single, "best" density parameter in HDBSCAN* outlier detection. The primary objective of this research is to formulate a general strategy capable of estimating the optimal density parameter, (I) in the absence of access to ground truth, and (II) devoid of any prior knowledge concerning the HDBSCAN*'s performance on similar datasets. (In Progress)
Major Projects:
The cumulative impact of Concept Complexity and Imbalance in Deep Networks
Kushankur Ghosh and Nathalie Japkowicz
Have all of the issues previously affecting machine learning systems been solved by deep learning or do some issues remain for which deep learning is not a bulletproof solution? This question in the context of the class imbalance becomes a motivation for this work. Our goal is to investigate whether the tight dependency between class imbalances, concept complexities, dataset size and classifier performance, known to exist in traditional learning systems, is alleviated in any way in deep learning approaches and to what extent, if any, network depth and regularization can help.
Single and Multi-labeled Data-Imbalance Problem
Kushankur Ghosh, Arghasree Banerjee, and Sankhadeep Chatterjee
Data-distribution related problems hinder the performance of any traditional or deep learning based model. The Imbalance problem is one of the most commonly encountered challenges in real-life applications which results in a biased classification. We mostly formulate data-driven techniques to mitigate the detrimental effects of the problem. The fundamental ways to diffuse the problem are by Oversampling and Undersampling existing examples. However, we often prefer to use a third kind of technique that involves the hybridization of both Oversampling and Undersampling. We noted the presence of Data-Imbalance as a threat to both single-labeled and multi-labeled classification problems.
Single-labeled Classification: This is a scenario where we find an instance exhibiting the features of a single class. To form a Single-labeled classification problem, the minimum number of class labels can be two and any data example can belong to only one class.
Multi-Labeled Classification: A major challenge occurs when an example can exhibit the features of multiple class labels. In this context, the minimum number of labels is equivalent to a single-labeled classification problem and every instance must belong to a minimum of two classes.
Ghosh has contributed to tackling the problem in both single labeled and multi-labeled classification. His major contribution to the field is mostly based on effective data augmentation. Ghosh has mostly focussed his research on understanding an effective methodology to undertake imbalanced image classification, sentiment analysis, and social media mining.
Some of his major contributions are:
Addressing and mitigating the effects of biased classification in Sentiment Analysis.
Exploring the impacts of data-biasness in multi-label in Multi-Labeled Sentiment Analysis.
Understanding of the correlation between the Imbalance problem and frequent data-irregularities with the development of fused frameworks.
Ghosh has studied the drawbacks of traditional data augmentation and is working towards developing novel frameworks by addressing them. In addition to that, his research also engulfs the modification of traditional models to undertake imbalanced learning. His current research involves deciphering the impact of Adversarial models in the context of imbalanced learning.
Class Overlapping Problem
Kushankur Ghosh, Arghasree Banerjee, and Sankhadeep Chatterjee
The overlapping of data instances is often seen in real-life classification frameworks when the instances belonging to disparate classes share a common region in a data space. This is evident when examples from two or more different classes have homogeneous feature values. It is found that the presence of overlapping examples is fatal for any classifier. Ghosh has worked on the problem and has developed novel frameworks to identify overlapped examples in datasets. His current research is to explore the effects of the overlapping problem in the context of Imbalance learning.