Abstract:
|
One of the most challenging studies in machine learning is imbalanced data analysis. Usually, in this type of research, it is more critical to predict minority class correctly than to majority class. However, traditional machine learning techniques are easy to cause such learning bias. Some ensemble methods cause various problems, such as over-fitting, disregard some information, or long computation time. Besides, the methods do not apply to all kinds of datasets. Based on the problem above, the virtual labels approach for the majority class is proposed to solve the imbalanced problem. A new multiclass classification approach with the equal K-means clustering method is performed in the study. The proposed method is compared with the commonly used imbalance problem methods, such as sampling methods (oversampling, undersampling, and SMOTE) and classifier methods (SVM and One-Class SVM). The result shows that the proposed method will have better performance when the degree of data imbalance increases and will gradually outperform other methods.
|