Conference Program

Return to main conference page

All Times ET

Thursday, June 9
Machine Learning
Advancements in Machine Learning
Thu, Jun 9, 1:15 PM - 2:45 PM
Cambria
 

Combining Capsule Networks and Self-Supervised Learning for Image Classification with Occlusion (310068)

Christian Pigorsch, Friedrich-Schiller-University Jena, Germany 
*Ladyna Wittscher, Friedrich-Schiller-University Jena, Germany 

Keywords: Capsule Networks, Cutout, Occlusion, Robustness, Neural Networks, Self-Supervised Learning

The combination of the innovative Capsule Network architecture and self-supervised training is superior regarding robust classification of images affected by occlusion. Occlusion is a relevant problem in numerous computer vision applications like autonomous driving or face detection. Capsule Networks have been originally developed to overcome some of the shortcomings of Convolutional Neural Networks, like their incapability to recognize pose. They can deal well with small and occluded datasets, but are sensible towards overfitting and background noise. To increase their robustness, we combine them with a novel training approach. Self-supervision makes use of the same dataset twice as the model's weights are pretrained based on a simpler pretext task with self-generated pseudo labels. This improves the actual downstream training on the dataset's labels. Self-supervised learning both improves visual representation learning with missing information and can deal well with overfitting and occlusion. These synergies enable the combined model to extract more knowledge from the same amount of samples. As Capsule Networks are sensible to spatial orientation, we make use of rotation as pretext task to further enhance the spatial knowledge of the network. Our combined approach results in classification accuracy for Fashion-MNIST dataset being improved by 9.47% with up to 80% occlusion compared to the non-pretrained Capsule Networks. Improvements in general are more significant the more difficult the training conditions are, but self-supervision increases the performance in every occlusion scenario we studied. If the network is trained on original data but tested on occluded images, test accuracy is improved by 5.45%, if the center of the image with a high information content is cut out, the improvement is 7.47%. We demonstrate that the combination is very promising as it deals better with occlusion and can extract more knowledge from the same amount of available information.