Online Program

Return to main conference page

All Times ET

Program is Subject to Change

Monday, June 14
Mon, Jun 14, 10:30 AM - 12:00 PM
TBD
Topics in Classification and Frame Development

Application of Bidirectional Encoder Representations from Transformers (BERT) for the classification of occupations for establishment surveys (308127)

Ng Yunling Elyn, Ministry of Manpower 
*Ng Bin Shen Lucas, Ministry of Manpower 
Tze Wei Sim, Ministry of Manpower 

Keywords: BERT, Natural Language Processing, Python, Fine-tune, Deep Learning

As National Statistical Agency responsible for manpower-related statistics, it is paramount that the classification of a respondents occupation during collection through surveys we conduct is done correctly. By doing so, this allows us to accurately identify occupations that are in-demand as well as occupations that are declining in employment for targeted Government intervention.

In Singapore, the Singapore Standard Occupation Classification (SSOC) adopts the basic framework and principles of the International Standard Classification of Occupations (ISCO) developed by the International Labour Office (ILO). The latest version comprises a total of 1,202 occupations.

Due to the large number of occupations, the respondent is relieve of the complexity of classifying survey responses using the provided job titles and job descriptions into their respective occupation labels. Instead, the classification, a rather labour intensive and subjective process is done by trained interviewers.

With the advent of transfer learning, fine-tuning our own model on historical data to recognise the relationship between the text provided and the occupation labels is made possible. BERT achieves an F1 score that outperforms all our previous models which were both linear and non-linear.

The presentation seeks to detail the pipeline of fine-tuning BERT for text classification involving a step by step guide on source code modifications using Python. It will also involve the creation of intermediate functions with the purpose of creating and processing the necessary files required and produced by the model. By the end of the presentation, participants would be able to build high-performance Natural Language Processing (NLP) models using their own organizational data.