CE_03C Sat, 8/6/2022, 8:30 AM - 5:00 PM CC-146C
Text Analysis for Statisticians Who Want to Become Data Scientists — Professional Development Continuing Education Course
This course will provide a broad overview of text analysis and natural language processing (NLP), including a significant amount of introductory material but with extensions to state-of-the-art methods. All aspects of the text analysis pipeline will be covered including data preprocessing, converting text to numeric representations (from simple aggregation methods to more complex embeddings), and training supervised and unsupervised learning methods for standard text-based tasks such as named entity recognition (NER), sentiment analysis, and topic modeling. The course will alternate between presentation and hands-on exercises in Python. Most examples will also be translated into R for students more comfortable in that language and support will be given for both Mac and Windows users. Attendees should be familiar with R, Python, or both and have a basic understanding of statistics and/or machine learning. Attendees will gain the practical skills necessary to begin using text analysis tools for their tasks as well as an understanding of the strengths and weaknesses of these tools.
Instructor(s): Karl Pazdernik, Pacific Northwest National Laboratory; Robin Cosbey, Pacific Northwest National Laboratory