Abstract:
|
With the wide adoption of electronic health records and popularity of social media and digital device, rich information for population at large is being accumulated on a daily basis. The information in many cases take the form of free texts that includes physician notes, on-line posting, and readings from device that could greatly augment the structured data to facilitate evidence-based data generation and decision-making. At the same time, unstructured data also poses challenges for analyses as classical statistical models are mostly developed for structured data. In this talk, we will present an end to end process that is required for a natural language processing research project. This includes pre-processing, e.g., tokenization, filtering, stemming, and lemmatization, followed by two different frameworks. The first one is to transform free text into structured data after feature extraction that can be fed into appropriate machine learning algorithm for model derivation. The second one is to leverage the most recent development in deep learning models which has feature extraction embedded. Practical consideration of their application, tools and case studies will be given.
|