Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 215 - Contributed Poster Presentations: Section on Statistical Learning and Data Science
Type: Contributed
Date/Time: Tuesday, August 4, 2020 : 10:00 AM to 2:00 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #313867
Title: Weakly Supervised Chinese Meta-Pattern Discovery and NER via TopWORDS 2
Author(s): Jiaze Xu* and Ke Deng
Companies: Tsinghua University and Tsinghua University
Keywords: Chinese text mining; EM algorithm; electronic health record

Text mining has attracted much attention with rapid developments of digitization and OCR technologies, which requires a large demand of text analysis tools. Domain-specific Chinese texts have various structures and styles, such as ancient Chinese prose and eletronic health records. They have different syntactical structures and word usage frequencies from modern official Chinese which is used in newspaper. In practical applications, pre-specified vocabularies and relevant corpora are sometimes not available, thus, semi-supervised methods with statistical modelling are preferred. I will introduce a weakly supervised method TopWORDS 2 for Chinese meta-pattern discovery and named entity recognition. From my research, I found that TopWORDS 2 could effectively extract valuable information and facilitate further text analysis.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program