Abstract:
|
Nowadays, unstructured text data are increasingly and readily available. For example, information may arise from conference abstracts, scientific publications, surveys, written notes, e-mails, blogs, and other sources including social media. In this research, we utilize text mining techniques and tools to first transform the unstructured texts into a structured database. Consequently, a document term matrix is extracted from structured data, with descriptive statistics generated for further exploration and analysis. Discrete optimization and simulated annealing will be applied to maximize an objective function based on the overall similarity of abstracts within a session. These methods are illustrated on a recent conference sponsored by the American Statistical Association. Statistical programming is conducted in R.
|