|
Activity Number:
|
436
|
|
Type:
|
Contributed
|
|
Date/Time:
|
Wednesday, August 5, 2009 : 8:30 AM to 10:20 AM
|
|
Sponsor:
|
Section on Statistical Learning and Data Mining
|
| Abstract - #303296 |
|
Title:
|
Statistical Learning of Word Acquisition with Application to Readability Prediction
|
|
Author(s):
|
Paul B. Kidwell*+ and Guy Lebanon and Kevyn Collins-Thompson
|
|
Companies:
|
Purdue University and Georgia Institute of Technology and Microsoft Research
|
|
Address:
|
216 Dehart St, West Lafayette, IN, 47906,
|
|
Keywords:
|
logistic regression ; readability ; Rasch model ; acquisition age
|
|
Abstract:
|
Language learning, as expressed through word acquisition and readability, plays an important role in both psycholinguistic theories and information system engineering. We present a statistical model for document readability that is based on the logistic Rasch model and the quantiles of word acquisition age distributions. We exploit this connection to infer the distributions of acquisition ages from empirical readability data that was automatically collected from the web. Contrasting the inferred acquisition distributions with existing oral studies reveals interesting historical trends as well as differences between the oral and written word acquisition processes. We also demonstrate how the inferred acquisition distributions can be used to predict global and local document readability.
|