Abstract:
|
In this talk, the speaker will provide an overview of methods to predict the amount of ultimate insurance loss given a textual description of the claim using a large number of words found in the description of the claim. Initial insurance losses are often reported with a textual description of the claim, and in order to transform words into numeric vectors, the proposed method is to use word cosine similarities and word embedding matrices. When one considers all unique words found in the training dataset and impose a generalized additive model to the resulting explanatory variables, the resulting design matrix is high dimensional. For this reason, statistical learning approaches, such as the group lasso approach, are used to reduce the number of coefficients in the model. Details of the implementation of the estimation routine using the Rcpp library will be explained during the talk.
|