Abstract:
|
To maximise the utility of a machine learning approach for text classification it should be possible to use the learner on corpus different from the one used in the training, i.e., on Document-Term Matrices (DTMs) with different terms respect to the training one. This is particularly true in fields which share different conventions or controlled vocabularies, as the clinical ones. Indeed, the validity of a systematic review on Clinical Trials (CT) depends on the ability to capture the body of evidence through searches of many data sources. Though, consistent search strategies across literature and registry databases may be difficult. Our aim is to show a way to face the problem. Relevant articles tagged with CT Publication Type on PubMed that appear in journals allocated to the Nursing area, formed the training DTM. A prediction task was then assessed by applying the validated classifier to a DTM created on all CT records which were registered in the International CT Registry Platform. The approach adopted shows a way to face the problem able to manage different DTMs from training to application and allowed to identify a set of protocols likely pertaining to nursing research.
|