Abstract:
|
The Johns Hopkins University Applied Physics Laboratory hypothesized that machine learning could be used to determine the security classification level of textual documents. To do this, military weapon performance reports were parsed by paragraph, retaining the portion-marking. The classification levels explored were UNCLASSIFIED, CONFIDENTIAL, SECRET and SECRET//FORMERLY RESTRICTED DATA. This data was then divided into training, validation and testing data sets. Next, a Long-Short Term Memory network was applied in an attempt to predict classification level based on the text within the paragraph. Nearly eighty percent accuracy was achieved for the testing and validation datasets. Upon further review of the paragraphs misclassified by the algorithm, it was determined that the algorithm was finding some mistakes and inconsistencies in the original markings. The ultimate goal is to develop a tool that will recommend portion-marking given input text. This tool would allow us to portion-mark documents in a more consistent way, and could potentially help identify areas of increased classification by aggregation.
|