Online Program

Return to main conference page

All Times EDT

Thursday, June 4
Machine Learning
Software & Data Science Technologies
Machine Learning and Software and Data Science Technologies Posters
Thu, Jun 4, 2:00 PM - 5:00 PM
TBD
 

Can Big Data Algorithms Be Used to Improve Cybersecurity? (308435)

Bruce Caulkins, University of Central Florida 
*Allen Sina Rahrooh, University of Central Florida 
Morgan Wang, University of Central Florida 

Keywords: big data, cybersecurity, machine learning, statistical learning

There are many newly developed big data analytical algorithms in the past few years and many of them can be used to be used to improve cybersecurity. Each algorithm has its own strengths and weaknesses. In this survey paper, we examined all newly developed algorithms including both machine learning and statistical learning algorithms developed in the past five years. We pay extra attention on these algorithms that have been used in the cybersecurity fields such as intrusion detection and network security. The survey of the papers is limited to the past 5 years from 2014 to 2019, since cybersecurity is changing rapidly, and new algorithms have to be developed to maintain cybersecurity systems such as firewalls and encryption. Statistical learning algorithms were limited to bootstrap aggregation, support vector machines with various kernel functions, boosting, k nearest neighbors, linear and quadratic discriminant analysis, random forest, logistic regression, and naïve bayes. Machine learning algorithms were limited to neural networks, classification trees, deep learning, and artificial intelligence. We aim to combine machine learning methods and statistical learning algorithms to develop and apply hybrid algorithms to improve network security for large databases and anomaly detection to detect threats to cyber security systems such as a credit card database were the network uses multiple layers of networks of encryption to ensure consumer privacy. We will then compare the performance metrics of each hybrid algorithm using confusion tables analyzing the true positives, true negatives, false positives and false negatives. We will conclude by reporting the results and comparing the various hybrid models.