Abstract:
|
AT&T systems and devices interact with millions of external hosts on a daily basis. Malicious URLs are a common and serious threat to cybersecurity. These external sites host unsolicited content (spam, phishing etc.), lure unsuspecting users to become victims of scams and can cause serious harm to AT&T’s network. It is imperative to detect and act on such threats in a timely manner. This work aims to provide a formal formulation of Malicious URL Detection as an online machine learning task using the structural components of the URL. AT&T currently uses DNS database containing DNS (Domain Name system) entries associated with IP traffic. The full URL, which also contains the domain name, provides insight into the activity being generated by the connection. It includes such items as the page, image or document being accessed, queries being run and re-direction to other URLs. With the recent availability of the MSP data for FirstNet traffic, we now have a source of the full URLs in addition to the victim’s device information. In this overview, we will discuss how we decompose these URLs into different components, extract various features and use machine learning to classify the URLs.
|