Online Program

Friday, February 22
CS11 Theme 2: Data Modeling and Analysis #4 Fri, Feb 22, 3:15 PM - 4:45 PM
Napoleon A1-3

Uncovering the Truths Behind Internet Domain Registrations (302439)

View Presentation View Presentation

Michael Jugovich , NORC at the University of Chicago 
*Edward J Mulrow, NORC at the University of Chicago 
Steven Pedlow, NORC at the University of Chicago  

Keywords: web-bot, WHOIS, DNSBL, Natural Language Processing, Machine Learning

The Internet Corporation for Assigned Names and Numbers (ICANN) requested a study of its internet domain registration data. In 2011, ICANN’s WHOIS databases cataloged more than 220 million website registrations. A representative sample was selected from the five most common generic top-level domain names, which cover 98.5 percent of ICANN registered websites, and WHOIS, DNSBL and organic data was extracted via a customized web-bot. WHOIS registrant name and registrant organization data were used to classify the types of entities that register domain names, such as natural persons, corporations, privacy and proxy service providers, and others. With these data, we analyze content associated with each domain name to classify the types of entities and identify the various types of activities associated with them. Entity and commercial activity classifications are developed using a variety of techniques ranging from manual coding to natural language processing and machine learning. Interrelationships among entity types and activities are examined to help ICANN better understand the wide variety of possible correlations that may emerge and their potential policy implications.