Keywords: measurement, sampling, security
How do you measure something as large and complex as the web? In this talk I will outline the methodology used to calculate the size and scope of badness across Google, including phishing, malware, and unwanted software. In order to effectively fight different types of abuse across Google, we must first define a metric for abuse and calculate a baseline. Then we must devise an efficient method for regularly calculating the metric. If, for example, there exists an uncountable - and ever growing - number of urls, we define a method to estimate a subset of urls most critical to protecting users. A robust sampling process allows us to measure the problem size, identify gaps in coverage, and determine if we are moving the needle in the fight against badness.