Distribution extents of order k for a sample {x1, x2,.. xn} of a non-negative stochastic variable X are defined as
E_k = (sum p_i^k )^(1/(1-k)) if k != 1 , E_1 = exp(-sum(p_i ln(p_i))) if k = 1 ,
where p_i = x_i / sum(x_i) ,
and are useful measures of "a number of large values" in the sample.
They were introduced by L.L. Campbell in 1964, and are generalization of inverse Herfindahl-Hirschman Index (HHI), a commonly accepted measure of market concentration in economics, and Simpson's diversity index used in ecology, and are closely related to Shannon-Wiener Index and the Rényi entropy and divergence.
In this work we describe general properties of E_k and demonstrate advantages to use it in analysis of a web advertisement network, where actors are advertisers, publishers, and users, for three purposes: 1) as cut off parameters to present the network as a graph to visualize the network and to use graph theory methods 2) as independent variable in predictive modeling, and 3) as a criterion for optimization of some parameters of models.
|