The kernel k-means algorithm successfully identifies a much more varied collection of cluster structures than the linear k-means algorithm, but when it is both the case that the non-linear feature map is high-dimensional and there are many input points, kernel k-means is prohibitively expensive. This talk presents recent results on the Nystrom approach to approximate kernel k-means clustering: using a novel rank-restricted Nystrom approximation, we show that one can construct an optimal number of features such that linear k-means on these features delivers a 1+epsilon approximation ratio to the kernel k-means objective. Along the way, a result of independent interest is established: the rank-restricted Nystrom approximation satisfies a 1+epsilon relative-error trace-norm low-rank approximation guarantee.