Keywords: crowdsourcing image analysis, functional data, genotype-by-environment interaction, robust estimation, spline smoothing, growth rate, principal component analysis
Recent advances in field-based plant phenotyping have increased interest in statistical approaches for analysis of longitudinal phenotypic data derived from sequential images. In a maize growth study, plants of various genotypes were imaged daily during the growing season by hundreds of cameras. Amazon Mechanical Turk (MTurk) workers were hired to manually mark plant bodies on these images, from which plant heights were obtained. An important scientific problem is to estimate the effect of genotype and its interaction with environment on plant growth while adjusting for measurement errors from crowdsourcing image analysis. We model plant height measurements as discrete observations of latent smooth growth curves contaminated with MTurk worker random effects and worker-specific measurement errors. We allow the mean function of the growth curve and its first derivative to depend on replicates and environmental conditions, and model the phenotypic variation between genotypes and genotype-by-environment interactions by functional random effects. We estimate the covariance functions of the functional random effects by a fast penalized tensor product spline approach, and then perform robust functional principal component analysis (rFPCA) using the best linear unbiased predictor of the principal component scores. As byproducts, the proposed model leads to a new method for assessing the quality of MTurk worker data and a novel index for measuring the sensitivity to drought for various genotypes. The properties and advantages of the proposed approach are demonstrated by simulation studies.