Neural networks (NN) are a machine learning algorithm that have been widely used for independent data to predict outputs as complex, non-linear functions of inputs. However, there is no research on using NN for clustered data. This research proposes to fill this gap by integrating random-effects into NN that allows the output to take any form of the nonlinear function of the inputs. The proposed NN is trained by minimizing the cost function using the quasi-Newton and gradient descent algorithms. Overfitting is controlled by using the L2 regularization method. The trained NN is evaluated for prediction accuracy by using the leaving-one-out cross-validation for both simulations and real data. Prediction errors are compared between the NN and generalized linear mixed models (GLIMM). Results show that NN outperforms GLIMM when the link function has a nonlinear relationship with the predictors. The NN can be used as a convenient alternative to GLIMM, as it provides higher prediction accuracy and can model complex relationships without a priori assumptions, especially when the functional relationship is unknown and data are very large with many inputs.