Stratified cluster randomization trials (CRTs) are frequently employed in clinical and healthcare research. Comparing to simple randomized CRTs, stratified CRTs reduce the imbalance of baseline prognostic factors among different intervention groups. Despite the popularity, there is limited methodological development on sample size estimation for stratified CRTs, and existing work mostly assumes equal cluster size within each stratum. Clusters are often naturally formed with random sizes in CRTs. With varying cluster size, commonly used approaches ignore the variability in cluster size, which may underestimate (overestimate) the required sample size and lead to underpowered (overpowered) clinical trials. We propose a closed-form sample size formula for stratified CRTs with binary outcomes, accounting for both clustering and varying cluster size. We investigate the impact of various design parameters on the relative change in sample size due to varying cluster size. Simulation studies are conducted and an application to a pragmatic trial of a triad of chronic kidney disease, diabetes and hypertension is presented for illustration.