In forensics studies, especially feature-based comparison decisions, it is important to evaluate the accuracy and reliability of assessments made by the forensic examiner. Often, the assessments are made on an ordinal scale. For example, in the fingerprint examination process, an examiner may be required to determine the quality of the latent print found at a crime scene. "Black-box" studies are conducted to assess the reliability and validity of decisions in a subjective examination process. Due to cost and time constraints, the intra-rater components of the black-box studies may be much smaller than the inter-rater component. It is of interest to provide a framework for assessing reliability for ordinal data in such settings. It is known that reliability may depend on the difficulty of the evidence. We propose a Bayesian approach to model ordinal decisions that unifies the analysis of inter-rater and intra-rater reliability, accounts for examiner-sample interactions, and provides the flexibility to model different examiner thresholds for characterizing samples into various ordinal categories.We also provide results on a study conducted by the FBI for latent fingerprint decisions.