Background/Aims: Colonoscopy is widely used for colorectal cancer (CRC) screening, surveillance, and diagnosis. To assess colonoscopy utilization, effectiveness, and safety, it is important to distinguish between these indications. Administrative data sources have the advantage of representing real world colonoscopy utilization, but because the codes are primarily intended for billing, it is challenging to identify the reason for the procedure in these data, especially with large datasets. Several studies using administrative data have applied procedure and diagnostic code-based algorithms to classify colonoscopy indication. However, none have demonstrated simultaneously high sensitivity and specificity. The current study uses adjudicated medical records at 4 CRN sites to evaluate the test characteristics of existing algorithms, develop a new algorithm, and compare performance of existing algorithms with the new algorithm. Methods: The study included 716 subjects, patients of 4 large health care organizations. Subjects’ records were reviewed and adjudicated as part of a late-stage CRC case-control study conducted concurrently with this analysis. Cases were 55 years or older at diagnosis in 2006 through 2008; controls were age-matched to cases. Medical records were abstracted and adjudicated to assign indication for 465 colonoscopy procedures. We first tested the performance of 5 published algorithms. We then identified a superset of candidate predictor variables, which we selected from the published algorithms. We entered the variables in a LASSO prediction model, using the subject-level coded data for values of the predictors, and subjects’ colonoscopy outcomes. LASSO is a backwards-selection multiple logistic regression, designed to protect against model over-fitting. The covariates retained by the new model were used to construct a Receiver Operator Curve (ROC) displaying the model’s sensitivity at each increment of specificity. Results: The existing algorithms had sensitivities and specificities of 65/74%, 60/77%, 74/58%, 77/58%, and 51/30% for classifying CRN data. The ROC curve of the new algorithm encompassed these values, indicating higher sensitivity at each level of specificity than the existing algorithms. For example, at a sensitivity of 80%, specificity was approximately 70%; at sensitivity of 70%, specificity was about 82%. Conclusions: The new algorithm will allow more accurate classification of colonoscopy indication in CRN data than do existing algorithms.