Confidence Scores

What is a confidence score?

A confidence score is a number from one (1) to five (5) that represents the likelihood that a given provider’s contact information (i.e., address, phone number) is accurate. It is based on the output of a Machine Learning (ML) model that is designed to predict accuracy.

Why does Ribbon use confidence scores?

The average provider directory is roughly 50% inaccurate (CMS). Rather than simply displaying potentially inaccurate data, Ribbon uses confidence scores to indicate the likelihood that any given provider’s contact information (i.e., address and phone number) is accurate. Since manual validation across every data point on a continual basis is not effective at scale, confidence scores show the relative accuracy of each data point without manual confirmation. A confidence score of one (1) indicates a low likelihood the data is accurate (i.e., a high likelihood of inaccuracy), while a confidence score of four (4) indicates a high likelihood the data is accurate. A confidence score of five (5) indicates confirmed accuracy via manual validation by Ribbon’s call center. This manual validation is then leveraged by Ribbon’s ML model which learns from this validated data to predict the relative accuracy of a provider’s contact information.

How does the model work?

Ribbon collects data from thousands of sources to identify all instances of a given provider’s National Provider Identifier (NPI), address, and phone number. Ribbon’s ML model then cleanses, standardizes, and “scores” that data to determine the likelihood that any given provider’s contact information is accurate.

Ribbon trains the ML model that powers confidence scores on a subset of manually validated contact information regularly to ensure continued accuracy. The ML model runs the aggregated data against this validated subset to predict the relative accuracy of all data points at scale. Additionally, Ribbon will also manually validate a random sample of the ML model’s output to assess if confidence scores match predicted probability benchmarks. For example, a random sample test should show that 90% of providers with a confidence score of four (4) are accurate after manual validation.

📘

How does confidence score map to accuracy?

5 indicates that the data point has been manually verified within the past 90 days.
4 represents a 90% probability of being accurate
3 represents a 70% probability of being accurate
2 represents a 50% probability of being accurate (industry average)
1 represents a 20% probability of being accurate