A Better Methodology to Evaluate and Improve AI-Generated Medical Notes
#1
Common machine learning assessment models are inadequate for determining the quality of AI-generated medical notes. The typical application of one standard methodology, the F1 score, looks at how well the medical note reflects the transcript. This method does not incorporate error weighting, note organization, medical terminology or transcript inaccuracies.

Augmedix developed the Augmedix Note Quality Score (NQS) to better capture the unique requirements for medical notes as well as clinician expectations. The NQS methodology directly informs model development by categorizing true positives, false positives, and false negatives into 6 note sections, over 100 medical entity categories, and 13 error types.

The NQS provides a more realistic assessment of medical note quality than the F1 score because it captures additional, relevant error types and the recorded errors per note align more closely with clinician editing time. Ultimately the goal of ambient medical note taking solutions is to save clinicians time. In comparison to other AI note generation companies, which are similar in performance, Augmedix scores 8 percentage points higher on the NQS than the lowest performing company, resulting in 5 fewer edits per note.

In conclusion, the NQS addresses error types that are missed by the F1 score, better informs opportunities for model development, and demonstrates that Augmedix out performs other AI companies on note quality.

Full white paper attached.


Attached Files
.pdf   A Better Methodology to Evaluate and Improve AI-Generated Medical Notes_August 2024.pdf (Size: 1.05 MB / Downloads: 0)
Reply


Messages In This Thread
A Better Methodology to Evaluate and Improve AI-Generated Medical Notes - by Rachel Franzmann - 07-19-2024, 01:52 AM

Forum Jump: