Case judged negative by 2 of 4 radiologists, but positive for acute hemorrhage by both the algorithm and the gold standard of consensus. Small left temporal subarachnoid hemorrhage (SAH). The boxed areas are magnified views of small areas of hemorrhage. Arrows indicate the borders of small or subtle hemorrhages Images reproduced from Kuo W, et al. Proc Natl Acad Sci U S A. 2019
An algorithm developed by scientists at UC San Francisco and UC Berkeley did better than two out of four expert radiologists at finding tiny brain hemorrhages in head scans — an advance that one day may help doctors treat patients with traumatic brain injuries (TBI), strokes and aneurysms.
The continued increase in diagnostic imaging studies, including 3D imaging studies such as CT, means that radiologists are looking at thousands of images each day, searching for tiny abnormalities that can signal life-threatening emergencies. The number of images from each brain scan can be so large that on a busy day, radiologists may opt to scroll through some large 3D stacks of images using mice with frictionless wheels, almost like viewing a movie. But it could be much more efficient — and potentially more accurate— if AI technology could pick out the images with significant abnormalities, so that radiologists could examine them more closely.
“We wanted something that was practical, and for this technology to be useful clinically, the accuracy level needs to be close to perfect,” said Dr Esther Yuh, MD, Ph, co-corresponding author of the study (Kuo W, Hane C, Mukherjee P, Malik J, Yuh EL Expert-level detection of acute intracranial hemorrhage on head computed tomography using deep learning. Proc Natl Acad Sci U S A. 2019. pii: 201908021. doi: 10.1073/pnas.1908021116).
“The performance bar is high for this application, due to the potential consequences of a missed abnormality, and people won’t tolerate less than human performance or accuracy.”
The algorithm the team developed took just one second to determine whether an entire head scan contained any signs of hemorrhage. It also traced the detailed outlines of the abnormalities it found–demonstrating their location within the brain’s three-dimensional structure. Some spots may be on the order of 100 pixels in size, in a 3D stack of images containing over a million of them, and even expert radiologists sometimes miss them, with potentially grave consequences.
The algorithm found some small abnormalities that the experts missed. It also noted their location within the brain, and classified them according to subtype, information that physicians need to determine the best treatment. And the algorithm provided all of this information with an acceptable level of false positives–minimizing the amount of time that physicians would need to spend reviewing its results.
Yuh said one of the hardest things to achieve with the AI technology was the ability to determine whether an entire exam, consisting of a 3D “stack” of approximately 30 images, was normal.
“Achieving 95 percent accuracy on a single image, or even 99 percent, is not OK, because in a series of 30 images, you’ll make an incorrect call on one of every 2 or 3 scans,” she said. “To make this clinically useful, you have to get all 30 images correct— what we call exam level accuracy. If a computer is pointing out a lot of false positives, it will slow the radiologist down, and may lead to more errors.”
The radiology experts said the algorithm’s ability to find very small abnormalities and demonstrate their location in the brain was a substantial advance.
“The hemorrhage can be tiny and still be significant,” said Dr. Pratik Mukherjee, professor of radiology at UCSF. “That’s what makes a radiologist’s job so hard, and that’s why these things occasionally get missed. If a patient has an aneurysm, and it’s starting to bleed, and you send them home, they can die.”
Dr. Jitendra Malik, Professor of Electrical Engineering and Computer Sciences at Berkeley, said the key was choosing which data to feed into the model. The new study made use of a type of deep learning known as a fully convolutional neural network, or FCN, which trains algorithms on a relatively small number of images, in this case 4,396 CT exams. But the training images used by the researchers were packed with information, because each small abnormality was manually delineated at the pixel level. The richness of this data — along with other steps that prevented the model from misinterpreting random variations or “noise” as meaningful — created an extremely accurate algorithm.
The scientists could have chosen to feed an entire stack of images, or one complete image, all at once. Instead, they chose to feed only a portion or “patch” of an image at a time, contextualizing this image with the ones that directly preceded and followed it in the stack. Viewing an image in patches is also how people read text or look at a computer screen, and this enabled the network to learn from the relevant information in the data without “overfitting” the model by drawing conclusions based on insignificant variations that were also present in the data. They called their model PatchFCN.
“We took the approach of marking out every abnormality —that’s why we had much, much better data,” said Malik, a co-corresponding author of the study. “Then we made the best use possible of that data. That’s how we achieved success.”
The authors are now applying the algorithm to CT scans from trauma centers across the country that are enrolled in a research study led by UCSF’s Dr. Geoffrey Manley, professor and vice chair of neurosurgery.