Is post-market surveillance of AI devices working?
Pre-market review only interrogates AI device performance at a single point in time, often in laboratory conditions. This is a common yet completely valid criticism of the regulatory process. Therefore consensus amongst regulators and academics is that the real-world safety of AI devices must be monitored post-market in real time.
But is anyone actually doing it?
According to the FDA :“Post-market surveillance is the active, systematic, scientifically valid collection, analysis, and interpretation of data or other information about a marketed device. The data collected under a surveillance order help to address important public health questions on the safety and effectiveness of a device.”
This seems fairly straightforward and sensible, right? Everyone agrees we need to monitor AI devices post-market, and the FDA have provided a clear definition of what it involves. Adverse event reporting and recalls are only one part such a multi-pronged post-market surveillance regime for AI devices, but they are a critical pillar. I have for a while suspected that while there is a lot of talk about monitoring AI devices, there is very little action, so I decided to dig into the data.
Using the official FDA list of AI devices in combination with our free-to-search Hardian Regulatory Intelligence database (HaRi) which includes all medical devices listed by the FDA, Health Canada,TGA, MHRA and EUDAMED, as well as all MAUDE and Recall database data, I searched for adverse events and recalls of all devices that are considered to be ‘stand-alone AI software’. This encompasses a wide range of products on market, from radiotherapy planning systems to cancer detection algorithms. The majority are of course radiology devices, since that is the most prevalent category for stand-alone AI.
The results, I think, may shock you.
First of all, a couple of definitions…
What is an adverse event?
The FDA’s Manufacturer and User Facility Device Experience (MAUDE) database contains Medical Device Reports (MDRs) of adverse events. The first interesting point to note is that even though MAUDE explicitly labels reports as ‘adverse events’, there is no FDA guidance on exactly what an adverse event is in regards to medical devices. The FDA do provide a clear definition for ‘Serious Adverse Events’, which includes things like death, disability or injury caused by a product, but there is no legal definition for ‘non-serious’ or ‘general’ adverse events for medical devices. The FDA however broadly defines an adverse event across all medical products as any undesirable experience associated with a product's use. (There is a clearer definition under eCFR 314.80 for drugs however).
Leaving this technical point aside, the MAUDE database contains MDRs from multiple sources (users, manufacturers and importers) about problems or defects with medical devices. In the US, serious adverse events involving death, serious injuries or when a device has malfunctioned and would be likely to cause or contribute to a death or serious injury if the malfunction were to recur, must be reported to the FDA under eCFR 803, for obvious reasons. Additionally, individual health professionals, consumers and patients can voluntarily report observed or suspected adverse events via MedWatch (a rather clunky and outdated platform) if they wish to raise a safety issue.
In summary, while there is no strict legal definition of a general medical device adverse event from the FDA, there are mandatory reporting requirements for serious events by hospitals, manufacturers and importers, and reporting is only voluntary for individual health professionals, patients and consumers.
It is also important to note that since anyone can voluntarily report anything they like, not all adverse events on MAUDE should be considered as real or significant. Not every complaint raised will be attributable to the device, and the FDA will only investigate if they feel there is merit in the report.
What is a device recall?
The FDA uses the term “recall” when a manufacturer takes a correction or removal action to address a problem with a medical device that violates FDA law.
There is rather a large range of what could constitute a reason for a recall when it comes to medical devices, starting from simple manufacturer updates to improve the safety of a device, right through to the FDA issuing a mandatory recall for a device they believe is unsafe to remain on market,
The FDA divides recalls into three distinct categories, and provides an open database for the public:
Class I: A situation where there is a reasonable chance that a product will cause serious health problems or death.
Class II: A situation where a product may cause a temporary or reversible health problem or where there is a slight chance that it will cause serious health problems or death.
Class III: A situation where a product is not likely to cause any health problem or injury.
Recalls can be considered roughly equivalent to Field Safety Corrective Actions in the UK and EU.
Analysing adverse events and recalls of FDA authorised AI/ML devices
To date the FDA has authorised just over1500 medical devices that are labelled as incorporating or using AI/ML techniques in their mode of operation. All of these devices (and around 8 million more) appear on our regulatory database HaRi, alongside their adverse events and recalls in the US.
Below is a summary table of FDA product codes, associated device brand names and totals of adverse events and recalls. Note that the data only relates to safety reports from the US.
| Product Code | Product Code Description | Associated Device Names | Total Adverse Events | Total Recalls |
|---|---|---|---|---|
| MUJ | Radiation therapy treatment planning software | RayStation, Dose+ (1.0), Monaco RTP System (6.3) | 136 | 45 |
| PJA | Cardiovascular blood flow simulation software | FFRct, HeartFlow Analysis | 137 | 0 |
| LLZ | System, image processing, radiological | Merge PACS, UNiD Spine Analyzer, TOMTEC-ARENA, Ziostation2, Lung Vision, Cleerly Labs, Vitrea CT Brain Perfusion, Centricity Universal Viewer | 106 | 10 |
| MYN | Dental AI computer-aided detection software | Second Opinion, Overjet Caries Assist | 3 | 0 |
| QIH | Automated radiological image processing software | Brainlab Elements | 1 | 0 |
| QAQ | Predictive physiological monitoring software | Acumen Hypotension Prediction Index Feature Software | 1 | 0 |
| DPS | Electrocardiograph software analysis | Cardiologs Holter Platform | 1 | 0 |
| POK | Breast MRI computer-aided diagnosis (CADx) | QuantX | 1 | 0 |
| NDC | Continuous glucose/insulin dosing software | EndoTool SubQ | 1 | 0 |
| PIB | Autonomous retinal diagnostic AI software | AEYE-DS | 1 | 0 |
The most striking aspect of the above data is that, for the vast majority of AI/ML product codes, there is absolutely minimal adverse event and recall information. Most product codes have only received ONE adverse event report in their entire time on market - remembering that each product code covers multiple similar devices in a predicate chain.
For two of the product codes that do have over 100 reports (MUJ and PJA), these are largely from just a couple of manufacturers who themselves submitted a number of reports. For example the Swedish manufacturer of Raystation, a radiotherapy planning software, self-reported small issues over 90 times that were ultimately resolved with no recalls. Similarly, the manufacturers of Heartflow conduct regular internal review and reported to the FDA, including potential false negatives. This is excellent practice and should be encouraged more widely for all manufacturers.
For the LLZ product code, it is more difficult to tease out a trend, since this code covers 160 different products, from simple post-processing software in PACS to complex brain perfusion analysis. Given this limitation, the total number of safety events is still surprisingly low at only 106 reported adverse events and 10 recalls.
The data below gives 10 more FDA product codes for stand-alone software that have NEVER received a single safety report from anyone. This covers a total of 244 authorised devices currently on the US market.
Stand-alone software codes with ZERO adverse events or recalls (244 devices)
| Code | Description | No. of Devices |
|---|---|---|
| QAS | Radiological computer-assisted triage and notification software (CADt). | 83 |
| QDQ | Radiological computer-assisted detection/diagnosis software for lesions suspicious for cancer (CADe/CADx). | 32 |
| QVD | Radiological machine learning-based quantitative imaging software deployed with a pre-approved Predetermined Change Control Plan (PCCP). | 1 |
| QFM | Radiological computer-assisted prioritization software for clinical lesions. | 39 |
| QJU | Diagnostic image acquisition and/or automated optimization modules guided by AI. | 7 |
| QKB | Radiological image processing and modeling software explicitly designed for radiation therapy treatment planning. | 44 |
| QBS | Radiological computer-assisted detection/diagnosis software for structural fractures. | 9 |
| OEB | Specialized lung CT computer-aided detection software. | 9 |
| QNP | Gastrointestinal tract lesion software detection systems used in endoscopy workflows. | 18 |
| QPN | Deep learning and machine learning software algorithms used to assist pathologists in digital whole-slide image analysis. | 2 |
To put this in perspective this final table covers ALL of the complex radiology AI stand-alone products, from triage to cancer detection. Not a single one of 244 AI radiology devices has ever received a report of an adverse event in the US, either from the manufacturer, a user or the public, and none have had to be recalled for any issues, which begs the question….
So, stand-alone AI devices are really safe?
Not quite.
Absence of evidence is not evidence of absence, after all.
There are, to my mind, three possible explanations for the complete absence of safety data for the vast majority of stand-alone AI devices:
1) No-one is actually using them - if a product is not in use, no adverse events can happen. I consider this unlikely given the reported rapid adoption of AI in radiology across the US in recent years. Ask any US radiologist and they will gladly tell you which algorithms they are using. They will also tell you all about how often they go wrong….
2) They are 100% safe - this might be the bull case, but again I think this is unlikely. Even an algorithm with 95% accuracy is expected to be wrong 5% of the time. Surely if a cancer detection algorithm misses a cancer, that should be considered a mandatory reporting event, since the patient could come to serious harm or even death?
3) Neither manufacturers or users are reporting adverse events - this I think is the correct answer. Ask any radiologist if they have ever seen an AI make a mistake, and they will tell you immediately ‘yes’. Ask them if they have ever reported an adverse event to the FDA… my bet, based on the above data, is that they haven’t. The process for reporting is not widely known and takes time and effort. Similarly, the manufacturers of these AI products are clearly not taking reporting seriously, given none of them have reported a single issue, despite the fact that algorithmic drift, false positives and negatives, bias, and other causes of AI errors are all widely reported in the academic literature.
What should be done?
Regulators like the FDA cannot do anything to improve patient safety if they have no data on which to act. Ideally they’d like to see trends over time that consistently tell a convincing story of harm, or potential for harm. In the almost complete absence of any active engagement by manufacturers and users in this aspect of post-market surveillance for AI it becomes impossible for anyone to know how safe these devices really are, or in which way they tend to fail. This is a sad state of affairs given the huge excitement around these technologies.
We are at risk of rapidly deploying at scale some of the most advanced technology ever developed, but doing almost nothing to keep an eye on it and ensure it remains safe. This is the main point - we are not doing what we said we would do!
Active steps the entire AI ecosystem could take are as follows:
Manufacturers - conduct regular internal review and self-report potential for harm (see above for Heartflow as a good example). Not only is this best practice, it makes you instantly more trustworthy as a vendor, and will actually make your product better. Instead of pretending everything is perfect, be brave, admit issues and correct them. Also, consider adding in a widget within your UI that allows users to one-click report issues. If you can’t get feedback from users, you can’t improve your product.
Users - start reporting all those errors you keep complaining about! Nothing will change if no-one knows what’s going wrong. If you can’t be bothered to report every error, then at least set up some internal review and conduct regular audit of AI devices within your institution, and report issues in aggregate. You have a duty to patient safety after all. Remember, serious adverse events are legally mandatory to report to the manufacturer. Educate yourselves on how reporting should be conducted, when to do it, and how to do it.
Regulators - please make it easier for users to report issues. The Medwatch website is woefully clunky. Consider creating an API hook so that manufacturers can connect user-facing reporting widgets within their products direct to your adverse events reporting forms. Demand more from manufacturers - consider making periodic safety reporting mandatory as it is in the EU, for instance.
The final diagnosis
Right now, the post-market surveillance of medical AI looks less like a cutting-edge safety net and more like plausible deniability. While we have successfully deployed some of the most sophisticated algorithms in human history into mainstream clinical workflows, our mechanism for tracking their real-world failures boils down to hoping a busy doctor feels motivated enough to battle a dial-up-era website. This has to change.
The current reality is stark: If an AI misdiagnoses a patient and nobody fills out the paperwork, the algorithm remains, on paper, completely flawless. You may choose to believe this, or choose to believe in what we collectively promised - to monitor AI in the real-world robustly.
To use an analogy - there is no airline pilot on earth who would fly their plane without filling out documentation about any and all issues that arise. Doctor’s, and the suppliers of their technology, must be prepared to do the same.
If we genuinely want AI to revolutionise healthcare, we have to stop treating post-market monitoring as an optional administrative chore. Until manufacturers build automated feedback loops, users start flagging errors, and regulators upgrade to 21st-century reporting infrastructure, those pristine zero-incident tables won't reflect perfect software. They will just reflect a system keeping its eyes wide shut.
YOU are part of the solution.
If you’ve been affected by the issues discussed, or are wondering how to better report issues with AI medical devices, please do get in touch.