Uppsala Reports Long Reads – Weeding out duplicates to better detect side effects Artwork

Drug Safety Matters

Drug Safety Matters brings you the best stories from the world of pharmacovigilance. Through in-depth interviews with our guests, we cover new research and trends, and explore the most pressing issues in medicines safety today. Produced by Uppsala Monitoring Centre, the WHO Collaborating Centre for International Drug Monitoring.

The views and opinions expressed in the podcast are those of the hosts and guests respectively and, unless otherwise stated, do not represent the position of any institution to which they are affiliated.

All Episodes

Drug Safety Matters

Uppsala Reports Long Reads – Weeding out duplicates to better detect side effects

August 26, 2024

Duplicate reports are a big problem when it comes to signal detection, but with the help of machine learning and new ways of comparing reports, we may more effectively detect them.

This episode is part of the Uppsala Reports Long Reads series – the most topical stories from UMC’s pharmacovigilance news site, brought to you in audio format. Find the original article here.

After the read, we speak to author Jim Barrett, Senior Data Scientist at UMC, to learn more about the duplicate detection algorithm and UMC’s work to develop AI resources for pharmacovigilance.

Tune in to find out:

How the new algorithm handles duplicates in VigiBase
About different approaches for developing algorithms
Why it can be challenging to evaluate the performance of an algorithm

Want to know more?

Listen to the Drug Safety Matters interview with Michael Glaser about his Uppsala Reports article “Ensuring trust in AI/ML when used in pharmacovigilance” and check out the episode’s extensive list of links for more on AI in pharmacovigilance.
Artificial intelligence in pharmacovigilance – value proposition and the need for critical appraisal, a presentation by Niklas Norén, Head of Research at UMC, given at University of Verona in April 2024.

Finally, don’t forget to subscribe to the monthly Uppsala Reports newsletter for free regular updates from the world of pharmacovigilance.

Join the conversation on social media
Follow us on Facebook, LinkedIn, X, or Bluesky and share your thoughts about the show with the hashtag #DrugSafetyMatters.

Got a story to share?
We’re always looking for new content and interesting people to interview. If you have a great idea for a show, get in touch!

About UMC
Read more about Uppsala Monitoring Centre and how we work to advance medicines safety.

Fredrik Brounéus: 0:09

Duplicate reports are a big problem when it comes to signal detection, but with the help of machine learning and new ways of comparing reports, we may more effectively detect them. My name is Fredrik Brouneus and this is Drug Safety Matters, a podcast by Uppsala Monitoring Center, where we explore current issues in pharmacovigilance and patient safety. This episode is part of Uppsala Reports Long Reads series, where we select the most topical stories from our news site, Uppsala Reports, and bring them to you in audio format. Today's article is "Weeding Out duplicates to better detect side effects, written by Jim Barrett, senior data scientist at Uppsala Monitoring Center, and published online in April 2024. After the read, I sit down with Jim to learn more about duplicate detection and other ways that we can use artificial intelligence in pharmacovigilance. So make sure you stay tuned till the end. But first let's hear the article read by Jim Barrett.

Jim Barrett: 1:14

VigiBase is fast approaching 40 million reports of adverse events following drugs and vaccines, with no indication of slowing down its growth. So far in 2024, VigiBase has received on average about 50,000 new reports per week. The sheer size of VigiBase makes it an amazing resource for pharmacovigilance. However, a natural consequence of this high rate of reporting is that we can sometimes get more than one report in VigiBase about the same adverse event in the same patient. There are many ways this can happen. Sometimes there are multiple reporters of the same event or a single patient may report to multiple places. Another is that follow-up information can be mistakenly unlinked to the original report.

Jim Barrett: 1:57

Duplicate reports pose several problems for pharmacovigilance. A key example arises when doing statistical signal detection, which is when we try to identify the adverse events which are happening more frequently in combination with a drug than we would otherwise expect to see by chance. Imagine we have some adverse event reports for a drug that specify experiencing a headache. Given the background rates of reporting on headaches, we would expect 10 of the reports to mention headache by chance. Then imagine that for each of the patients who experienced the headache, VigiBase had received two independent reports of their adverse event. Suddenly, this combination looks like it's happening twice as often as we would expect. This might lead us to investigate the combination as a potential safety signal, wasting valuable time that could be spent investigating other potential signals. Clearly, it would be better to remove duplicate reports from the database before we do our statistical analyses. For VigiBase, this task is impossible to do manually due to the large number of reports it receives daily, so it becomes necessary to come up with an algorithm to do it for us. This is a more challenging problem than it sounds. Just because two reports are duplicates of one another doesn't mean that they look identical. Different reports might use different terms to describe the same adverse event, or they might include more or less information about the patient. Conversely, two reports may not contain enough information to reliably decide whether they are duplicate reports or not.

Jim Barrett: 3:30

Previous efforts to detect duplicates have focused on probabilities, comparing the likelihood of a specific combination of drugs, reactions, sexes, ages and so on occurring on a given pair of reports, based on the background reporting rates derived from VigiBase. If it seems too unlikely to have occurred by chance, then we suspect they're duplicates. This approach has been used with great success by Upps ala Monitoring Center for several years. However, methods like these can run into problems, especially in databases as large and diverse as VigiBase. One place where previous approaches are known to perform poorly is with reports of adverse events following vaccinations. Consider the vaccine against human papillomavirus. Most vaccine recipients are going to be girls around the same age, with many patients being vaccinated on the same day. If you have two HPV vaccine reports and both report the same sex, age, date of vaccination and adverse event, this may still not be sufficient evidence to suspect them of being duplicates. These challenges have made duplicate detection among vaccine reports unreliable.

Jim Barrett: 4:41

Over the past two years, researchers at UMC have been working on a new algorithm for duplicate detection for both drug and vaccine reports.

Jim Barrett: 4:49

It builds upon the strengths of earlier approaches, but also implements new methods comparing pairs of reports. For example, we use a new way of capturing any date information mentioned on the report, from drug administration periods to the start and end date of the drug, to dates contained in the free text narrative. We use this date information to determine whether the timelines described in the reports are compatible. If they are, the reports are more likely to be duplicates. If they aren't, then they may be separate reports. The method also uses machine learning to understand how to effectively weigh evidence from different parts of the reports to decide whether to suspect a pair as being duplicates. In all our tests, this new approach works as well as, or better than, previous approaches for both drugs and vaccines. Effective duplicate detection is just one cog in the machine of pharmacovigilance, but once the new method is in place, pharmacovigilance practitioners worldwide will have a sharper tool to find true safety signals, ultimately improving patient safety.

Fredrik Brounéus: 5:59

That was Jim Barrett, senior data scientist at Uppsala Monitoring Center, reading his article Weeding out duplicates to better detect side-e ffects", and he's with me here in the studio now. Welcome back to the show, Jim.

Jim Barrett: 6:12

Thanks for having me, Fredrik. Good to be back.

Fredrik Brounéus: 6:15

It's been a couple of years, but listeners interested in AI will remember the last time you were here to tell us about an algorithm that you and your colleagues had developed to improve signal detection in VigiBase.

Jim Barrett: 6:29

Yes, exactly. So, I was here a couple of years ago speaking about a method called VigiGroup, which is a kind of a new way of doing signal detection by grouping together similar reports. I think at that time we were specifically talking about how we would use that method for the COVID vaccine rollout, looking for new and unknown side effects in that time.

Fredrik Brounéus: 6:47

Right, and today we're going to talk about an other algorithm, one that's specifically designed to detect duplicate reports in VigiBase. And you said VigiGroup; does this one have a name yet?

Jim Barrett: 7:01

It does. So we have an existing algorithm that has been in use for several years at UMC called VigiMatch, which is designed to do the same problem with detecting duplicates, and I think we're going to continue with the brand there and name this new algorithm an improved VigiMatch or just keep calling it VigiMatch.

Fredrik Brounéus: 7:19

In the media, I mean, we often see terms such as AI, algorithms and machine learning used almost interchangeably, and I was wondering whether you could perhaps give us just a quick rundown of the meaning of these concepts, because they're not completely the same, are they?

Jim Barrett: 7:37

No, they're not completely the same. I mean, I think it's kind of funny with AI specifically, it's very difficult to chase down a real concrete definition. It feels like if you put three data scientists in a room you'd come away with five definitions of AI. So for me personally, I like a definition that some of my colleagues have adopted, which is that AI is being a branch of computer science that involves the ability of a machine to emulate aspects of human behavior and to deal with tasks that are normally regarded as primarily proceeding from human cerebral activity.

Jim Barrett: 8:07

This is a definition first put forward by Jeffrey Aronson a few years ago. So I quite like that definition for AI, but it's quite a broad definition, necessarily, and then going on to a definition of machine learning, I would call machine learning kind of a class of algorithms which basically learn from data, so you don't have any sort of hard-coded knowledge in them necessarily. They instead learn by example. And then algorithms is an even more broad kind of definition, I would say. It's just, you would kind of class an algorithm as a set of instructions to follow to achieve a certain task.

Fredrik Brounéus: 8:47

How do we go about developing algorithms here at UMC? Do we build them from scratch or do we depart from models created by other actors, such as, we hear about OpenAI and ChatGPT, and do we have some kind of basis and then tweak it for our own specific needs?

Jim Barrett: 9:11

Yeah, so we work on quite a wide, diverse set of problems within research and data science at UMC, and so I would say the answer to this question varies a lot depending on the problem we're working on. So, for example, we do a lot of work in the area of NLP (natural language processing) which is learning from and inferring things from free text or natural language, and in those cases we typically take models off the shelf and then tweak them to our use case. Or, as you mentioned, OpenAI we've been doing some work and investigation into using OpenAI's GPT models for certain tasks, but then for other tasks, such as VigiMatch, which I was describing in this article, the kind of precursor to the new algorithm which I described in the article was developed completely in-house and then the improvements I've made on it have been largely figuring out how to best represent features on the reports to compare them to one another. So a lot of the work has been developed from scratch in-house in that instance.

Fredrik Brounéus: 10:14

And then the next question is how do we evaluate their performance, the algorithms, when we have developed them?

Jim Barrett: 10:21

Yeah, absolutely. I mean this is is an enormous problem, an enormous topic. We could probably do three more podcast episodes just on this. I mean, as it happens, I was recently in San Diego at the DIA Global, Drug Information Association Global Conference, which is a conference where many drug manufacturers and developers and regulators meet to discuss current topics, and I was chairing a session on exactly this problem how do we evaluate AI solutions in the context of pharmacovigilance? And it's very much not an easy problem. I think we all came away from that session with more questions than answers. I mean, we can talk specifically in the context of VigiMatch.

Jim Barrett: 11:06

So one of the real difficulties with VigiMatch is duplicates are very rare.

Jim Barrett: 11:11

If you were to just pick two random reports from VigiBase, you would expect them to be a duplicate pair about one time in 250 million.

Jim Barrett: 11:23

And the issue then with this is that when you're trying to kind of generate a number of examples of true duplicates so that you can test to see if your algorithm is successfully finding them, your data set then is going to be necessarily biased, because you can't just kind of randomly sit there, label billions of pairs of reports with the hope of finding a few duplicates.

Jim Barrett: 11:47

So this presents a significant challenge, and it becomes an exercise more in understanding and correcting for the biases in the way you're evaluating this algorithm than anything else. Another significant challenge that we faced, I think, and has been a challenge facing VigiMatch for quite a long time, is that VigiBase is extremely diverse. It's a global database with over 150 contributing countries at this point, and not all countries have exactly the same pharmacovigilance landscape. They don't necessarily have the same standards or best practices of reporting, and so we sometimes see that in some countries the reporting kind of distribution, the reporting patterns, can be significantly different from other countries, and so making sure that the algorithm is performing well in all settings and not just in the most common setting, for example, also presents a significant challenge. And the reality of it is that we just have to kind of roll our sleeves up and go in and really verify the algorithm and look at real examples of where it's succeeding and where it's failing in all of these cases to get a good sense of how well it's performing.

Fredrik Brounéus: 13:07

So just to have an idea; approximately how many algorithms are we talking about, all in all that we have developed here at UMC?

Jim Barrett: 13:16

Yeah, I was trying to count these on my way into work this morning. I mean, it's certainly in the tens. It's a difficult thing to quantify, I would say, but yes, so I mean, as I mentioned earlier, we work on a diverse set of problems and there's many kind of problems within pharmacovigilance which can somewhat yield to data science techniques. So we have approaches to many of these. But yeah, I would say it's a difficult thing to count.

Fredrik Brounéus: 13:45

You mentioned in your article that VigiBase is approaching now 40 million reports of adverse drug events and yeah, speaking about difficult numbers to count, but how many of those may be duplicates?

Jim Barrett: 13:59

I mean, so this is another extremely challenging problem to count these, and it's something that I would really like to take another stab at making a good estimate. So once we've published the new VigiM atch method, we're in a better place to maybe quantify this a bit better. So the kind of classical estimate of this is that, roughly speaking, around about one in 10 reports will have a detectable duplicate somewhere in the database. The true rate of duplication is extremely difficult to measure, especially seeing as, as I mentioned in the article, sometimes you have a pair of reports which simply don't have enough information to be assessed as being duplicates, even though they may truly be duplicates. So, moreover, you have different factors affecting duplication.

Jim Barrett: 14:52

So if a patient has suffered a serious or fatal adverse event, then that may well motivate more people to report that or kind of stimulate a greater deal of reporting. So this is not necessarily uniform across all different adverse events. So this is definitely a study that I would very much like to do to try and figure out a better, get a better handle on this number. I would say we don't really know. I think this one in 10 number is about correct-ish, or not a bad estimate.

Fredrik Brounéus: 15:24

That's a fair amount of duplicates.

Jim Barrett: 15:26

It's a fair amount of duplicates, yes.

Fredrik Brounéus: 15:27

Yeah, so, but let's say, then, that our algorithm has helped us identify duplicate reports; how do we then decide which report to keep? We are talking about weeding out duplicates here, and so which do we keep and which do we weed out? And because my guess is that the reports, although they are about the same case, that they may differ significantly, for instance, with regards to their level of detail and perhaps how useful they are to us.

Fredrik Brounéus: 16:00

So what do we keep? What do we weed out?

Jim Barrett: 16:01

Yeah, absolutely. So, you're completely right. So the way that we choose which report is the "preferred report is how we say it, or the kind of canonical report, is we use an algorithm which was developed some time ago to quantify how complete a report is. It's called VigiGrade and it takes into account various aspects on like whether dates have been reported and dosages and if there's free text information and things like this. So if you have a set of duplicates, then you would choose the most complete report of those. If you then find that you have several reports which are equally complete, then we go to the one which has the most recent update. So, yeah, if you find that we have several reports which have the same completeness, then we choose the report which has the most recent update in VigiBase.

Fredrik Brounéus: 16:58

Could there ever be a case where a less complete report would have more interesting or valuable information than the more complete report?

Jim Barrett: 17:04

Yeah, I mean that's definitely a possibility and I think this speaks to a point which is nice to raise actually, which is that duplicate detection can kind of be used in several different settings, right?

Jim Barrett: 17:16

So I I described in my article that duplicates can be a big problem for statistical signal detection, where you're typically just looking at the drugs and events on reports and in that instance it doesn't really matter which is the more complete one, you only care that you have the events and the adverse events and drugs there. But if you are a signal assessor and you're sitting with your case series of 100 reports and you're deduplicating that, then typically the way we use it in practice is we don't delete and hide from you the duplicates. We instead flag them and say, oh, these are the ones that are duplicates, so that then in the case that you mentioned, when there's multiple reports referring to the same case and maybe they have different information, then the signal assessor can look at those and make an informed judgment about that

Fredrik Brounéus: 18:10

As you write towards the end of your article, you say that effective duplicate detection is just one cog in the machine of pharmacovigilance.

Fredrik Brounéus: 18:21

What other parts of the pharmacovigilance machine are we using AI or machine learning for? And you already told us about the VigiGroup algorithm, and now we have this other algorithm also, for assessing how complete a report is. But do you have any other examples for us?

Jim Barrett: 18:42

Sure.

Jim Barrett: 18:44

So one of the problems that I've been working quite a bit on, I've had a couple of master's students over the last couple of years working on this problem with me, has been to try and extract information from product labels.

Jim Barrett: 18:58

So the product labels are a document which is published alongside when a drug is authorized in a certain market, describing all sorts of things like guidelines for how to use that drug, known adverse events from the clinical trials or from post-marketing surveillance, and typically these documents are just a free text document.

Jim Barrett: 19:23

They're just published as a PDF or a Word document on the website of the regulatory authority and it turns out that it's a very useful thing to be able to know what adverse events are already known for a drug, and these tend to only be listed in these documents. The reason that it's important to know this is that when we're doing signal detection at UMC or wherever you are, you don't want to waste your time looking at stuff that's already known. You want to try and find the stuff that's hurting people which is not already known. So we've been using AI, machine learning techniques to try and mine the natural language in these documents to try and extract all of the known adverse events for a given drug, so that we can then use that information to prioritize which combinations are looked at by signal assessors downstream.

Fredrik Brounéus: 20:18

Looking at how fast things are moving in these areas. Where do you think we will be in, say, two years' time, with regards to both how the community is using this technology and how we at UMC are using it? What do you think?

Jim Barrett: 20:35

So I think, I mean the elephant in the room to a certain extent Is these large language models that have really taken the world by storm in the last year or two. And I think the next one or two years is going to see a real explosion in the use of these within the context of pharmacovigilance. We've already been experimenting with them, both for extracting information from these product labels, as I mentioned earlier for that problem, but also for summarizing or making inferences about case reports, looking for those cases which are suggestive of possible causal things. But I think the whole community is still kind of trying to understand how best to use these models and also, critically, how to understand how to use them effectively and safely, because we all know that these models hallucinate a lot. So I would say yeah, in the in the next next couple of years, that's where I imagine the biggest shift is going to come from, like grappling with and beginning to understand how to really leverage these large language models in the context of pharmacovigilance.

Fredrik Brounéus: 21:47

So, you said now, you mentioned hallucinations and here, when we are heading into this future, are there any specific pitfalls you think we need to be particularly mindful of?

Jim Barrett: 22:01

Yes, I think there can be a tendency, because these models, when we use them, I mean they appear to be so good, there's a tendency to overtrust in them, I think. So building systems, either for evaluation or just safety nets in any implementation of these in practice to avoid that kind of cognitive bias of blindly trusting them, is going to be extremely important going forward.

Fredrik Brounéus: 22:29

Thank you very much, Jim, and to finish off, do you have a dream algorithm that, you know, something that you would like to pursue, given unlimited resources?

Jim Barrett: 22:43

Yeah, given unlimited resources. So this is a very, very preliminary experiment we were running recently at UMC, but what we were playing around with was using large language models to take a case series and perform a signal assessment In the sense. So basically what the algorithm would do is it would go through and it would look for different pieces of evidence in each report and then summarize that at the end saying or there is, you know, evidence from five out of the 10 reports for a dechallenge, like the reaction stopping after the drug was discontinued, or different pieces of information like this following the Bradford Hill criteria for pharmacovigilance, for case series assessment. And, you know, given unlimited resources and unlimited research time, I think there's a lot of promise in this approach for being able to really bring forward and highlight to our human signal assessors to look at in more depth the most suggestive case series. Yeah, I think that it's a really promising and exciting area, but it would require unlimited resources, I think, to pull this off. It would be expensive.

Fredrik Brounéus: 23:57

Thank you very much for coming to the show, Jim. I learned a lot today and look forward to having you here again someday soon and hear more about what you're working on.

Jim Barrett: 24:07

Thanks for having me.

Fredrik Brounéus: 24:10

If you'd like to know more about artificial intelligence in pharmacovigilance, check out the episode show notes for useful links. That's all for now, but we'll be back soon with more long reads, as well as our usual in-depth conversations with medicine safety experts. In the meantime, we'd love to hear from you. Reach out on Facebook, LinkedIn and X. Send comments or suggestions for the show or questions for our guests next time we open up for that, and visit our website to learn more about what we do to promote safer use of medicines and vaccines for everyone, everywhere. If you like the podcast, please subscribe to make sure you won't miss an episode. And spread the word so other listeners can find us too. For Drug Safety Matters, I'm Fredrik Brouneus. Thanks for listening.

People on this episode

Drug Safety Matters

Drug Safety Matters

Uppsala Reports Long Reads – Weeding out duplicates to better detect side effects

People on this episode

Alexandra Coutinho

Federica Santoro

Fredrik Brounéus