Stephan Kadauke, MD, PhD, is an Assistant Professor of Clinical Pathology and Lab Medicine. He runs the Cell and Gene Therapy Lab at the Children’s Hospital of Philadelphia (CHOP) and also sees patients on the Apheresis service. As the Lead of the Cell and Gene Therapy Data Operations team, he has implemented several apps that are being used internally for clinical operations. He co-founded the CHOP R User Group, and he is the chair of the R/Medicine Organizing Committee.
R Consortium talks to Stephan Kadauke on the topic of Racial bias in algorithms. We also talk about how an overarching collaboration of R/Medicine and R in Pharma may help solve some difficult problems facing those in healthcare who use the R ecosystem.
RC: What is the R community like for R/Medicine?
The odd thing is that we are not all R programmers and we are not all health care professionals. After the R/Medicine 2020 conference, we looked at the demographics of our attendees and found out that 16% were practicing physicians. So a large contingent of our community actually takes care of patients. It’s also pretty international! Because of COVID, we switched from in-person to virtual in 2020. The previous two years, we had the R/Medicine conference at Yale (in New Haven, CT), then in Boston. We had about 100 people attend both of these. In 2020 we went virtual and grew five-fold, had participants from 43 different countries, and 1/3 of attendees were international. R/Medicine went global.
RC: How has COVID affected your ability to connect with members?
Things have changed and for better or worse they aren’t going back. We can’t have an exclusively US-centric conference anymore. I missed the interpersonal connections of a real live in-person event, and we did have virtual Birds-of-a-Feather sessions in Zoom breakout rooms, but there is only so much you can do to try to replicate the in-person experience. For our next event, we plan on using a platform that helps with these interactions, but nothing approximates the in-person experience. Of course, we don’t want to do this in a way that loses the virtual experience or the global community. This is a very difficult situation and one we haven’t resolved. Once the COVID pandemic is history, it would be really cool to do a hybrid-distributive conference where there are lots of small watch parties in various places to have a virtual and in-person event globally. I don’t know if we will pull it off but that would be awesome.
RC: In the past year, did you have to change your techniques to connect and collaborate with members? For example, did you use GitHub, video conferencing, online discussion groups more? Can these techniques be used to make your group more inclusive to people that are unable to attend physical events in the future?
For R/Medicine 2020 we used Crowdcast. I think Crowdcast is awesome. Compared to some other programs, Crowdcast is more limited because you don’t have an exhibit hall, multitrack, poster board, and those types of things. Instead, everyone is dropped into a single track video stream with a chat window on the side. By removing degrees of freedom there’s fewer user experience possibilities to consider for the org committee, and it also tends to lead to a more cohesive feeling for the participants. I was a big fan of using Crowdcast and we will use it again this year. For the Birds-of-a-Feather sessions, we used Zoom, which worked OK. For 2021, we’re looking into an alternative platform that does a better job allowing the kinds of chance or planned encounters and interactions that happen at a real live conference. Behind the scenes communication is super important too. We used a closed Slack channel, which was efficient for conference planning and putting out fires when necessary. Much of the planning relied on Google Docs. This stack I think is pretty common in planning online conferences these days.
RC: Can you tell us about one recent presentation or speaker that was especially interesting and what was the topic and why was it so interesting?
Racial bias in medicine is such an important issue, especially when you’re using the electronic health record to feed predictive models that will have some kind of downstream effect on patient care. Both of our keynotes for R/Medicine 2021 will discuss this topic. There’s a paper in Science that looked at a widely used algorithm that is supposed to identify patients who are at high risk for disease and need extra care. They found that the algorithm disproportionately selects White patients for extra help, and if you’re Black you’re on average much sicker to be selected. This is because the algorithm used health costs as a proxy for health needs, and because less money is spent on Black patients, it falsely concludes that Black patients are healthier than equally sick White patients. This is insane, and probably just the tip of the iceberg. But algorithmic equity really is a hard problem! For example, do you want your algorithm to consider a person’s race or not? It may seem to make sense that you want it to be color-blind, but then you can’t correct for biases embedded in the training data. Another issue is the trade-off between accuracy and fairness. Predictive algorithms nowadays exclusively optimize for predictive accuracy. How can we make them consider measures of fairness? This is something that Michael Kearns discusses in his book The Ethical Algorithm but something that we aren’t talking about in mainstream AI yet. You can mathematically define equity, and you can design your algorithm so it optimizes for a specific trade-off between accuracy and equity. We need to talk about this. I want people to get more fired up about this. I think the R community is one of the wokest communities. We should be the leaders in ethical AI! When will tidymodels support socially aware modeling? There are so many fields where algorithmic bias can screw with people’s lives. Cathy O’Neil did an awesome job capturing this in her book Weapons of Math Destruction, which is both amazing and depressing at the same time… to see where machine learning algorithms are systematically discriminating against minorities in law enforcement, lending, and health care. Especially in health care where we “first do no harm”!
RC: What trends do you see in R language affecting your organization over the next year?
How do you put R into production? How do you make clinical-grade R applications? There are some major efforts trying to establish standards and guidelines, both for software engineering and validation, and this will really help, but as of yet there isn’t a consensus. This year, we are trying to gather ideas from people in the field who are thinking about this professionally. We’re planning to have a fireside chat with folks who have made R based applications that undergo regulatory scrutiny and folks who made frameworks for production-grade Shiny apps. Hopefully, eventually we’ll be able to offer a workshop to package some of these best practices into a teachable format. We hope this will lower the bar for creating good clinical software.
RC: When is your next event? Please give details!
R/Medicine 2021 will be held August 24-27. The org committee is working full steam on the program and logistics as we speak. R consortium is helping tons with that as well as the Linux Foundation. We will have two days of workshops and two days of presentations. This will be all virtual. Registration is open now, and it’s all-inclusive, with all workshops and videos, for $50. If you are an Academic it will be $25. All students and trainees can attend for $10. If this price point presents a financial hardship, we can talk about that as well. We are trying our best to keep the barrier low for attending. I’m biased, of course, but I think it’ll be an awesome event – we have some great workshops, keynotes, and sessions lined up!
RC: Of the Funded Projects by the R Consortium, do you have a favorite project? Why is it your favorite?
I’m a big fan of R Validation Hub! Validation is really important when it comes to clinical software – you really have to be able to show, to a reasonable degree, that what you think your software does is what it actually does. Another important part is software engineering for reliability, scale, and usability. When we think about best practices, we think about software engineering as well as validation, and how we bring those together is an important question.
RC: Of the Active Working Groups, which is your favorite? Why is it your favorite?
I am a big fan of R in Pharma. The R in Pharma conference last year was amazing and had tons of great workshops and speakers. Thematically, R/Medicine is really well aligned with R Pharma. Of course not all of medicine is pharma, and arguably not all of pharma is medicine, but we do have a lot of overlap.
RC: There are four projects that are R Consortium Top Level Projects. If you could add another project to this list for guaranteed funding for 3 years and a voting seat on the ISC, which project would you add?
I would love to have an “R in Healthcare” project and working group to capture some of the synergies between the R/Medicine and R in Pharma communities. This working group could hammer out some of the issues that are shared between pharma and medicine, which includes the question of how to build R based apps that can withstand regulatory scrutiny as well as some of the important societal issues with algorithmic bias that we talked about earlier. I think the funding could also go to fill some of the gaps in the R ecosystem that healthcare researchers face – for example, building CONSORT diagrams with ggplot2, or creating a high-level functionality to de-identify sensitive microdata. It would be great to have R Consortium funding for one or two engineers building open-source solutions here.
How do I Join?
R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups around the world organize, share information and support each other. We have given grants over the past 4 years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute. We are now accepting applications!