By Samantha Toet, Derrick Kearney, Sydeaka Watson, Gwynn Sturdevant, Kevin O’Brien, and Joe Rickert
We’re trying something new and we want your support.
One of the goals of the R-Consortium Diversity and Inclusion project is for R Consortium-affiliated events to be more representative of the wider R community. As a result, we put together a community form for you to nominate your peers to speak at upcoming R Community events. This is a great way to promote the work that your colleagues are doing, and draw speakers from varying levels of expertise.
We are working with several R-related conferences and events that are seeking recommendations for knowledgeable and engaging speakers on topics that are of interest to the R Community. Our aim is to encourage speakers from diverse backgrounds to consider speaking at R events, and we would like to build a platform to bring these potential speakers to the attention of conference program committees.
About the R-Consortium R Community Diversity & Inclusion Project
The goal of the R Community Diversity and Inclusion Project (RCDI) is to broadly consider how the R Consortium can best encourage and support diversity and inclusion across a variety of events and platforms. Anyone is welcome to join our team, and you can find more information about joining here: https://github.com/RConsortium/RCDI-WG
Stephan Kadauke, MD, PhD, is an Assistant Professor of Clinical Pathology and Lab Medicine. He runs the Cell and Gene Therapy Lab at the Children’s Hospital of Philadelphia (CHOP) and also sees patients on the Apheresis service. As the Lead of the Cell and Gene Therapy Data Operations team, he has implemented several apps that are being used internally for clinical operations. He co-founded the CHOP R User Group, and he is the chair of the R/Medicine Organizing Committee.
R Consortium talks to Stephan Kadauke on the topic of Racial bias in algorithms. We also talk about how an overarching collaboration of R/Medicine and R in Pharma may help solve some difficult problems facing those in healthcare who use the R ecosystem.
RC: What is the R community like for R/Medicine?
The odd thing is that we are not all R programmers and we are not all health care professionals. After the R/Medicine 2020 conference, we looked at the demographics of our attendees and found out that 16% were practicing physicians. So a large contingent of our community actually takes care of patients. It’s also pretty international! Because of COVID, we switched from in-person to virtual in 2020. The previous two years, we had the R/Medicine conference at Yale (in New Haven, CT), then in Boston. We had about 100 people attend both of these. In 2020 we went virtual and grew five-fold, had participants from 43 different countries, and 1/3 of attendees were international. R/Medicine went global.
RC: How has COVID affected your ability to connect with members?
Things have changed and for better or worse they aren’t going back. We can’t have an exclusively US-centric conference anymore. I missed the interpersonal connections of a real live in-person event, and we did have virtual Birds-of-a-Feather sessions in Zoom breakout rooms, but there is only so much you can do to try to replicate the in-person experience. For our next event, we plan on using a platform that helps with these interactions, but nothing approximates the in-person experience. Of course, we don’t want to do this in a way that loses the virtual experience or the global community. This is a very difficult situation and one we haven’t resolved. Once the COVID pandemic is history, it would be really cool to do a hybrid-distributive conference where there are lots of small watch parties in various places to have a virtual and in-person event globally. I don’t know if we will pull it off but that would be awesome.
RC: In the past year, did you have to change your techniques to connect and collaborate with members? For example, did you use GitHub, video conferencing, online discussion groups more? Can these techniques be used to make your group more inclusive to people that are unable to attend physical events in the future?
For R/Medicine 2020 we used Crowdcast. I think Crowdcast is awesome. Compared to some other programs, Crowdcast is more limited because you don’t have an exhibit hall, multitrack, poster board, and those types of things. Instead, everyone is dropped into a single track video stream with a chat window on the side. By removing degrees of freedom there’s fewer user experience possibilities to consider for the org committee, and it also tends to lead to a more cohesive feeling for the participants. I was a big fan of using Crowdcast and we will use it again this year. For the Birds-of-a-Feather sessions, we used Zoom, which worked OK. For 2021, we’re looking into an alternative platform that does a better job allowing the kinds of chance or planned encounters and interactions that happen at a real live conference. Behind the scenes communication is super important too. We used a closed Slack channel, which was efficient for conference planning and putting out fires when necessary. Much of the planning relied on Google Docs. This stack I think is pretty common in planning online conferences these days.
RC: Can you tell us about one recent presentation or speaker that was especially interesting and what was the topic and why was it so interesting?
Racial bias in medicine is such an important issue, especially when you’re using the electronic health record to feed predictive models that will have some kind of downstream effect on patient care. Both of our keynotes for R/Medicine 2021 will discuss this topic. There’s a paper in Science that looked at a widely used algorithm that is supposed to identify patients who are at high risk for disease and need extra care. They found that the algorithm disproportionately selects White patients for extra help, and if you’re Black you’re on average much sicker to be selected. This is because the algorithm used health costs as a proxy for health needs, and because less money is spent on Black patients, it falsely concludes that Black patients are healthier than equally sick White patients. This is insane, and probably just the tip of the iceberg. But algorithmic equity really is a hard problem! For example, do you want your algorithm to consider a person’s race or not? It may seem to make sense that you want it to be color-blind, but then you can’t correct for biases embedded in the training data. Another issue is the trade-off between accuracy and fairness. Predictive algorithms nowadays exclusively optimize for predictive accuracy. How can we make them consider measures of fairness? This is something that Michael Kearns discusses in his book The Ethical Algorithm but something that we aren’t talking about in mainstream AI yet. You can mathematically define equity, and you can design your algorithm so it optimizes for a specific trade-off between accuracy and equity. We need to talk about this. I want people to get more fired up about this. I think the R community is one of the wokest communities. We should be the leaders in ethical AI! When will tidymodels support socially aware modeling? There are so many fields where algorithmic bias can screw with people’s lives. Cathy O’Neil did an awesome job capturing this in her book Weapons of Math Destruction, which is both amazing and depressing at the same time… to see where machine learning algorithms are systematically discriminating against minorities in law enforcement, lending, and health care. Especially in health care where we “first do no harm”!
RC: What trends do you see in R language affecting your organization over the next year?
How do you put R into production? How do you make clinical-grade R applications? There are some major efforts trying to establish standards and guidelines, both for software engineering and validation, and this will really help, but as of yet there isn’t a consensus. This year, we are trying to gather ideas from people in the field who are thinking about this professionally. We’re planning to have a fireside chat with folks who have made R based applications that undergo regulatory scrutiny and folks who made frameworks for production-grade Shiny apps. Hopefully, eventually we’ll be able to offer a workshop to package some of these best practices into a teachable format. We hope this will lower the bar for creating good clinical software.
RC: When is your next event? Please give details!
R/Medicine 2021 will be held August 24-27. The org committee is working full steam on the program and logistics as we speak. R consortium is helping tons with that as well as the Linux Foundation. We will have two days of workshops and two days of presentations. This will be all virtual. Registration is open now, and it’s all-inclusive, with all workshops and videos, for $50. If you are an Academic it will be $25. All students and trainees can attend for $10. If this price point presents a financial hardship, we can talk about that as well. We are trying our best to keep the barrier low for attending. I’m biased, of course, but I think it’ll be an awesome event – we have some great workshops, keynotes, and sessions lined up!
I’m a big fan of R Validation Hub! Validation is really important when it comes to clinical software – you really have to be able to show, to a reasonable degree, that what you think your software does is what it actually does. Another important part is software engineering for reliability, scale, and usability. When we think about best practices, we think about software engineering as well as validation, and how we bring those together is an important question.
I am a big fan of R in Pharma. The R in Pharma conference last year was amazing and had tons of great workshops and speakers. Thematically, R/Medicine is really well aligned with R Pharma. Of course not all of medicine is pharma, and arguably not all of pharma is medicine, but we do have a lot of overlap.
RC: There are four projects that are R Consortium Top Level Projects. If you could add another project to this list for guaranteed funding for 3 years and a voting seat on the ISC, which project would you add?
I would love to have an “R in Healthcare” project and working group to capture some of the synergies between the R/Medicine and R in Pharma communities. This working group could hammer out some of the issues that are shared between pharma and medicine, which includes the question of how to build R based apps that can withstand regulatory scrutiny as well as some of the important societal issues with algorithmic bias that we talked about earlier. I think the funding could also go to fill some of the gaps in the R ecosystem that healthcare researchers face – for example, building CONSORT diagrams with ggplot2, or creating a high-level functionality to de-identify sensitive microdata. It would be great to have R Consortium funding for one or two engineers building open-source solutions here.
How do I Join?
R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups around the world organize, share information and support each other. We have given grants over the past 4 years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute. We are now accepting applications!
During this Working with Databases in R online presentation, Christopher Maronga shares his years of practical experience in accessing and working with Databases in R. R Consortium assisted by providing access to Meetup.com Pro as a platform for information sharing
John Mutiso, a statistician and member of Nairobi R, introduces the presentation on Working with Databases in R and points to the support of the R-Ladies Nairobi Organizers and NairobiR Organizers.
Christopher Maronga, a data manager, then shares his practically gained experience on how we can turn R into a powerful tool for accessing MySQL databases and writing SQL code, pulling and querying data from within the R environment.
Maronga structured this session to be mainly hands-on learning by using coding examples and implementations. In his presentation he teaches how to efficiently connect R to RDBMS, query data stored in the RDBMS via R, connect and export data from REDCap and API security. He introduces what RDBMS is and how it is used for storage and management of data. He then goes on to explain REDCap and how it is a secure web application for building and maintaining online surveys and databases. Maronga then jumps into practical examples illustrated through the use of a local SQL database.
In the concluding section, it is emphasized that just knowing how to get data into R efficiently is half the battle in the path towards using R in data science. The speaker ended the session on the note that we should expect to see many more collaborations between R-Ladies Nairobi Organizers and NairobiR Organizers in the future.
The R Conference has been able to thrive with the changes that have occurred due to COVID. They also are planning for life post-COVID, with the lessons that they have learned from a digital world working their way into how they organize conferences going forward. R Consortium talked to Jared Lander about the issues with online conferences, how they have seen increased attendance, and how they will incorporate them into a new hybrid system.
RC: What is the R community like at the R conferences in New York and DC?
Each conference is a microcosm of the area that they are located in. We see all fields at the conference since all groups come together.
For the New York conference, it’s mostly people from the metro area, but since New York City is a hub people will stop by and attend. People are always visiting beyond the geographic area. People from Europe come and talk. We get a lot of people we wouldn’t otherwise because it’s a hub.
The DC conference is a government conference. It’s a way to focus on the DC community and their interests. It focuses on government and public life. Military talking about military matters. Intelligence offers to talk about working behind a secure network. Economists doing economic data. And teachers talking about how to analyze student data.
RC: How has COVID affected your ability to connect with members?
We used Hopin for the conference. This was a real resource drain for a lot of the attendees. I needed to run two computers to keep it going. We had to instruct people to turn off all other programs and run Zoom from the browser. It worked decently, had a good stage area, a good chat room. However, our conference has always been very lively. We normally had walk-on music for the conference. We had a visiting professor tell students at the conference to not expect this at any other conference and that this one is not normal.
To try to replicate that at the virtual conference I played walk-on music through my speakers. It was okay. We were able to get a mathematical comedian at the conference to attend, we were able to get a whiskey and rum producer to give a lesson on it and give them a discount on the products.
Usually, the speakers are local people or companies that let people travel to give talks. When we went virtual we could get a lot more people. We got Rob Hyman to give a talk because he could. Going forward, we plan on doing a hybrid approach so that people can attend anywhere.
RC: Can you tell us about one recent presentation or speaker that was especially interesting and what was the topic and why was it so interesting?
Andrew Gelman was great. He went up there with no slides and talked for 40 minutes. He is the only person I know of who could do that. This talk was on open science and how to make it work. You need to publish your data, methods, and code to make this work.
For the government conference, there was Graciela Chichilnisky, who gave a talk about carbon offsets and went over how they helped at the time and how they would work going forward. She went over how carbon offsets can save you money and help the climate at the same time.
RC: When is your next event? Please give details!
The New York Conference is going to be September 8-10 and will be HYBRID! We will have speakers announced soon. We have 8 to 10 planned right now. We are very excited. It’s going to be in Midtown East, and it will be online as well. Our R in government will be in-person and virtual also and will be held in early December but no dates yet.
CVXR is such a nice way to do optimization methods, and it’s so explainable, and it can do quasi complex programming and not just linear programming.
RC: There are four projects that are R Consortium Top Level Projects. If you could add another project to this list for guaranteed funding for 3 years and a voting seat on the ISC, which project would you add?
I would like to add something that would allow vendors to support R better. We have people like database companies and people like Nvidia who have APIs for every platform but R. They say there will be community support because there is no market for it. Even companies that have an R API do not update it as much as their other APIs. They tend to only update the tools that are for their target area, not realizing that there is a market for people who work in R. It wouldn’t even be that hard since R was written in C, so all you would have to do is modify an existing C API.
How do I Join?
R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups around the world organize, share information and support each other. We have given grants over the past 4 years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute. We are now accepting applications!
R Consortium talked with Jared Lander about the R Community in the New York Metro area. The R Community’s ability to draw upon not only their large metro area as well as their status as a hub city has allowed them to have many different speakers. This has allowed them to grow the community to an impressive size.
RC: What is the R community like in New York?
We are not just the City, but we have a broad reach. We have an active and vibrant community that is very enthusiastic. From Brooklyn Hipster data scientists with MacBooks riding a fixie [Editor’s note: A type of bicycle with just one gear, considered cool and stylish], office workers in slacks and ties just off from work, and everything in between.
Old school data scientists who consider themselves hardcore statisticians, people just learning the field, people with 30 years of experience, people with one-year experience dipping toes into a different field, we have so many people, the topic of the meeting determines who shows up. If we have a finance talk we will give finance and insurance. Marketing topics will give marketing and business people. If we have a pharmaceutical talk we will get biologists and biostatisticians.
We also have talks on topics like racing, military, doctors, and lawyers. We have talks and members from all walks of life. We have over 12,200 members. We usually have a core group that shows up to meetings, but others will come because a topic speaks to them. Some won’t come for 6 months till another topic interests them. Attendance comes down to the meeting.
RC: How has COVID affected your ability to connect with members?
We lose the in-person connection. We usually have people show up at 6:15-6:30pm for pizza and (budget allowing) drinks. Usually, people meet for 30 or 40 minutes and talk/socialize and start making friends. We have a speaker go on for 45 minutes or so. We then went to the bar to hang out about whether or not they drink. We get a great lecture, the best night school, that’s free, sandwiched between hanging out with people who are interested in your field or something close to your field.
Even though we are missing out on that, we are meeting every month. We are having Zoom fatigue because we are losing people. So we do clearly lose out on that social interaction. In-person meetings can have several people having a chat, while a Zoom meeting can only have, at most, two people talking at a time. It becomes a one-way lens.
People will chat in a Slack room where people can ask their questions. However, I preferred the in-person where people could nod to show understanding or sit next to people and whisper to them. We are missing out on that.
We were interested in using gather.town and spatial.chat, but were unable to. Due to meetup.com restrictions, we are forced to use Zoom.
However, the nice thing about going virtual is that we got a worldwide attendance from those who wouldn’t otherwise be able to attend.
Going forward, we will have the in-person meetup (hopefully soon) and Zoom as well. This way, people can participate online and via Slack as well as in person. People can also access all of our meetings via our website or our YouTube channel.
RC: Can you tell us about one recent presentation or speaker that was especially interesting and what was the topic and why was it so interesting?
Will Landau wrote the drake package and the targets package. Targets is a real game changer at my work. It makes my life so much easier as it looks at dependencies to not rerun redundant processes and parallel processes tasks. I was able to get Will to give a speech after he attended another meetup. I have done that several times actually, where people who wrote packages attended, and I have been able to get them to do future meetups.
RC: What trends do you see in R language affecting your organization over the next year?
It’s getting more respect. People are starting to learn that R is a full production language, and I have used it in many sectors. We are getting more people who are learning it for a job, and they come and visit us, and they learn that there is more to it.
RC: Do you know of any data journalism efforts by your members? If not, are there particular data journalism projects that you’ve seen in the last year that you feel had a positive impact on society?
We have a lot of members who are journalists who either use data to figure out what’s going on and/or use data to inform their readers. We have members from AP Press, NY Times, Wall Street Journal, Bloomberg, and more. It’s so cool seeing this because they want to use data to find out what’s going on. They love it, and it’s so great to see these people who are using the code to find the outcome.
RC: When is your next event? Please give details!
Our next meetup is on June 15 is a talk by Emil Hvitfeldt who is going to talk about text mining. The July 13 is by Sean Taylor with no topic chosen yet.
SFDBI because I do a lot of geospatial work. I have always asked people if I can work with geospatial data in postGIS, I can use geospatial data in R, can I use postGIS in R and I’m hoping that I can do it in R now.
Distributing computing. Most things don’t need it, but we had a speaker who worked on Ballista (now merged with Arrow Datafusion) which is a language-agnostic distributive computing program that could be integrated into R. This could force people like NVidea to port CUFD to R.
How do I Join?
R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups around the world organize, share information and support each other. We have given grants over the past 4 years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute. We are now accepting applications!
R Consortium talked to Bioconductor Asia co-organizer Matt Ritchie about the upcoming conference (BioC Asia 2021), how COVID has affected attendance at the conference, and how they deal with multilingualism at their meetings.
RC: What is the R community like for Bioconductor Asia?
In terms of other Bioconductor groups, such as Europe and America, Bioconductor Asia is quite small. We are building up, and are grateful to receive R Consortium support for our event. The first BioC Asia was held in 2015 as a satellite of the GIW/InCoB meeting in Japan. The program included a workshop day, followed by a mini-conference where people could learn about Bioconductor and present their research. Martin Morgan (project lead at the time) gave a talk and an introductory workshop on Bioconductor to around 30 participants. In the following years, we organized our meetings as satellite events to the larger ABACBS annual conference in Australia (Brisbane 2016, Adelaide 2017, Melbourne 2018 and Sydney 2019). In 2020, we were planning an in-person meeting in China hosted by Tsinghua University and were expecting a large audience of around 200 attendees. Oddly, after we went digital we ended up having 400 registrants. Thanks to the generous support of our sponsors, BioC Asia 2020 offered free registration and more people could participate in the workshops than ever before. In 2021, we will again run our event virtually, with lead organizer Kozo Nishida from RIKEN in Japan. We are also keen to have conferences in other countries. Last year we had several people from South Korea attend, so maybe future BioC Asia events can be hosted virtually (or in person) by South Korean researchers.
RC: How has COVID affected your ability to connect with members?
We canceled the in-person conference and went virtual. Zoom was the main tool that we use to communicate, and we set up a #biocasia2020 channel on the Bioconductor community slack for further discussion. Because of the virtual conference we were able to increase attendees in the workshops, where we had more people possible for virtual than in person (100 plus people online versus 30 or so for in person workshops, often limited due to physical restrictions of space). For the conference, we used the Orchestra platform that Sean Davis developed for running workshops in the cloud. We recently used this platform to run virtual training in Africa, so it’s been well tested now in different parts of the world and can scale up or down very easily. We are likely to keep a hybrid option in the future as it is more accessible for students and people without a travel budget. We want our meetings to be as accessible as possible. It’s also nice to have a meeting in your general time zone as opposed to one that is scheduled for when you were hoping to be asleep, which is often the case for meetings based around other time zones.
RC: Can you tell us about one recent presentation or speaker that was especially interesting and what was the topic and why was it so interesting?
RC: What trends do you see in R language affecting your organization over the next year?
The growing influence that the tidyverse is having on the Bioconductor project, with software such as tidyBulk and plyranges applying these principles to genomic data analysis. Both packages have been developed by researchers based in our region, and it will be exciting to see further applications in the future.
RC: Do you know of any data journalism efforts by your members? If not, are there particular data journalism projects that you’ve seen in the last year that you feel had a positive impact on society?
Although not directly related to Bioconductor, the work led by Rafael Irizarry that I saw presented at an AMSI Bioinfosummer public lecture in 2019 on estimating the number of excess deaths in Puerto Rico in the aftermath of Hurricane Maria (bioRxiv preprint here and news story here) is very inspiring. They modeled data to look at excess deaths and found that the government reported figures grossly underestimated the real impact of the disaster, which lasted long after the Hurricane had ended.
RC: When is your next event? Please give details!
BioC Asia 2021 will be held Nov 1-4 as a virtual event. The lead organizer is Kozo Nishida from RIKEN, who represents our region on the Bioconductor Community Advisory Board. This is the second time the event is hosted in Japan, albeit virtually this time around.
An early one funded in the 2016 round: ‘Software Carpentry R Instructor Training’ by Dr Laurent Gatto was a really valuable contribution for teaching countless people how to use R. Software carpentry is an amazing platform for onboarding new users, and Laurent and colleagues are currently planning a new curriculum that introduces the world of Bioconductor using this approach.
R / Medicine, as a lot of Bioconductor tools are used in clinical research and there is a lot of interest in Bioconductor from that sector. R / Medicine has an annual conference that will be held virtually this year on August 24-27th.
RC: There are four projects that are R Consortium Top Level Projects. If you could add another project to this list for guaranteed funding for 3 years and a voting seat on the ISC, which project would you add?
More support for teaching R in other Languages would be great for our work. At BioC Asia 2020 we ran workshops in Mandarin which proved very popular. We have also published RNA-seq analysis workflows in English and Chinese. It would be great to see more multilingual vignettes and workflows so that people can learn about different packages in whatever language suits them best. We are looking at redeveloping the Bioconductor website and aim to have key landing pages and training material translated into different languages. Adding closed captioning in English to talks can also improve accessibility.
How do I Join?
R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups around the world organize, share information and support each other. We have given grants over the past 4 years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute. We are now accepting applications!
Since the start of the COVID-19 pandemic in March 2020, R-Ladies Philly has shifted from local in-person meetups to virtual events. Hundreds of local, national, and even international R enthusiasts joined us for monthly virtual meetups and social activities! We organized more than 10 R workshops and co-hosted a datathon with local partners. We even launched a YouTube channel to make our workshop recordings widely available. Based on feedback from our members, this has been very successful despite the difficulties associated with COVID-19. In this post, we share a bit more information on our events and what has worked for our group.
Workshops
Starting from April 2020, we organized 11 R workshops, ranging from basic data cleaning and visualization to more advanced R usage like R package development and popular topics such as machine learning. One of our goals is to celebrate gender diversity in the R community by highlighting different speakers. We also aim to engage R users from all different levels and encourage speakers to share their learning experiences. We made all workshop content and notes captured during Q&A available to the public through our Youtube channel and event recaps on the R-Ladies Philly blog. See below for links to our workshop events:
Every year, R-Ladies Philly organizes a datathon that aims to bring R enthusiasts and community organizations together to create insights through data and give participants exposure to new techniques, real-world data, and a diverse group of local data science professionals. These datathons consist of in-person kickoff and conclusion events, and 6 weeks of online collaboration in between.
In 2021, we had to switch to a fully online format, where the kickoff meetup was held via Zoom and participants organized themselves into groups through breakout rooms. Our 2021 datathon explored judicial patterns in Philadelphia courts, including bail, sentencing, and the concept of ‘judge harshness’. Participants worked together using Zoom, Slack, GitHub, and a shared Google doc for Q&A that also allowed the partner organization to answer questions asynchronously. Our conclusion meetup (view the recording here) showcases some of the highlights of this year’s datathon findings and the work that participants have put into analyzing a large and complex real-world dataset.
Insights from a virtual year
Overall, we are looking forward to returning to our pre-pandemic format when it is safe to do so. Being forced to adapt our approach has had some benefits. We were able to reach a broader audience that was not previously able to travel to our events. We were also able to try new event formats and new technology. For example, we held two virtual social events where we experimented with different formats to get to know each other remotely. We also used tools like breakout rooms in zoom for our datathon and online tools like sli.do for polls and Q&A sessions for our panel and datathon events. We also tried to keep our events as interactive as possible with lively chats and the usage of Google docs to track and answer all participant questions during workshop events. These practices will be useful for all future workshops, whether virtual, in-person, or hybrid.
We are looking forward to continuing to build our online presence with more YouTube and blog posts, even when we are able to meet again face to face. If you are interested in joining us, please look for upcoming events on our meetup page. We are also seeking volunteers to plan and lead hands-on workshops for the remainder of 2021. Please learn more by visiting our website.
R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups around the world organize, share information and support each other. The wealth of knowledge in the community and the drive to learn and improve is inspiring. We had a chance to talk with Jiun Siew, Data Scientist and organizer of R User Group Melbourne, to find out more about the R community in Melbourne, how they’re holding up during the pandemic, trends in R, and what the future holds.
If you are interested in applying to the RUGS program for your organization, see the How do I Join? section at the end of this article.
RC: What is the R community like in Melbourne?
Melbourne has a vibrant community, with a mixture of students, professionals, and industries involved. Our R Meetup has almost 3,000 members who are interested in R and Data Science. We do get a little support in organizing from some of the companies in the area.
RC: How has COVID affected your ability to connect with members?
Melbourne had a very strict lockdown happen. We had a 5km travel radius where we were allowed to travel, restrictions on going to grocery stores, and the like. Because of that, we couldn’t meet in person. To get around this, we did our meetings primarily on Zoom.
RC: Can you tell us about one recent presentation or speaker that was especially interesting and what was the topic and why was it so interesting?
In the last meetup in November, WhyHive, who does analytics using Shiny, did a presentation for the work they did for the International Women’s Development Agency. It was amazing to see, and they were such a young crew. WhyHive is a social enterprise that works with non-profits to analyze and make data-driven decisions. Typically, our Zoom meetings with higher turnout tend to be around 30 to 40 people. For this one, we had 70 to 80. The conversation for this was so good that we had to cut off questions to them at the end due to time.
RC: What trends do you see in R language affecting your organization over the next year?
The overall trend where more of our users are going is Tidyverse and all things Tidy. For better or worse, it is where our users tend to be going. We have seen a lot of time series, tidy objects, tidy models, and the like. In our organization, it has become almost the de facto method, to the point where people don’t usually use For loops anymore.
RC: When is your next event? Please give details!
We are currently in the planning stage, and we are looking for a speaker. More details to come soon!
Due to my work, Mater 2.0 stuck out to me. It would be very nice to deal with larger data frames. It would help with scaling projects to a larger size.
The ones that I saw, distributive computing, were really interesting. Being able to scale is one of the problems and one of the limitations that our members run into. You can multithread it or hack parallels or distribute a job, but it would be awesome if it were more native. This would also help with scaling projects up for people who work in industry.
RC: There are four projects that are R Consortium Top Level Projects. If you could add another project to this list for guaranteed funding for 3 years and a voting seat on the ISC, which project would you add?
I think there should be more put into diversity and inclusion. This is a really important field that I believe should be emphasized. Another one would be looking into scalability in R and making it more usable in work environments. I attended an R User Conference in Brisbane some years ago, and what I saw showed a lack of scaling. In industry we try to use R in production, and the lack of scalability is an issue in R. This issue is becoming more important.
How do I Join?
R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups around the world organize, share information and support each other. We have given grants over the past 4 years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute. We are now accepting applications!
R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups around the world organize, share information and support each other. The wealth of knowledge in the community and the drive to learn and improve is inspiring. We were able to talk to Adriano Belisario, program manager at School of Data Brazil (Open Knowledge Brazil) and associated researcher at MediaLab.UFRJ to find out more about the da R community in Brazil, how they are dealing with the pandemic, what trends are happening in R in Brazil, and how they are heading forward.
RC: What is the R community like in Brazil?
The R community in Brazil is very vibrant and generous. There are a lot of initiatives like meetups, events, open classes, and tutorials for people who want to learn to program in R. Personally, I would like to highlight three initiatives from the Brazilian community: R Ladies Brazil, Curso-R, and R Brasil Telegram group.
In a country with so many levels of inequality, the R Ladies Brazil does an amazing work on making sure that R would be accessible to women and other underrepresented groups. Curso-R creates courses, free books, tutorials and a lot of excellent materials for those who use R in Brazil. They also have a YouTube channel that provides live streams, debates and live coding in Portuguese in R. Finally, the Telegram group has more than 2000 active members currently. It is a very active channel for the community, where people discuss, exchange information about events, support others with technical issues, and offer help to other users.
RC: How has COVID affected your ability to connect with members?
Since 2016, the School of Data Brazil has organized the Brazilian Conference on Data Journalism and Digital Methods (Coda.Br), the main event of this area in Latin America. This conference and most of our activities used to be in-person before COVID. The exception was an experience doing online courses (MOOCs) with the Knight Center for the Journalism in America (University of Texas), but since the beginning of the pandemic we needed to reinvent all of our methodologies, courses and community activities towards the online environment. Although we miss in person meetings, this change has allowed us to work with people all across Brazil. We had excellent results with the last conference, which was also supported by R Consortium.
RC: Can you tell us about one recent presentation or speaker that was especially interesting and what was the topic and why was it so interesting?
Talking about Coda.Br and R-Ladies, I will mention the presentation of Gabriela de Queiroz on AI Bias. She introduced the topic explaining for a broad audience some critical implications of AI nowadays. In the talk, Gabriela presented the general issue of dealing with bias in data and went over how to mitigate these biases using open-sources programs. She introduced toolkits, software, and libraries that are used to address this problem, like Fairlearn, Lift, What-If Tool and AI Fairness 360.
RC: What trends do you see in R language affecting your organization over the next year?
Thinking about not only my organization, but the entire field of data journalism and the use of R language by journalists, I would say that with the general election for the government coming up, access to election data, governmental budget might be an important topic. While we have access, it is not always easy to query it. There is a lot of work in collecting, preparing, cleaning, and transforming the data so it can be analyzed for public use. That’s why the work done by initiatives such as Brasil.IO or Base dos Dados is so important. They offer open data well structured, cleaned and ready to use. The last one also has a package in CRAN. So we might see some impact in terms of academic research and data journalism, since it is becoming easier to realize more complex analysis merging several datasets. Finally, along with the election, environmental data will be important as climate change is an urgent topic globally and locally. In School of Data Brazil, we have created a course about environmental data journalism, in partnership with the Earth Journalism Network.
RC: Do you know of any data journalism efforts by your members? If not, are there particular data journalism projects that you’ve seen in the last year that you feel had a positive impact on society?
Since the School of Data is focused on data journalism and data literacy, there are a lot of people across the country in our network with experiences in these fields. We also have a membership program with hundreds of journalists, researchers and developers. The program offers benefits such as webinars, dedicated channels and free entry in Coda.Br, for those who support our activities through a fee, as we are a nonprofit organization.
Another way we support data journalists is through the Claudio Weber Abramo Award. This awareness recognizes and stimulates high-level excellence in data journalism in Brazil. The award and this open forum we’ve created help to highlight the best works in the country and keep the community engaged. One of the highlights of the Award is the fact that the summary of all subscriptions of the last edition are available on the website.
RC: When is your next event? Please give details!
Our conference – Coda.Br – is going to be held online, November 8-13. Second online convention due to the pandemic.
RC: There are four projects that are R Consortium Top Level Projects. If you could add another project to this list for guaranteed funding for 3 years and a voting seat on the ISC, which project would you add?
I would love to see an effort to promote data literacy. Not just for journalists, but for everyone. We need to make sure that most people have access to the basic mindset to properly understand, analyze, and criticize data.
How do I Join?
R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups around the world organize, share information and support each other. We have given grants over the past 4 years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute. We are now accepting applications!
R Consortium’s R User Group and Small Conference Support Program (RUGS) is made to support R groups around the world by providing grants to help R groups organize, share information and support each other. Unlike many groups over this past year, the R groups in Taiwan did not have as long a disruption as other groups did. Can they provide a glimpse of how the rest of the world can hold regular meetings? Do they have a way to mix the wave of virtual meetings along with local meetings? We talked with Kristen Chan, current R-Ladies Taipei organizer, to find out more.
RC: What is the R community like in Taiwan?
In Taiwan, we have two R communities, one is R-Ladies Taipei which I host, and another one is Taiwan R user group. Both of the R communities promote and build up venues for friendly data discussion. We welcome talks on any data topics such as Machine Learning, Deep Learning, Data Analytics, Data Engineering, and meet every Monday night. The last Monday of every month is reserved for women.
RC: In the past year, did you have to change your techniques to connect and collaborate with members?
In Taiwan, at the very beginning of 2020, we stopped the face-to-face meetup and started thinking about how we can continue to maintain the meetup. But how lucky we are! Taiwan’s CDC has the pandemic under control, so we can do everything as normal. So we still have the regular meetup, but we require everyone to wear masks to protect each other during the event.
RC: Can you tell us about one recent presentation or speaker that was especially interesting and what was the topic and why was it so interesting?
In October 2020 we held a satRday conference. It was the first time we had this satRday event and also it is the first one in Asia. And it was tough to make this conference happen during the COVID-19 pandemic. Fortunately, we had invited Yihui Xie who works for RStudio to give us a wonderful talk. We are using the online meeting to make this happen. The topic is R Markdown. It was an interesting and unforgettable presentation because most of us don’t have the experience to talk to the author directly, so all the attendances were learned a lot.
RC: Do you know of any data journalism efforts by your members? If not, are there particular data journalism projects that you’ve seen in the last year that you feel had a positive impact on society?
Yes, some of our members are reporters. They are learning R and some BI tools to help them deal with the data cleaning and make some graphs to let people know more about information.
And we also have a community called “知了新聞 Cicadata.” It’s a community about data journalism and its goal is that they want the data journalism field more open. By the way, one of the founders of 知了新聞 is also Taiwan R user group’s current organizer.
RC: When is your next event? Please give details!
We are still planning the event. But I think we will have a series of beginner tutorialmeetups to let more people know R.
My favorite is “Distributed Computing.” Because the data becomes bigger and we need to deal with that and also want to save some time, what we need is Distributed Computing.
How do I Join?
R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups around the world organize, share information and support each other. We have given grants over the past 4 years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute. We are now accepting applications!