window.intercomSettings = { app_id: "w29sqomy", custom_launcher_selector:'#open_web_chat' };Skip to main content
All Posts By

R Consortium

R Consortium Grants Committee Announces New Chair

By Blog

The R Consortium is pleased to announce that Katherine Jeschke has been appointed Chair of the Grants Committee.

She will oversee day-to-day grant processes for both pre- and post-awards, ensuring grants meet the R Consortium’s strategic objectives. Katherine will report to the Executive Director and work closely with the Infrastructure Steering Committee and the RUGS program to track grants and help improve the effectiveness of R Consortium grants in supporting the R Community.

Katherine’s non-profit grants and administration skills were honed over more than 25 years of experience in marketing communications, and staff, budget, development, and grants management for non-profits and trade organizations, as well as corporate and public sector consulting organizations. 

She holds an MA degree in American History and Museum Administration from George Washington University and a BA degree in Fine Arts and Art History from the University of Maryland.

“Getting acquainted with our various grants and procedures while evaluating the strategic effectiveness of R Consortium grants is a big undertaking, but her background and years of experience should ease her way,” said Joseph Rickert, Executive Director of the R Consortium.

She may be reached at kj.jeschke@posit.co.

Pharma RUG: The Rise of R in China’s Pharmaceutical Industry

By Blog

PharmaRUG, China organizer Joe Zhu, spoke with the R Consortium about the growing R community and the increasing use of R in the pharmaceutical industry in China. The group has contributed to the pharmaceutical R community through several R packages. Since its establishment last year, the group has organized large-scale hybrid events. Joe also shared some tools and techniques for smoothly organizing and running hybrid events.

Please share about your background and involvement with the RUGS group.

I have a PhD in statistics and studied in New Zealand for my undergraduate and postgraduate degrees in statistics. My PhD work focused on theoretical coalescent theory and probabilistic modeling for phylogenetics models. I also completed a postdoc at Oxford, focusing on statistical genomics for the human genome and malaria parasite genome projects. During this time, I developed open source software tools for statistical genomics, primarily using R as a front end and developing C++ software.

For the past four years, I’ve worked at Roche, where I started leading a major collaboration initiative in pharma three years ago. I’ve created TLG (table, listing, and figures) for regulatory submissions to the FDA. Throughout this initiative, we have open sourced around 30 software packages, including `formatters`, `rtables`, `rlistings` and `tern`. Last year, we submitted these packages to CRAN.

At first, we open sourced the project on GitHub and then submitted it to CRAN. I’m heavily involved in one of China’s R user groups, PharmaRUG. We use the group to share posts about developments in the area, and we organize events and conferences. In March last year, we hosted the first event with over 100 people on-site and around 100 online. The event covered topics like R package usage in the pharma industry. Later that year, we organized another event called “Open Source Clinical Reporting summeR“. 

Lately, I have been busy organizing several events. I recently gave a talk (about R package dependencies as directed in acyclic graphs) at a conference hosted by the R community in China. Early next month, on August 1st, I will attend a pharma conference where I will conduct a workshop on good practices in software package development. The conference schedule is quite packed for me as I also have a session on how teams operate and collaborate within the Pharma industry to develop R packages. On the third day of the conference, I will organize a series of 11 data visualization talks, one of which is about Python. Most of the talks will focus on using R, except for one discussion on Python.

Can you share what the R community is like in China? 

We have opened up seats for students to join our events in the pharmaceutical industry. In the past, fewer than 20 students, mostly from academia, have joined us for these conferences. The events include big names like Roche, Johnson & Johnson, Novartis, Boehringer Ingelheim, and Sanofi and local companies such as Fosun, Hengrui, and Legend Biotech. There is a big R community in China across academia and industry. Our user group primarily focuses on the pharma industry. Our WeChat channel has nearly a thousand subscribers, and our group chat has almost 500 members. It’s a very active community. 

Later this year, we will collaborate with the “R in Pharma” for the October conference. Daniel Sabanes Bove and I have contacted Harvey and Phil, and we will organize an APAC track, including India, China, Japan, Australia, Singapore, and Korea. 

Any techniques you recommend using for planning for or during the event? (Github, zoom, other) Can these techniques be used to make your group more inclusive to people that are unable to attend physical events in the future?

We have created a GitHub account called PharmaRUG. We use this platform to share websites, posts, slides, and videos related to our events. The Pharma RUG 2024 conference was particularly successful this year, thanks to the support from the R Consortium. We also utilize WeChat groups to call for speakers and interact with others. In addition to GitHub and WeChat, we use Tencent Docs to share documents. This is particularly useful in China, where using company-specific platforms like Google or Microsoft can be hindered by firewalls. Tencent Docs works perfectly in China, making sharing and synchronizing documents easy.

Can you share some valuable tips for organizing succesful Hybrid events?

We have a series of planning sessions where we actively communicate using WeChat. We meet at a community center where everyone is open, and we have preset meetings. We test the audio and everything beforehand. This is our second year organizing these events, so we have gained more experience. We are now familiar with the standards and know what needs to be done. For example, when two companies, like MNCs, use different systems, we find it better to use one shared system to ensure everything is synchronized.

We’ve found that Microsoft Teams is easy to use for setting up meetings and scheduling them ahead of time. For live demos, we recommend pre-recording the demos and taking questions. In the case of hybrid sessions with multiple locations, we prioritize asking and answering questions based on the primary and secondary locations, as well as online participation. If we cannot answer questions quickly, we host Q&A sessions afterward and share them online.

I believe that for the event to be successful, timing is crucial. We must stick to the schedule because it’s a hybrid event. However, we should also allow for some flexibility when unexpected things come up. We haven’t created a YouTube account yet because YouTube isn’t accessible in China. One alternative could be setting up a Bilibili web page and account to share the videos. All our files are currently on GitHub, which is convenient. We need to trim the videos to smaller sizes to fit GitHub’s file size limits, maybe at four and a half speeds or similar.

What trends do you currently see in R language and your industry?

So, SAS has dominated the software space for the Pharma industry for decades. While it used to be used for exploratory and research purposes, there have been successes with using Office to support missions in recent years. Roche also has success stories in this area. There are several initiatives, with PharmaVerse being a significant player. Roche is part of PharmaVerse, taking inspiration from the tidyverse multiverse concept. The end-to-end clinical reporting process is considered in this space, from data preparation to TLG generations. A lot has happened in the past three to four years, especially in China last year. There’s been significant development in China, and you can see a shift from SAS to R in the tools used. At the PharmaSUG meeting, which was previously dominated by SAS users, in the past few years, a quarter to one-third of the tools are using languages other than SAS. It’s clear that things are moving away from SAS towards software languages like R.

This year, I don’t have the complete statistics with me right now, but you do see a lot of topics. In my session, I’m sharing, and you know, many talks use visualization because it’s much likable. So, the trend is that R is becoming more acceptable than before, from PLCs to things in production. There are very high standards for codes and validation.

In the end, I would like to thank my dear friends and colleagues for their support and for making this happen

  • Yan Qiao, Associate Director of Scientific Programming, Beigene, 
  • Baoqin Li, China Head Clinical & Statical Programming, Johnson & Johnson
  • Dong Guo, China head of Stats Analyst, Eli-lilly&Company
  • Yun Ma, Director, clinical data Sciences, Boehringer Ingelheim (china) Investment Co.
  • Yanli Chang, Head of Data Operations China, Novartis

How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups organize, share information, and support each other worldwide. We have given grants over the past four years, encompassing over 68,000 members in 33 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute.

R-Ladies Rome: Empowering Women in Data Science Through Collaboration and Innovation

By Blog

Federica Gazzelloni, co-organizer of R-Ladies Rome, recently spoke with the R Consortium about the fast-growing R community in Rome. The group collaborates with R user groups worldwide and has successfully attracted a diverse audience to its events. Federica also contributes to the R community through package development and is currently working on a book about using R for health metrics and tracking the spread of infectious diseases.

The group is hosting an online event titled “Building Reproducible Pipelines with R, Docker, and Nix” on the 29th of July. R users from around the world are invited to attend this event.

Please share about your background and involvement with the RUGS group.

I am Federica Gazzelloni, an actuary and statistician interested in health studies. I am writing a book about health metrics and the spread of infectious diseases. As the lead organizer of R-Ladies Rome, one of the R user groups sponsored by the R Consortium, I am grateful for the support that enables us to organize monthly talks, tutorials, and workshops. Our events provide an inclusive and accessible learning environment free of charge, featuring exciting speakers and various engagement opportunities. Additionally, we have held events in partnership with R-Ladies New York, R-Ladies Paris, and Tunis R User Group and created a branded website: rladiesrome.org!

R-Ladies Rome started in 2023 and has grown significantly since then. Our events consistently reach a substantial audience. For instance, our latest event with Isabella Velasquez garnered over 100 RSVPs. During Data Viz Month, we received unexpected attention from the open tech community of learners. Our followers on Meetup have reached over 1.1k, and our social media presence is steadily growing.

Can you share what the R community is like in Rome? 

Since the kick-off of the chapter in 2023, R-Ladies Rome has played a pivotal role in fostering a dynamic community. We have successfully brought together an international group of R enthusiasts, ranging from beginners to experienced data scientists, creating a supportive and engaging environment for all. The popularity of the R language within the open source community, particularly for statistical analysis and medical research, is evident in Rome. R offers a wide range of libraries that can be easily applied to various topics, making it very convenient for users. Although Python is gaining attention in research and providing another accessible option for statistical analysis, the medical statistical community seems to prefer R over Python due to its extensive capabilities and strong community support.

Our events have consistently attracted significant attention and participation. For example, our latest event with Isabella Velasquez garnered over 100 RSVPs. We have also received unexpected attention during Data Viz Month from the open tech community of learners. Our Meetup followers have reached over 1.1k, and our social media presence is steadily growing.

The R community in Rome is expected to grow, with R-Ladies Rome at its heart, driving engagement and promoting the use of R for various applications. We are excited to continue growing and evolving, providing valuable learning opportunities and fostering connections within the community.

What trends do you currently see in R language?

We have reviewed all the events that R-Ladies’ groups have organized in the past years and, after considering the rebranding of RStudio to Posit PBC and RMarkdown to Quarto, realized that several exciting trends in the R language are shaping its usage and development within the data science community. Analyzing event titles, attendee numbers, and activity on past recordings, we found that events containing “Introduction” or “Tutorials” impact learners most, highlighting the growing interest in learning R for data analytics, reproducible research, and dynamic reporting.

Moreover, integrating R with other programming languages and platforms is gaining significant attention. The interoperability between R and languages like Python, HTML, or Java allows users to enhance their skills across multiple tools within a single environment. This has expanded R’s capabilities, making it a versatile choice for a diverse range of users.

Any techniques you recommend using for planning for or during the event? (Github, zoom, other) Can these techniques be used to make your group more inclusive to people that are unable to attend physical events in the future?

We’ve been using Canva extensively for various tasks and found ChatGPT very helpful in crafting storytelling content. To enhance the planning and execution of events, Copilot assists with its excellent collaborative typing, saving time. Additionally, having a Meetup pro-account is valuable, mainly as R-Ladies Rome is part of the broader R Ladies group. It helps us connect with a wider audience while using Google Forms aids communication and prevents missing information. We also use YouTube and have our channel, which is very useful for sharing past events recordings and making them available online to ensure accessibility for those unable to attend live events. 

You have a Meetup on Building reproducible pipelines with R, Docker and Nix, can you share more on the topic covered? Why this topic? 

We have an upcoming Meetup titled “Building reproducible pipelines with R, Docker and Nix” , featuring speaker Bruno Rodrigues. This topic was chosen based on feedback from our organizers, Silvana Acosta and Rafael Ribeiro, who polled our audience to identify a favorite speaker. Bruno Rodrigues emerged as a popular choice, highlighting the growing interest in robust and reproducible data science workflows.

In this session, Bruno Rodrigues will guide us through setting up reproducible data pipelines using R, Docker, and Nix. These tools ensure that data analyses are consistent and can be easily shared and replicated across different environments. By learning to use Docker and Nix alongside R, our community members will gain valuable skills to enhance the reliability and reproducibility of their data science projects. This event aligns with our mission to provide practical and impactful learning opportunities that meet the evolving needs of the data science community.

Please share about a project you are currently working on or have worked on using the R language. Goal/reason, result, anything interesting, especially related to the industry you work in?

One of the key projects I am currently working on involves a quarto-book titled “Health Metrics and the Spread of Infectious Diseases with R”, which CRC Press will publish at the end of this year. This book aims to provide comprehensive insights into the intersection of health metrics, such as DALYs and infectious disease dynamics, using advanced statistical methods and machine learning techniques in R. The goal is to equip readers with the knowledge and tools to analyze and interpret health data effectively, thereby contributing to the broader field of public health.

In addition to the book, I have developed a couple of R data packages to aid in data analysis and visualization. One is “oregonfrogs,” which is expected to go on CRAN very soon. This package focuses on classification modeling for detecting frog habitats, utilizing spatial techniques. It provides a valuable function, longlat_to_utm(). The development of these packages showcases R’s versatility in handling complex ecological data and emphasizes the importance of open source tools in advancing scientific research. Through these projects, I aim to demonstrate the practical applications of R in public health and environmental science, fostering a deeper understanding and appreciation of data-driven methodologies.

How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups organize, share information, and support each other worldwide. We have given grants over the past four years, encompassing over 68,000 members in 33 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute.

Empowering the R Community: Insights from Myles Mitchell of the Leeds Data Science Group

By Blog

The R Consortium recently interviewed Myles Mitchell, co-organizer of the Leeds Data Science group, to discuss the local R community and the group’s recent activities. Myles highlighted the group’s efforts to create an inclusive and welcoming environment for all participants. The group is dedicated to creating networking opportunities for students interested in pursuing a career in data science and sharing job openings.

The Leeds Data Science group is hosting an in-person event titled “Improving the Fidelity and Stability of Large Language Models” on the 23rd of July. 

Please share about your background and involvement with the RUGS group.

I am a data scientist at Jumping Rivers, a data science consultancy. We collaborate with various companies on data-related projects, such as data storage, modeling, developing data visualization dashboards, and offering data science training. Initially, I had a background in Python, but I learned R while working at Jumping Rivers, where many of our staff are proficient in R, and much of our infrastructure is written in R.

At Jumping Rivers, we receive funding from the R Consortium. We organize the Leeds Data Science Meetups every two months and the North East Data Science Meetups every three months. Additionally, we hold annual conferences, such as Shiny in Production (October) and SatRdays London (April), which run once a year. I organize the North East and Leeds Data Science Meetups for Jumping Rivers.

Can you share what the local R Community is like? 

I am located in Newcastle, in the northeast of England, where a large community is keenly interested in data science. Our community includes Newcastle University and Northumbria University students, many of whom are studying data science or statistics. There are also professionals from various industries looking for data science jobs. Our meetups are attended by prospective data scientists and students eager to network and learn more about the field.

Both universities teach R, and many industries in the area employ data science techniques, including Northumbrian Water and Nissan. These companies use data science to solve everyday problems, such as detecting water leaks and optimizing manufacturing processes.

Data science is relevant across almost all industries, and R, along with Python and other languages, is a crucial tool in solving data science problems. In the Northeast, consultancies like Jumping Rivers specialize in data science. In summary, we have a large community of students and industry professionals in the Northeast, and it’s a similar story in Leeds.

You have a Meetup on “Improving the Fidelity and Stability of Large Language Models”, can you share more on the topic covered? Why this topic? 

During our Meetup on “Improving the Fidelity and Stability of Large Language Models,” we will explore how to enhance software solutions with AI capabilities, focusing on improving the accuracy and reliability of these models. Drawing from real-world experiences, we will discuss successful strategies for development, tackle the challenge of model ‘hallucinations,’ and address other significant obstacles. This topic is essential as the AI sector continues to grow rapidly, and integrating AI effectively is crucial for developers to achieve robust performance and innovative functionality in their projects. The session is designed for developers of all skill levels interested in incorporating AI into their work, ensuring they can implement practical and effective methodologies for positive outcomes.

Ryan Mangan will be presenting this meetup. Ryan is a seasoned technologist with over 18 years of experience in cloud computing, AI, and virtualization. He founded Efficient Ether Ltd, a Microsoft startup specializing in AI, cloud optimization, and sustainability. Ryan is a recognized Microsoft MVP, VMware vExpert, and Chartered Fellow of the British Computer Society. He has authored several e-books and publications, including “Mastering Azure Virtual Desktop,” and is active in public speaking and blogging within the tech community. 

Any techniques you recommend using for planning for or during the event? (Github, zoom, other) Can these techniques be used to make your group more inclusive to people that are unable to attend physical events in the future?  

Regarding techniques, I’m currently reviewing how we organize our meetups. Our meetups are free to attend for all participants, and we aim to create a welcoming and accessible environment for everyone to network and meet like-minded individuals in the area. The meetups are held every two to three months on weekdays in the evenings, providing attendees with time to travel from their place of work to the venue. We offer refreshments at the start, including pizza and soft drinks, and we ensure that vegan, gluten-free, and halal options are included to cater to a wide range of dietary preferences. 

We often run interactive workshops at the North East Data Science Meetups, including a recent meetup on the Apache Arrow interface for R, led by Nic Crane on July 18th. To make our workshops as inclusive as possible, we provide attendees with all necessary materials and dependencies via a cloud environment created using Posit Workbench. It allows participants without prior installation of RStudio IDE to contribute and interact with the workshop materials. Our goal is to make our workshops accessible to a broad audience, including those from non-R backgrounds. In general, we aim to create an event where attendees can participate without the burden of installing multiple packages or downloading data.

Most attendees attend our meetups to network and meet industry professionals, especially students exploring post-graduation career opportunities. With this in mind, we always reserve a part of the meetup for advertising similar meetups and conferences in the area and job opportunities in data science. Many attendees regard these meetups as a regular source of news, so we try to provide a central hub of information and a place to enjoy high-quality live talks and workshops.

These are in-person meetups. We could attract more people if we recorded and live-streamed it on Zoom. However, managing a Zoom call and recording would create more work for the organizers, and an in-person event provides better networking opportunities than online. In saying that, we will continue to look at ways to make these accessible and appeal to a wide range of backgrounds, and we will always take feedback from the community seriously.

What trends do you currently see in R language and your industry? Any trends you see developing in the near future?

In the northeast and Leeds data science meetups, there is a significant interest in machine learning, training and deploying machine learning models, and productionizing these models (ML Ops). Attendees often expect talks on these topics and are particularly interested in chatGPT, generative AI, and other related issues. However, data science encompasses a broader range of areas, including visualizing data and creating dashboards, and we try to cover all of these areas in our talks and workshops. Despite our efforts, there is a clear trend toward machine learning-focused discussions, with many talk submissions focusing on ML Ops and deploying models on the cloud.

How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups organize, share information, and support each other worldwide. We have given grants over the past four years, encompassing over 68,000 members in 33 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute.

Kolkata R User Group: A Rich History with Statistics 

By Blog

The R Consortium recently spoke with Samrit Pramanik of the Kolkata R User Group about his experience starting a new R User Group in India. Samrit highlighted Kolkata’s rich history with statistics and talked about the diverse local R community.

The Kolkata R User group is organizing its second online event titled “A New Approach for Teaching Data Analytics with R” on July 13th. R users from around the world are invited to join this event.

Please share your background and involvement with the RUGS group.

My name is Samrit Pramanik. I work as a data scientist at a US-based private firm and have a post-graduate degree in statistics from the University of Calcutta. I have been using R since my post-graduate days in 2018 and used it extensively in various projects during my studies. Since 2022, I have also been an R instructor for a non-profit organization. Additionally, I have been involved in several short projects working with R. Since April 2024, I have managed the Kolkata R User group.

This is the third city-based R user group in India that is affiliated with the R Consortium. I plan to arrange virtual meetups monthly and in-person meetups annually. I enjoy helping and teaching people from diverse backgrounds, not only in statistics, mathematics, and data science but also in other areas. I want to teach them to use R language to add value to their professional and personal projects.

Can you share what the R community is like in Kolkata? 

The Kolkata User Group has been formed with a broader perspective that I would like to share with you. Kolkata is known for its reputation in statistical research and education. The city is recognized as the birthplace of modern statistics in India, with the establishment of the Indian Statistical Institute (ISI) in 1931 by a prominent figure in statistics. The University of Calcutta, where I graduated, was the first in Asia to offer a post-graduate degree in statistics in 1941. This rich history made the formation of the Kolkata R User Group inevitable. Our community consists of academics and professionals from diverse fields such as life sciences, healthcare, the public sector, physics, astrophysics, and other industries. This diverse background facilitates robust exchanges of ideas and techniques related to R and data, making our R community in Kolkata truly unique.

Please tell us about your recent and upcoming events?

I would like to highlight a recent event. Last month, in June, we had our inaugural session where we introduced Quarto, a recently released reporting tool by Posit. Our goal was to make the participants aware of this tool and its outstanding features, such as website building, ebook writing, creating thesis papers, manuscripts, and blogging sites. We aimed to show participants, including early graduate students, professionals in the industry, and researchers from academia, that they can use Quarto in their projects and studies for reporting. This was our first organized session.

The upcoming session is scheduled for July 13th. It will focus on a new approach to teaching R to students with non-technical backgrounds such as business students. Dr. Abhimanyu Gupta from Saint Louis University will be the speaker at this event.

We have received very positive feedback and responses from the participants who are showing interest in the upcoming events. They want us to organize such events frequently. People are very much aware of these events and this community. They are very responsive, and we have received positive responses. Two esteemed Economics professors have expressed interest in joining our organizing team and working with us.

Please share about a project you are currently working on or have worked on in the past using the R language. Goal/reason, result, anything interesting, especially related to the industry you work in?

Currently, I am working on two projects. The first project involves cricket analytics, where I extensively use R for cleaning up messy raw data and conducting exploratory data analysis at both the team and individual player levels and published a shiny dashboard on performance analysis of T20I players. I’m also building a statistical model to predict the total score of an innings, the winner of the match, and the tournament. Lastly, I aim to compile all the findings into an ebook format.

 Cricket Performance Analysis Shiny Dashboard

The second project revolves around converting the functions and features of AstroPy, an open source software package for astronomy and astrophysics, into R. Our goal is to enhance its popularity among researchers and scientists in the astronomy, astrophysics, and cosmology domains. I am collaborating with another individual from a physics background on this open source project, and we plan to publish it on GitHub soon for public access.

How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups organize, share information, and support each other worldwide. We have given grants over the past four years, encompassing over 68,000 members in 33 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute.

Diving into R with Isabella Velasquez: Perspectives from R-Ladies Seattle

By Blog

Isabella Velasquez, co-organizer of R-Ladies Seattle, recently spoke with the R Consortium about her journey with R and the group’s recent activities. Isabella started as a beginner but has become a key figure in the R community thanks to the supportive and collaborative learning environment. R-Ladies Seattle regularly hosts in-person, hybrid, and online events, such as casual happy hours, lightning talks, and collaborations with other user groups. The group engages its members through creative activities and uses tools like GitHub for event planning. Their commitment to inclusivity and continuous learning helps maintain a dynamic and supportive community for R users in Seattle.

R-Ladies Seattle is seeking speakers for an upcoming lightning talks session. If you are interested in presenting, please contact Isabella.

Personal website | Twitter | Mastodon

Please share your background and involvement with the RUGS group.

I first encountered R when I started my graduate program in 2014. I was pursuing a Master’s in Analytics in Chicago. The program mainly revolved around using R. At that time, I was a complete beginner and had to start from the basics, like installing R. Since the program was fairly new, there wasn’t a well-structured curriculum for introducing R. It was assumed that students would either pick it up or already have some knowledge. The coursework focused on R from there.

My older brother Gustavo was a great resource when I started learning R and picking up the necessary skills. He is a highly proficient R user, so I asked him for help. He introduced me to many tools that made it much easier for a beginner to work in RStudio and pick up the tidyverse syntax. The main course curriculum was open to different approaches to using R, which provided flexibility in learning the tools and skills that interested me the most.

After completing my program in 2016, I landed my first job as a data analyst. I began using R and regularly working with data. Back then, Twitter was buzzing with activity. I stayed enthusiastic and continued learning from the community. My brother and I collaborated to solve problems and acquire new skills. At work, my team had diverse tool proficiency; some were adept in Excel, while others had data expertise. Eventually, we formed a learning community and collectively mastered R. We utilized R to generate presentations, reports, visualizations, and clean up data. It was fantastic to have a small community at work and a larger one outside through social media.

One of my colleagues, Chaya Jones, at my previous workplace, where I worked as a data analyst, was one of the original co-organizers for R-Ladies Seattle. The R user group had just started in 2018, and she invited me to join and give one of the earliest presentations for R-Ladies Seattle. Over time, the membership grew, and eventually, I became one of the co-organizers. My role involved coordinating events, finding speakers, and other related tasks.

I am very fortunate because I got into R in a friendly and collaborative environment. During graduate school, I collaborated with classmates and gained valuable knowledge from my brother. Later, I landed a new data analyst job and had a whole team of people who were interested in learning and using R. It has truly been a joy, and I feel appreciative for how well things have worked out and for the length of time that I’ve been able to use R.

Can you share what the R community is like in Seattle?

As I mentioned, my workplace involved various programming languages, but there were quite a few R users. We used to have these small study groups where we discussed creating an R Markdown template for our company and shared Shiny apps and other similar things. The field I worked in was education, but in Seattle, you see a lot of R being used in bioinformatics and scientific research related to diseases. It’s very popular among those groups, and they have solid user groups where they grow and learn together. Many members of R Ladies Seattle are from organizations like Fred Hutch, where the emphasis on using R is very strong, which is pretty great.

Every month, R-Ladies Seattle hosts a casual happy hour. We have good chips and salsa, and it’s a great opportunity for members to join, chat, and have a good time. Additionally, after the Cascadia R Conference in June, we will have a social hour where people can keep the conversation going in a relaxed setting. We will also host a social hour at the end of posit::conf in August, and R-Ladies who didn’t attend the conference are more than welcome to join and hang out. We organize many social events, so there are plenty of opportunities to connect with us.

We have some exciting events related to R coming up. We are currently looking for speakers for a lightning talk session, where individuals can quickly share the projects they are working on or a tool they love. It’s a low-pressure way to join in, and we welcome anyone who would like to sign up and participate.

Our focus is primarily on in-person and hybrid events. While we have seen an explosion of online events after COVID-19, it’s important to uphold the Seattle community by providing opportunities for local participation. Generally, our events are held in person, with occasional hybrid events. However, we are also excited about organizing online events with Seattle residents in mind at a convenient time for the Pacific Time Zone. Our offering of in-person, hybrid, and online events provides a unique experience for our user group in Seattle.

Any techniques you recommend using for planning for or during the event? (Github, zoom, other) Can these techniques be used to make your group more inclusive to people who are unable to attend physical events in the future? 

One thing we implemented was to think of events that could generate high engagement. For example, with our Hex sticker, we organized a competition for all R-Ladies members to participate by submitting their designs, followed by a voting process. It was a lot of fun, both creatively and in terms of getting everyone involved. We strive to come up with similar engaging activities. Additionally, when appropriate, we reach out to other user groups in the area to explore collaboration opportunities, co-hosting events, or simply support each other’s promotions to foster a strong sense of community.

One method we have tried for planning events is using GitHub discussions to log our ideas for events and determine which events are the most popular based on comments and upvotes. This helps guide our future event planning.

What trends do you currently see in R language and your industry? Any trends you see developing shortly?

Every year, we send out a survey to inquire about people’s interests and what they like to see. The responses are usually a mix of technical content, with many requests for intermediate-level information. There’s a lot of interest from people in Seattle who use R in some capacity or are members of R-Ladies Seattle. They are looking for opportunities to upskill based on their existing knowledge of R. Additionally, there are many requests for information about career advancement and available positions.

There are various job titles related to data, such as data scientist and data engineer. Many people have questions about the career prospects in this field, including the potential for advancement and available options. These are common topics of discussion.

Please share about a project you are currently working on or have worked on in the past using the R language. Goal/reason, result, anything interesting, especially related to the industry you work in? 

I now work at Posit,  formerly RStudio, in a marketing role. I still get to work with R a lot, which is great for creating dashboards to track various metrics. I’ve been focusing on defining metrics of success and similar tasks.

Recently, I created a dashboard in Shiny that refreshes daily to compile the information I need for my to-do list. Every morning, I check my Shiny dashboard to see my daily tasks. It pulls information from my project management tool, so I only have to update one place to see an aggregated view of my month. It was fun to do this in R with Shiny. 

Recently, I worked on creating a custom template in Quarto for the upcoming R Medicine Conference website. The website is built entirely on Quarto, a new tool similar to R Markdown. My work specifically involved designing the events page to display previous events and provide links to the event page and YouTube playlist. It’s exciting to learn and work with new tools like Quarto.

How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups organize, share information, and support each other worldwide. We have given grants over the past four years, encompassing over 68,000 members in 33 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute.

R Consortium’s Submission Working Group: Advancing R for Regulatory Success at PharmaSUG 2024

By Blog

The R Submission Working Group is making significant strides in promoting the use of R for regulatory submissions in the pharmaceutical industry. At PharmaSUG 2024, held from May 19-22 in Baltimore, MD, the group’s impact was evident through various insightful presentations and discussions.

One highlight was Ben Straub’s presentation, “Piloting into the Future: Publicly available R-based Submissions to the FDA,” which showcased the growing adoption of R in both industry and regulatory settings. Straub emphasized the vibrant R community and its diverse packages that enhance statistical analysis and data visualization, highlighting R’s role in facilitating efficient and transparent FDA submissions.

Additionally, André Veríssimo and Ismael Rodriguez’s presentation, “Automating SDTM Using R: A Practical Guide,” demonstrated the advantages of using R for automating the creation of Study Data Tabulation Model (SDTM) datasets. They provided a detailed guide on implementing automation techniques and sharing best practices and real-world applications to improve data management workflows with R.

These presentations underscored the R Submission Working Group’s contributions to advancing the use of R in regulatory processes promoting greater efficiency, reproducibility, and transparency in pharmaceutical data management.

R Addicts Paris: Promoting Diversity in R

By Blog

Vincent Guyader, organizer of the R Addicts Paris and president of ThinkR, recently updated the R Consortium on the group’s activities. Last year, Vincent discussed the application of R in developing solutions for industrial problems. He emphasized the importance of helping people become fluent in R and leveraging the language to add value to their work. ThinkR is dedicated to enhancing R proficiency in various industries. The R Addicts Paris, one of France’s oldest and largest R user groups with 1,800 members, continues to foster a strong R community under Vincent’s leadership. 

Please share your background and involvement with the RUGS group.

My name is Vincent, and I have been using R since my student days. During my studies, I took on freelance R projects for various companies. Currently, I head a company called ThinkR, where we have a team of over 10 experts specializing in everything related to R. Our services include training, consulting, developing Shiny applications, creating R packages, and more. We also collaborate with Posit and handle hardware installations for clients, primarily in France but also in Switzerland, Belgium, and other parts of Europe.

Since 2018, I have been managing the R user group in Paris, known as the R Addicts Paris. It’s one of the oldest and possibly the largest R user groups, with 1,800 members. I aimed to organize meetups every three months, but the next one has been delayed due to internal organizational issues. I genuinely enjoy helping people become fluent in R and use the language to add value to their work.

What challenges do you face in organizing the R Addict Paris group and how do you overcome those challenges?

One of the main challenges is that our users are not professional programmers or developers; they are specialists in fields like biology and finance. They have to shift their mindset to use programming languages. My daily job involves helping these individuals embrace software development. Coming from a genetics and biochemistry background, I understand how challenging this can be for non-developers. However, I love doing this, and I have a dedicated, competent team to assist.

Based on your work with ThinkR, which industries in France do you see using R?

We have clients in various fields across France, including finance, retail, and research. The health sector is particularly prominent. For instance, a company that used SAS a few years ago now uses R & Python. About half of our clients currently use Python. While we provide Python installation on hardware, we don’t offer Python training yet.

We are committed to being the sole organization in France that can certify R users and developers. The French government has authorized us to issue an official certification akin to a diploma. Our goal is to elevate R proficiency across various fields in France. Our clients include businesses and individuals, with many investing their resources to learn proper software and programming skills.

Do you host online or in-person events?

I chose not to host online events. It’s a very opinionated choice because most meetups switched to online formats during the pandemic. At ThinkR, we are a fully remote company, and I spend my day on Zoom. While remote training is effective, I’ve found that in-person events work better for our user group.

Any techniques you recommend using for planning for or during the event? (Github, zoom, other) Can these techniques be used to make your group more inclusive to people who are unable to attend physical events in the future?   

One of the main challenges we face as a group is finding female speakers. I try to avoid having only male speakers, but I only get female speakers every fifth or sixth event, which is not enough. I encourage other R user group organizers to recognize our power to give a voice to different kinds of people. I push myself to include more female speakers. Sometimes, I encounter highly qualified women who hesitate to speak, while less experienced men are more willing. It’s challenging, but I strive to maintain a balanced representation.

I consciously avoid engaging with speakers who lack substance, ensuring I have time to encourage qualified women to share their knowledge. Despite my efforts, female representation remains below 20%. A few years ago, my colleague Diane and I tried to connect with the R-Ladies Paris group. Many men are actively engaged there, and I wonder why that is.

There are many skilled women in the R community, which includes biologists and geneticists. There’s no excuse for the lack of female representation. We must remember our influence and endorse individuals who truly represent our values.

What are some trending topics in R in your R User Group?

I’ve noticed a decline in interest in statistics over the past two to five years. During meetups, we rarely discuss statistics. The machine learning and AI fields aren’t well-represented in R, possibly because most people in these fields use Python. It could also be due to regional differences or my network.

You had a Meetup “Raddicts x RTE – {webr} – Shinyproxy and return of the Reconteres 2024” on 19th June, can you share more on the topic covered? Why this topic?

For this event, we had two male speakers. Colin Fay discussed {webr}, a new JavaScript capability for launching user insights in the browser. This is powerful for deploying Shiny applications. Valentin Cadoret talked about new Shinyproxy functionalities, and tools that enhance the deployment of Shiny applications. So we focused heavily on Shiny once again.

How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups organize, share information, and support each other worldwide. We have given grants over the past four years, encompassing over 68,000 members in 33 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute.

The Crucial Role of Release Control in R for Healthcare Organizations

By Blog

Guest blog contributed by Ning Leng, People and Product Leader, Roche-Genentech. Eric Nantz, Director, Eli Lilly and Company; Ben Straub, Principal Programmer, GSK; Sam Parmar, Statistical Data Scientist, Pfizer. 

Supporting the science of drug development requires computational tools with careful implementations of core statistical functions and data structures. The R programming language, a general purpose language developed by statisticians that grows dynamically through the contributions of a worldwide community of developers is a common choice for serious statistical work. However, managing new versions of the core R language and the hundreds of specialized libraries (called packages in R) necessary to support multiple development groups in a way that ensures the consistency, reproducibility, and reliability of results poses many practical challenges 

The FDA, for example, requires that the software and tools supporting a clinical trial submission are capable of producing reproducible results for an extended period of time. This means submitting code based on a version of R that is sufficiently tested and stable yet new enough to support the critical R packages over the required FDA time horizon.

So, how is the R environment release managed across different healthcare organizations?  We interviewed individuals from different pharma companies to learn their internal approaches to keep their R environment up-to-date and secure.

  • Roche’s Scientific Computing Environment is container based, with clinical reporting done from managed qualified images being released twice per year – roughly timed to capture the last update to an R major version (April release) and a 6 month later update (September release). For each image, R packages undergo a mostly automated risk assessment to document package quality. Automated indicators of package quality include test coverage, thoroughness of documentation, test coverage of exported objects (using covtracer), and may be supplemented with package adoption measured using download counts, author reputation or other peripheral knowledge of the package’s history. Prior to internal publication, a representative sample of reverse dependencies are re-evaluated to safeguard against breaking changes. If the package meets our quality criteria it is published to a continuously updated repository of validated packages corresponding to the image’s R minor version (e.g. x.x). This allows flexibility for teams to roll forward to newer releases of packages within a managed release by moving their renv snapshot to a later date, easing the transition between bi-annual image releases. A generalized version of Roche’s automated process has been open sourced as ‘theValidator‘, and more details on the Roche process was shared within the R Validation case studies series.
  • Eli Lilly currently updates its qualified R environment only after a new major release of R is available and the corresponding release of Bioconductor (utilizing that version of R) is also available. In a new release, all packages currently installed from the CRAN and Bioconductor repositories are refreshed to their latest versions at the time of the release. Once the new R version is deployed, all packages are frozen for that particular release to ensure stability and reproducibility. Lilly maintains multiple R versions for backward compatibility. Only packages available on CRAN or Bioconductor are permitted for installation in the central package library. Lilly uses a hybrid approach of automation and risk-based assessment when a new package is requested for installation. In the event that a new version of a package is necessary for a project (such as a new Shiny application), the users are encouraged to leverage the renv package created by Posit to create a project-based environment which will not impact the central package library. As technology evolves and the R language becomes more prominent in clinical data analysis, Lilly continues to assess the current and future possibilities of a robust clinical computing environment primed for innovation while adhering to the foundational principles of reproducibility and transparency.
  • GSK releases “frozen R environments” for clinical reporting work on a 6-12 month cycle.  The choice of R version is the latest stable release with at least one patch release of R, e.g. 4.3.1 rather than 4.3.0.  As R itself is stable with decades of use, we prefer to focus on package assessment for building of our “frozen R environments.”  Packages for this environment can be from external sources (CRAN, Bioconductor) or internally built at GSK, regardless of origin we assess the same way.  We pay close attention to author qualification and institutional backing, types and breadth of testing, documentation and examples, and the practice of software development life cycle practices.  Once a package is approved in this process it will be included in the frozen environment.  Packages change over time, if substantial changes are implemented in the packages, then we re-assess with a focus on those changes for allowing up-versioning of the package in the frozen environment.  These frozen environments ensure that clinical reporting can be easily reproduced if needed as all packages versions and the version of R used during the analysis are contained in the frozen environment.  
  • Pfizer releases one new R version every year. We typically target R-x.y.1 releases to pick up patches – so we might consider this a “stable” release. The process of testing, documenting, and deploying R into validated containers is performed every 6 months, with a new release of R once per year, e.g. R-4.3.1, and an update to package set and package versions 6 months later (for the same R version). We take a snapshot date of CRAN to form the basis of our package set for the container build. We try to balance competing priorities of getting latest versions of packages and newest package releases, while maintaining a snapshot and version-controlled release to ensure reproducibility and stability.

Here is what we have: four companies and four somewhat complex bespoke solutions. It seems likely that if we interviewed a hundred representatives from a hundred different companies we would get at least a hundred different solutions. It is also not difficult to imagine that multiple protocols for managing R and package versions imposed a fairly complex project management solution on the FDA as it simultaneously deals with submissions from multiple sponsors.

In the R Consortium’s R Submissions Work Group meeting we have been discussing whether there might be a simple solution for at least dealing with the R versioning problem that might serve as a de facto standard for the industry. One suggestion that has gained some traction is that sponsors use the previous minor and latest patched R version for a submission. For example, if R version 4.4.0 is currently available then it is suggested that a sponsor uses the latest patch version (4.3.z). If R version 4.5.0 becomes available, then it is suggested that a sponsor uses the latest patch version (4.4.z). This ensures that the minor version is stable and most likely available to all stakeholders. Of course, if a version change eliminates a security problem, that might be preferred. (Note that R versions are organized as R x.y.z where, x is the major version, y is the minor version, and z is the patch version.)

We would love to hear what you think. Please, go to Issue number 117 on the GitHub repository of our working group and leave a comment.

Bridging the Digital Divide: Umar Isah Adam on Expanding R Access for Kano, Nigeria Students

By Blog

Umar Isah Adam, the founder and organizer of the R User Group Kano, Nigeria, spoke with the R Consortium during the pandemic about his efforts to engage the next generation of students in the R community. Recently, the R Consortium followed up with Umar to discuss the group’s progress over the past few years. He discussed the increasing acceptance and interest in R within academia. The user group is working with various colleges in Kano state to introduce R to students and teach them the fundamentals. Umar also shared his experience using R for managerial tasks related to student data. He hopes to persuade college management to use R for data handling instead of the current manual processes.

Please share your background and involvement with the RUGS group.

My name is Umar Isah Adam, and I’m from Kano State, Nigeria. I studied mathematics at the Federal University Dutse, Jigawa State. During my studies, I became interested in statistics and technology. One of my lecturers mentioned R as a statistical analysis tool, which piqued my interest. I learned it by researching online and watching videos. Later, a friend introduced me to R User Groups. I found that I was interested in R and noticed there wasn’t a group in Kano State, so I applied to start a chapter there, and it was approved.

Can you share what the R community is like in Kano, Nigeria? 

The use of R is relatively new in Kano State. Most academics in the area use SPSS in their work. It makes it challenging for R to gain traction in this environment. Despite the challenges, we have been making progress with the support of our user group. Currently, I work as an assistant lecturer at a college in Kano State. I recently organized a well-attended seminar for lecturers and students at the Kano State College of Education and Preliminary Studies. I also posted a video of the workshop on YouTube and have received requests for more information.

There’s room for improvement. We’ve received requests from academic institutions to host events or provide information about the power of R. However, we cannot do so now due to the nature of my work and inadequate funding. However, we plan to start a 10-week training session soon. It will likely be free, as we are collaborating with the Kano State College of Education and Preliminary Studies to organize it. R isn’t very popular here, and more than 70% of academicians need help understanding what it is and how to use it effectively. However, those introduced to it have shown a high interest in learning and utilizing it.

We aim to introduce R to the academic community, and after this, we plan to move on to another college and launch a new program. In summary, R is not widely known in our society, but we are progressing. There has been an increase in the acceptance of R and a growing interest from different people in academia, particularly in R. Many are interested. Still, there needs to be more awareness about it. Most people need to learn what R is and how to use it. Therefore, most of our upcoming programs will focus on introducing the R language.

Additionally, there is an issue with student access. Most of our students don’t have personal computers and can only access them on campus, usually at the ICT department. This lack of access also affects student engagement. However, among academics and lecturers in our colleges, there is growing interest in R.

Do you host in-person or online events? How do you make your events inclusive? 

It’s important to remember that online events became essential during the pandemic. However, due to internet connectivity issues, we avoid online meetings or events most of the time. As a result, our sessions are usually held offline. We have been hosting events within colleges and other institutions to make them easily accessible to students and academics. It is also more cost-effective and popular than hosting in private locations. Advertising these events has proven effective, as interested individuals are usually willing to attend when they see the advertisement. 

We attempted to transfer between colleges, such as those owned by the state government. The majority of the data and processes are research-based. Therefore, we strive to incorporate more R programming aligned with academic requirements. We aim to limit topics to the use of R in academia to ensure that attendees feel more connected and can see the practical applications of using R. For instance, compared to using SPSS, where one often needs to use code or convert data into another format, with R, one can easily import data into the working environment and manipulate it as needed.

Please share about a project you are currently working on or have worked on using the R language. What is the goal/reason, result, or anything interesting, especially related to your industry?

I usually demonstrate to people around me, including the school management, how easy it is to use R. For example, we need help with the examination office potentially losing some of their data. However, they have a backup on an external drive. I am importing the data from the old template to the new one in Excel format. I am also working on calculating the student results and offloading them into the new portal we have developed. Doing this job manually might take a month, but if I successfully create this program, it will complete the job in two to three days. It will demonstrate to the school management the importance and impact of using R.

I am proposing to the college management to introduce a certified course of study on “Introduction to R” within the ICT department. Showcasing how this programming language can impact the working environment will help them understand the need for this course. Many students rely on fundamental analyses using questionnaires, frequency, and percentage without exploring visualization techniques. As a supervisor, I encourage using R for data analysis in student projects, as it provides a more comprehensive approach. However, many students need access to computers. Therefore, by offering this course, we can equip them with valuable skills and knowledge to benefit their future careers.

How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups organize, share information, and support each other worldwide. We have given grants over the past four years, encompassing over 68,000 members in 33 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute.