window.intercomSettings = { app_id: "w29sqomy", custom_launcher_selector:'#open_web_chat' };Skip to main content
All Posts By

R Consortium

Aligning Beliefs and Profession: Using R in Protecting the Penobscot Nation’s Traditional Lifeways

By Blog
Angie Reed sampling Chlorophyll on the Penobscot River where a dam was removed 

In a recent interview by the R Consortium, Angie Reed, Water Resources Planner for the Penobscot Indian Nation, shared her experience learning and using R in river conservation and helping preserve a whole way of life. Educated in New Hampshire and Colorado, Angie began her career with the Houlton Band of Maliseet Indians, later joining the Penobscot Indian Nation. Her discovery of R transformed her approach to environmental statistics, leading to the development of an interactive R Shiny application for community engagement. 

pαnawάhpskewi (Penobscot people) derive their name from the pαnawάhpskewtəkʷ (Penobscot River), and their view of the Penobscot River as a relative guides all of the Water Resources Program’s efforts. This perspective is also reflected in the Penobscot Water Song, which thanks the water and expresses love and respect.  Angie has been honored to:

  • work for the Water Resources Program, 
  • contribute to the Tribal Exchange Network Group,
  • engage young students in environmental stewardship and R coding, blending traditional views with modern technology for effective environmental protection and community involvement, and
  • work with Posit to develop the animated video about Penobscot Nation and show it at the opening of posit:conf 2024

Please tell us about your background and how you came to use R as part of your work on the Penobscot Indian Nation.

I grew up in New Hampshire and completed my Bachelor of Science at the University of New Hampshire, followed by a Master of Science at Colorado State University. After spending some time out west, I returned to the Northeast for work. I began by joining the Houlton Band of Maliseet Indians in Houlton, Maine, right after finishing my graduate studies in 1998. Then, in 2004, I started working with the Penobscot Indian Nation. Currently, I work for both tribes, full-time with Penobscot and part-time with Maliseet.

My first encounter with R was during an environmental statistics class taught by a former USGS employee, Dennis Helsel during a class he taught for his business Practical Stats. He introduced us to a package in R called R Commander. Initially, I only used it for statistics, but soon, I realized there was much more to R. I began teaching myself how to use ggplot for graphing. I spent months searching and learning, often frustrated, but it paid off as I started creating more sophisticated graphs for our reports.

We often collaborate with staff from the Environmental Protection Agency (EPA) in Region One (New England, including Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, Vermont and 10 Tribal Nations). One of their staff, Valerie Bataille, introduced us to R Carpentries classes. She organized a free class for tribal staff in our region. Taking that class was enlightening; I realized there was so much more I could have learned earlier, making my journey easier. This experience was foundational for me, marking the transition from seeing R as an environmental statistics tool to recognizing its broader applications. It’s a bit cliché, but this journey typifies how many people discover and learn new skills in this field.

The Penobscot Nation views the Penobscot River as a relative or family. How does that make water management for the Penobscot River different from other water resource management?

If you watch The River is Our Relative, the video delves deeper into seeing the river from a relative, beautiful, and challenging perspective. This view fundamentally shifts how I perceive my work, imbuing it with a deeper meaning that transcends typical Western scientific approaches to river conservation. It’s a constant reminder that my job aligns with everything I believe in, reinforcing that there’s a profound reason behind my feelings.

Working with the Penobscot Nation and other tribal nations to protect their waters and ways of life is an honor and has revealed the challenges of conveying the differences in perspective to others. Often, attempts to bridge the gap get lost in translation. Many see their work as just a job, but for the Penobscot people, it’s an integral part of their identity. It’s not merely about accomplishing tasks; it’s about their entire way of life. The river provides sustenance, acts as a transportation route, and is a living relative to whom they have a responsibility. 

How does using open source software allow better sharing of results with Penobscot Nation citizens?

My co-worker, Jan Paul, and I had the pleasure of attending and presenting at posit::conf 2023   and working with Posit staff to create an animated video that describes what we do and how opensource and Posit tools help us do it.  It was so heart-warming to watch the video shown to all attendees at the start of conf, and was a great introduction to my shameless ask for help during my presentation and through a table where I offered a volunteer sign-up sheet/form, I was humbled by the number of generous offers and am already  receiving some assistance on a project I’ve been eager to accomplish. Jasmine Kindness, One World Analytics, is helping me recreate a Tableau viz I made years ago as an interactive, map-based R Shiny tool. 

I find that people connect more with maps, especially when it comes to visualizing data that is geographically referenced. For instance, if there’s an issue in the water, people can see exactly where it is on the map. This is particularly relevant as people in this area are very familiar with the Penobscot River watershed.  My aim is to create tools that are not only interactive but also intuitive, allowing users to zoom into familiar areas and understand what’s happening there. 

This experience has really highlighted the value of the open source community. It’s not just about the tools; it’s also about the people and the generosity within this community. The Posit conference was a great reminder of this, andthe experience of working with someone so helpful and skilled has truly reinforced how amazing and generous the open source community is.

How has your use of R helped to achieve more stringent protections for the Penobscot River?

Before we started using open source tools, my team and I had been diligently working to centralize our data management system, which significantly improved our efficiency. A major shift occurred when we began using R and RStudio (currently Posit) to extract data from this system to create summaries. This has been particularly useful in a biennial process where the State of Maine requests data and proposals for upgrading water quality classifications.

In Maine, water bodies are classified into four major categories: AA, A, B, and C. If new data suggests that a water body, currently classified as a lower grade, could qualify for a higher classification, we can submit a proposal for this upgrade. In the past we have facilitated upgrades for hundreds of miles of streams, however it took much longer to compile the data.  For the first time in 2018 we used R and RStudio to prepare a proposal to the Maine Department of Environmental Protection (DEP) to upgrade the last segment of the Penobscot River from C to B.  Using open source tools, we were able to quickly summarize data and compile data into a format that could be used for this proposal, a task that previously took a significantly longer time.  DEP accepted our proposal because our data clearly supported the upgrade.  In 2019, the proposal was passed and we anticipate this process continuing to be easier in the future.

You are part of a larger network of tribal environmental professionals, working together to learn R and share data and insights. Can you share details about how that works?

Jan Paul, Water Quality Lab Coordinator at Penobscot Nation, sampling in field

I’m involved in the Tribal Exchange Network Group (TXG), which is a national group of tribal environmental professionals like myself and is funded by a cooperative agreement with the Office of Information Management (OIM) at the Environmental Protection Agency (EPA). We work in various fields, such as air, water, and fisheries, focusing on environmental protection. Our goal is to ensure that tribes are well-represented in EPA’s Exchange Network, and we also assist tribes individually with managing their data.

Since attending a Carpentries class, I’ve been helping TXG organize and host many of them. We’ve held one every year since 2019, and we’re now moving towards more advanced topics. In addition to trainings, TXG provides a variety of activities and support, including small group discussions, 1-on-1 assistance and  conferences.  Although COVID-19 disrupted our schedule we are planning our next conference for this year.

Our smaller, more conversational monthly data drop-in sessions always include the opportunity to have a  breakout room to work on R. People can come with their R-related questions, or the host might prepare a demo.

Our 1-on-1  tribal assistance hours allows Tribes tosign up for help with issues related to their specific data. I work with individuals on R code for various tasks, such as managing temperature sensor data or generating annual assessment reports in R Markdown format. This personalized assistance has significantly improved skill building and confidence among participants and are particularly effective as they use real data and often result in a tangible product, like a table or graph, which is exciting for participants.  We’ve also seen great benefits, especially in terms of staff turnover. When staff members leave, the program still has well-documented code, making it easier for their successors to pick up where they left off. These one-on-one sessions.

Additionally, I’ve been involved in forming a Pacific Northwest Tribal coding group, which still doesn’t have an official name as it is only a few months old. It began from discussions with staff from the Northwest Indian Fisheries Commission (NWIFC) and staff from member Tribes. And I am thrilled to say we’ve already attracted many new members from staff of the Columbia River Inter-Tribal Fish Commission (CRITFC). This group is a direct offshoot of the TXG efforts with Marissa Pauling of NWIFC facilitating, and we’re excited about the learning opportunities it presents.

Our work, including the tribal assistance hours, is funded through a grant that reimburses the Penobscot Nation for the time I spend on these activities. As we move forward with the coding group, planning to invite speakers and organize events, it’s clear there’s much to share with this audience, possibly in future blogs like this one. This work is all part of our broader effort to support tribes in their environmental data management endeavors.  If anyone would like to offer their time toward these kinds of assistance, they can use the TXG google form to sign up.

How do you engage with young people?

I am deeply committed to engaging the younger generation, especially the students at Penobscot Nation’s Indian Island school (pre-K through 8th grade). In our Water Resources Program at Penobscot Nation, we actively involve these students in our river conservation efforts. We see our role as not just their employees but as protectors of the river for their future.

Sampling for Bacteria 

Our approach includes hands-on activities like taking students to the river for bacteria monitoring. They participate in collecting samples and processing them in our lab, gaining practical experience in environmental monitoring. This hands-on learning is now being enhanced with the development of the R Shiny app I’m working on with Jasmine, to make data interpretation more interactive and engaging for the students.

Recognizing their budding interest in technology, I’m also exploring the possibility of starting a mini R coding group at the school. With students already exposed to basic coding through MIT’s Scratch, advancing to R seems a promising and exciting step.

Beyond the Penobscot Nation school, we’re extending our reach to local high schools like Orono Middle School. We recently involved eighth graders, including two Penobscot Nation citizens, in our bacteria monitoring project. This collaboration has motivated me to consider establishing an R coding group in these high schools, allowing our students continuous access to these learning opportunities.

Processing bacteria sample

My vision is to create a learning environment in local high schools where students can delve deeper into data analysis and coding. This initiative aims to extend our impact, ensuring students have continuous access to educational opportunities that merge environmental knowledge with tech skills and an appreciation of Penobscot people, culture and the work being done in our program.

Over the years, witnessing the growth of students who participated in our programs has been immensely gratifying. . A particularly inspiring example is a young Penobscot woman, Shantel Neptune, who did an internship with us through the Wabanaki Youth in Science (WaYS) Program a few years back , then a data internship through TXG and is now a full-time employee in the Water Resources Program.  Shantel is also now helping to teach another young Penobscot woman, Maddie Huerth, about data collection, management, analysis and visualization while she is our temporary employee.  We’re planning sessions this winter to further enhance their R coding skills, a critical aspect of their professional development. 

It’s essential to me that these women, along with others, receive comprehensive training. Our program’s success hinges on it being led by individuals who are not only skilled but who also embody Penobscot Nation’s values and traditions. Empowering young Penobscot citizens to lead these initiatives is not just a goal but a necessity. Their growth and development are vital to the continuity and integrity of our work, and I am committed to nurturing their skills and confidence. This endeavor is more than just education; it’s about preserving identity  and ensuring our environmental efforts resonate with the Penobscot spirit and needs.

Elevate Your R Community with the 2024 RUGS Grant Program

By Announcement, Blog

The R Consortium is rolling out its 2024 R User Groups (RUGS) Grant Program, and it’s an opportunity you don’t want to miss. The program, which aims to foster vibrant R communities worldwide, is in full swing, and we are eagerly awaiting your application!

Apply here!

Why Apply and… For What?

User Group Grants: Boost engagement and initiate user-focused activities.

Conference Grants: Support for R-related events, either hosting or attending.

Special Projects Grants: Kickstart innovative projects with the potential to impact the R community.

With 74 active groups and a thriving community of over 67,000 members, the RUGS network is a hub of innovation and knowledge sharing. Your participation could be the next milestone in this growth journey.

Examples of some recent R Consortium sponsored RUGS activities:

Key Information

Application Deadline: September 30th, 2024. Don’t delay!

Eligibility: Open to initiatives aimed at community building, not software development (for that, see ISC Grant Program).

Be part of shaping the future of R. Visit here for more details and to apply. Your contribution matters to the global R narrative. Apply now, and let’s grow together!


For details and to apply, visit here.

Offa R Users Group: Empowering Data-Driven Education in Nigeria

By Blog

The R Consortium had a conversation with Anietie Edem Udokang, who is the founder and organizer of the Offa R Users Group (ORUG). He discussed the emerging local R community and the use of R for his research in time series analysis. 

The Offa R Users Group has a Meetup coming up on March 26th, 2024, titled “Test for the Assumptions of Linear Regression Using R.” The group is also seeking individuals to serve as guest speakers for their online events.

Please share about your background and involvement with the RUGS group.

My name is Anietie Edem Udokang, and I am a chief lecturer at the Federal Polytechnic Offa. I hold a Master of Science degree in Statistics. It was during my postgraduate studies that my supervisor introduced me to R, which was around 2012. Since then, I’ve been using R and have discovered that it’s far superior to some of the other software programs I had previously used.

I have found that interacting with others and utilizing specific features, such as the ability to download applications, has been incredibly beneficial to my analysis work. These special packages have helped me greatly, and I believe it is important to attach relevant packages when organizing data. This experience has made me passionate about using R for data analysis.

Ever since I began using R, I have had the privilege of engaging with a diverse group of individuals, including data scientists and software users. These interactions have led me to the realization that to continue growing and learning, it would be beneficial to establish a user group within our community. Initially, we called it the “Fedpofa R Users Group,” but later changed the name to “Offa R Users Group.” We have been organizing meetings, providing training, and engaging in other activities to keep the community vibrant.

Can you share what the R community is like in Offa?

R is not limited to academic use, but it is also used in industry. The reason for this is that polytechnics act as a bridge between the industry and academic institutions. If the students have a good grasp of how to use R, it means that industry will be directly or indirectly affected. Consultants often visit our ORUG and ask for some analysis, which we provide using R. Additionally, students also use R for their projects.

I use R for many of my publications. R has gained a lot of popularity, not only within our institution but also among sister institutions in the area. Some departments have even made R the only software that students are required to use for analysis. 

What industry are you currently in? How do you use R in your work?

I am in the education sector, and I use R for my work in time series analysis, which is my area of specialization. I rely on TSA, tseries and other related time series packages to carry out my work. For example, I used R for Modeling the Residuals of Financial Time Series with Missing Values for Risk Measures, which was my MSc project. I have also used R in the Application of the Seasonal Autoregressive Moving Average Model to Analyze and Forecast the Food Price Index (free registration required). Additionally, I used R in a paper titled “Volatility of Exchange Rates in Nigeria: An Investigation of Risk on Investment.” In another innovative project was Modelling Circular Time Series with Applications. These are just a few examples of the papers and research where I’ve personally used R.

You have a Meetup titled Test for the Assumptions of Linear Regression Using R, can you share more on the topic covered? Why this topic? 

Some authors use regression models without checking whether the assumptions hold or not. Instead of carrying out tests to confirm this, they assume that the model is valid if the assumptions are fulfilled. This topic aims to highlight the importance of carrying out such tests to ensure reliable and comprehensive results. Lack of adherence to the assumptions may lead to inaccurate conclusions. The focus will be on commonly used tests for normality, linearity, autocorrelation, heteroscedasticity/homoscedasticity, and multicollinearity, with illustrative examples using R.

I appreciate the R Consortium for their valuable RUGs grant assistance in 2022. With this grant, I could open two other user groups: the Ilorin R Users Group and the Kwara Environmental Statistics R Group. I also want to express my gratitude to the R Consortium for sponsoring my Meetup subscription and covering other minor expenses in 2022. The subscription is still ongoing, and I hope that we can continue our partnership to promote the use of R in our community. 

I would like to request for speakers to present at our R User Group. We are currently seeking speakers for our upcoming events and would be delighted to welcome speakers from all over the world to share their R-related knowledge with us.

How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups organize, share information, and support each other worldwide. We have given grants over the past four years, encompassing over 68,000 members in 33 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute.

R-Ladies Goiânia: Promoting Diversity and Inclusion in Local R Community

By Blog

Fernanda Kelly, founder and organizer of the R-Ladies Goiânia, recently talked to the R Consortium about the group’s efforts to provide a learning and networking platform to gender and ethnic minorities in the local R community. She discussed the group’s successful transition to virtual events, which has helped increase its visibility and reach. 

Please share about your experience and involvement with the RUGS group.

My name is Fernanda Kelly, and I’m 28 years old. I graduated from the Federal University of Goiás with a degree in statistics. During my studies, I became familiar with the R programming language. However, it wasn’t until 2019 that I realized how underrepresented women and black people are in this field. This led me to establish a new R-Ladies chapter that same year to promote diversity and inclusion in the industry.

I worked as a statistician for 4 years at Hospital Moinhos de Vento, where I was involved in the Pfizer Project that analyzed the effectiveness of the COVID-19 vaccine in Brazil. After that, I worked as a Data Scientist at Accenture Brasil. I hold a degree in Machine Learning and an MBA from the University of São Paulo. Recently, I completed a specialization in project management, and I am currently pursuing a master’s degree in Intelligent Systems and their applications in the Healthcare sector.

I have incorporated the R language extensively in my work and studies. Its versatility in interacting with other languages and its diverse range of tools for creating reports, such as R Markdown and Quarto, provide users with multiple ways to develop high-quality models and reports. I have used R for various tasks, such as modeling, data processing, manipulation, and writing with blogdown and R Markdown. R’s constant updates and information dissemination about its features have increased my usage of the language even more.

I haven’t been involved with the RUGS community lately, but I found out about the initiative on LinkedIn and thought, ‘Why not apply?’ Sometimes we hold ourselves back and don’t even consider applying for opportunities, but I applied this time and succeeded.

Our group is a chapter of the global R-Ladies community that strives to promote awareness of R programming language among individuals belonging to minority genders. We cover a broad range of topics helping facilitate individuals entering the job market. Some topics we cover include public speaking, mentoring on how to fill out LinkedIn, and occasionally Excel. We believe R programming goes beyond just coding and that is why we emphasize the development of soft skills as well. We view the community as a trainer of future professionals. To date, we have held over 40 meetings, and this year we plan to offer over 20 workshops. These workshops will cover an array of diverse topics, but our primary goal is to showcase the functionalities of the R programming language in comparison to other programming languages like Python.

I want to emphasize that the work I have accomplished for the R community since 2019 with R-Ladies Goiânia was not a solo endeavor. I have great admiration for the exceptional women I have walked alongside, and currently, I am fortunate to have Jennifer Lopes (a remarkable black woman) by my side since 2023, who has been helping me with R-Ladies Goiânia.

Can you share what the R community is like in Goiânia?

When I talk about community, I also refer to the city of Goiânia, located in the heart of Brazil, where the population is a mix of different ethnicities. The R community in Goiânia is huge, especially within the university. Many degree programs use R as their primary programming language. I fell in love with the power of the R during my undergraduate studies in statistics. However, during my master’s degree, I realized that there was a lack of representation for minority genders and ethnic groups. This led me to search for communities that catered to this audience. As a result, I discovered the R-Ladies community and founded the R-Ladies Goiânia chapter in mid-May 2019. Since then, the chapter has grown and reached out to many women, black people, and members of the LGBT community.

R language is widely used in Brazil across various sectors, including health, agriculture, and financial institutions. The primary reason for its popularity is the vast range of packages it offers and the structured control offered by CRAN, which enhances the language’s credibility and security.

Do you have an upcoming event planned? Can you share more about the topic covered? Why this topic?

We have an upcoming event planned which will be presented by Julia Helen, who lives in Rio de Janeiro. She is a statistician by profession and works as a data scientist at a large television station in Brazil. The meetup will cover the connection between R and Python. This event will take place on March 16th, and everyone is invited to attend. The primary focus of the meeting will be to teach R programmers how to use Python within RStudio effectively. By leveraging both languages, programmers will take advantage of their combined functionalities. The choice of this topic is because of the high demand among R programmers to learn about the use of Python and how to make both languages work seamlessly within RStudio.

Any techniques you recommend using when planning or during the event? (Github, zoom, other) Can these techniques be used to make your group more inclusive for people who may not be able to attend financial events in the future?

R-Ladies Goiânia is present on diverse networks, but we recommend using GitHub to access our course material. We have complete control over the material available on GitHub, and it helps participants gain knowledge on the platform, which is often required by companies in Brazil. We are currently using Zoom through the Sympla platform, which is free and offers many control options over the event. The platform allows us to manage registration, accept the code of conduct, and send certificates to attendees.

We have hosted our meetings online since 2020, and it has worked well for our group. In our meetings we have people from various states in Brazil and, sometimes, we have people from other countries participating. This is incredible and this way we can reach more people, making the chapter decentralized. We have already reached 100 people watching the Introduction to R meetup. All of our events are recorded, and this gives people who could not attend the live event and people in career transition the opportunity to access the content. Currently, our YouTube channel has more than twenty videos.

For the future, we are planning more accessibility, but we know how poorly developed the accessibility of broadcasting platforms is. R-Ladies Goiânia aims to achieve real diversity in its meetings and has been working towards this with campaigns on Instagram, LinkedIn and Twitter. We are seeking innovation and managed to open a mentoring program, which is a big step for the chapter and we are extremely happy.

Please share about a project you are currently working on or have worked on in the past using the R language. Goal/reason, result, anything interesting, especially related to the industry you work in?

As I mentioned before, my most recent project involved using the R programming language to analyze the effectiveness of the Pfizer vaccine in Brazil. The project was conducted in collaboration with Pfizer, a large pharmaceutical company, and the results have been published in an article titled “BNT162b2 mRNA COVID-19 against symptomatic Omicron infection following a mass vaccination campaign in southern Brazil: A prospective test-negative design study“. The code used for this project is available on my GitHub.

What resources/techniques do/did you use? (Posit (RStudio), Github, Tidyverse, etc.)

In this project, we utilized a range of techniques. R Markdown was the most frequently used tool for generating reports in both HTML and PDF formats. Apart from the tidyverse package, we also employed a variety of packages for analyzing PDF data (such as pdftools), data analysis (including lme4 and sampling), and data tabulation (such as reactable, DT, and qwraps2). We utilized GitHub solely to host the codes for the published article.

How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups organize, share information, and support each other worldwide. We have given grants over the past four years, encompassing over 68,000 members in 33 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute.

ISC-funded Grant: Secure TLS Connections in {nanonext} and {mirai} Facilitating High-Performance Computing in the Life Sciences

By Blog

Contributed by Charlie Gao, Director at Hibiki AI Limited

{nanonext} is an R binding to the state of the art C messaging library NNG (Nanomsg Next Generation), created as a successor to ZeroMQ. It was originally developed as  a fast and reliable messaging interface for use in machine learning pipelines. With implementations readily available in languages including C++, Go, Python, and Rust, it allowed individual modules to be written in the most appropriate language and for them to be piped together in a single workflow.

{mirai} is a package that enables asynchronous evaluation in R, built on top of {nanonext}. It was  initially created purely as a demonstration of the reliable RPC (remote procedure call) protocol from {nanonext}. However, open-sourcing this project greatly facilitated its discovery and dissemination, eventually leading to a long-term, cross-industry collaboration with Will Landau, a statistician in the life sciences industry, author of the {targets} package for reproducible pipelines. He ended up creating the {crew} package to extend {mirai} to handle the increasingly complex and demanding high-performance computing needs faced by his users.

As this work was progressing, security was still a missing piece of the puzzle. The NNG library supported integration with Mbed TLS (a SSL/TLS library developed under the Trusted Firmware Project), however secure connections were not yet a part of the R landscape.

The R Consortium, by way of an Infrastructure Steering Committee (ISC) grant, funded the work to implement this functionality from the underlying libraries and to also devise a means of configuring the required certificates in R. The stated intention was to provide a user-friendly interface for doing so. The end result somewhat exceeded these goals, with the default allowing for zero-configuration, single-use certificates to be generated on-the-fly. This affords an unparalleled level of usability, not requiring end users to have any knowledge of the intricacies of TLS.

Will Landau talks about the impact TLS has had on his work:

“I sought to extend {mirai} to a wide variety of computing environments through {crew}, from traditional clusters to Amazon Web Services. The integration of TLS into {nanonext} increases the confidence with which {mirai} can be deployed in these powerful environments, accelerating downstream applications and {targets} pipelines.”

The project to extend {mirai} to high-performance computing environments was featured in a workshop on simulation workflows in the life sciences, given at R/Pharma in October 2023 (video and materials accessible from https://github.com/wlandau/rpharma2023).

With the seed planted in {nanonext}, {mirai} and {crew} have grown to form an elegant and performant foundation for an emerging landscape of asynchronous and parallel programming tools. They already provide new back-ends for {parallel}, {promises}, {plumber}, {targets}, and Shiny, as well as high-level interfaces such as {crew.cluster} for traditional clusters and {crew.aws.batch} for the cloud.

Charlie Gao, Director at Hibiki AI Limited

The Cleveland R User Group’s Journey Through Pandemic Adaptations and Baseball Analytics

By Blog

Last year, R Consortium talked to John Blischak and Tim Hoolihan of the Cleveland R User Group about their regular structured and casual virtual meetups during the pandemic. Recently, Alec Wong, another co-organizer of the Cleveland R User Group, updated the R Consortium about how the group provides a networking platform for a small but vibrant local R community. Alec shared details of a recent event from the group regarding the use of R for analyzing baseball data. He also discussed some tools for keeping the group inclusive and improving communication among group members.

Please share about your background and involvement with the RUGS group.

I completed my Bachelor of Science degree in Fisheries and Wildlife from the University of Nebraska-Lincoln in 2013, and my Master of Science degree in Statistical Ecology from Cornell University in late 2018. During my graduate program, I gained extensive experience using R, which is the de facto language of the ecological sciences. I discovered a passion for the language, as it is extremely intuitive and pleasant to work with.

After completing my program in 2018, I moved to Cleveland and immediately began attending the Cleveland R User Group in 2019, and have been a consistent member ever since. I eagerly look forward to each of our events. 

After completing my graduate program, I started working at Progressive Insurance. Working for a large organization like Progressive provides me with many diverse opportunities to make use of my extensive experience with R. I was happy to find a vibrant R community within the company, which allowed me to connect with other R users, share knowledge, and I enthusiastically offer one-on-one assistance to analysts from all over Progressive.

Starting in 2022, I accepted the role of co-organizer of the Cleveland R User Group. As a co-organizer, I help with various tasks related to organizing events, such as the one we held last September. I am passionate about fostering the growth of these communities and helping to attract more individuals who enjoy using R.

Our group events are currently being held in a hybrid format. When we manage to find space, we will meet in person, such as when we met to view the 2023 posit::conf in October–several members visited in person and watched and discussed videos from the conference. Most of our meetups continue to be virtual, including our Saturday morning coffee meetups, but we are actively searching for a more permanent physical space to accommodate our regular meetups. 

I am only one of several co-organizers of the Cleveland R user group. The other co-organizers include Tim Hoolihan from Centric Consulting, John Blischak who operates his consulting firm JDB Software Consulting, LLC, and Jim Hester, currently a Senior Software Engineer at Netflix. Their contributions are invaluable and the community benefits tremendously from their efforts.

Can you share what the R community is like in Cleveland? 

I believe interest in R has been fairly steady over time in Cleveland since 2019. We have a handful of members who attend regularly, and typically each meeting one or two new attendees will introduce themselves. 

I would venture to say that R continues to be used frequently in academic settings in Cleveland, though I am ‌unfamiliar with the standards at local universities. At least two of our members belong to local universities and they use R in their curricula. 

As for industry usage, many local companies, including Progressive use R. At Progressive, we have a small, but solid R community; although it is not as large as the Python community, I believe that the R community is more vibrant. This seems characteristic of R communities in varying contexts, as far as I’ve seen. Another Cleveland company, the Cleveland Guardians baseball team, makes use of R for data science. In September 2023 we were fortunate to invite one of their principal data scientists to speak to us about their methods and analyses. (More details below.)

Typically, our attendance is local to the greater Cleveland area, but with virtual meetups, we’ve been able to host speakers and attendees from across the country; this was a silver lining of the pandemic. We also hold regular Saturday morning coffee and informal chat sessions, and it’s great to see fresh faces from outside Cleveland joining in.

You had a Meetup titled “How Major League Teams Use R to Analyze Baseball Data”, can you share more on the topic covered? Why this topic?

On September 27th, 2023, we invited Keith Woolner, principal data scientist at the Cleveland Guardians baseball team, to give a presentation to our group. This was our first in-person meetup after the pandemic, and Progressive generously sponsored our event, affording us a large presentation space, food, and A/V support. We entertained a mixed audience from the public as well as Progressive employees.

Keith spoke to us about “How Major League Baseball Teams Use R to Analyze Baseball Data.” In an engaging session, he showcased several statistical methods used in sports analytics, the code used to produce these analyses, and visualizations of the data and statistical methods. Of particular interest to me was his analysis using a generalized additive model (GAM) to evaluate the relative performance of catchers’ ability to “frame” a catch; in other words, their ability to convince the umpire a strike occurred. The presentation held some relevance for everyone, whether they were interested in Cleveland baseball, statistics, or R, making it a terrific option for our first in-person presentation since January 2020. His presentation drove a lot of engagement both during and after the session.

Any techniques you recommend using for planning for or during the event? (Github, zoom, other) Can these techniques be used to make your group more inclusive to people that are unable to attend physical events in the future?  

One of our co-organizers, John Blischak, has created a slick website using GitHub Pages to showcase our group and used GitHub issue templates to create a process for speakers to submit talks. Additionally, the Cleveland R User group has posted recordings of our meetups to YouTube since 2017, increasing our visibility and accessibility. Many people at Progressive could not attend our September meetup and asked for the recording of our September 2023 meetup as soon as it was available.

Recently, we have also created a Discord server, a platform similar to Slack. This was suggested by one of our members, Ken Wong, and it has been a great addition to our community. We have been growing the server organically since October of last year by marketing it to attendees who visit our events, particularly on the Saturday morning meetups. This has opened up an additional space for us to collaborate and share content asynchronously. Ken has done an excellent job of organizing the server and has added some automated processes that post from R blogs, journal articles, and tweets from high-profile R users. Overall, we are pleased with our progress and look forward to continuing to improve our initiatives.

How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups organize, share information, and support each other worldwide. We have given grants over the past four years, encompassing over 68,000 members in 33 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute.

Apply Now! R Consortium Infrastructure Steering Committee (ISC) Grant Program Open for Proposals!

By Announcement, Blog

Help build R infrastructure! We’re opening the call for proposals for the 2024 Infrastructure Steering Committee (ISC) Grant Program. The R Consortium is dedicated to enriching the R Ecosystem, directly supporting projects that strengthen both its technical and social infrastructures.

Apply here!

What We Fund:

Our grants target projects that make a difference in the R community, focusing on:

Technical Infrastructure: Enhancements in key R packages, development tools like R-hub, and improvements for R on various operating systems.

Social Infrastructure: Projects like SatRDays that promote local engagement and initiatives for better tracking of R Consortium activities.

We’re eager to see your innovative ideas and how they can propel the R ecosystem forward. This is a call to action for all who wish to contribute to the growth and enhancement of R. Let’s build a stronger R community together!

Submit your proposal now and be a part of shaping the future of the R Ecosystem. Learn more about how to apply here.

We look forward to your submissions and furthering the R community’s advancement together!

Apply now!

The R Consortium 2023: A Year of Growth and Innovation

By Announcement, Blog

Excerpted from the Annual Report

Access the annual report here!

Letter from the Chair — Mehar Pratap Singh, Chairman

Welcome to the 2023 Annual Report of the R Consortium. This document reflects a year of significant growth, innovation, and community engagement within and beyond the R ecosystem. As we present the accomplishments and milestones of the past year, we also set our sights forward, laying out the path for an even more collaborative and impactful future.  

The R Consortium serves as a central hub for the R community, bringing together industry leaders, academic institutions, and individual contributors to foster the development and proliferation of the R language. Our mission is to support the R community through funding, infrastructure improvement, community initiatives, and global outreach.  

In 2023, the R Consortium played a pivotal role in shaping the development of the R ecosystem. Through monetary grants, nearly $200,000 dollars to develop R packages and other technical infrastructure, through fostering industry wide collaborative working groups, and by supporting R-Ladies, R user groups, and several important industry conferences, including Latin-R, New York R, and Bioconductor conferences. This report highlights some of these achievements, showcasing the collective effort of our members and the broader community.  

Recognizing the dynamic nature of data science technologies and the evolving needs of industry, we also recognize the responsibility of the R Consortium to help set a vision for the evolution of the R ecosystem. As you read through this report, we hope you’ll appreciate the strides we’ve made together and feel inspired by the potential of what we can achieve in the future. The R Consortium is more than just an organization: it’s a vibrant community of innovators, problem-solvers, and thought leaders. Together, we are shaping a future where the power of R is accessible to all and continues to drive progress across industries worldwide.   

Thank you for your continued support and dedication to the R Consortium and the wider R community. 

Access the annual report here!

Moffitt Cancer Center Bio-Data Club’s New Chapter in Spatial Data Analysis and Enhanced Hackathon Collaboration

By Blog

The R Consortium recently reconnected with Paul Stewart, founder of Moffitt Cancer Center Bio-Data Club in Tampa, Florida. Since the last update on January 6th, 2023, the Moffitt Cancer Center Bio-Data Club has hosted special guest Dr. Josh Starmer of StatQuest, and it has welcomed new co-organizers Rodrigo Carvajal, Nathan Van Bibber and Dr. Alex Soupir. The club has maintained its momentum with monthly meetings that have featured enriching discussions, educational talks, and practical tutorials.

One big change they have made this year is revamping their annual hackathon to broaden its scope and encourage greater participation from external academic institutions and industry partners. This expansion aims to enrich the event with diverse perspectives and innovative ideas, marking a significant step forward for the club and its contributions to bioinformatics and cancer research.

What is new with Moffitt Cancer Center Bio-Data Club since last we spoke on Jan 6th, 2023?

Something new is that we hosted sessions on spatial data analysis. Our work at Moffitt often involves big molecular data, delving into patients’ tumor samples or blood to uncover insights about genes, proteins, and metabolites. This exploration aims to unravel the intricacies of cancer, paving the way for new treatments or early detection methods. Traditionally, we analyze patient tumors in bulk, meaning the entire sample is processed at once, and molecules of interest are extracted and profiled. However, the resulting data are just numbers in a matrix, and we lack the ability to define what part of the tumor the numbers are coming from. New spatial technologies have recently revolutionized our understanding of cancer and other diseases. We can now spatially resolve where genes, proteins, and metabolites come from in the tumor and neighboring cells. This advancement adds a crucial spatial dimension to our research, necessitating novel methods for data processing, quality control, and interpretation. Not to mention, these approaches generate some cool pictures. For example, here is an image from a spatial assay run at Moffitt for a project that I lead (funded by the Cancer Research Institute):

I also want to touch on our hackathon. We’ve decided to broaden its scope this year, extending an open invitation to foster greater participation. Previously, attendance was mainly limited to the Bio-Data Club Meetup and our immediate connections at Moffitt. This year, we’re reaching out more actively to other academic institutions like the University of South Florida and industry partners. We are hoping to increase participation beyond last year’s 50 participants, and we are hoping to enrich the event with diverse perspectives and innovative ideas.

Please share about your background and your involvement in the R Community. What is your level of experience with the R language?

I helped initiate the Bio-Data Club at Moffitt back in 2018. It began as an internal group but soon gained interest from beyond Moffitt, leading us to secure funding from the R Consortium. Since then, I’ve been dedicated to leading the club. In addition to this, I mentor trainees at Moffitt, including Moffitt research staff and students from the University of South Florida.

I’m actively engaged in the local data science community; I’ve delivered lectures at the Tampa Bay R Users Group, the Tampa Bay Data Science Meetup, and, notably, at the 2023 D4CON Data Science Conference in Tampa, organized by Lander Analytics. (Editor’s note: Lander Analytics is an R Consortium member.) While my talks aren’t exclusively about the R programming language, they are intended to cater to the Tampa data science community.

My experience with R spans over a decade. As a Moffitt Cancer Center faculty member, I extensively leverage R in my research. I’d classify my proficiency as advanced, though I wouldn’t label myself an expert because I still learn new things about this great language daily.

Why do industry professionals come to your user group? What is the benefit of attending?

Being a part of Moffitt, located on the University of South Florida campus, our focus naturally gravitates toward biomedical academic research, and showcasing how data science operates within an academic research setting is beneficial. It offers a unique perspective and exposes attendees to cutting-edge techniques, like spatial omics analyses, which might not be part of the typical workload in a standard 9-to-5 job. However, our meetings must cater to a broad audience. Our meeting topics are applicable across many interests, one of which comes to mind was a presentation and demo by ComplexHeatmap author Dr. Zuguang Gu. We’re committed to broadening our discussions and introducing various topics and libraries relevant to R users and the broader data science community. My aim is to ensure that our meetings are inclusive, informative, and beneficial for everyone involved, irrespective of their field of work.

What trends do you currently see in R language and your industry? Any trends you see developing in the near future?

The realm of spatial omics and spatial data analysis, especially in the context of big biological data like genomics, proteomics, and metabolomics, is rapidly evolving. It’s fascinating to see the development of numerous packages, including spatialGE and scSpatialSIM, which are pioneered right here at Moffitt. These tools are a game-changer because they allow individuals who aren’t necessarily experts in imaging or spatial data analysis to engage in and benefit from this research.

As a bioinformatics or biological data science researcher, my research focuses on mass spectrometry data, which involves comprehensive profiling of proteins, metabolites, and lipids in tumors or blood. This is a fairly specialized field, yet even here, there’s the Cardinal R package tailored for spatial analyses. This progress is exciting and indicative of a significant trend in our field. This trajectory is not just a fleeting moment but a substantial shift that will persist and evolve, shaping the future of bioinformatics.

Please share any additional details you would like included in the blog. 

If you have a neat package or tool you would like to showcase, and please feel free to reach out at paul.stewart@moffitt.org. This is a great way for trainees or junior data scientists to get a presentation on their CV.

Moffitt is consistently looking for talent on the academic research side and the operational side. For anyone who is interested, I’d recommend visiting the Moffitt website.

I’m also excited to share that I’ll be presenting again at this year’s D4 conference in Tampa, scheduled for June 5th and 6th and hosted by Lander Analytics. Additionally, I want to shout out to the Tampa Bay Data Science Meetup and the Tampa Bay Data Engineering Meetup.

Our annual hackathon is set for December 12th and 13th, 2024. Details about the hackathon are forthcoming, but for those eager to stay informed, the best approach is to join our Bio-Data Club Meetup. We consistently post all the relevant updates there, ensuring you’re well-informed and prepared for the event. Mark your calendars for December 12th and 13th – it’s shaping to be an enriching and exciting experience!

How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups organize, share information, and support each other worldwide. We have given grants over the past four years, encompassing over 68,000 members in 33 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute.

Recap: R Validation Hub Community Meeting

By Blog

Join the R Validation Hub mailing list!

The recent R Validation Hub Community meeting brought together around 50 participants to explore the concept of “validation” within the R programming ecosystem. This session highlighted the diversity of validation perspectives, emphasizing the importance of tailored definitions across different roles, such as users, administrators, package developers, and regulatory agencies. Here are the key takeaways:

Key Insights:

  • Validation Perspectives: The meeting underscored the need for each organization to define “validation” in a way that suits its context, while the R Validation Hub offers a baseline for common understanding.
  • Statistical Methodology Challenges: Discussions acknowledged the challenges in achieving exact results across different programming languages due to inherent differences in statistical methodologies.
  • Open Source Contributions: The importance of returning testing code to package developers was highlighted, reinforcing the open-source ethos of collaboration and quality enhancement.
  • Resource Availability: The slides from the meeting are accessible on GitHub here. Although the meeting wasn’t recorded, the community is encouraged to join the R Validation Hub mailing list for future updates and meeting invites here.

Looking Forward:

The meeting reiterated the significance of the R Validation Hub as a central point for validation discussions and resources. Future community meetings are tentatively scheduled for May 21, August 20, and November 19, offering opportunities for further engagement and contribution to the evolving conversation around R validation.

Join the R Validation Hub mailing list!