window.intercomSettings = { app_id: "w29sqomy", custom_launcher_selector:'#open_web_chat' };Skip to main content
All Posts By

R Consortium

Join our R/Medicine Webinar: Quarto for Reproducible Medical Manuscripts

By Blog, Events

Join the R Consortium for an enlightening webinar on March 20th, 2024, at 4:00 PM ET, featuring Mine Cetinkaya-Rundel, Professor of the Practice of Statistical Science at Duke University. Discover the innovative Quarto tool to streamline the creation of reproducible, publication-ready manuscripts.

Register here!

Key Highlights:

  • Quarto Manuscripts Introduction: Learn how to easily integrate reproducibility into your research with Quarto’s user-friendly features, creating comprehensive bundled outputs ready for journal submission.
  • Interactive Demo: Witness a live demonstration of Quarto in action, showcasing how to enhance your current manuscript preparation process and address common challenges.
  • Expert Guidance: Gain insights from Mine Cetinkaya-Rundel’s extensive experience in statistical science and reproducible research, offering valuable tips for improving your workflow.

Event Details:

When: March 20th, 2024, at 4:00 PM ET

Don’t miss this opportunity to refine your manuscript preparation process with the latest advancements in reproducibility technology.

Register now!

R-Ladies Cotonou – A Community that Makes R Accessible for French-Speaking African Women

By Blog

Nadejda Sero, the founder of the R Ladies Cotonou chapter, shared with the R Consortium her experiences learning R, the challenges of running an R community in a developing country, and her plans for 2024. She also emphasized the importance of considering the realities of the local R community when organizing an R User Group (RUG). 

Please share about your background and involvement with the RUGS group.

My name is Nadejda Sero, and I am a plant population and theoretical ecologist. I have a Bachelor of Science in Forestry and Natural Resources Management and a Master of Science in Biostatistics from the University of Abomey-Calavi (Benin, West Africa). I discovered R during my Master’s studies in 2015. From the first coding class, I found R exciting and fun. However, as assignments became more challenging, I grew somewhat frustrated due to my lack of prior experience with a programming language. 

So, I jumped on Twitter (current X). I tweeted, “The most exciting thing I ever did is learning how to code in R!” The tweet caught the attention of members of the R Ladies global team. They asked if I was interested in spreading #rstats love with the women’s community in Benin. I was thrilled by the opportunity and thus began my journey with R-Ladies Global.

The early days were challenging due to the novelty of the experience. I did not know much about community building and social events organization. I started learning about the R-Ladies community and available resources. The most significant work was adjusting the resources/tools used by other chapters to fit my realities in Benin. My country, a small French-speaking developing African country, had poor internet access and few organizations focused on gender minorities. (We are doing slightly better now.) On top of that, I often needed to translate some materials into French for the chapter. 

As I struggled to make headway, the R-Ladies team launched a mentoring program for organizers. I was fortunate enough to participate in the pilot mentorship. The program helped me understand how to identify, adjust, and use the most effective tools for R-Ladies Cotonou. I also gained confidence as an organizer and with community work. With my fantastic mentor’s help, I revived the local chapter of R-Ladies in Cotonou, Benin. I later joined her in the R-Ladies Global team to manage the mentoring program. You can read more about my mentoring experience on the R-Ladies Global blog.

Happy members of R-Ladies Cotonou sharing some pastries after the presentation. At our first official meetup, the attendees discovered and learned everything about R-Ladies Global and R-Ladies Cotonou.

I am grateful for the opportunity to have been a part of the R-Ladies community these last six years. I also discovered other fantastic groups like AfricaR. I am particularly proud of the journey with R-Ladies Cotonou. I am also thankful to the people who support us and contribute to keeping R-Ladies Cotonou alive. 

Can you share what the R community is like in Benin? 

R has been commonly used in academia and more moderately in the professional world over the past 2-3 years. For example, I worked with people from different areas of science. I worked in a laboratory where people came to us needing data analysts or biostatisticians. We always used R for such tasks, and many registered in R training sessions. The participants of these sessions also came from the professional world and public health. I have been out of the country for a while now, but the R community is booming. More people are interested in learning and using R in different settings and fields. I recently heard that people are fascinated with R for machine learning and artificial intelligence. It is exciting to see that people are integrating R into various fields. There are also a few more training opportunities for R enthusiasts. 

Can you tell us about your plans for the R Ladies Cotonou for the new year?

More meetups from our Beninese community, other R-Ladies chapters, and allies. 

We are planning a series of meetups that feature students from the training “Science des Données au Féminin en Afrique,” a data science with R program for francophone women organized by the Benin chapter of OWSD (Organization for Women in Science for the Developing World). We have three initial speakers for the series: the student who won the excellence prize and the two grantees from R-Ladies Cotonou. The program is an online training requiring good internet, which is unfortunately expensive and unreliable. If you want good internet, you must pay the price. 

R-Ladies Cotonou supported two students (from Benin and Burkina Faso) by creating a small “internet access” grant using the R Consortium grant received in 2020. 

The meetup speaker is taking us through a review of the most practical methods of importing and exporting datasets in R. The attendees are listening and taking notes.

This next series of meetups will focus on R tutorials with a bonus. The speakers will additionally share their stories embracing R through the training. The first speaker, Jospine Doris Abadassi, will discuss dashboard creation with Shiny and its potential applications to public health. I hope more folks from the training join the series to share their favorite R tools. 

I believe these meetups will assist in expanding not only the R-Ladies but the entire R community. I particularly enjoy it when local people share what they have learned. It further motivates the participants to be bold with R. 

About “Science des Données au Féminin en Afrique“, it is the first time I know that a data science training is free for specifically African women from French-speaking areas. Initiated by Dr. Bernice Bancole and Prof. Thierry Warin, the program trains 100 African francophone women in data science using R, emphasizing projects focused on societal problem resolution. The training concluded its first batch and is now recruiting for the second round. So, the community has expanded, and a few more people are using R. I appreciate that the training focuses on helping people develop projects that address societal issues. I believe that it enriches the community.

As I said in my last interview with the R consortium, “In some parts of the world, before expecting to find R users or a vivid R community, you first need to create favorable conditions for their birth – teach people what R is and its usefulness in professional, academic, and even artistic life.” It is especially true in Benin, whose official language is French. English is at least a third language for the average multilingual Beninese. Many people are uncomfortable or restrained in using R since most R materials are in English. I hope this OWSD Benin training receives all the contributions to keep running long-term. You can reach the leading team at owsd.benin@gmail.com.

Our other plan is to collaborate with other R-Ladies chapters and RUGS who speak French. If you speak French and want to teach us something, please email cotonou@rladies.org.

 Otherwise, I will be working on welcoming and assisting new organizers for our chapter. So, for anyone interested, please email cotonou@rladies.org.

Are you guys currently hosting your events online or in-person? And what are your plans for hosting events in 2024?

We used to hold in-person events when we started. Then, the COVID-19 pandemic hit, and we had to decide whether to hold events online. Organizing online events became challenging due to Cotonou’s lack of reliable internet access or expensive packages. As a result, we only held one online event with poor attendance. We took a long break from our activities.

Going forward, our events will be hybrid, a mix of in-person and online events. In-person events will allow attendees to use the existing infrastructure of computers and internet access of our allies. It also offers an opportunity to interact with participants. Therefore, I am working with people in Cotonou to identify locations with consistent internet access where attendees can go to attend the meetups. Online events will be necessary to accommodate speakers from outside of the country. It will be open to attendees unable to make it in person.

Any techniques you recommend using for planning for or during the event? (Github, zoom, other) Can these techniques be used to make your group more inclusive to people that are unable to attend physical events in the future?  

The techniques and tools should depend on the realities of the community. What language is comfortable for attendees? What meeting modality, online or in person, works best for participants? 

As mentioned earlier, I was inexperienced, and organizing a chapter was daunting. My mentoring experience shifted my perspective. I realized that I needed to adjust many available resources/tools. Organizing meetups became easier as I integrated all these factors. 

For example, our chapter prioritizes other communication and advertisement tools like regular emails and WhatsApp. The group is mildly active on social media, where the R community is alive (X/Twitter, Mastodon). It is easier to have a WhatsApp group to share information due to its popularity within our community. We recently created an Instagram account and will get LinkedIn and Facebook pages (with more co-organizers). I would love a website to centralize everything related to R-Ladies Cotonou. Using emails is an adjustment to Meetup, which is unpopular in Benin. Getting sponsors or partners and providing a few small grants for good internet would help tremendously our future online events.

Adjusting helps us to reach people where they are. It is imperative to consider the community, its realities, and its needs. I often asked our meetup participants their expectations, “What do you anticipate from us?” “What would you like to see in the future?” Then, I take notes. Also, we have Google Forms to collect comments, suggestions, potential speakers, contributors, and preferred meeting times. It is crucial to encourage people to participate, especially gender minorities less accustomed to such gatherings.

I have also attempted to make the meetups more welcoming and friendly in recent years. I always had some food/snacks and drinks available (thanks to friends and allies). It helps make people feel at ease and focus better. I hope the tradition continues for in-person meetups. It is valuable to make the meetups welcoming and friendly. How people feel is essential. If they come and feel like it is a regular lecture or course, they may decide to skip it. But, if they come to the meetup and learn while having fun, or at the very least, enjoy it a little, it benefits everyone. 

These are some of the key aspects to consider when organizing a meetup. It is critical to consider the people since you are doing it for them. Also, make sure you have support and many co-organizers if possible.

All materials live on our GitHub page for people who can’t attend physical events. Another solution would be recording and uploading the session on the R-Ladies Global YouTube or our channel. 

What industry are you currently in? How do you use R in your work?

I am now a Ph.D. student in Ecology and Evolutionary Biology at the University of Tennessee in Knoxville. 

R has no longer been my first programming language since I started graduate school. I still use R for data tidying data analysis but less extensively. I worked a lot with R as a master’s student and Biostatistician. It was constant learning and growth as a programmer. I had a lot of fun writing my first local package. However, I now work more with mathematical software like Maple and Mathematica. I wish R were as smooth and intuitive as this software for mathematical modeling. I like translating Maple code to R code, especially when I need to make visualizations. 

I am addicted to ggplot2 for graphs. I love learning new programming languages but am really attached to R (it’s a 9-year-old relationship now). I developed many skills while programming in R. R helped me become intuitive, a fast learner, and sharp with other programming languages. 

My most recent project that utilized R, from beginning to end, was a project in my current lab on the evolutionary strategies of plants in stochastic environments. We used R for demographic data tidying and wrangling. Data analysis was a mix of statistical and mathematical models. It was a good occasion to practice writing functions and use new packages. I enjoy writing functions for any task to automate repetitive tasks, which reduces the need for copying and pasting code. I also learned more subtleties in analyzing demographic data from my advisor and colleagues who have used R longer. 

How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups organize, share information, and support each other worldwide. We have given grants over the past four years, encompassing over 68,000 members in 33 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute.

Unlocking the Power of R for Insurance and Actuarial Science: Webinar Series Recap

By Announcement, Blog

The R Consortium recently hosted a webinar series tailored specifically for insurance and actuarial science professionals. This series, called the R/Insurance webinar series, led by experts Georgios Bakoloukas and Benedikt Schamberger, was crafted to guide attendees from transitioning from Excel to R to implementing R in production environments, fostering a performance culture with R, and mastering high-performance programming techniques. 

Whether new to R or looking to deepen your expertise, these webinars offer valuable insights into leveraging R’s capabilities in your field. All sessions are now accessible on YouTube, providing a fantastic resource for ongoing learning and development. 


For further details and to watch the webinars, visit the R Consortium’s website.

Ann Arbor R User Group: Harnessing the Power of R and GitHub

By Blog

The R Consortium talked to Barry Decicco, founder, and organizer of the Ann Arbor R User Group, based in Ann Arbor, Michigan. Barry shared his experience working with R as a statistician and highlighted the current trends in the R language in his industry. He also emphasized the significance of organizing regular events and effective communication for managing an R User Group (RUG).

Please share about your background and involvement with the RUGS group.

Throughout my professional career, I have gained extensive experience in various industries as a statistician. Statisticians are often thought of as either staying in one industry for their entire career or frequently transitioning between them. I have followed the latter path, having held positions at Ford Motor Company, their spinoff Visteon, the University of Michigan School of Nursing, the University of Michigan Health System, Nissan Motor Company, Volkswagen Credit (as a contractor), Michigan State University, and currently Quality Insights.

I have been using the R programming language consistently for several years now. I have extensively worked with R during my tenure at Michigan State University as a member of the Center for Statistical Training and Research (CSTAT). CSTAT serves as the university’s statistical laboratory. Our team heavily relied on R as our preferred software for statistical analysis.

Our reporting process involved using R Markdown reports. Steven Pierce, the assistant director, developed a highly complex and upgradeable system using R Markdown to process data. This system allowed us to initiate a report and then trigger the R Markdown file to process the data and generate the final datasets for each report. Another R Markdown file was then called to render the report. This streamlined process enabled us to produce about 40 PDF reports within 45 minutes. The process remained relatively straightforward when we needed to make modifications, such as changing the reporting period from fiscal years to calendar years or adding or subtracting individuals, units, or departments.

I have recently started a new job primarily working with the SAS programming language. Initially, I will focus on gaining proficiency in this area. After that, I will transition to performing more in-depth analysis and ad hoc reporting, requiring me to use additional tools and resources. I have also moved to a new system where we use Hive or Hadoop through Databricks. As part of my role, I am responsible for taking over the current reporting system and identifying future reporting needs. This will require me to use R extensively.

Before the COVID pandemic, the R users group met in Ann Arbor. However, the pandemic dealt a major blow to the group, and we are still recovering from its impact. In our efforts to revive the group, we continued with the same theme as before: a mix of programming and statistics. However, we have been focusing more on programming and simpler analysis to make it easier to get the group restarted. We have also introduced some new presenters covering topics such as machine learning pipelines in their presentations.

Can you share what the R community is like in Ann Arbor?

R has become a popular programming language in academia and will likely remain relevant in this field. However, general coding and applications are more prevalent in the industrial sector. Python is gaining popularity because it attracts a broader range of programmers, including those who are not data or analytics specialists. Therefore, R will continue to be a significant but specialized tool.

Currently, I have noticed a significant decrease in the usage of SAS. This trend is driven by the dislike of license fees among individual and corporate users. The matter is further complicated by corporate accounting practices, where different funding sources may have varying spending restrictions. As a result, organizations may end up incurring higher salary expenses because of the complexity of corporate accounting processes.

If a company spends a fixed amount, say $10,000, on SAS licenses yearly, it might not like it. But then, it may hire additional staff to do the same work SAS did earlier. The salary of these people, and other associated costs, may come from a different funding source. As a result, the company may spend a significant amount of money, ranging from $120,000 to $150,000 annually, to replace a smaller amount of $10,000 to $20,000 annually. However, whether this arrangement is acceptable would depend on the funding source.

Do you have an upcoming meeting planned? What are your plans for the RUG for this year?

Our next presenter, Brittany Buggs, Staff Data Analyst at Rocket Mortgage, will demonstrate the usage of the GT package for generating tables. Additionally, we are striving to establish closer integration with the Ann Arbor chapter of the American Statistical Association to foster mutual support and collaboration between the groups. We have been conducting hybrid meetings catering to in-person and virtual attendees. Ann Arbor Spark, a local startup business development organization, has generously provided us with a physical meeting space. Our meetings follow a hybrid format, recognizing the convenience and accessibility it offers to many individuals.

This year, I aim to have more presenters as I have been doing all the presentations by myself. I plan to raise awareness about R, R Markdown, and Quarto and show people how these tools can be useful. I will promote these tools at the University of Michigan and other companies.

What trends do you currently see in the R language?

When it comes to data analysis, R has a clear advantage. The tidyverse syntax is easy to understand, even for those unfamiliar with data tables or Pandas-like programming paradigms.

When working with data tables, both base R and Pandas use programming languages that differ significantly from English, which can make understanding them difficult. On the other hand, R Markdown has a notable advantage in that it makes it easy and quick to generate HTML documents. For instance, my former supervisor at C-STAT spent much time creating visually appealing PDF documents because his reports were highly customized. However, if your main goal is to produce polished reports relatively quickly, R Markdown is the better option.

I understand that my main focus is the transition to Quarto. As someone who used to work with R Markdown, I have been learning more about Quarto and adjusting to its features. However, I am concerned about how new users may react to Quarto. I plan to give presentations throughout the year to gauge their responses and better understand any potential issues that may arise.

Moreover, I’ve noticed that many people are unaware of R Markdown’s capabilities. To address this, I conducted an introductory session on R Markdown for a group at the University of Michigan. During my thirty-minute presentation, the participants were surprised by the diverse functionalities of R Markdown, as they were used to working with JavaScript and basic R. Although I had inferior knowledge compared to some of the individuals in the group, my ability to perform certain tasks using R Markdown impressed them.

One of the benefits of R Markdown is its ability to run multiple languages, with each language being executed chunk by chunk. I hope Quarto will also support this feature.

In the past, I have presented on calling R from SAS and SAS from R. During these presentations, I demonstrated how to run a SAS job within an R chunk. However, this approach has a limitation. For it to work, SAS must be accessible from the computer running the R code. This means the SAS installation must be on the computer or a network drive that the computer recognizes as a local drive. On a certain occasion, while using Enterprise Guide on a Linux machine, I faced a problem. I couldn’t locate the executable file (EXE) for SAS from my computer, which obstructed me from executing a SAS job.

It is now possible for individuals to use R Markdown with their preferred programming languages. For instance, R Markdown can be used with Pandas for most cases, which can help individuals produce visually appealing reports quickly. With this approach, all the work can be done within Pandas, and users need only basic knowledge of R. Therefore, Quarto can be seen as a language for report writing only. I will keep an eye on this situation and evaluate its effectiveness.

I want to highlight the smooth combination of Git and GitHub with R. I use GitHub frequently in my work, though I am not very skilled because RStudio IDE fulfills most of my requirements. I rarely face conflicts due to my carelessness; I must interact with Git and GitHub manually.

I highly recommend the book “Happy Git with R” as an essential resource for beginners. This comprehensive guide provides a step-by-step approach to setting up and using Git and GitHub effectively in R.

When using Git in conjunction with R, you can access a detailed transaction history that can be reviewed anytime. I have found this feature incredibly useful and have been able to recover important work using this method. As a data management instructor at MSU, I have also taught my students how to execute this process manually. However, having R Studio automatically handle this task is much more convenient.

In fact, I used SPSS to conduct a project and leveraged GitHub as an experiment. I utilized the data management capabilities of RStudio and found the results satisfactory.

Any techniques you recommend using for planning for or during the event? (Github, zoom, other) Can these techniques be used to make your group more inclusive to people that are unable to attend physical events in the future?   

I suggest that RUG organizers should arrange regular monthly meetings. It would be advantageous to fix these meetings on the same day and time every month, as it will help attendees get accustomed to the routine and know when to expect them.

In my years of working with different groups, I have noticed that if we don’t consciously communicate regularly, our communication will become less effective over time. This can lead to a lack of new ideas and engagement, and we may unintentionally exclude potential participants.

For almost 20 years, I have been part of a group that communicated through a university mailing list. However, we faced difficulties as the list was not easily discoverable through search engines like Google. This made it challenging for new individuals to find or contact us. We have taken steps to tackle this problem by introducing Meetup as a new tool that can be used alongside or instead of our traditional mailing list. The main benefit of Meetup is that it is easily searchable on Google, which makes it simple for anyone to locate and get in touch with our group.

I want to emphasize the importance of effective communication. Neglecting communication efforts can cause a decline in communication quality. I have personally witnessed this happening in different groups, and I have seen others experiencing similar challenges.

How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups organize, share information, and support each other worldwide. We have given grants over the past four years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute.

Unraveling the term “Validation”: Join the Discussion at the R Validation Hub Community Meeting on February 20, 2024 

By Announcement, Blog

Dive into the world of validation at the first R Validation Hub community meeting of the year! What defines a validated R package? Is it ensuring reproducibility across systems? Prioritizing bug-free and well-maintained packages? We want to hear YOUR take!

Join the community call! (Microsoft Teams meeting) 

Meeting Details

  • When: February 20, 12:00 EST
  • Where: Virtual meeting

Why Attend?

This is your chance to share your perspective, learn from diverse viewpoints, and help shape the future of validation in the R ecosystem. Whether you’re a developer, user, or enthusiast, your insights are valuable.

Let’s Discuss

What does validation mean in the R world to you? Join us to debate, learn, and network. Mark your calendars and prepare to contribute to shaping the standards of R package validation.

Join the call here! 

R Consortium Infrastructure Steering Committee (ISC) Grant Program Accepting Proposals starting March 1st!

By Announcement, Blog

The R Consortium is excited to announce the opening of our call for proposals for the 2024 Infrastructure Steering Committee (ISC) Grant Program on March 1st, 2024. This initiative is a cornerstone of our commitment to bolstering and enhancing the R Ecosystem. We fund projects contributing to the R community’s technical and social infrastructures.

Submit your proposal here!

Enhancing the R Ecosystem: Technical and Social Infrastructures

Our past funding endeavors have spanned a variety of projects, illustrating our dedication to comprehensive ecosystem support:

  • Technical Infrastructure: Examples include R-hub, a centralized tool for R package checking, enhancements in popular packages like mapview and sf, and ongoing infrastructural development for R on Windows and macOS.
  • Social Infrastructure: Initiatives such as SatRDays, which facilitates local R conferences, and projects for data-driven tracking of R Consortium activities.

Focused Funding Areas

The ISC is particularly interested in projects that align with technical or software development that aids social infrastructure. It’s important to note that conferences, training sessions, and user groups are supported through the RUGS program, not the ISC grants.

Ideal ISC Projects

We look for proposals that:

  • Have a broad impact on the R community.
  • Possess a clear, focused scope. Larger projects should be broken down into manageable stages.
  • Represent low-to-medium risk and reward. High-risk, high-reward projects are generally not within our funding scope.

Projects unlikely to receive funding are those that:

  • Only impact a small segment of the R community.
  • Seek sponsorship for conferences, workshops, or meetups.
  • Are highly exploratory.

Important Dates

  • First Grant Cycle: Opens March 1, 2024, and closes April 1, 2024.
  • Second Grant Cycle: Opens September 1, 2024, and closes October 1, 2024.

You can learn more about submitting a proposal here. 

We eagerly await your proposals and are excited to see how your ideas will propel the R community forward. Let’s build R together!

Improving with R: Kylie Bemis Unveils Enhanced Signal Processing with Matter 2.4 Upgrade

By Blog

The R Consortium recently connected with Kylie Bemis, assistant teaching professor at the Khoury College of Computer Sciences at Northeastern University. She has a keen interest in statistical computing frameworks and techniques for analyzing intricate data, particularly focusing on datasets with complex correlation patterns or those that amalgamate data from various origins. 

Kylie created matter, an R package that offers adaptable data structures suitable for in-memory computing on both dense and sparse arrays, incorporating multiple features tailored for processing nonuniform signals, including mass spectra and various other types of spectral data. Recently, Kylie upgraded matter to version 2.4. Since our May 2023 discussion, Kylie has enhanced its signal processing capabilities, focusing on analytical tools like principal component analysis and dimension reduction algorithms, which are crucial for imaging and spectral data. A grant from the R Consortium supports this project.

We talked with you about matter in May 2023. You were providing support for matter and looking to improve the handling of larger data sets and sparse non-uniform signals. matter has been updated to version 2.4. What’s new?

Last time we spoke, I had already rewritten most of the matter infrastructure in C++ for better maintainability. Since then, my focus has been on enhancing our signal processing capabilities. This summer, I’ve been adding essential signal processing functions and basic analytical tools, which are particularly useful in fields dealing with spectra or various types of signals.

I’ve incorporated fundamental techniques like principal component analysis, non-negative matrix factorization, and partial least squares. I’ve also added several dimension reduction algorithms and a range of signal processing tools for both 1D and 2D signals. This includes smoothing algorithms for images and 1D signals and warping tools applicable to both. 

These enhancements are crucial for working with imaging and spectral data and include features like distance calculation and nearest neighbor search.

My aim has been to augment matter with robust signal processing tools, particularly for sparse and non-uniform signals. This is inspired by my experience in augmented reality (AR) and my desire to integrate tools similar to MATLAB’s Signal Processing Toolbox or SciPy in Python. As someone primarily analyzing mass spectrometry imaging data, I found these tools initially in my Cardinal package. I wanted to transfer them to a more appropriate platform, not specific to mass imaging, and reduce Cardinal’s reliance on compiled native code for easier version updates.

Additionally, I’ve been building a more robust testing infrastructure for these tools and documenting them thoroughly, including citations for the algorithms I used for key picking and smoothing techniques. This documentation details the implementation of various algorithms, such as guided filtering and nonlinear diffusion smoothing, citing the sources of these algorithms.

By providing support for non-uniform signal data, matter provides a back end to mass spectrometry imaging data. But working with large files is applicable in a lot of domains. What are some examples?

I deal with large files and data sets across various fields. Matter can be particularly impactful in areas dealing with signal, spectral, or imaging data. One field that comes to mind is remote sensing, where the imaging tools I’ve incorporated would be highly beneficial. That’s one key application area.

Another field is biomedical imaging, especially MRI data. For instance, a data format we often use for mass spectrometry imaging was originally developed for MRI – it’s called Analyze, and there’s a more recent variant known as NIfTI. This format is also supported in Cardinal for mass spec imaging data, but it’s primarily used in MRI and fMRI data analysis. While matter doesn’t directly offer plug-and-play functionality for MRI data, with some modifications, it could certainly be adapted for importing and processing MRI data stored in these formats.

We don’t have a specific function to read NIfTI files directly, but the structure of these files is quite similar to the mass imaging files we commonly work with. They consist of a binary file organized in a particular format, with a header that functions like a C or C++ struct, indicating where different image slices are stored. Understanding and interpreting this header, which is well-documented elsewhere, is key.

So, with some effort to read and attach the header file correctly, it’s entirely feasible to build a function for reading and importing MRI data. We’ve already done something similar with the Analyze format. Someone could definitely examine our approach and develop a method to handle MRI data effectively.

Previously, you indicated you wanted to improve R data frames and string support. You have a prototype data frame in the package already? What’s the schedule for improvements?

I’m currently evaluating how far we’ll expand certain features in our project. One of these features is supporting strings, which is already implemented. Regarding data frames, I believe there might be better solutions out there, but it’s quite simple to integrate our data with them. For instance, taking a vector or an array, whether a matter matrix or a matter vector, and inserting it into a data frame column works well, particularly with Bioconductor data frames.

I’m not entirely convinced that developing standalone, specialized data frame support in matter is necessary. It seems that other platforms, especially those like Bioconductor, are already making significant advancements in this area. For now, it seems sufficient that users can easily incorporate a matter vector or array into a data frame column. I’m hesitant to duplicate efforts or create overlapping functionalities with what’s already being done in this field.

What’s the best way for someone to try matter? How should someone get started?

Like any Bioconductor package, we offer a vignette on the Bioconductor website. This vignette provides a basic guide on how to start using our package, including creating matrices and arrays. It shows how these can serve as building blocks to construct larger matrices, arrays, and vectors. This is a straightforward way for users to begin.

Regarding the applicability of our package, it really depends on the specific data needs of the user. For instance, our package provides out-of-memory matrices and arrays. If that’s the primary requirement, then our package is certainly suitable. However, there are other packages, both in Bioconductor, like HDF5 array support, and on CRAN, such as big memory and FF, that also offer similar functionalities.

The real advantage of our package becomes apparent when dealing with specific data types. If you’re working with data formats like MRI, where you have a binary file and a clear understanding of its format, our package can be very helpful. It simplifies attaching different parts of the file to an R data structure.

Moreover, if your work involves signal data, particularly non-uniform signals like those in mass spectrometry or imaging data, our package becomes even more beneficial. Over the summer, I’ve added extensive support for preprocessing, dimension reduction, and other processes that are crucial for handling these types of data. So, in these scenarios, our package can be a valuable tool.

Anything else you would like to share about matter 2.0?

I’ve spent much of the summer working on improvements to the matter package, and it’s now in a good place, particularly regarding signal processing capabilities. These enhancements are largely aligned with the needs of mass spectrometry, an area I closely focus on. As new requirements emerge in mass spectrometry, I’ll look to add relevant features to matter, particularly in signal and image processing.

However, my current priority is updating the Cardinal package to support all these recent changes in matter. Ensuring that Cardinal is fully compatible with the new functionalities in matter is my next major goal, and I’m eager to get started on this as soon as possible.

About ISC Funded Projects

A major goal of the R Consortium is to strengthen and improve the infrastructure supporting the R Ecosystem. We seek to accomplish this by funding projects that will improve both technical infrastructure and social infrastructure.

Join Our Upcoming Webinar: Master Tidy Finance & Access Financial Data with Expert Christoph Scheuch

By Announcement, Blog

Are you passionate about financial economics and eager to learn more about empirical research methods? Then our upcoming webinar is an unmissable opportunity for you!

Discover Tidy Finance: A Revolutionary Approach in Financial Economics

Tidy Finance isn’t just a method; it’s a movement in financial economics. This webinar will introduce you to this innovative approach, which is grounded in the principles of transparency and reproducibility. With a focus on open source code in both R and Python, Tidy Finance is changing the game for students and instructors alike. You’ll learn about its applications in empirical research and how it’s reshaping the way we teach and learn in the financial domain.

What You’ll Learn

  • Introduction to Tidy Finance (10 mins): Get an overview of Tidy Finance principles and its significance in the field of financial economics.
  • Accessing and Managing Financial Data (20 mins): Dive into the practical aspects of using R to import, organize, and manage various data sets.
  • WRDS & Other Data Providers (10 mins): Explore different data providers, including open source and proprietary options.
  • Q&A Session (15 mins): Have your queries addressed directly by Christoph in an interactive Q&A session.

Who Should Attend?

This webinar is tailored for students, professionals, and anyone with an interest in financial economics, data management, and empirical research. Whether you’re just starting or looking to deepen your understanding, this webinar will provide valuable insights and practical knowledge.

Register now to secure your spot in this enlightening session. Embrace the opportunity to learn from a leading expert and elevate your understanding of Tidy Finance and financial data management.

Register here! 

📅 Mark your calendars and join us for this educational journey! 🚀

Natalia Andriychuk on RUGs, Pfizer R Center of Excellence, and Open Source Projects: Fostering R Communities Inside and Out

By Blog

The R Consortium recently talked with Natalia Andriychuk, Statistical Data Scientist at Pfizer and co-founder of the RTP R User Group (Research Triangle Park in Raleigh, North Carolina), to get details about her experience supporting the Pfizer R community and starting a local R user group. 

She started her R journey over 7 years ago, and since then, she has been passionate about open source development. She is a member of the Interactive Safety Graphics Task Force within the American Statistical Association Biopharmaceutical Safety Working Group, which is developing graphical tools for the drug safety community. 

Natalia Andriychuk at posit:conf 2023

Please share your background and involvement with the R community at Pfizer and beyond.

From 2015 to 2022, I worked at a CRO (Contract Research Organization) in various roles, where I discovered my passion for Data Science after being introduced to R, JavaScript, and D3 by my talented colleagues. I became a part of an amazing team where I learned valuable skills.

Later, when I began looking for new career opportunities, I knew that I wanted to focus on R. I sought a role that would deepen my R skills and further advance my R knowledge. This is how I came to join Pfizer in 2022 and became a part of the amazing team. I am a Statistical Data Scientist in the R Center of Excellence SWAT (Scientific Workflows and Analytic Tools) team.

Pfizer SWAT team at posit::conf2023 (left to right: Natalia Andriychuk, Mike K Smith, Sam Parmar, James Kim)

The R Center of Excellence (CoE) supports various business lines at Pfizer. We provide technical expertise, develop training on R and associated tools, promote best practices, and build a community of R users within Pfizer. Our community currently consists of over 1,200 members. 

I will present Pfizer’s R CoE progress and initiatives during the R Consortium R Adoption Series Webinar on February 8th at 3:00 pm EST. 

My first introduction to the R community was through the posit::conf (previously known as rstudio::conf) in 2018. Attending the conference allowed me to witness the welcoming nature of the R community. Five years later, in 2023, I made it to the speakers’ list and presented at the posit::conf 2023. It was an incredible experience!

I also follow several other avenues to connect with R community members. As the name suggests, I read R Weekly weekly and attend the Data Science Hangout led by Rachael Dempsey at Posit. Every Thursday, Rachael invites a data science community leader to be a featured guest and share their unique experiences with the audience. Fortunately, I was invited as a featured guest to one of the Posit Data Science Hangouts. I shared my experience organizing and hosting an internal R at Pfizer Hangout. 

Can you share your experience of starting the RTP (Research Triangle Park) R User Group?

Nicholas Masel and I co-organize the RTP R User Group in our area. We formed the RTP R User Group in 2023 and have held three meetings‌: meet-and-greet, social hour, and a posit::conf 2023 watch party. 

RTP R User Group Social Hour Gathering.

We hope to expand and increase attendance at our meetups in 2024. We currently have approximately 74 members who joined the online meetup group, and we look forward to meeting all of them in person moving forward. 

Can you share what the R community is like in the RTP area? 

Nicholas and I both work in the pharmaceutical industry, and thus far, our in-person user group meetings have predominantly included individuals from this field. However, we want to emphasize that our user group is open to everyone, regardless of industry or background. 

The RTP area has great potential for a thriving R User Group. We are surrounded by three major universities (University of North Carolina at Chapel Hill, Duke University, and North Carolina State University), the growing high-technology community and a notable concentration of life science companies. We anticipate attracting more students in the coming year, especially those studying biostatistics or statistics and using R in their coursework. We also look forward to welcoming individuals from various industries and backgrounds to foster a rich and collaborative R user community.

Please share about a project you are working on or have worked on using the R language. Goal/reason, result, anything interesting, especially related to the industry you work in?

I am an open source development advocate, believing in the transformative power of collaborative innovation and knowledge sharing. I am a member of the Interactive Safety Graphics (ISG) Task Force, part of the American Statistical Association Biopharmaceutical Safety Working Group. The group comprises volunteers from the pharmaceutical industry, regulatory agencies, and academia to develop creative and innovative interactive graphical tools following the open source paradigm. Our task force is developing a collection of R packages for clinical trial safety evaluation. The {safetyGraphics} package we developed provides an easy-to-use shiny interface for creating shareable safety graphics for any clinical study. 

{safetyGraphics} supports multiple chart typesincluding web-based interactive graphics using {htmlwidgets}

We are preparing to share three new interactive visualizations we developed in 2023 during the upcoming ASA-DIA Safety Working Group Quarterly Scientific Webinar – Q1 2024 on January 30 (11:00 – 12:30 EST). Participating in the ISG Task Force has been an invaluable experience that allowed me to learn from talented data scientists and expand my professional network. 

How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups organize, share information, and support each other worldwide. We have given grants over the past four years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute.

Webinar for R and Databases!  How Oracle Machine Learning for R Helps with ML and Massive Datasets

By Announcement, Blog

Are you seeking faster R data processing and enhanced machine learning capabilities for massive datasets in databases? Look no further. Join us in our upcoming webinar to discover how Oracle Machine Learning for R (OML4R) can transform your data analysis and machine learning endeavors.

Webinar Highlights:

  • Seamless In-Database Access: Engage directly with your data, eliminating the need for time-consuming extractions.
  • High-Performance Data Processing: Tackle massive datasets with unmatched ease and efficiency.
  • Integrated Machine Learning: Develop and deploy powerful models within your database, streamlining your data science workflow.
  • Simplified Production Deployment: Transition your R scripts to production effortlessly, making your projects more impactful.

We’ll also demonstrate real-world applications, including product bundling, demand forecasting, and customer churn prediction, showcasing OML4R’s potential to revolutionize your R workflows.

Don’t Miss Out!

Elevate your data science skills and streamline your processes. Register now for our webinar and unlock the full potential of in-database analytics with OML4R. Take your R to the next level.

Register here! 

Register Now and transform your approach to data analysis with Oracle Machine Learning for R.