window.intercomSettings = { app_id: "w29sqomy", custom_launcher_selector:'#open_web_chat' };Skip to main content
Category

Blog

Bridging Ecology, Statistics, and Data Science with R for Biodiversity and Climate Change Research

By Blog

Francisco Rodríguez Sánchez shared with the R Consortium how the R language has changed and improved his work as a computational ecologist, leading him to become a passionate advocate, professional, and promoter of R, especially in the context of environmental causes.


Francisco is a scientist specializing in computational ecology who works at the intersection of various disciplines, such as ecology, biogeography, statistics, and data science. His research focuses on understanding and predicting the effects of climate change on biodiversity. To achieve this, he combines field observations with computational approaches that rely on analyzing large datasets, complex statistical models, and reproducible workflows. He is also interested in developing new quantitative methods and computer tools that facilitate reproducible research.


Besides his research, he also teaches ecology, statistics, and programming, playing an important role as a founding member and coordinator of two groups: the ‘R Users Group’ in Seville and the Ecoinformatics working group of Asociación Española de Ecología Terrestre (AEET), both with the objective of promoting good statistical and programming practices among ecologists.

Among his hobbies, he enjoys being in nature, listening to flamenco music and playing the guitar, spending time with friends and family, traveling, and reading.

How was the creation of Seville R carried out? Was it well accepted by the community? What were the challenges you faced during the pandemic?

It has been a wonderful experience, full of ups and downs as it happens in many groups. Our first meeting took place at a cocktail bar here in Seville, where we were given space. Approximately 15 to 20 people attended, which was quite good considering Seville has around 700,000 inhabitants. Most of the attendees were colleagues, but we also had people who discovered the event through Twitter or maybe through our blog. For several years, we gathered there with great interest and turnout. Among other great talks, we were lucky to have Karthik Ram from rOpenSci and Romain François from Posit (Rstudio), who came together in the first year of the group’s foundation. At that time, dplyr had just been released and quickly growing up, so we had the scoop on how it worked, and it was very impressive. Karthik also gave an amazing talk on the importance of reproducible research. The news spread and many people in Spain started to closely follow the Sevilla R Users Group.

We held monthly or quarterly meetings, attracting people from both academia and the private sector, until the pandemic hit, and everything came to a halt. The pandemic times were challenging, and we had only one or two online meetings. We highly valued the opportunity to physically gather in a place. Usually, after the talks, we would go to a bar for a drink and continue discussing projects and what we were doing. Online meetings are more distant and colder; they do not motivate us as much.

In the online mode, during the live broadcasts, there were very few people connected, and there were rarely any questions. It was not so easy to achieve that sense of community. That is why we decided to stop the online meetings since we were all busy and did not enjoy that format as much. We were on pause for a while, but fortunately, we have resumed in-person meetings. Thanks to the addition of new people with a lot of energy and initiative, we are now quite active, and everything is going very well.

What’s your level of experience with the R language?

I am not a professional programmer, but I am self-taught. However, I dedicate a significant amount of time to learning and enjoying programming, as it is very useful for my work and helps me solve many of my problems. Over time, I have acquired intermediate to advanced-level knowledge through self-teaching. Although I am not a professional programmer and there are still many things that elude me, I consider myself quite competent in R. In my community, people usually come to me when they have programming or analysis problems; they know me as “Paco, the R guy.”

Please share about a project you are currently working on or have worked on in the past using the R language.

I use R mostly for teaching or developing research projects, but I have also developed several packages. One of the most recent is called CityShadeMapper. We worked on it last year, and we are currently in the final phase. This package, available on my GitHub, is used to create shadow maps in urban environments. Basically, it works as follows: LIDAR (Light Detection and Ranging) data, which are remote sensing technologies that accurately measure terrain heights, are downloaded from various sources. Many countries offer this data for free, so it can be downloaded from the internet. CityShadeMapper takes this remote sensing data and generates high-resolution shadow maps, at a one-meter level, for every hour of the day and throughout the year.

In other words, with this package, we can obtain detailed information about shadow intensity or lighting in every square meter of a city, both on building rooftops and at street level where pedestrians and cyclists are. It is important to highlight that CityShadeMapper utilizes infrastructure from other essential packages for its operation, but integrates them in a way that any user, citizen, or municipal administrator can easily generate shadow maps.

This tool is particularly useful in the context of climate change, as it allows us to identify areas with a lack of shade, which leads to high temperatures. I live in Seville, a very hot city in summer, and every year we face the issue and debate about the lack of shade. That is why developing this package has been a goal for us for years, as we believe it can be very useful. In fact, I know that some municipalities are already using it to improve urban planning and address the problem of shade deficiency. The CityShadeMapper package excites us a lot because we consider it a fundamental tool for climate change adaptation, not only in the Mediterranean but also in other parts of the world.

What was the objective and outcome of this project?

The project is still incomplete, there is a missing functionality that is in the prototype stage: the creation of shadow routes. These shadow routes work like Google Maps or a map service, but instead of taking the user on the shortest path, they guide you along a route that maximizes shade. In this way, you can walk through cooler areas during the summer and avoid the heat. Currently, we are working on finalizing this part of the project; it is something we have pending. We plan to start next month with tutorials and workshops for those interested in learning how to use it. We will publish all the information online. There is still more outreach needed, but I am confident that it will be useful, as several people have contacted me expressing their interest in starting to use it.

Would you like to mention something interesting about the R language especially related to the industry you work in?

Currently, I hold the position of university professor and researcher in ecology, and R has become by large the dominant language in our field. Therefore, it has become indispensable for research, both for doctoral students and researchers, who are learning and using it to carry out their work.

Furthermore, in the realm of teaching, I have always felt a great interest. Although I did not learn R through formal education and had to learn it on my own, I am passionate about teaching and how to provide effective instruction in programming and data analysis to individuals who are not programmers or computer scientists, who have no prior experience with code, and who may feel some fear or apprehension towards programming. However, once they start working with R, they usually love it and realize that they can acquire skills quickly. Therefore, both in teaching and research, R has become an invaluable tool.

As mentioned earlier, I also use R in all aspects of my work, whether it be creating graphics, designing slides for my classes, or preparing presentations. In summary, R is a versatile tool that I use in all stages of my academic work.

How well accepted is R in your community?

In my experience, most people get excited and really enjoy using R once they start using it. It is true that they may encounter difficulties at first, as it is their first time dealing with code, and they may feel some frustration with things like commas or typographical errors. However, in just a few hours, they usually perceive its potential and, despite the initial challenges, they become motivated to learn and explore all the possibilities that the language offers. In my role as a teacher, I teach both programming and data analysis, and I have observed that the R language, thanks to its design and the freely available teaching resources on the internet and in books, facilitates the progress and advancement of those who are starting out.

The R language is relatively easy to grasp, especially for those taking their first steps. However, when it comes to statistics and data analysis, I believe the complexity increases and more insecurities arise, requiring greater effort. Despite that, in general, the R language is very well received. In fact, there is a growing demand from students for R to be used in many universities, as they recognize its great potential.

That being said, I believe it is important to use teaching techniques that facilitate the first steps in R, especially for non-programmers. Teaching R to a computer scientist who is not afraid to open a terminal and start programming is different from teaching it to a biologist who has never written code. Therefore, I think it is essential to approach those first steps in a pedagogical way so that they can truly perceive the potential and make progress without feeling too frustrated. In my experience, the reception of R has been excellent.

What resources do you use?

I started using Posit (RStudio) since it came out on the market. I was very attentive to RBloggers and also followed the updates on Twitter. When I saw that RStudio was released, I decided to give it a try and was delighted with its functionality. Previously, I used a program called Tinn-R, but since I tried RStudio, I liked it so much that I continue to use it as my main working tool.

I also used Visual Studio for a few years, but currently, RStudio is my primary choice. In 2012, I started using GitHub. I discovered its usefulness on the internet, although it took me some time to learn how to use it fluently. Since then, I have been using it almost daily.

I became a big fan of RMarkdown many years ago and use it to create class slides, presentations, and to write papers, scientific articles, and theses. We have been using RMarkdown for years. Recently, we have also started using Quarto.

As for Tidyverse, I also use it extensively. I do not consider myself an expert in Tidyverse, as it is a broad and constantly evolving ecosystem, making it challenging to stay fully updated. However, I am in love with dplyr and dbplyr as they offer a lot of functionality for working with databases. I believe they have great potential and work excellently. Of course, I also make use of the entire ggplot ecosystem.

What would you say to anyone interested in learning R?

I would like to make a special invitation to include the teaching of programming languages to biology and environmental sciences students, as I consider it highly beneficial. Learning R provides students with numerous opportunities and opens many doors for them. Additionally, I would like to extend an invitation to everyone in Sevilla to get in touch with the Sevilla R group. We are delighted to welcome new people and share our knowledge.

Lastly, I would like to express my deep gratitude to the R Consortium for their ongoing support to local groups over the years. In Sevilla, our group has received various forms of support that have been extremely valuable in maintaining and strengthening these local communities, which I consider vital in energizing the R community as a whole. Our sincere appreciation for all the support provided.

Francisco was also interviewed in 2022, read more.


How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups around the world organize, share information and support each other. We have given grants over the past four years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute. We are now accepting applications!

Comparing Analysis Method Implementations in Software (CAMIS)

By Blog

CAMIS is a PHUSE working group in collaboration with PSI and the R consortium. Initially, the repository contains R and SAS analysis result comparisons, however, the team hopes to extend to other software/languages in the near future. Their white paper will soon be available on the website. Please help the CAMIS team build a high-quality and comprehensive repository. Learn from the PHUSE WG below!


Are you trying to replicate results using different software/languages and struggling to find out why you can’t match the results? 

Check out the CAMIS repository! https://psiaims.github.io/CAMIS/

The CAMIS repository stores documentation detailing the reasons for observed differences when performing statistical analysis in SAS and R. The repository is housed on GitHub and will be populated through open-source community contributions.

Differences between software could be due to different default and available options, including the methods being used.  By documenting these known differences in a repository, we aim to reduce time-consuming efforts within the community, where multiple people are investigating the same issues.  If you find an issue not already investigated, please log an Issue in GitHub.  If you have time to investigate and document the reason for the issue, then please submit a pull request with the new content in a quarto file. Details of how to contribute can be found on the website.

R Consortium-Funded R Package “matter” Helps Solve Larger-Than-Memory Issues

By Blog

Have you ever run into the problem of trying to write a vector or matrix that R cannot store? Dr. Kylie Bemis, faculty in the Khoury College of Computer Sciences at Northeastern University, ran into this problem during graduate school and wrote a package called matter that solves it. Development for matter was supported by a grant from the R Consortium. 

Dr. Bemis holds a B.S. degree in Statistics and Mathematics, an M.S. degree in Applied Statistics, and a Ph.D. in Statistics from Purdue University. She’s run the Boston Marathon twice and has won numerous academic awards including the John M. Chambers Statistical Software Award from the American Statistical Association.


Dr. Bemis is currently working on providing support for matter and better handling of larger data sets and sparse non-uniform signals. Sparse data in mass spectrometry requires handling data that is zero or does not exist at all. The goal is to better interpolate or resample that data. 

In particular, by providing support for non-uniform signal data, matter will be able to provide a back end to mass spectrometry imaging data. But working with large files is applicable in a lot of domains, covering other kinds of hyperspectral data. It is a problem in digital signal processing without many solutions.

What is matter and what does it do? What problem are you solving?

matter is an R package designed for rapid prototyping of new statistical methods when working with larger-than-memory datasets on disk. It provides memory-efficient reading, writing, and manipulation of structured binary data on disk as vectors, matrices, arrays, lists, and data frames. 

Data sets might be larger than available memory. matter is completely file based and does its interpretation on the fly. matter aims to offer strict control over memory and maximum flexibility with file-based data structures, so it can be easily adapted to domain-specific file formats, including user-customized file formats.

matter is done with most of its major milestones and is currently on version 2.2. To download and get started now, see: http://bioconductor.org/packages/matter/. The matter 2 User Guide is here: http://bioconductor.org/packages/release/bioc/vignettes/matter/inst/doc/matter-2-guide.html 

What type of formats are you extending matter to?

Technically, we’re not extending to formats but rather improving support to existing formats. Improving sparse matrix support. We do have sparse matrices in matter, but it’s not easy to work with them as dense matrices. The main idea of matter is to work with LTR matrices without loading them into memory. We have a little of that with sparse, but it’s written in R so it’s not the fastest in matter. The dense matrix is written in C and C++ so it’s efficient. It also means that we can use the alt-rep framework that R has introduced. You can have something that looks like an ordinary R matrix or array, and in the background, it’s supported by a matter matrix or array.

A few packages make use of this, and it’s something that we are working on and improving for the dense matrices. We can’t do that yet with the sparse matrices because the way alt-rep works is through the C layer of R. We have to implement alt-rep through that level and since the sparse matrix representation is in R, we can’t currently have a sparse matrix alt-rep. That’s another reason that we want to have a sparse matrix in C and C++. Not only will it become faster, but as alt-rep becomes more mature and more efficient, we can use alt-rep with sparse matrices. The main thing that comes out of that is to hopefully use matter matrices in places that you wouldn’t normally get to use them because with an alt-rep object, the function doesn’t have to be aware that this isn’t a regular R matrix or array.

Beyond that, I want to improve R data frames and string support. Right now, we have a prototype data frame in the package, but that’s something that people will be interested in down the line. So, looking at the best way to make an out-of-memory data frame more mature and fully featured now is another thing that we are working on now. Lastly, string support, we have some built-in support for string. It’s not the most intuitive to use, and I’m not entirely sure how useful that will be for some people. But, based on the way that matter is built to be flexible, I realized that is something that we could do. I wanted to sort of explore what we could do in terms of storing string and character vectors in terms of reading from files. One of the bigger challenges going forward is that it’s not going to be as efficient as the others based on its nature. If you have a character vector, each string might be of a different length. And so, with a lot of different types of formats, we are assuming we have an idea of what the file structure looks like. Where, with strings that won’t be the case. So we have to start by parsing the document and figure out what we are breaking on. Whether it’s new lines or other methods.

How exactly does matter allow the use of such large files?

The way that matter works with these large files is kind of simple. Any programming language (R, C, C++, Java) will have a function somewhere that allows you to read some small chunk of a file, whether text or binary. What matter does to allow us to work with these files is that it calls a C++ function that reads a small chunk of a file. Where the magic happens is that we assume that we have a very large matrix or array and the column in that matrix or section in an array might come from different parts of a file or different parts of multiple files. That’s kinda where matter comes in. matter stores a blueprint or a dictionary of where the different columns are stored in that file, maps them to that location in the matrix or array, and depending on what part of the matrix we are accessing, matter figures out what part of the file or different files to read and figures out the most efficient way to read those so it does not do too many reads (since that’s the slowest part of the process).

matter goes and figures this out, reads it into memory, rearranges it if needed so it’s in the desired shape, and returns it to you as an ordinary in-memory matrix or array.

A lot of packages like these use this kind of thing. They use memory mapping, which maps directly onto some big file. Operating systems are really good and efficient at doing that sort of thing. The problem we have is that a lot of time our data comes from multiple files and different places in the file. So, we needed something more flexible. The main thing I wanted to do was figure out how to map between where the different columns were based on where they were in memory and how I wanted to interact with them as a programmer. That’s the main thing that matter does.

matter is hosted on Bioconductor, but other sectors (such as sentiment analysis) also use large datasets. How can matter be used in these applications?

Right now it is hosted on Bioconductor because we come from a Bioconductor background. My work comes from a mass spectrometry background with imaging and proteomics. So, Bioconductor was a natural home for matter and our other packages were hosted there. As a system, it made sense to host it there. There is no reason it can’t be used by other domains. 

With Bioconductor, there is no reason that it’s restricted to just Bioconductor packages. Most of the packages are based on bioinformatics itself, but they might have applications outside of bioinformatics. And matter is one of those. So, if you are working in a different area, downloading and installing isn’t that hard and not really any more difficult than downloading and installing a package from CRAN. 

So, right now we think Bioconductor is the best place to host matter, and we think people should be able to use it in most domains. The use of sparse data is more useful for mass spectrometry and proteomics, so that is why we are working on these, but we hope that it is usable across other domains.

Why did you realize you had to expand?

The main reason was that we needed more sparse matrix formats that we had as a prototype in R, but we needed it to be as fast. To do this we needed to implement it in C and C++ and make sure it’s more compatible with alt-rep and some of the other Bioconductor big data packages. A lot of that came from mass spectrometry imaging. We work with a lot of imzML formats. This is specific to mass spectrometry imaging. It has 2 files for every data set. One is an XML file which is a text file that describes the experiment and the mass spectrometry in the binary file (metadata). The second file is a binary file that has all the mass spectrometry, intensity arrays, and mass-to-charge ratio from the dataset. There are two subformats, one is sparse and one is dense. A lot of the more recent experiments and recent high mass spectrometry and spatial resolution require storing more in a sparse format. We are seeing more datasets coming in using this sparse format. I have another package called Cardinal that has matter as a backend. We didn’t want it to be slower on these large sparse data sets. So, to deal with that we needed to improve the sparse data sets (processed imzML). That was the main driver, making sure things were fast for the new data sets. Also, there is a lot of potential for sparse data and sparse matrices. That was important for me.

How did you get involved?

My background is in bioinformatics and my Ph.D. is in statistics. I started working on a collaboration with Graham Cook’s lab at Purdue. They worked on a project called DESI for mass spectrometry imaging. The idea is that with mass spectrometry, we can collect from a sample and the mass spectrometry tells us the different abundances in the sample. And with imaging mass spectrometry, we collect hundreds of thousands of mass spectrometry from across the surface of the sample. If we choose a particular chemical, we can reconstruct an image of where that chemical comes from and where it comes from, and where we are seeing it at different abundances across the sample. My Ph.D. was working on developing different statistical methods for analyzing this type of mass spectrometry imaging data, which is really interesting data because we have mass spectrometry which is very high dimensional. Then we also have this spatial component. So we have the x and y coordinates along with the image. So it’s a very interesting, complicated problem. 

During my Ph.D. I developed some statistical methods for working with this data. I also developed the Cardinal package for working with and analyzing this kind of data. One of the main difficulties with importing this data and working with it in R in the first place. That wasn’t something that existed in the first place, so that was implemented in Cardinal. Then, as these experiments got bigger, the data files got larger, and I realized I couldn’t just pull the whole thing into memory and I needed a way to work with these larger-than-memory files, especially because a lot of our users, labs in life sciences, chemists, and others, don’t necessarily have the access or know how to work on the cloud or in clusters. So a lot of work is done on PCs. 

At that point I developed the project matter, primarily to be a back-end for Cardinal, but also as something that can be used by anyone who uses larger-than-memory files without converting to other formats. One of the main ideas of matter was that we didn’t want you to convert to another format if it’s easier to work with the original format. That’s what we did with imzML. A lot of these instruments are collecting data in some proprietary type and they have to convert to imzML. We didn’t want to make them convert to another format again to use matter.  

What was your experience working with the R Consortium? Would you recommend applying for a grant to others?

The R Consortium was very understanding of delays with COVID and private issues. I’m grateful for that. The grant process provides a structure for building a complete application. That is useful, and I would recommend it to others.

I also presented at an User! conf, early on, and Bioconductor, and I am considering doing more. I think connecting with the community and presenting your ideas and getting feedback is a key part of the process.

What do you do for your day job?

I am teaching faculty at Khoury College of Computer Science at Northeastern University in Boston. My teaching duties include teaching R masters for the data science program. The classes I teach are introductory data science for our master’s students, Introduction to Coding for Master’s students, and the capstone project-based course where the students develop a project, work on it as a group, and hopefully include it in their portfolio for industry and interviewing.

What non-project hobbies do you have?

I am a writer and a runner. I write science fiction and fantasy. I haven’t been able to write too much during COVID. I have published two short stories and am working on revising a novel. One of my short stories was nominated for an Otherwise award, which is an award for LGBTQ+ authors. That is something that I am proud of. I also have started running again, and I’m trying to get faster.


About ISC Funded Projects

A major goal of the R Consortium is to strengthen and improve the infrastructure supporting the R Ecosystem. We seek to accomplish this by funding projects that will improve both technical infrastructure and social infrastructure. 

{fusen}: Simplifying Writing Packages for R Users

By Blog

The R Consortium recently talked to Sébastien Rochette, organizer of the Meetup R Nantes, about his involvement in the R community. The group hosts a mix of physical and online events targeting the full range of R users, from beginners to experts. Sebastien works for ThinkR and has recently developed a package named {fusen}. {fusen} allows developers to use a single Notebook file for writing the package documentation, code, and tests in the same place. 

Sébastien Rochette, Head of Production at ThinkR and Organizer of Meetup R Nantes


Please share your background and your involvement in the RUGS group or in the R Community.

I began using R for research and spatial data modelization 15 years ago. I started in high school and continued while I was a fisheries science researcher. Now I continue to use R every day as the Head of Production at ThinkR, where we work as consultants and teachers of R. Our work entirely revolves around R, from installing infrastructures to teaching to and certifying public and private users, or building packages and shiny applications for them. We are also certified full-service partners of Posit (RStudio)

I am also deeply invested in the open source and R community. At ThinkR, we develop many open-source R packages for our internal use, which we are happy to share with the community. We believe that building a business over an open-source project like R requires giving back to the community. I am also the organizer of Meetup R Nantes, which alternates between physical and online events. Physical events are great for networking, while online events ensure inclusivity for those unable to attend in person.

At all of our meetups, we have two presentations. One is aimed at beginners and the other is aimed at more experienced users or presenting a personal experience. This is to make sure that members of all experience levels feel included and can learn from the events.

Please share about a project you are currently working on or have worked on in the past using the R language. Goal/reason, result, anything interesting, especially related to the industry you work in?

I am currently working on a project called {fusen}, an open-source package designed to make package building easier. While developing packages, developers usually prioritize writing code and overlook documentation. This can cause problems when reusing or sharing code with colleagues or the community.

With {fusen}, we start by opening a template notebook file (RMarkdown or Quarto). This encourages writing about the code’s purpose in plain text as it is readable. Then, the user writes the code and an example within the R Markdown file. The script example can also be used to write a unit test. The RMarkdown template proposed by {fusen} encourages people to document everything they have in mind regarding their project. This reduces the risk of forgetting their goal the next time they code, but also sets the perfect basis for sharing their work: documentation and examples are there already. The {fusen} notebook template has different script parts for code, function, and test. {fusen} then inflates the file into a full package, with the documentation and examples in the vignette, the functions in the R directory, and the unit tests in the tests directory. This approach eliminates worries about the package structure and allows developers to write everything in a single RMarkdown file.

Figure 1. The classical code of a project is written in one or multiple notebook files, called flat files, which {fusen} inflates as a full R package. A complementary file called “dev_history.Rmd” contains steps for package level documentation, and a list of tools for collaborating and sharing the project. See {fusen} documentation for more information.

We had two goals in mind when developing {fusen}. The first is that developers no longer need to open multiple files to write their packages and can focus on their package’s purpose. The second is that it encourages developers to document and test their work while making it easier for new developers to write R packages.

What resources/techniques do/did you use? (Posit (RStudio), Github, Tidyverse, etc.)

As RStudio-certified resellers, we prefer working with Posit (previously RStudio). It is much easier for beginners to use, which makes it an excellent tool for teaching R. The {fusen} project is available on GitHub since it is an open-source package, making it easily accessible to the community.

Within the {fusen} package, I use base R code, as I have been writing R code long ago before Tidyverse arrived. However, from the Tidyverse, I use the ‘tibble’ format to benefit from its consistency in data frame structures. {fusen} also relies on a package called {parsermd}, which reads and parses R Markdown files as ‘tibbles’.

Our package {attachment} also has an important role in {fusen} in reducing the difficulty of writing R packages. It helps declare all package dependencies. {attachment} reads the ‘importForm’ declared and all different ways of calling a package in your code and fills the ‘Description’ file in the correct place as imports or suggests, depending on where the dependency was declared in the package. This ensures beginners do not need to worry about dependencies or open the ‘Description’ file to write it by hand.

Because sharing and collaborating is central in the philosophy of {fusen}, I also encourage developers to use ‘git’ platforms like GitLab or GitHub, and use packages like {pkgdown}, {covr} and {testthat} to share documentation and the quality of their work.

Is this an ongoing project? Please share any details or CTA for who should get involved!

Yes, it is indeed an ongoing project that we use every day in our work to build packages for our customers. Although I initially created {fusen} to assist beginners in writing packages, we have discovered that it is equally valuable for experts, as it allows you to write the entire package in a single file (or groups of files when needed). This also makes it much easier to reuse and re-factor your code: if you split your package, you only have one file to worry about.

We are constantly improving {fusen}, and my colleague Yohann is currently working on adding a new feature allowing us to ‘inflate’ multiple flat files at the same time. With big projects, like {golem} applications, developers are used to separate family of functions in different flat files. This coming functionality will allow you to inflate all at once. 

My call to action for the community is to give this package a try. Regardless of your level of expertise in R programming, {fusen} can make package writing much more accessible to you. We have been using it for testing purposes, but over time, we have realized that it is also quite convenient for experts. For instance, you can try {fusen} to add new functionalities to your already existing package, and it will work smoothly. You won’t have to change anything in the existing structure.

I would also like to encourage beginners to try {fusen}, as they are often hesitant to write packages. Writing a package can add complexity to your code due to CRAN-specific checks and other requirements, even if you do not want to send it on CRAN. However, {fusen} can simplify this process for you. We also have another package that is not on CRAN called {checkhelper}, which will help you identify some sources of package building problems. And, if finally, you decide to share your work on CRAN, you can follow our ‘Prepare for CRAN’ guide that we regularly update.

Note that {fusen} has a teaching flat template included, which shows a full example of a working package that you can use to explore its possibilities. You will see that you can build your package locally and have its documentation shared as a web page on GitHub in one command.


How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups around the world organize, share information and support each other. We have given grants over the past four years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute. We are now accepting applications!

Welcome to our newest member Parexel!

By Announcement, Blog, News

The R Consortium, a Linux Foundation project supporting the R Foundation and R community, today announced that Parexel has joined the R Consortium as a Silver Member. 

Parexel is a clinical research organization providing Phase I to IV clinical development services. Parexel uses R for a wide range of internal decision-making and regulatory interactions. They have a team of more than 21,000 global professionals collaborating with biopharmaceutical leaders, emerging innovators delivering clinical trials worldwide.

 “We are thrilled to welcome Parexel as a member of the R Consortium and connect more closely with the rest of the R Consortium community,” said Joseph Rickert, R Consortium Board Chair, and Posit’s R Community Ambassador. “Parexel brings expertise in using R as a tool to perform analyses as part of clinical trials to their sponsors. We look forward to collaborating with the Parexel team to expand the use of R in drug development.”

“Upskilling our personnel in the use of R, coupled with enhancing our computing environment to incorporate R holistically with our other, more established software options, provides Parexel with additional tools in performing analyses of clinical trial data and opens up new avenues to innovative techniques,” said Michael Cartwright, Associate Biostatistics Director, Parexel.

About The R Consortium 

The R Consortium is a 501(c)6 nonprofit organization and Linux Foundation project dedicated to the support and growth of the R user community. The R Consortium provides support to the R Foundation and to the greater R Community for projects that assist R package developers, provide documentation and training, facilitate the growth of the R Community and promote the use of the R language.

About Linux Foundation 

Founded in 2000, the Linux Foundation is supported by more than 1,000 members and is the world’s leading home for collaboration on open source software, open standards, open data, and open hardware. Linux Foundation projects like Linux, Kubernetes, Node.js and more are considered critical to the development of the world’s most important infrastructure. Its development methodology leverages established best practices and addresses the needs of contributors, users and solution providers to create sustainable models for open collaboration. For more information, please visit us at linuxfoundation.org

Better Understanding Your Tools Choices with Online Book HTTP Testing in R

By Blog

Working with internet sources can be a tricky subject. In order to deal with modern packages and workflows, a resource for testing online packages can be very useful. R Consortium talks to Maëlle Salmon about the online book, HTTP Testing in R, which she co-authored with Scott Chamberlain.

Photo of Maëlle: ©Photo Julie Noury Soyer 


Maëlle Salmon is a R(esearch) Software Engineer, part-time with rOpenSci. She regularly contracts for other organizations, often to develop or strengthen R packages. She also enjoys blogging about R, in particular R package development (on her personal blog, rOpenSci blog, and the R-hub blog). She is a volunteer software review editor for rOpenSci and a R-Ladies Global team member. She lives in Nancy, France.

Why is it so important to have this reference book?

For people writing packages that interact with internet sources, testing a package is always a challenge: for instance, you don’t want your tests to burden the online resource, you can’t trigger an API error to test how your package behaves in that case, etc. Before this book, there was no central place for learning about the tools one can use to help this process: vcr and webmockr, httptest, httptest2, webfakes. Also, since we compare the tools used in HTTP testing, it allows people to make a choice.

Who should use the HTTP Testing in R book? Is it aimed at developers, or does it have applications for individual users?

It is aimed at people writing packages, but these days there are many people who are writing packages as modern tools and guidance lower the barrier to package development. 

How responsive are you to new content and packages that may affect your book?

If someone opens a new issue in the book repository I try to respond quickly. I can add content based on these issues, if or when I have time. I also try to follow the changes to the packages that are explained in the book. Since the first update of the book, I added a chapter about httptest2, for example, a package for testing packages that use the more modern httr2 rather than httr. 

What format and languages is the book available in?

It is an online book. There is a website, a pdf book, and an ePub version. There is no current aim to publish in other languages. I am not against the idea, but the issue is that you would have to both translate and maintain the various versions. At this point, I do not think it would be a good idea to do this if there wasn’t a plan to maintain it in the language(s) that it has been translated. 

However, because of the work I’ve been doing for rOpenSci multilingual publishing project, including some funded by the R Consortium, I’m more open to the idea! For instance, we have a package for rendering a Quarto book in several languages https://docs.ropensci.org/babelquarto/ and another one for getting an automated translation via DeepL API, that humans can build upon: https://docs.ropensci.org/babeldown/ 

How did you get involved?

The first version of the online book was created by my former rOpenSci colleague Scott Chamberlain. He started the book to document his packages vcr and webmockr, and I was interested in that, as it related to my work maintaining rOpenSci dev guide. I applied for funding to dedicate more time to the project and decided it would also be interesting to cover other packages that provide the same functionalities. Scott reviewed and contributed to all the new chapters.

What do you do for your day job?

I am a part-time research software engineer for rOpenSci, which supports open and reproducible science through tooling and community building. I also have different missions. For instance, I received funding from R Consortium to work on developer advocacy for R-hub a few years ago, and more recently consolidating R Ladies global guidance for advisors and wisdom, an online book. My work is often related to package development.

What was your experience working with the R Consortium? Would you recommend applying for a grant to others?

I’ve now received funding from the R Consortium several times, for which I am very grateful. Getting funding for an idea really helps with being able to focus on its execution!

I’d recommend getting feedback from collaborators or less close contacts, especially those who got proposals funded or those who’d be the target audience of a proposal, as they might help you clarify your ideas. I’m indebted to all those who reviewed my own proposals!


About ISC Funded Projects

A major goal of the R Consortium is to strengthen and improve the infrastructure supporting the R Ecosystem. We seek to accomplish this by funding projects that will improve both technical infrastructure and social infrastructure. 

The 15th Annual R/Finance Conference 2023

By Blog

The fifteenth annual R/Finance conference for applied finance using R will be held on May 19 and 20, 2023 in Chicago, IL, at the University of Illinois at Chicago. 

The conference brings together experienced R users in the field to discuss quantitative finance – covering R (or Python or Julia!), portfolio construction, statistics, and more! Some of the topics that will be covered include advanced risk tools, decentralized finance, econometrics, high-performance computing, market microstructure, portfolio management, and time series analysis. 

All will be discussed within the context of using R and other programming languages as primary tools for financial model development, portfolio construction, risk management, and trading.

The keynote speakers for this year’s conference are: 

Dr. Chandni Bhan

Dr. Chandni Bhan serves as the Global Head of Quantitative Research and Model Risk at Morgan Stanley Investment Management (MSIM).

Dr. Carlos Carvalho

Carlos Carvalho is a La Quinta Centennial Professor of statistics at The University of Texas McCombs School of Business. His research focuses on Bayesian statistics in high-dimensional problems with applications ranging from finance to genetics. 

Dr. Peter Cotton

Peter currently leads data science for Intech Investments where he works on the theory and practice of portfolio construction. Peter is also the creator and maintainer of various things “microprediction” (packages, books and live probability exchange) and the author of a book on the topic published by MIT Press.

Dr. Sam Savage

Dr. Sam L. Savage is the author of The Flaw of Averages: Why We Underestimate Risk in the Face of Uncertainty (John Wiley & Sons, 2009, 2012), and Chancification: How to Fix the Flaw of Averages (2022). He is a cofounder of the Discipline of Probability Management, Executive Director of ProbabilityManagement.org, a 501(c)(3) nonprofit devoted to the communication and calculation of uncertainty, and is the inventor of the SIP (Stochastic Information Packet) a standardized data structure for the communication of uncertainty.

{riskassessment} app from the R Validation Hub voted best Shiny app at shinyConf 2023! 🎉

By Blog, News

The {riskassessment} app, presented by Aaron Clark from the R Validation Hub, was voted best Shiny app at shinyConf 2023. Congratulations!

The 2nd Annual Shiny Conference was held March 15-17, 2023. It was all virtual with over 4k global registrants. Aaron Clark, from the R Validation Hub Executive Committee, presented on the {riskassessment} app.

The app features several key features such as providing a framework to quantify risk via metrics that evaluate package dev best practices, code documentation, community engagement, and sustainability. The app aims to be a platform for quality assessment within organizations that operate in regulated industries but can be leveraged in various contexts. 

More Information

The {riskassessment} app extends the functionality of riskmetric by allowing the reviewer to: 

  • Analyze riskmetric output without the need to write code in R
  • Categorize a package with an overall assessment (i.e., low, medium, or high risk) based on subjective opinions or after tabulating user(s) consensus after the evaluating metric output
  • Download static reports with the package risk, metrics outputs, review comments, and more
  • Store assessments in a database for future viewing and historical backup
  • User authentication with admin roles to manage users and metric weighting
Here is the {riskassesment} demo app’s example dashboard. Several packages have been uploaded and run to evaluate shiny apps and if they have any risks. 

Note: Development of both riskassessment and riskmetric was made possible thanks to the R Validation Hub, a collaboration to support the adoption of R within a biopharmaceutical regulatory setting.

For more information, the talk is currently available on Appsilon’s Youtube channel

Congratulations! 🎉

Teaching and Translating R Resources in Nepal

By Blog

Binod Jung Bogati, the organizer of the R User Group Nepal, discussed his experience of fostering the budding R community in Nepal. He shared the details of a recent beginner two-day workshop and some useful techniques for organizing events. Besides using R for his work in validating clinical trial programming, Binod is actively involved in translating R resources into Nepalese.  

Binod Jung Bogati, Data Analyst / Statistical Programmer at Nimble Clinical Research 


Please share about your background and involvement with the RUGS group.

I work as a Data Analyst/Statistical Programmer at a partner company, Nimble Clinical Research, which is based in the US. My work involves clinical trial programming, and we use SAS to develop CDISC-compliant SDTM/ADaM datasets including generating Tables, Listings, and Figures (TLF). These datasets and documents are used for submission to regulatory bodies like FDA in the US. Currently, we have started using R for validating these datasets (SDTM/ADAM) and TLFs where ever possible which was previously done in SAS. Additionally, our partner company has also built a tool called Nimble Workspace (R-based Web Data Visualization & Reporting) to generate tables, listings, and figures from clinical data which will make our team more efficient. 

Regarding my background, I have a Bachelor’s in Computer Science and IT from Tribhuvan University. I started using R during my college group project. We felt that there was a lack of guidance and assistance for using R which was a big issue. So we (along with Diwash Shrestha) came up with the idea of starting this group where we can share resources and learn from each other. We conducted a lot of events before the pandemic. 

On a personal level, I am also conducting sessions in R in my local language and I have also contributed to translating R resources into Nepalese. For my next project, I applied to volunteer at OAK-SDTM in the package development for automating SDTM generation and generating raw synthetic data.

Disclaimer: All logos and trademarks mentioned or displayed on our website are the property of their respective owners.

Can you share what the R community is like in Nepal? 

R is fairly new in Nepal, and it’s currently being used more in the public health and research sector. It is also being used in academia for teaching. Most of the members of the R community are students and a few companies like the one I work for are using it at a professional level. It is a diverse group of people, but as far as my knowledge goes, the use of R is more dominant in health and academia. 

You recently had a Meetup event on Overview of R programming, can you share more on the topic covered? Why this topic? 

We conducted a two-day event on the Overview of R programming and Getting Started with R on the 1st and 2nd of April. It was a beginner-friendly session, we had diverse participants from different fields like engineering, health, IT, computing, and many others. 

On the first day, we showcased two of our previous projects. The first project was about vaccine updates in Nepal, where we published government data on Twitter with visualization and daily statistics. We showcased how we scrapped the pdf data and published it into Twitter with visualization and daily stat. 

Disclaimer: All logos and trademarks mentioned or displayed on our website are the property of their respective owners.

The second project was a recent project we are working on about census data. In this project, we used census data published by the Central Bureau of Statistics of Nepal to create visualizations and dashboards with the help of R. After that we had a Q&A session.

On the second day, we had a hands-on workshop for the participants. We used census data to create visualizations, and we gave a 5-minute demo, which they followed in the next five minutes. If they had any issues, we helped them out. It was an interactive session, and we received really great feedback for this session. We are now planning another event soon.

These events aim to help beginners learn about the tools and their use cases.

Any techniques you recommend using for planning for or during the event? (Github, zoom, other) Can these techniques be used to make your group more inclusive to people that are unable to attend physical events in the future?   

For this event, we used several tools, including Google Meet (or Microsoft Teams), Google Slides, and Posit Cloud. Google Slides proved to be an excellent tool for sharing presentation materials with attendees. We also used Google Forms for gathering feedback from participants after the event, which helps us tailor future events according to their suggestions.

GitHub is another tool we use, although only some of our participants are familiar with it. We primarily use it to publish slides and other materials.

We used the Posit Cloud to share all relevant materials during this event. It proved to be extremely helpful, particularly during hands-on workshops. In the past, we’ve faced difficulties with installing packages on participants’ systems, but with Posit Cloud, we avoided this issue entirely. For this reason, we highly recommend it for hands-on workshops.

Overall, we strive to ensure inclusivity for all participants, regardless of their ability to attend physical events. By utilizing tools like Posit Cloud and Google Forms, we can create a more inclusive experience for all attendees.


How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups around the world organize, share information and support each other. We have given grants over the past four years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute. We are now accepting applications!

Sketch Package looks to add JavaScript to R packages

By Blog

Jackson Kwok, Infrastructure Steering Committee (ISC) recipient for the 2020 cycle, discusses the project Sketch and its implementation in 2021 as an R Package in the CRAN environment. The Sketch package was developed with the goal of having a package that translates R to Javascript. With this experience, Jackson then later developed the package, Animate, which requires no prior knowledge of JavaScript. Jackson discusses Sketch’s origins in JavaScript (JS) and data visualization, its potential applications, and the level of expertise required to utilize the package effectively.


RC: Where did you come up with the idea of a program that translates R into JS?

JK: It started with a GitHub issue by Jonathon Carroll at the rOpenSci R OzUnconference 2017. The idea was to use the JS library P5 for visualization in R, and a few of us came together to work on a prototype package called “realtime”. After the hackathon, I continued to pursue the idea. As I was reading the book The Nature of Code, I realized it took only a few rewriting rules to translate JS to syntactically correct R code. After some manual experiments, I wrote the package to test the idea (in the reverse direction transpiling R to JS) and it worked! The package got its name because P5 refers to the digital drawings as Sketches; I named the folder ‘Sketch’ when I studied P5 and later used the same folder for the package. There is nothing specific to P5 that makes the conversion work. I tried other JS libraries and it also worked very well, so I refactored the package into a general purpose R-to-JS transpiler. 

Examples are of Physics engine and 3D models on Sketch

RC: How can I use Sketch in a shiny app?

JK: On the documentation website, there is a page on how to use Sketch on a Shiny app. Once you develop your R script and transpile it into JS, you can include that in the Shiny app as usual.

RC: How much JS knowledge do you need to use this package?

JK: A little bit to get started. You don’t need to know the syntax, but you need to know how JS works because it works differently from R. Vector and List in R correspond roughly to Array and Object in JS. Both of them are passed by reference rather than passed by value, and Array uses 0-based indexing. That is what is needed to get started. The Pitfalls section of the documentation has the complete list.

There are some tutorials in the Tutorial section that users can follow with no knowledge of JS. They will be able to pick up some JS along the way. What Sketch does is to let users call JS libraries like R packages. You still need to learn the commands in the package to use it, just like any new R package. 

RC: Can I use different R Packages with Sketch? (aka, could I use rQTL to analyze and visualize the data?)

JK: Yes. That was one of the key milestones in the 2nd proposal of this project. Sketch uses WebSocket to establish a connection between R and the browser so that you can use any R package to perform some calculations and then pass the results to the browser. The connection is live, so you can also perform operations in the browser and get the corresponding data back to R. It is very handy if you need some interaction that is very hard to express in code. For instance, if you need to select something using a lasso tool (irregular shape) it will be much easier. You just draw it rather than have to figure out the coordinates.  

If you choose to use an R package with Sketch, then the application will no longer be stand-alone. A targeted use case is to let users add a customizable domain-specific interface to an existing analysis performed using many R packages. If you don’t use any R package at all, then the Sketch application can be deployed as a stand-alone website.

RC: What is the progress on Sketch?

JK: The second proposal is now complete with many new features added. Among other things, there is the support of R6-style of OOP which helps you structure larger programs, the WebSocket which lets you do bidirectional updates, and a knitr engine for RMD and publish support. On the package side, I do not see substantial changes from here on; the later updates will mostly be fixes and patches. 

One interesting direction that I have been looking into is to transpile R to AssemblyScript, which in turn can be compiled into WebAssembly. I have done some preliminary studies, and it seems this path is viable. AssmeblyScript is still maturing, so this will not be worked into Sketch for now, but I will keep an eye on it.

The next thing Sketch looks to expand on is use cases. Lately, I found out it is not difficult to use Sketch to create a web-based graphics device for animated plots in R, so I have been working on it, and the results are encouraging. Another thing I discovered is that it is quite easy to go from ggplot2 to rayshader to VR, and it works well with 3d histograms on maps. Also, another application that came by surprise is that with Sketch, you can control Excel from R. Excel has a JavaScript API and supports WebSocket protocol. As Sketch transpiles R to JS and speaks WebSocket, it turns out controlling Excel from R just works out fine!

I discover new possibilities with Sketch every now and then, and that makes me realize this work is really a good step in strengthening the R-Web integration and expanding the R application landscape. 

The ISC project was delivered in early 2021. Since its release date, we now have a package called Animate, on CRAN. It would have been difficult to build without the Sketch package since it uses a heavy amount of JavaScript. Animate provides animated diagrams beyond XY plots. Unlike Sketch which needs the user to have some background knowledge of integrating JavaScript, with Animate, users can manipulate graphical elements on the screen using native R commands without knowing the animations are powered by JavaScript.

Lorenz system [code] on Animate
 Maze generation [code] [tutorial] on Animate

RC: Has Sketch expanded the scope of R visualizations for the R community?  

JK: It has definitely expanded the scope for JavaScript and R. The Sketch website features a showcase page showing some of the new possibilities, including more advanced 3D model animations and agent-based visualizations. In general, Sketch is well-suited for cases in which you need to use a JavaScript library beyond direct API calls or where the API has an imperative style. 

Looking back after two years of completing the package, I think Sketch succeeded in producing  executable transpiled JavaScript and exploring how far one can control JavaScript with R, but it fell short in abstracting away the JavaScript side of things. These shortcomings are addressed by Animate. 

RC: How did you get involved?

JK: I first got into interactive visualizations back in 2016 after seeing a few great talks and demos online, e.g. the Invention on Principle, Stop Drawing Dead Fish by Bret Victor, the Parable of the polygons by Vi Hart and Nicky Case and the ConvNetJS by Andrej Karpathy. They got me started learning JS, but I am an R user at heart. I wanted to make interactive visualization in R, then the R OzUnconference came, and you know the rest. Looking back, it has been quite a journey picking up the skills needed to deliver this project.

I found out about the R consortium ISC program when the research fund that I was under ran out early. I reckoned it was a great opportunity to contribute to the R community and get some financial support, so I put in an application with Kate Saunders, a good friend who loves data visualization and R programming with expertise in spatial statistics – one of the key areas I want Sketch to develop into. 

RC: What was your experience working with the R Consortium? Would you recommend applying for a grant to others?

JK: I had a great experience and highly recommend it to others looking to develop packages that can help the R community. I personally learned a lot about the grant application process, like writing a proposal, arguing for the benefits of the project, and doing deeper research on what you want to solve and how your solution can be successful.

The process was also great for picking up medium- to long-term software planning, like structuring the development with milestones, which I previously had no experience with. Overall it has been a great learning and rewarding experience! What helped me plan the proposal were the guidelines provided by the R Consortium. I was able to take bigger infrastructure problems and group them into small solvable groups of tasks. 

RC: What do you do for your day job?

JK: I am finishing up my postdoc at the St. Vincent’s Institute of Medical Research. My research group is working on a translational project called BRAIx. It’s about transforming breast cancer screening in Australia using AI. In the project, I have been using Sketch to create customized data visualization tools for data and model diagnostics.


About ISC Funded Projects

A major goal of the R Consortium is to strengthen and improve the infrastructure supporting the R Ecosystem. We seek to accomplish this by funding projects that will improve both technical infrastructure and social infrastructure.