The R Consortium’s working group R Submissions Working Group is spearheading an innovative approach to clinical trial data sharing, according to a feature in Nature. This initiative, led by Eric Nantz, a statistician at Eli Lilly in Indianapolis, Indiana, involves a pilot project with the US Food and Drug Administration (FDA). Sharing clinical trial data traditionally requires each scientist to install custom computational dashboards, a cumbersome and error-prone process.
Nantz elaborates on the benefits of using webR and WebAssembly in this context: “Using WebAssembly, [it] will minimize, from the reviewer’s perspective, many of the steps that they had to take to get the application running on their machines.” This technology not only simplifies the data sharing process but also has the potential to accelerate drug approval timelines and enhance collaborative research across various fields.
For more details, read the full article on Nature’s website: Read the full article here (Paid subscription required).
Last year, the R Consortium had a conversation with Amal Tlili, the co-organizer of the Tunis R User Group, regarding the Use of R for Marketing and CRM in Tunisia. This year, Amal Boukteb and Hedia Tnani spoke to the R Consortium about the use of R for bioinformatics research in Tunisia and discussed the group’s efforts to bridge the gap between academia and industry. The Tunis R User Group hosts engaging virtual events to connect R enthusiasts across the MENA region and Worldwide. Their events promote the use of R and foster knowledge and skill development in data science and bioinformatics.
Amal Boukteb is a PhD student at the National Institute for Agricultural Research of Tunisia (INRAT). She holds master’s degrees in Molecular Genetics and Biostatistics. Her PhD project focuses on Orobanche foetida, a parasitic plant threatening faba bean crops. She analyzed O. foetida genetic diversity in Tunisia with RADseq and studied faba bean gene expression during this parasitic plant attack using RNA-seq. With a passion for integrating bioinformatics and Plant biology, Amal is determined to make significant contributions to the implementation of sustainable agricultural practices.
Dr. Hedia Tnani is a Staff Scientist at Lieber Institute for Brain Development (LIBD). She did a PhD in molecular biology and genetics. Her current work focuses on addressing the complex challenge of RNA degradation in postmortem brain tissue samples. She’s also the co-founder of R-Ladies Tunis and Tunis R User Group. Through the Tunis R User Group, she wants to democratize bioinformatics and data science.
in 2023. With a deep commitment to inclusivity and empowerment, they’ve dedicated themselves to breaking down barriers faced by women and individuals from low-income countries when accessing education in these cutting-edge areas. By organizing workshops tailored to these communities, they aim to provide valuable skills and knowledge and foster a more diverse and equitable future in the bioinformatics field.
Please share about your background and involvement with the RUGS group.
Amal: We are biologists, and our academic curriculum did not include any programming courses. However, with the advancement of sequencing technologies, biologists are now facing the challenge of analyzing vast amounts of genomic data. This is a significant challenge for us. For my PhD project, I was involved in RNA-Seq and RAD-Seq projects. To overcome this challenge, I attended a course on analyzing genomic data using Unix, where I met Hedia for the first time. Additionally, in the framework of my thesis project, I had the opportunity to visit the Plant Immunity Group at RIKEN Yokohama in Japan for an internship. While there, I learned a lot from the talented scientists and their exciting research in bioinformatics.
When it comes to learning R programming for biologists, no specific courses are available. The only courses that exist are general ones. So, to overcome this gap, I started learning by myself. I attempted to understand the concepts by reading through error messages, package tutorials, and watching YouTube tutorials.
We realized we faced the same challenge after discussing this issue with our colleagues. We have genomic data that we need to analyze, but the available courses are located outside of Tunisia, primarily in Europe. Unfortunately, we lack the financial support to attend these courses. Additionally, obtaining a student visa for a temporary stay to attend such courses is a complex process. This challenge is not only unique to Tunisians but also a struggle for Africans and many biologists from middle and low-income countries. Our Tunis R user group aims to help others overcome this challenge and bridge this gap.
Hedia: I studied agronomy first and then pursued a master’s degree in plant breeding from Spain. Later, I completed my PhD in genetics. I did not know programming or R during my studies in Tunisia and Spain. However, when I started my postdoc at the International Rice Research Institute (IRRI) in the Philippines, especially when I first faced analyzing genomics data, I felt out of my depth. With no programming experience, learning R seemed like a mountain too steep to climb. This is a familiar story for many biologists transitioning from wet to dry labs, where code replaces beakers. Despite the daunting challenge, I persevered and taught myself R; eventually, it became an invaluable tool for my research. I’m also thankful to the great mentors I had at IRRI who helped me accelerate my learning curve. My journey wasn’t easy, but it was incredibly rewarding.
Learning bioinformatics can be challenging, especially in regions like Tunisia where resources are scarce and training abroad is so costly. Moreover, the need for bioinformatics training to solve biological problems has left many highly skilled biologists struggling to find a job in their field. Recognizing these obstacles, we formed a supportive community to facilitate collective learning and growth in bioinformatics and related fields such as data science and artificial intelligence.
Our community is a friendly, inclusive, and welcoming space for anyone passionate about bioinformatics, data science, artificial intelligence, and beyond. We’re all about growing together and learning from each other in a supportive environment. Whether you’re just starting out or have lots of experience, we encourage you to dive in, ask questions, and share your insights. We all rise by lifting others. Don’t worry about asking the “wrong” question. Every question is a chance to understand and learn something new. Come join us and be part of our journey of discovery and growth. We can’t wait to learn with you!
Can you share what the R community is like in Tunisia?
Hedia: In Tunisia, programming is mainly used in the industry, but it is not widely taught in the curriculum for biologists. This creates a gap between what is taught in the academic courses and what is required in the industry. As a result, individuals are expected to possess programming skills when they work in the industry. Still, they may not have been able to learn programming during their academic courses. This gap must be addressed to better prepare individuals for the job market.
Can you please update us about the group’s recent activities?
Amal: First, it is important to mention that Arabic is our native language in Tunisia. However, French is the predominant teaching language in many subjects, including biology and informatics. Despite this, we have decided to conduct our workshop in English for the Tunis user group for two main reasons. Firstly, we aim to bridge the gap between the academic skills acquired in French and English resources. Secondly, by using English as our teaching language, we can reach a broader audience of scientists who share our needs.
We decided to allow us the flexibility to choose speakers without language barriers. Our main goal is to reach a broad audience worldwide. During our workshop, we noticed participants worldwide, not just Tunisians. This is very important to us. We conducted workshops for biologists, such as the Genome-Wide Association Studies (GWAS) workshop, and we already have 5k views on our YouTube channel. It is interesting to see that people are very interested in our workshops. We also had the opportunity to collaborate with highly qualified researchers in their respective fields. Within our community, we were privileged to learn from Pr. Emerson Del Ponte generously shared his expertise using R for Plant Disease Epidemiology.
We aim not only to cover biological subjects but also those related to artificial intelligence. Recently, we conducted two successful workshops on Building a Chatbot with OpenAI, Shiny and R, and Bioinformatics Analysis using Chatlize and ChatGPT. We strive to have a balance between biological and AI-related subjects to make the experience easier for our participants with the help of artificial intelligence.
What trends do you currently see in R language?
Hedia: In bioinformatics, there is a growing trend towards single-cell and spatial transcriptomics. Our latest event was an introduction to single-cell RNA-seq analysis. Additionally, packages based on OpenAI API are increasingly being used. For instance, many of those packages can be used by people who lack coding skills. This is particularly helpful because not all biologists possess coding skills, and it makes their work easier. Another trend we have noticed is using Quarto instead of R Markdown. Shiny is also gaining popularity in this field.
We have been receiving a lot of queries about bioinformatics workshops lately, particularly because they offer a diverse range of events, such as user groups. However, it can be challenging to find a specific topic. For instance, some R user groups may only hold one or two events yearly, whereas we host monthly bioinformatics events.
We value feedback from our attendees and gather suggestions from our latest events to improve our upcoming ones. Our events are designed to stay current with trends in the industry, and we often invite guest speakers to talk about relevant topics. For instance, during one of our workshops about Building a Chatbot with OpenAI, we had 200 participants whom we taught how to use R and create their chatbots. We learn from our experiences, and when we notice an interest in a particular area, we look to bring in speakers to teach on that subject.
Any techniques you recommend using for planning for or during the event? (Github, zoom, other) Can these techniques be used to make your group more inclusive to people that are unable to attend physical events in the future?
Hedia: Our organization had a sponsorship for our Zoom account, an important tool for hosting events. One of the features that we utilize is the captions option, which allows participants from all over the world to have captions in their language and helps them follow the workshop. This is particularly helpful for those who may have difficulty understanding English. We are very grateful to Appsilon for their sponsorship of our Zoom account.
Amal: Thanks to Appsilon’s sponsorship, we have accepted more participants for our events. Previously, the number of participants was limited due to the capacity of our Zoom account. However, with this sponsorship, we can now handle up to 100 participants per event. This has made it easier for us to accept more subscribers and host successful workshops. We recently had an event with over 200 participants, which was a great success.
Hedia: We provide teaching materials for our speaker sessions on GitHub. You can find all the materials on YouTube and use them to reproduce what the speaker did during their session. We are always open to questions, especially if you encounter bugs while trying to reproduce the speaker’s work. Recently, we received an email from a participant experiencing a bug, and we had a great time figuring it out together. If you have any questions or problems, feel free to ask us for help, and we’ll do our best to assist you.
Are your events online, in-person, or hybrid?
Hedia: We are considering organizing hybrid events in the future, and we are searching for funding. We only have sponsorship for our Zoom, so we need additional funds to make this happen. We plan to organize events at multiple universities across the MENA region so important speakers can be followed in person and online. Amal, who is based in Tunis, has been in contact with many universities and academic professionals in the area. We’re currently exploring the best ways to make these hybrid events a reality, ensuring a seamless and enriching experience for everyone involved. Our goal is to make these events as engaging and accessible as possible, fostering a true sense of community.
We want to organize events for online events and to provide something valuable to our community. When we meet in person, we can better understand their needs and challenges, which helps us to build and organize workshops that cater to their specific needs. Recently, Amal mentioned that some courses are not free in Tunisia, which can be a barrier for some people. Therefore, we aim to organize a free hybrid event for everyone who wants to join and learn with us. We hope to get funding for this initiative to provide this opportunity to all.
Please share about a project you are currently working on or have worked on in the past using the R language. Goal/reason, result, anything interesting, especially related to the industry you work in?
Amal: For my PhD project, I conducted research on population genomics and RNA-seq to investigate the interaction between plants and parasitic plants. Our work shed light on the genetic diversity of Orobanche foetida, a parasitic plant posing a significant threat to faba beans in Tunisia. Additionally, through RNA-seq analysis, we identified a potential target gene for developing resistant varieties of faba beans against this parasitic plant. Furthermore, I recently completed a bachelor’s degree in biostatistics, specifically focusing on Aphid diversity in Tunisia.
During my academic journey, R has been my primary tool for conducting comprehensive data analysis across all my research projects. After finishing my PhD, I aim to develop my expertise in bioinformatics further, specifically focusing on wheat genomics.
Hedia: I primarily use R as the main software for all my research projects. I am currently working on maintaining and improving a package called qsvaR. qsvaR is a tool that generates quality surrogate variable analysis for degradation correction in RNA cells. It contains functions that help remove the degradation effect in post-mortem brain tissue, making it a useful tool for generating basic data. We are currently working on a publication based on this work.
How do I Join?
R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups organize, share information, and support each other worldwide. We have given grants over the past four years, encompassing over 68,000 members in 33 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute.
The R Consortium recently spoke with Mario Annau, co-organizer of the Vienna R User Group. During the conversation, he discussed the use of R in the finance and pharmaceutical industries in Vienna. He also shared insights into the latest and upcoming trends in using R in these sectors and tips for organizing successful hybrid meetups with minimal overhead.
In September 2022, Mario Annau talked to the R Consortium about the role of the local financial industry in the Robust Vienna R Community. Recently, the R Consortium reached out to Mario for a detailed discussion about the use of R in the finance and pharmaceutical industries in Vienna. Mario shared his insights regarding the latest and upcoming trends in using R in these sectors and tips for organizing successful hybrid meetups with minimal overhead.
Mario is the Founder and CEO of Quantargo, a platform that provides professional training and consulting in data science focusing specifically on R programming. Before establishing Quantargo, Mario worked in market risk management, proprietary trading, and advanced analytics. He is an active member of the R community, having contributed to various R packages and given conference talks over several years.
Please share about your background and involvement with the RUGS group.
I became interested in R during my university studies in computer science. I earned a bachelor’s degree in software engineering and a master’s in intelligent systems or computational intelligence. During my master’s studies, I began using R. I also found out that Kurt Hornik, was at a different university in Vienna and was also using R. Together with other R core developers, he created R with its package repository and many features. Although I am not a trained statistician, I became more involved in statistics and machine learning, which are closely related. I did my master’s thesis with Kurt Hornik.
During my second thesis, I became increasingly involved with R, which led me to explore text mining and sentiment analysis with this language. This interest ultimately kick-started my career. I am proud to say that I am one of the few people who have truly benefited from using R in my professional life. Back then, using open source software in companies was uncommon, and many people preferred Matlab and other professional tools. People would often ask me who supported R and why it was free. However, I found that having this skill set was very beneficial.
The experience of using open source languages and technologies has been really helpful for me. Over the years, I have switched jobs and worked for different employers, but the knowledge I gained has always been useful in other settings and companies. Unlike bigger corporations, I never had to worry about buying licenses or running into budget issues. For example, Matlab is expensive, so it’s always a concern for some companies. But since I’ve had experience with open source technologies, I never had to deal with those issues.
I learned about open source technologies during my university studies and discovered that they are free to use even in my professional career. This has been very helpful to me, and I am amazed at how far I have been able to go with it. Although R is not as widely used in the professional field as other languages, it has served me very well, and I am happy to be able to use it in my career. The Vienna R User Group allows me to bring it to the local R community.
Can you share what the R community is like in Vienna?
It’s evident that the industry has started accepting open source, including R. I work primarily in the financial sector and pharma, which are industries where R is widely used. R is also a strong contender, alongside Python, in these fields.
The acceptance of using R in production environments is increasing, but some companies still view it as just a tool for creating graphs and nothing else. Despite this perception, I still use R a lot in production, and it works well. However, some wrong assumptions about using R in production are still present, which makes it challenging to deploy. Since R is a dynamic language and not compiled, some issues need to be addressed. Python also faces similar issues but is seen as easier to use. Although it is possible to use R in production, it depends on the department, as IT departments tend to be less accepting of R compared to the statistics or math departments.
There are always discussions regarding the best programming language to use in various industries. However, with the emergence of cloud technology and containerization, it is possible to package everything up into a nice container, making it work well. R is an industry-standard, and many risk departments in the financial industry use it to develop core models. Although people may complain and want to learn other languages like Python, R is still widely used.
What industry are you currently in? How do you use R in your work?
We apply our expertise to various industries, including finance and pharmaceuticals. As external consultants, we assist clients in setting up proper procedures and creating useful dashboards and applications. We often work with existing R codes or other resources to improve their functionality and create helpful add-ons. Our focus is on maximizing existing knowledge and leveraging the existing code base. Our services often involve package creation, documentation, containerization, and dashboard framework development. We tailor our approach to suit the unique needs of each project.
Nowadays, we are developing more and more frameworks to set up departments in the industry with the right infrastructure. This includes developing R packages and connecting everything with the rest of the organization. Initially, we started by creating small models and calculations, but it gradually became more significant, and now we are mostly helping entire departments set themselves up in the right way and make the most of R and their people.
What trends do you currently see in R language and your industry? Any trends you see developing in the near future?
The trend of containerization has been around for some time now, where you package your app or REST API dashboard in a Docker container and deploy it in an environment such as the cloud. This trend is prevalent in both R and Python. As for upcoming trends, I am excited about the web assembly initiative, which makes it possible to run a Shiny app within a browser without a server. This initiative has great potential and can bring R to people who are unaware of its existence. It is exciting to see R bring data and statistics to life in various applications. I hope that this initiative can go further and reach more people.
Regarding the deployment of our Shiny projects, it is always surprising to see how complicated it can be depending on the environment. This tool aims to make the deployment process easier and accessible to a broader audience. Currently, the loading times are still too long, but these issues can be optimized with some improvements.
I have noticed another trend in certain industries, which is the increasing demand for regulatory compliance. For example, the FDA regulates the pharmaceutical industry, while finance has its own regulatory authorities. This trend encompasses ensuring that packages and codes are properly regulated and reviewed. I am seeing this trend in both the finance and pharmaceutical industries.
Any techniques you recommend using for planning for or during the event? (Github, zoom, other) Can these techniques be used to make your group more inclusive to people that are unable to attend physical events in the future?
We have a GitHub page and a Meetup page, which is our setup. We tried to ensure that everything we present is also available, such as code and slides on GitHub, so that it’s easy for everyone to access. However, finding speakers and rooms is always a challenge. The good news is that finding rooms is getting easier than finding speakers. Some companies are always willing to host an hour-long meetup and have some online meetings. We are a group of smart people who like to talk about interesting things.
The most challenging aspect is locating speakers, particularly female speakers. I am pleased that initiatives like R Ladies provide a dedicated space for women in this field. Generally, finding speakers is a difficult task for us, and we rely heavily on referrals from friends and acquaintances. However, as a community, we always work to overcome this obstacle.
It’s important to always have a stream of topics and speakers available for events, but this can be difficult, especially when finding female speakers. Creating a welcoming and safe community where everyone feels comfortable sharing their knowledge is essential. Organizing these events is worth the effort, as you get to meet many like-minded people in your industry, and it can help you professionally. You’ll learn a lot and get to know people in your field, which is always an advantage. So, if you’re thinking of organizing meetups, just do it, and you’ll see how far it can take you.
Before COVID, our meetings were always in person. We tried recording them, but it didn’t work out. During COVID, we had to switch to online meetings only, and afterward, we started having hybrid meetings. I don’t find online meetups very satisfying because you miss out on the networking and socializing aspects. Going out to a bar or a pub and talking with people is an important part of the experience for me. That’s why I still prefer in-person meetups. However, thanks to COVID, things have changed, and I think we can now find more ways to combine the benefits of in-person and online meetings.
You are creating a lot of content that some people miss due to various reasons. There may be people who wanted to attend but couldn’t due to certain difficulties. To address this issue, we have now set up hybrid meetings, which require more equipment, like microphones and cameras. Most of the time, I have to carry this equipment. However, it makes sense to have this kind of content and share it with your community. Sometimes, speakers may not be happy about it, but it’s rare. Most of the time, it makes sense to do it hybrid.
As for the hybrid, I can say that recording can be difficult, and it rarely works out perfectly the first time around. However, I would recommend setting up a system that reduces your overhead for platforms like YouTube Live. Strive for minimal overhead to make your life easier. Don’t make the mistake we did once when Hadley Wickham was in town and we had to do a lot of editing and cutting because the recording wasn’t perfect. Instead, aim for a setup that works seamlessly and consider doing live streams instead.
The most practical way to share content with YouTube is to stream it live. This automatically uploads the content online, eliminating the need for further actions. As a result, when users visit the platform, the content is readily available for viewing.
I have realized that delaying uploading our content to perform tasks such as editing and rearrangement is a time-consuming process that does not offer significant benefits. Therefore, we are working towards improving our setup by acquiring high-quality microphones and mobile cameras to make the process more efficient and provide our viewers with a seamless experience.
I am often amazed by the gratitude expressed by individuals around the world who get the opportunity to participate. Without the necessary infrastructure, achieving this would be impossible. However, some members of my community believe that it requires an excessive amount of work.
In light of the current global situation, people are less likely to travel or move to different cities for work or other purposes. Therefore, hybrid events are the most suitable way to improve accessibility and encourage community participation. Event organizers should consider using hybrid formats to provide a more inclusive and efficient experience for all participants.
How do I Join?
R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups organize, share information, and support each other worldwide. We have given grants over the past four years, encompassing over 68,000 members in 33 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute.
Kirill Müller is the author of the {DBI} package, which helps to connect R and database management systems (DBMS). The connection to a DBMS is achieved through database-specific backend packages that implement this interface, such as RPostgres, RMariaDB, and RSQLite. There’s more information here. Most users who want to access a database do not need to install DBI directly. It will be installed automatically when one of the database backends is installed. If you are new to DBI, the introductory tutorial is an excellent starting point for familiarizing yourself with essential concepts.
{DBI} supports about 30 DBMS, including:
MariaDB and MySQL, using the R-package RMariaDB and RMySQL
Postgres, using the R-package RPostgres
SQLite, using the R-package RSQLite
Kirill Müller is passionate about building, applying, and teaching tools for working with data and has worked on the boundary between data and computer science for more than 25 years. He has been awarded five R consortium projects over the past 8 years to improve database connectivity and performance in R and another one to investigate profiling of R and native code. Kirill is a core contributor to several tidyverse packages, including dplyr and tibble, and the maintainer of the duckdb R package. He holds a Ph.D. in Civil Engineering from ETH Zurich and is a founder and partner at cynkra, a Zurich-based data science consultancy with a heavy focus on R. Kirill enjoys playing badminton, cycling, and hiking.
Your latest work with the R Consortium was focused on the maintenance and support for {DBI}, the {DBItest} test suite, and the 3 backends to open source databases ({RSQLite}, {RMariaDB} and {RPostgres}). You stated that “Keeping compatibility with the evolving ecosystem (OS, databases, R itself, other packages) is vital for the long-term success of the project.” What’s the current status?
DBI and the other projects are available for use. Please try them!
I always strive for a healthy, “green” build, prioritizing clean and efficient outcomes. However, given the complexity of the projects, with their many moving parts and the continuous influx of new developments, achieving perfection at all times can be challenging. My goal is to ensure that everything we build meets a standard of functionality, even if there are moments when the builds don’t meet every expectation.
Fortunately, the generous funding provided by the R Consortium empowers us to address and rectify any issues as they emerge. This financial support is crucial, as it allows for the immediate tackling of problems, ensuring that our projects remain on the cutting edge and continue to serve the community effectively. Despite the occasional imperfections, my commitment is to promptly and efficiently solve these problems, maintaining the high quality and reliability of our builds.
Is performance an issue with big data sets? Does R have performance advantages or disadvantages compared to other languages?
R has unique strengths as a powerful interface language. R treats data frames as first-class data structures. Functions and expressions are first-class objects, enabling easy parsing, computing, and emitting code, fostering robust programming practices. Moreover, R’s “pass by value” semantics (to be more accurate, “pass by reference and copy on write) ensure that functions do not inadvertently alter your data. This eliminates concerns over state management and makes data manipulation both predictable and secure.
Despite performance considerations, R is adept at efficiently handling bulk data operations. For example, working with columnar data frames that contain anywhere from 100,000 to 3 million rows is smooth due to R’s vectorized approach, allowing for efficient column-wise addition or multiplication. However, the performance can decline if large data frames are processed cell by cell.
And here’s the true power of R: As an interface language, R enables the use of external, high-speed engines—be it DuckDB, Arrow, traditional databases, or data processing tools like data.table and collapse—for computation, while R itself is used to compose the commands for these engines. This integration showcases R’s versatility and efficiency by leveraging the strengths of these specialized engines for heavy lifting, thereby bypassing its performance limitations in certain areas.
Therefore, the focus should not be just on performance in isolation but rather on what can be achieved through R’s integration with other systems. This flexibility allows R to extend its utility well beyond mere data processing, making it a potent tool not only for technical users but also accessible to those with less technical expertise. The intuitive syntax of R, especially with domain-specific languages like dplyr, makes it exceptionally user-friendly, resembling plain English and facilitating a smoother workflow for a wide range of tasks.
Who uses databases and R most? Are they already using R and need to connect to different types of DBMS?
As an interface language, R is remarkably versatile. It is designed to facilitate connections with a broad spectrum of tools and systems. This versatility positions R as a central hub for orchestrating a wide range of tasks, enabling users to maintain their workflow within the platform without wrestling with complex interfaces. Command-line interfaces are acceptable, offering a decent level of control and flexibility. File-based interfaces, on the other hand, can be cumbersome and inefficient, making them far from ideal for dynamic data management tasks.
The spectrum of interfaces available for database interaction varies. The most effective solution is an R package that includes bindings to a library. This setup provides a direct conduit to the necessary functionality, streamlining the interaction process. Examples are DBI backends for PostgreSQL, SQLite, MySQL, and ODBC, or the new ADBC (Arrow Database Connectivity) standard (more on that later). These backends facilitate direct, low-friction access to databases from within R.
Focusing on native solutions, I want to emphasize the potential of the dm package, which I see as offering substantial benefits beyond what the DBI backends might provide. The dm package closely integrates database concepts with R. It enables sophisticated operations, such as the management of data models with primary and foreign keys, execution of complex joins, and the transformation of data frames into a fully operational data warehouse within R. These capabilities extend and enhance the functionalities provided by dplyr, dbplyr, and DBI, offering a comprehensive toolkit for database management directly through R.
RMySQL is being phased out in favor of the new RMariaDB package. Why?
When I first got involved with the DBI Library, it was after being awarded my first contract, which focused on connecting R to SQLite, PostgreSQL, and MariaDB. It’s important to note that MariaDB and MySQL are essentially related; MariaDB is a fork of MySQL. Despite their independent evolution, they remain largely interchangeable, allowing connections to either MariaDB or MySQL databases without much trouble. This similarity can sometimes cause confusion.
In terms of technical specifics, our MySQL package utilizes C to create bindings to its underlying library, while our DBI package prefers C++, which I find more user-friendly for these tasks. When I took charge of the project, these packages were already separate, and I didn’t challenge that decision. Starting anew offers the benefit of not needing to maintain backward compatibility with existing our MySQL users, which has posed significant challenges, especially with the RSQLite package. That package’s widespread use across several hundred other packages meant we had to conduct reverse dependency checks, running tests from those packages against modifications in ours to ensure compatibility. This process, essentially an enhanced test suite, required considerable effort.
Reflecting on it now, I would have preferred to initiate a project like RSQLite, to begin with a clean slate. Starting fresh allows for quicker progress without the constraints of backward compatibility or the expectation of maintaining behaviors that may no longer be relevant or supported. However, you also want to avoid alienating your user base. So, transitioning to C++ and starting from scratch was a strategic choice, and it was one that the maintainer of our MySQL and I agreed upon.
I should mention the odbc package, which isn’t included in the scope of R Consortium projects but is essential to our work. We use the odbc package extensively to connect with a variety of commercial databases and specialized database systems, some of which might not offer a straightforward method for direct interaction. In our setup, the odbc package acts as a crucial database interface, bridging the gap between the database itself and DBI.
There’s been a significant new development in this space, known as ADBC, accompanied by an R package named adbi. This initiative, spearheaded by Voltron Data, represents a modern reimagining of ODBC, specifically designed to enhance analytical workflows. While traditional databases have been geared towards both reading and writing data, ADBC focuses on optimizing data reading and analysis, recognizing that data science and data analysis workflows predominantly require efficient data reading capabilities. This focus addresses a gap left by ODBC, which wasn’t originally designed with high-speed data analysis in mind.
These developments are exciting, and I’m keen to see what the future holds for us in this evolving landscape.
What’s the difference between DBI and dbplyr?
I could describe it as a relationship between DBI and dbplyr, where dbplyr acts as a user of DBI. DBI supplies the essential functionality that enables dbplyr to operate effectively. This setup allows dbplyr to concentrate on constructing SQL queries, while other packages handle the responsibility of connecting to the individual databases.
What are the biggest issues with using R and databases moving forward?
The current DBI project faces challenges that are tough to solve within its existing scope. These challenges could significantly impact many dependent components, which is why this repository has little code and serves mainly as a placeholder for ideas we think DBI is missing. However, these ideas have the potential to become significant enhancements.
One major technical challenge I’ve faced is with query cancellation. If a query runs too long, the only option is to terminate the process, which stops our entire session. This issue is closely related to the concept of asynchronous processing, where a query is sent off, and other tasks are done in parallel until the query results are ready. This would be especially useful in applications like Shiny, allowing it to handle multiple user sessions simultaneously within the same R process. Finding a solution to this problem is crucial due to the current lack of effective alternatives in our infrastructure.
While not every issue signifies a major problem, there are certainly areas that DBI does not address, some of which may be beyond its intended scope. Still, there are notable gaps that require attention.
As for our original plan, we’re taking a different direction thanks to the introduction of the ADBC via the adbi package. ADBC offers a stronger foundation for achieving our goals. With ADBC, all data is funneled through the Arrow data format, which means we no longer need individual backends to translate data into R data frames separately, and at the same time other ecosystems can be integrated easier. In addition, a substantial part of the known challenges for DBI, including query cancellation and asynchronous processing, are already solved by ADBC. Using ADBC as a bridge between databases and ecosystems reduces the complexity from a many-to-many (n × m) problem to a more manageable one-to-one (n + m) problem. This reduces duplication of effort and makes it easy to support new databases or new ecosystems. More information here.
How has it been working with the R Consortium? Would you recommend applying for an ISC grant to other R developers?
This is an excellent opportunity for young professionals to secure funding for their ideas or explore areas that haven’t been fully addressed yet. R is a fantastic tool, but it constantly evolves with new technologies being introduced. I’m particularly impressed by how the consortium supports various projects, including R-Ladies and SatRdays, which promote inclusivity within the community. I was approached with the idea of applying for a project, something I might not have considered alone. This makes me curious whether there’s a list of challenges similar to what the Google Summer of Code offers, where potential mentors submit project ideas for students to work on under their guidance. I haven’t looked into this possibility for the consortium in detail yet, but the thought of it excites me. I thoroughly enjoy being part of this process and am eager to see what long-term collaborations might emerge from it.
About ISC Funded Projects
A major goal of the R Consortium is to strengthen and improve the infrastructure supporting the R Ecosystem. We seek to accomplish this by funding projects that will improve both technical infrastructure and social infrastructure.
Come celebrate 10 YEARS of theNew York R Conference! This year’s conference takes place May 16th & 17th with workshops May 15th. We are taking a trip down memory lane and looking back over the past nine years. Come listen to some of the all-time greats who will be gracing our stage once again, and we’re also adding some fresh and exciting new voices to the mix!
This year’s conference features an a-list lineup of speakers who will be sharing their expertise on a wide range of topics, including data visualization, machine learning, programming, AI and more.
The R Consortium recently spoke with the organizing team of the R User Group at the University of Manchester (R.U.M.). R.U.M. aims to bring together R users of all levels to share R best practices, expertise and knowledge. The group is open to all staff and postgraduate researchers at the University of Manchester, UK.
During the discussion, the team shared details about their recent events and their plans for this year. They also discussed the latest trends in the R programming language and how they are utilizing it in their work.
Please share about your background and involvement with the RUGS group.
Martin: My name is Martin, and I joined the University of Manchester a year ago. They assigned me to manage the R user group, which was previously under Camila’s leadership. Although I am officially in charge, this is a collaborative effort between all of us who are present in this meeting, along with some others who couldn’t join. I work in Research IT and mainly use R for projects assigned to me by other people.
Anthony: My name is Anthony and I work at Research IT with Martin at the University of Manchester. I first came into contact with R when I was a student. Later, I became a helper at many of the university’s R training courses based on the Carpentries training courses. Camila, who was Martin’s predecessor, was also a trainer at R and she formed the R Users Manchester group. I volunteered to help her with the group a year ago, and it just turned a year old. After that, I continued to be a part of the group.
Lana: Hi there, my name is Lana. I am a PhD student and research assistant at the Division of Psychology and Mental Health at the University of Manchester. I have been using R for the past six years, ever since my Master’s degree. I have been a part of the group since its inception and have been running R introduction sessions for beginners within my division for a couple of years now. When I learned the group was being formed, I contacted Camila a year ago. This makes us founding members of the group.
Rowan: Hello, my name is Rowan Green. I am currently a PhD student in the Department of Earth and Environmental Sciences. For my research work, I use R extensively for simulation modeling bacteria, analyzing lab data, and creating visualizations. The best thing about using R is that it produces much prettier visualizations than other options available to us as biologists. We have a lot of master’s and undergraduate students coming through the lab. I often give them pre-written scripts they can tweak to create their plots. It’s exciting to see them working hard to produce their plots.
Camilla mentioned starting a group to share knowledge about R on a university-wide level. I found this a great opportunity to participate and learn from others’ presentations during the meetings. It has been an enriching experience so far.
Can you share what the R community is like in Manchester?
Anthony: In industries such as banking and finance, R is frequently used to create graphs to showcase econometric data in an easy-to-understand manner. The graphical capabilities of this programming language make it a popular choice in these fields. The university we’re in has access to the Financial Times, which is known for producing visually stunning graphs. Interestingly, they also use an R package called FT plot tools, which is a specialized package solely for their use. So, it’s safe to say that R has a significant presence in the banking and finance sectors.
Are your meetups virtual or in-person? What topics have you covered recently? What are your plans for the group in the future?
Martin: Our events are a mix of in-person and online meetings. There have been talks about developing packages, data visualization, automating reports, and working with tables. We usually cover topics we are confident about or know people from the university are working on. However, we are also trying to get external speakers to come and talk. It’s challenging, but we are doing our best to make it happen. We are currently accepting proposals from potential speakers.
Our book club has mostly or completely taken place online.
Lana: Bookclub was mostly online. During the summer book club, we were reading R for Data Science. We covered a chapter or two chapters each time. We had the book’s second edition, and people from all over the university joined the club.
We were discussing the possibility of changing the format of Tidy Tuesdays. We received feedback that people don’t have enough time to come up with something extra creative every month. Additionally, there has been a need for more practice. Therefore, we plan to redesign Tidy Tuesdays to be more practice-oriented than creativity-oriented. We will be implementing these changes this year.
Anthony: We’ve recently had several discussions on useful packages, particularly in R. Some packages that were developed and published were custom-made. We also had presentations on the cosinor and cosinor2 packages, which are used for fitting curves, and an R update package for validating clinical prediction models.
There are two other R groups in Manchester. Our aim for this year is to establish communication with them and collaborate in a coordinated manner. (Editor’s Note: We recently talked with the Manchester R User Group.) Currently, our group solely focuses on the internal R community at the University of Manchester.
Any techniques you recommend using for planning for or during the event?
Rowan: I’m not sure if everyone would agree with me, but I think we did well in the format of our meetings. We started with brief, brief talks – within an hour – followed by questions and discussions, which worked well.
However, the harder part has been promoting and informing people about the meetings. Sometimes, word of mouth has been more effective than emails and posters. I noticed that they were interested in attending when I encouraged my lab group, who all use R. But without any scheduled reminders and someone to encourage them, it may be difficult to get people to come.
Lana: It’s important to identify everyone’s strengths or specialties within the organizing group, as they will probably be useful in the first few events. After that, you can expand your network within the community, which is easy to do since people are easily reachable. This will allow you to find interesting topic ideas and strengths to draw from.
What trends do you currently see in R language?
Martin: I’ve noticed a growing interest in Shiny lately, as I manage a pilot server for the university and have seen an increase in users over time. There have also been several inquiries about using R within our high-performance computer cluster, which may be something we can offer to the university. This interest is not surprising, given the current hype around machine learning.
A trending area that applies to multiple platforms, not just R, is towards reproducible research and compatibility between different programming languages. This means that R can be integrated with Python and other languages to create a documented and integrated pipeline. I’ve been experimenting with SnakeMake, which works well with R, but it would be great to see more integration from the R side, perhaps through the common workflow language or another similar tool.
Please share about a project you are currently working on or have worked on in the past using the R language. Goal/reason, result, anything interesting, especially related to the industry you work in?
Rowen: Recently, I wrote a preprint of a paper where we simulated the growth and mutation of bacteria using differential equations and R programming language. To perform the simulation, we utilized high-performance computing, which enabled us to simulate various ways the bacteria could grow by adjusting the rates of reactions occurring within the cells. This simulation required high-performance computing to be feasible for running multiple simulations.
After running simulations, we came up with some ideas to test in the lab. Our focus was on measuring mutation rates, and we used statistical analysis to estimate them through R. We have been striving to ensure reproducibility, and as a result, we have annotated all the data tables and R scripts with the paper.
It has been an interesting journey for me. I had to tidy up my messy scripts and think about how someone else would perceive them. I had to ensure they made sense. However, the project was fascinating as I generated hypotheses using R, tested them, and analyzed and visualized them with the same tool. R is a complete tool that can handle all aspects of the process, making it a brilliant choice.
How do I Join?
R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups organize, share information, and support each other worldwide. We have given grants over the past four years, encompassing over 68,000 members in 33 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute.
Contributed by Abbie Brookes, Senior Data Analyst at Datacove
Datacove is pleased to announce the availability of tickets for the upcoming EARL (Enterprise Applications of the R Language) conference.
The EARL conference is a cross-sector event that will be held at the Grand Hotel in Brighton. This venue promises to provide attendees with a blend of Victorian elegance and modern conference facilities over three days, from the 3rd to the 5th of September 2024. The conference schedule includes high-quality workshops on the first day (3rd September) and two days of presentations and talks (4th – 5th September). An evening networking event is planned for the 4th of September at the British Airways i360 venue, offering attendees the opportunity to connect with peers and speakers in a relaxed setting.
We are offering tickets at a reduced early bird rate. Additionally, we provide discounts for government employees, NHS staff, charity workers, academics, and those making bulk purchases. For more detailed information on ticket pricing and discounts, contact Abbie Brookes at abbie.brookes@datacove.co.uk.
The EARL conference draws attendees from across the globe and from a variety of sectors. Previous participants have included notable organizations such as The Dogs Trust, BBC, Microsoft, Swiss RE, Posit, Sainsburys, and Bupa.
This year’s keynote speakers include:
In addition to the main conference, a selection of pre-conference workshops will be available, offering in-depth training opportunities. For more information on the conference venue, schedule, and registration, please visit our website. We invite you to join us for what promises to be an informative and engaging event for the R and Python communities
The R/Medicine conference provides a forum for sharing R based tools and approaches used to analyze and gain insights from health data. Conference workshops provide a way to learn and develop your R skills. Midweek demos allow you to try out new R packages and tools, and our hackathon provides an opportunity to learn how to develop new R tools. The conference talks share new packages, and successes in analyzing health, laboratory, and clinical data with R and Shiny with a vigorous ongoing discussion with speakers (with pre-recorded talks) in the chat.
Statistical Challenges in Single-Cell and Spatial Transcriptomics
Thursday, June 13
Biography
Stephanie Hicks, PhD, MA and Associate Professor of Biomedical Engineering and Biostatistics at Johns Hopkins University, is an applied statistician working at the intersection of genomics and biomedical data science.
Gundula Bosch
Reproducibility in Medical Research
Friday, June 14
Biography
Gundula Bosch, PhD, MEd ’16, MS, is a scientist and educator leading global education reform through training programs in critical, broad, and interdisciplinary scientific thinking. She is the director of the R3 Center for Innovation and Science Education at the Johns Hopkins School of Public Health.
Lightning talks (10 min, Thursday, June 13, or Friday, June 14) Can pre-record so that you can be live on chat to answer questions
Regular talks (20 min, Thursday, June 13, or Friday, June 14) Can pre-record so that you can be live on chat to answer questions
Demos (1 hour demo of an approach or a package, Wednesday, June 12) Done live, preferably interactive
Workshops (2-3 hours per topic, Monday, June 10, or Tuesday June 11, usually with a website and a repo, participants can choose to code along. Usual 5-10 min breaks each hour.
Posters for poster session on Wednesday, June 12. Can include live demos of an app or a package.
Confirmed Workshops (Monday, June 10, and Tues, June 11)
Note: Final dates and times TBD. More workshops being added. Check the R/Medicine website for updates.
Causal Inference with R – Lucy D’Agostino and Malcolm Barrett
Tidying your REDCap data with REDCap Tidier – Stephan Kadauke and Will Beasley
Next Generation Shiny apps with bslib – Garrick Aden-Buie
Are you ready to delve into the world of finance through the lens of R? Look no further than the R Finance Conference (May 18, 2024, University of Illinois Chicago) – your gateway to cutting-edge insights, advanced methodologies, and unparalleled networking opportunities. As an enthusiast of data-driven finance or an R programming aficionado, this single-track, one-day event promises to be an enlightening experience. R Finance is the must-attend event in the realm of financial technology.
Founded in 2009, the R Finance Conference quickly evolved into the premier event in the financial technology landscape. Originating from the shared enthusiasm of R users in the Chicago financial center, a group of loosely connected enthusiasts was seeking to improve financial analysis. From its humble beginnings to its current stature, it remains committed to fostering knowledge exchange and driving advancements in R-based finance.
Why Choose a Single-Track Event?
One distinctive feature of the R Finance Conference is its single-track format. Unlike multi-track conferences, where attendees must choose between concurrent sessions, a single-track event offers a shared group experience. Single track offers:
Focused Learning:
Attendees can fully immerse themselves in each session without the distraction of conflicting schedules. This focused approach enhances learning and ensures that participants extract maximum value from every presentation.
Enhanced Networking:
The single-track format encourages interaction among attendees as everyone gathers in the same sessions. This facilitates meaningful discussions, idea exchange, and networking opportunities with like-minded professionals, fostering a sense of community and collaboration.
Comprehensive Coverage:
By following a single track, attendees gain exposure to a diverse range of topics and perspectives within the realm of R-based finance. From quantitative modeling and algorithmic trading to risk management and data visualization, each session contributes to a holistic understanding of the subject matter.
Key Highlights of R Finance Conference
Expert Speakers: Renowned experts and thought leaders in finance and data science share their insights, best practices, and real-world experiences. In 2022, speakers included Matthew Dixon, Associate Professor, Department of Applied Math and Affiliate Professor, Stuart School of Business, Illinois Tech; Veronika Rockova, Professor of Econometrics and Statistics, University of Chicago, Booth School of Business and James S. Kemper Foundation Faculty Scholar; and Thomas P. Harte, Head of Fixed Income & Liquidity Strats at Morgan StanleyInteractive Workshops: Hands-on workshops provide attendees with practical skills and techniques to implement R-based solutions in their professional endeavors.
Networking Opportunities: Engage with industry peers, establish valuable connections, and exchange ideas during networking breaks, social events, and interactive sessions.
Exhibition Showcase: Explore cutting-edge technologies, tools, and services offered by exhibitors and sponsors, offering valuable insights into the latest innovations in financial technology.
Join Us at R Finance 2024
Don’t miss out on the opportunity to elevate your finance skills and network with industry leaders at the R Finance Conference 2024. Reserve your spot today and embark on a transformative journey in R-based finance.
SatRDays London 2024 is set to ignite the data science community with a vibrant lineup of speakers and a rich array of topics ranging from survival analysis to geospatial data. This inclusive event, designed for R enthusiasts at all levels, emphasizes networking and collaboration amidst the backdrop of King’s College London’s iconic Bush House. Keynote speakers like Andrie de Vries, Nicola Rennie, and Matt Thomas bring unparalleled expertise, offering attendees a unique opportunity to deepen their knowledge and connect with peers. As a hub of innovation and learning, SatRDays London promises to be a cornerstone event for anyone passionate about R and its applications in the real world.
How does this year’s satRDays in London compare to last year’s event? What’s new and different?
After a successful SatRdays London in 2023, we are keeping the format the same, but with a whole new lineup of speakers! This year we’re excited to welcome:
Andrie de Vrie – Posit
Hannah Frick – Posit
Charlie Gao – Hibiki AI Limited
Michael Hogers – NPL Markets Ltd
Matthew Lam & Matthew Law – Mott MacDonald
Myles Mitchell – Jumping Rivers
Nicola Rennie – Lancaster University
Matt Thomas – British Red Cross
Talk topics for the day include survival analysis, geospatial data, styling PDFs with Quarto and using R to teach R, as well as a range of other exciting themes! The talks can reach a varied audience from aspiring data scientists right to the experienced audiences.
Take a look at the full list on the conference website for more information.
Who should attend? And what types of networking and collaboration opportunities should attendees expect?
Anyone and everyone with an interest in R! The SatRdays conferences are designed to be low cost, to allow as many to attend as possible, and they’re on a SatRday, so you don’t have to worry about getting time off work if your job isn’t necessarily R focussed.
Networking is the main focus of the event. We have multiple coffee breaks to give attendees the opportunity to interact with fellow R enthusiasts. If you’re brand new to this kind of event, and are not sure where to start, don’t worry! Find one of the attendees from JR, and we’ll be happy to help you make introductions!
Can you share some insights into the keynote speakers, their areas of expertise, and how they will contribute to the overall experience at SatRDays?
At this year’s event, we have talks from three invited speakers – Andrie de Vries of Posit, Nicola Rennie from the University of Lancaster and Matt Thomas of the British Red Cross.
Andrie is Director of Product Strategy at Posit (formerly RStudio) where he works on the Posit commercial products. He started using R in 2009 for market research statistics, and later joined Revolution Analytics and then Microsoft, where he helped customers implement advanced analytics and machine learning workflows.
Nicola is a lecturer in health data science based at the Centre for Health Informatics, Computing, and Statistics at Lancaster University. She is particularly interested in creating interactive, reproducible teaching materials and communicating data through effective visualisation. Nicola also collaborates with the NHS on analytical and software engineering projects, maintains several R packages, and organises R-Ladies Lancaster.
Matt is Head of Strategic Insight & Foresight at the British Red Cross. His team conducts research and analysis to understand where, how and who might be vulnerable to various emergencies and crises within the UK.
Could you elaborate on the types of sessions and workshops available and how they cater to different interests and skill levels within the R community?
The day will consist of eight 25-ish minute talks, plus Q&A, from a variety of speakers across various sectors.
The talks are on a wide range of topics. For example, last year we had speakers talking about everything from using R for mapping air quality, to EDI and sustainability in the R project, and why R is good for data journalism. If you want to take a look at what you can expect, we have a playlist of last year’s talk recordings available on our YouTube channel.
With the event being hosted at King’s College London, how does the venue enhance the experience for attendees, both in terms of facilities and location?
We’re very excited to be partnering with CUSP London again this year, who provide the amazing Bush House venue at King’s College London. The venue is a beautiful listed building, right in the heart of London, only a few minutes walk from Covent Garden.
Being in the center of London means easy access to multiple public transport links, both for national and international attendees!
The venue facilities and supporting technology provides a great space for sharing insights and networking.