window.intercomSettings = { app_id: "w29sqomy", custom_launcher_selector:'#open_web_chat' };Skip to main content
All Posts By

R Consortium

R Applied to Epidemiology and Infectious Disease in Glasgow

By Blog

Antonio Hegar, organizer of the R Glasgow user group (also on Twitter), shared with the R Consortium his efforts to build an R community in Glasgow. He discussed the widespread use of R in Glasgow across a broad range of fields and stressed the need to bring together R users for knowledge sharing. He also shared his work as an epidemiologist with the Ministry of Health in Belize for reporting COVID-19-related data for public policy and planning. 

Antonio Hegar, Epidemiologist | Health Data Scientist | Public Health Researcher


Please share about your background and involvement with the RUGS group.

I am a Ph.D. student at Glasgow Caledonian University here in Scotland. My Ph.D. is focused on Machine Learning applied to large health databases, in other words, big data. Before starting my Ph.D. I worked for over five years at the local Ministry of Health in Belize, which is where I am from. R has been my primary statistical tool because of all the functionalities it offers. I come from an epidemiology/public health background and use R for my analyses. 

I came over to the UK at the end of the pandemic in late 2021, and like many people, I was stuck at home due to all the restrictions. Nevertheless, I wanted to reach out and learn as much as possible about R from the respective experts across a broad range of fields and I knew that in Glasgow there is a very large R community. So I started looking around and found the R Glasgow group on Meetup and decided to give it a try. I joined, and luckily enough Andrew Baxter was organizing it at that time along with other members. I attended a few online meetings which were very productive and I have been following it up ever since. Since the beginning of this year, I have tried to be more active in the group by organizing events.

Please share about a project you are currently working on or have worked on in the past using the R language. Goal/reason, result, anything interesting, especially related to the industry you work in?

I have worked at the Ministry of Health in Belize using R and R Markdown to generate reports for COVID-19 outbreaks. So basically I applied mathematical modeling to infectious diseases which in this case was COVID-19, and made forecasts. My mathematical model took data from the local Ministry of Health and forecasted hospitalization rates, infection rates, and mortality rates. All this information was compiled into a report which was used by local officials and the Ministry of Health for planning. 

What resources/techniques do/did you use?

As I mentioned, I used R Markdown for generating reports. As you might be aware that at the height of the pandemic, every country had a dashboard. I used Shiny for creating private dashboards displaying public health data for the Ministry of Health. Besides that, I also used the tidyverse and dplyr a lot. 

I also did data imputation because whenever you are working with real-life data, especially public health data, there are a lot of gaps. So data imputation using mice and different R packages help you fill in the gaps in the data. 

ggplot is another tool I used a lot for this project. When you are dealing with a non-technical audience you need really easy-to-understand charts and graphs which will help them easily and quickly understand what you are trying to display. So I did a lot of data visualization with ggplot and was constantly trying to look at new techniques to make data as attractive as possible. 

Can you share what the R community is like in Glasgow and Scotland in general? 

To be very honest with you, my response would be that I cannot really speak about it. As much as I have tried engaging, and of course, I am a member of the local R group, it’s proven to be much more difficult than I anticipated to actually have a cohesive understanding of the wider R community. 

What I could say from what I have noticed from looking at university websites and looking at the profiles of different lecturers and researchers is that R is definitely used across the board in all of the major universities in Glasgow. I imagine it’s the same in many other major cities like Edinburgh in Scotland. So there are people using it for modeling, geospatial analysis, public health, epidemiology, finance, and economics. 

I have met online or seen the profiles of many people who claim to be using R. But in terms of community or the lack of community, it’s all very dispersed at the moment. Which is another point that I wanted to discuss. On the surface, it appears that there is a lot of support and a lot of enthusiasm for using R at the individual as well as research department levels. But in terms of forming a cohesive group where people will come together and share ideas, that hasn’t been as forthcoming as I would have wanted. It’s less of an R community, in my point of view, and more of a network of R users with different nodes around the place. But not necessarily a functioning complete organization. 

I would like to take this opportunity to reach out to R users living in Glasgow. R Glasgow can provide R users in Glasgow with a great opportunity to learn and grow together. I would also like to give a call for speakers. As we are hosting our events online, we would love to have speakers from around the globe join our events. 


How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups around the world organize, share information and support each other. We have given grants over the past four years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute. We are now accepting applications!

Using R to Develop Solutions for Industrial Problems

By Blog

Vincent Guyader talked to the R Consortium about his background with the R language, describing his acquisition of experience and skills in developing solutions to scientific and industrial problems. He also spoke about the founding of his company, ThinkR, and his role as organizer of R Addicts Paris

Vincent Guyader is the President and CTO of ThinkR, a company founded in 2015 dedicated to the development of tools and solutions based on the R language to solve different scientific and industrial problems.

Vincent is a specialist in applied statistics and also has skills in system administration, which allows him to intervene on the IS (Information systems) side for the implementation of Compute Server, the deployment of Shiny applications, and the installation of Posit workbench and Posit Connect. Vincent is the organizer of the R Addicts Paris group, the first R group of its kind in France and currently the largest with over 1,800 members. The group offers training in the use of R software for business professionals, which allows him to develop his Data Science skills. In his spare time, he enjoys cycling in Paris and practicing karate.


Please share your background and your involvement in the RUGS group or in the R Community. 

It all started during my scholarship at Agrocampus Ouest, located in Rennes, France, a distinguished institute known for its exceptional curriculum focused on statistics and agronomy, where I gained proficiency in the R programming language.

Subsequently, I ventured into the consulting field immediately after my graduation without any previous experience in running a business. At first, I brought my expertise in statistical analysis, but thanks to my rapid progress in R, I soon became an outstanding R specialist. As a result, my clients did not only approach me for my statistical competence but also for my exceptional command of the R programming language.

📸 Credits: Diane B., 2016

Please share about a project you are currently working on or have worked on in the past using the R language. Goal/reason, result, anything interesting, especially related to the industry you work in?

The main project I have been involved in is the Golem package, which allows the creation of dynamic and interactive Shiny applications. Along with my colleagues, we have written a book entitled Engineering Production-Grade Shiny Apps, which has become one of our most significant contributions to the community. This project is ongoing and has been continuously growing since its inception four years ago.

In our line of work, we collaborate with various industries, such as pharmacology, energy and banking, among others. Our main goal is to demonstrate the usefulness of R as a language that is not limited to trivial applications but can also be used to address complex business challenges. We offer this orientation to all sectors, regardless of their nature.

What is your level of experience with the R language?

Although R was not the programming language I initially started with, I have consistently used it since 2008 and have developed a good level of proficiency in the language. I am able to complete any required task with relative ease, and my experience goes beyond personal use.

I am confident in my ability to share my knowledge with others and help them improve their R programming skills. I believe that with my experience and knowledge of R, I can offer some value to any project or initiative. While I have developed some mastery of R, I still see myself as a learner and continuously seek to improve.

What resources or techniques do you use? 

At ThinkR, we leverage a wide range of technologies to achieve our goals. While we do not use Spark extensively for big data, we do use other powerful tools such as Docker, the Posit products, GitHub, GitLab, the Tidyverse, and the data table. With our team of 13 people at ThinkR, we are very fluent in all aspects of our language. It is worth noting that not all team members rely on RStudio as their IDE, as some prefer VS Code, either locally or on a remote desktop. As a result, we have a wealth of options to carry out our work effectively.

Do you have an ongoing project? Please share any details or CTA for who should get involved!

We have an imminent appointment in Avignon, a picturesque city in the south of France. The event, known as Rencontres R, will be held in a few weeks and promises to be a great event, with an expected attendance of 250 people. The two- to three-day event is dedicated exclusively to all things R in France and marks an important moment for the French R community. Excitement is growing as the date approaches and we expect it to be a success!


How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups around the world organize, share information and support each other. We have given grants over the past four years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute. We are now accepting applications!

The 2023 RUGS Program is awarding grants for 2023!

By Announcement, Blog

The R Consortium gives grants to help R User Groups (RUGS) around the world organize, share information, and support each other. We are currently accepting applications! 

The R Consortium RUGS Program has grown from being a relatively modest R user group support program to being the primary vehicle for the R Consortium to award Social Infrastructure Grants. Social Infrastructure includes meetings, events, conferences and any other activity intended to strengthen the social, organizational, and identity structures of the R Community. 

In 2023, there will be three categories of RUGS Program grants:

  1. User Group Grants
    1. The intent of user group grants is to facilitate person-to-person exchange of R knowledge in small group settings on a global scale.
    2. Cash grants typically vary between $200 and $1,000 and depend on group size and special needs.
    3. All groups who are accepted into the RUGS program who are not already participating in the R-Ladies Meetup.com Professional Account program are enrolled in the RUGS program which covers Meetup.com dues.
  2. Conference Grants
    1. To qualify for a RUGS program conference grant, an event must be focused on the R language and offer at least one full day of technical talks and presentations and aim to attract participants with diverse backgrounds. These grants are for conferences organized by non-profit or volunteer groups.
  3. Special Project Grants
    1. With our Special Projects Grants categories, we hope to stimulate the imagination of local R community builders. 

Full details here: https://www.r-consortium.org/all-projects/r-user-group-support-program. Please help support R language. Submit your proposals!

R User Groups

There are currently 98 R User Groups (RUGS) organizing and learning and spreading the use of R globally. These groups welcome individuals from any background, from beginner-level users to experts. 

Check out our recent blog interviews by organizers of the R User Groups across all industries:

Get Involved!

The 2023 RUGS Program is currently taking applications and will close at midnight PST on September 30, 2023. 

These grants do not include support for software development or technical projects. Grants to support the R ecosystem’s technical infrastructure are awarded and administered through the ISC Grant Program which issues a call for proposals two times each year.

Global Fishing Watch Helps Protect Critical Marine Ecosystems Through Open Data and Innovative Technology

By Blog

Interactive map and R package enable the research community with new insights about the ocean

Global Fishing Watch is an international nonprofit organization dedicated to advancing ocean governance through increased transparency of human activity at sea. To help meet that goal, they created an interactive online map that uses satellite technology and machine learning to track and visualize the activity of fishing vessels around the world.

The map allows anyone to view the movements of fishing vessels in near-real time and explore historical data on fishing activity. Global Fishing Watch uses automatic identification system data and other streams of information to build knowledge about the ocean and the activities taking place across it. These key insights help promote transparency and accountability in global fisheries, enabling authorities to identify illegal, unreported, and unregulated fishing. The map also supports efforts in understanding the environmental impact of fishing activity and helps inform sustainable fisheries management. Since its launch in 2016, Global Fishing Watch has bolstered the work being done by researchers, policymakers, and conservationists to promote sustainable fishing practices and protect marine ecosystems.

In July 2022, Global Fishing Watch released gfwr—an R package designed to enable the research community to access data from their API portal

The gfwr package allows R users to request data from Global Fishing Watch’s application programming interfaces and receive data in a tidy format suitable for incorporation into new or existing R workflows. Users have the ability to pull data for analysis without any prior API experience. Learn more about how Global Fishing Watch empowers others to use their data. 

Check out the Global Fishing Watch map. 🗺

A screenshot of the Global Fishing Watch map shows an interactive heat map of fishing efforts. Variations in color represent different data sources. The brighter grid cells indicate areas with more activity. 

The gfwr authors and maintainers are listed below. ✍️


Tyler Clavelle (he/him)

Rocío Joo (she/her)

Nate Miller (he/him)


Adoption of R by Actuaries Community in Melbourne

By Blog

Dr. Maria Prokofieva of the Business Analytics and R Business User Group, Melbourne, recently talked to the R Consortium about the growing use of R and Data Science in Australia. The group feels strongly about contributing to the R community by building resources for R users at all levels of expertise. They are currently looking for volunteers to help put their plans into action.

Maria is a Senior Lecturer at the Victoria University Business School. She has a Bachelor’s in IT and Graduate Certificate in Accounting. Her Ph.D. is in the area of Business Information Systems (in Russia). With a background in IT and Business, she is passionate about using R in academic research and everyday life.


How did you get introduced to R? 

I work as an academic at Victoria University. My work is at the intersection of Business, Social Sciences, and Technology. I got involved with the R community in Melbourne ten years ago. My first exposure was through R-Ladies Melbourne. I am really grateful for all the support I was provided in my early years, and I am trying to give back. Being an academic, I take great joy in teaching my students and colleagues and helping other people get insights from their data. It’s an amazing feeling to watch people progress from being afraid to do anything to feeling empowered that they can do anything. 

What is the R community like in Melbourne? What is the most interesting thing about the community?

The R and Data Science community in Melbourne is very supportive. I feel that this is not just limited to the community in Melbourne. The strongest point of the R community is that if you are struggling with a problem you can reach out for help and people will try their best to help you. 

People in R and Data Science are very welcoming and you feel at home when you talk to these people. And this is what I really love about the local community here and the community overall. You feel that these are my birds of a feather and I want to be around them.  

What industries do you see using R in Melbourne?

The use of R in industries in Australia is still in the early stages. We can say that R in Data Science and business is still growing. In some areas of business use of R is growing rapidly. For example, the actuaries community in Australia has adopted data science. On the other end, if we look at CPA Australia publications, they still promote the use of Pivot tables in Excel and it is considered a really advanced skill. 

So overall businesses are at different stages when it comes to adopting the use of R and data science in general. We are trying to provide all these businesses at different levels, a platform to share their struggles and learn from others. This group acts as a uniting force that can put people in dialogue and they can talk to each other for solving issues. 

How has COVID affected your ability to connect with members? What techniques (Github, zoom, other) have you used to connect and collaborate with members? Can these techniques be used to make your group more inclusive to people that are unable to attend physical events in the future?   

I think the pandemic showed us new ways to work and some new ways to communicate with each other. As meetings have shifted online and people work from home, they have become more rational with their time. They no longer want to waste any time in commute. Many businesses only require their employees to come to work three days a week and they work from home for the rest of the week. 

So even for group events people prefer the online format. Initially, they show enthusiasm for physical events as they are sick of staying at home. But when the actual time for the event comes everyone prefers attending virtually. 

Even though Zoom works really well for presentations and workshops, nothing can replace physical events for networking and building relationships. In my opinion, as a participant and as an organizer, physical events work best for any networking activities. Just having a coffee together with a person can change your entire perception. And it allows you to build working relationships that can never happen on Zoom. And while there is the option of hosting hybrid events, the logistics can be very challenging.  

What trends do you see in R language over the next year?

I really love the transition from R Studio to Posit, and I think it will open new horizons for all R users. In our group, we try to invite all people in the data science community to our events, regardless of the programming language they use. We believe that coming from different backgrounds is really great for information sharing and learning new approaches to solving a problem. So I think the integration of R with other programming languages is one of the biggest trends that I will be looking forward to. 

What is your favorite R event you have attended?

R Studio conference is my favorite R event as you get to know all the new trends there. You get to see old friends, make new friends, and get lots of insights about what is going to happen. What I really appreciate are the contests they have started recently. And it’s not about the prize but the intellectual challenge. You get involved to see what you can do. It’s equally exciting for me and my students. 

When is your next event? What are your plans for the group for the coming year? Please give details!

We are building an agenda for this year. We are planning to publish blog posts twice a month. We are also planning a series of webinars which we will be recording and sharing on our YouTube channel

We are also building up our GitHub repository and have developed a plan for it. We will be making a list of all packages available for accounting and actuaries. We will also support them with educational materials and tutorials. Everyone will be able to find and use what interests them based on their level of expertise. 

We will also generate data sets from data available from businesses. A lot of times businesses are not willing to share their data. Convincing them to share their data while preserving their privacy is a valuable task. This is all part of our efforts to give back to the community, as we have all used resources made available by the community to us. I feel that it is our responsibility to contribute and pay back to the community.

📣 We are also looking for new members and have published a call for volunteers in our recent R blog post. We need passionate volunteers who can contribute to the group with their time and effort. I would also like to take this opportunity to spread the word. It will be a really good opportunity for someone who wants to contribute to the development of resources and pay back to the R and Data Science community.


How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups around the world organize, share information and support each other. We have given grants over the past four years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute. We are now accepting applications!

Use of R in Agricultural Chemical Industry in Chile

By Blog

Daniel Fischer of the UseR Chile Group recently talked to the R Consortium about the use of R for Data Analysis in Chile. He shared that even though there is a constant comparison of R with Python, R has a very loyal user base in Chile. Daniel also uses R for his work in the agricultural chemicals industry and teaches R through his YouTube channel, blog, and University of Andes.

Daniel Fischer,
Corporative Head of Advanced Analytics and Data Engineering at Anasac.


Please share about your background and your involvement in the R Community? What is your level of experience with the R language?

I am an industrial engineer by education. Here in Chile, it’s a six-year degree that is a mix of engineering and management. I have been working for the last 12 years in data analysis-related jobs. I am currently the Corporate Head of Advanced Analytics and Data Engineering at ANASAC. I am also a Business Analytics and Optimization Professor at the Universidad de los Andes (CL). I also use R for teaching. 

Around 10 years ago, I did some research and I realized R is a really powerful and easy tool. I started looking for other people using R for their work and founded the useRChile. We were a small group and used to meet in a pub to talk about what we can do as a group. After some time, with help from MetricArts by EY, we organized monthly meetups. Most people with a scientific background, statisticians, and people interested in machine learning or networking attended our meetups. 

We used to have two presentations in each meetup and recorded and uploaded them to our YouTube channel.  We stopped having meetings a year ago because of the pandemic. But the whole organizing team is still in contact, and we will start organizing meetups again soon. Besides the group, I also teach R through my YouTube channel and blog

What industry are you currently in? How do you use R in your work?

I currently work for an international agricultural chemicals-producing company. We use R for everything, as it is our main Extract, Transform, Load (ETL) tool. When I started working for the company, I told the team that we can use R or Python. We should not use SQL because it is hard to debug. I wanted something you could run line-by-line like R or Python. The whole team decided to use R because it is much easier. A couple of them already knew Python. But when they saw how much easier R was using dplyr and the whole dataverse, they just said Python is too complicated. So we have all our processes in R and various small libraries in Python which are shared with R using Reticulate

We built most of the data warehouse data pipeline in R and we built the web applications in Shiny. There is also ongoing work on a real-time recommendation system. We have also built a set of tools to interact with databases which I want to make open-source in CRAN. 

In Chile, Databricks is very popular and widely used in industry. Most people use it with Python because it’s cheaper, but there are many bricks built in R too. 

Process panel built in R+Shiny. The form allows users to trigger processes, which saves time for the analytics team.

What trends do you currently see in R language and your industry? Any trends you see developing in the near future?

There is a lot of comparison between R and Python and many people believe that Python is much superior. But I think R is much superior for data analysis. People who already use R for data analysis stick to it and when new users learn about the power of R they also prefer it over Python. Even Julia says it wants to be as good for analysis as R and fast as C and flexible as python.

So I think while R might not be as big as Python, it’s like comparing a swiss pocket knife to a big knife used for chopping meat. If you want to cut the meat you are going to need a big knife. R is the big and specialized knife for data analysis. I am hopeful as new people get introduced to R, the user base will also increase with time. 

Why do industry professionals come to your user group? What is the benefit for attending?

Our group is appealing to industry professionals as it helps them learn about interesting topics presented by speakers who are experts in their fields. Besides learning, they also get to meet people with similar interests and build their network. We also used to hang out after the meetings at a pub. 


How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups around the world organize, share information and support each other. We have given grants over the past four years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute. We are now accepting applications!

Uniting R Professionals Across Disciplines for Data-Driven Insights in Maine

By Blog

Donald Szlosek, the MaineR User Group organizer, recently highlighted the significant growth and impact of the R community in Maine to the R Consortium. Donald emphasizes the crucial role of the R language in life sciences and showcases the group’s remarkable work in bridging the gap and empowering data professionals through collaboration and insights throughout the state of Maine.

Donald Szlosek has a great track record as a biostatistician, shaped by his collaboration with Beth Israel Deaconess Medical Center, Harvard Medical School, and his current work at IDEXX Laboratories. His job consists of diving into big data to find new actionable medical insights in collaboration with clinicians. He has worked on more than 40 clinical and real-world evidence studies ranging from pre-clinical toxicology studies to Phase III randomized clinical trials in oncology, nephrology, cardiology, dermatology, infectious diseases, parasitology, anesthesiology, and medical imaging.


Why did you personally get interested in learning R? How do you use it in your work? What do you do when you’re not programming?

If I remember correctly, my first foray into R started while I was working on interventional radiological clinical trials at Harvard Medical School. The goal was to recreate the results of a paper on bivariate analysis – a visual assessment of the safety-efficacy profile of antithrombotic drugs. In order to perform the task, I had to learn how to recreate the example code that they had using R to make the figures for our publication.

Currently, I work as a biostatistician for a medical device company called IDEXX Laboratories, and use R in every facet of my work. My job is divided into three areas. The first is everything related to clinical studies, like classic biostatistical work. I use R for statistical analysis using the tidyverse suite of packages and some specific packages developed for method comparisons like the mcr package. The second is big data real-world evidence studies using data collected from our reference laboratories and electronic medical records. This requires the use of some R packages that allow easier manipulation of large data sets like sparklyr and arrow. And lastly, the design and analysis of external validation studies for our machine learning algorithm projects. For this work, I lean heavily on some excellent groundwork on the assessment of clinical prediction models by Ewuot Steyerberg’s team and Frank Harrell’s wonderful rms package.

When I am not programming, I am usually found outdoors hiking, biking, and skiing around New England! Currently, I am trying to finish the last 14 mountains on the New Hampshire 48 4,000-footers. When I am not outdoors I usually am reading (a little bit of everything) and playing the piano.

What is the R community like in Maine? What was most surprising to you about the community?

We were initially the Greater Portland R User Group when we started our activities in 2018, but when COVID happened, it completely changed the way we worked. During the pandemic, we tried to keep things going, but we only had three meetings in 2020, nine meetings in 2021, and unfortunately none in 2022. We knew we needed to think of ways to restart ourselves this year, and the first action we took was to rename the Greater Portland R User Group to the MaineR User Group to be more statewide and to be able to include all the different universities, hospitals, and research institutes across the state of Maine!

Definitely, every step has been a process, and 2023 is showing a lot of promise as we currently have speakers lined up for the entire year, which is really exciting. We are slowly becoming more and more active in trying to find the rest of our users in Maine.

The thing that surprised me the most about the Maine R community, is that there are a lot of research institutes and hospitals that use R throughout the state of Maine and that there are a lot of people that seem motivated and excited to be a part of a statewide community.

Who comes to these meetups? What industries do you see more in Maine?

Specifically in our group, the medical and life science industries are the most dominant. I would say that around 90% of the people who join our meetups are in some part of the healthcare field. Maine also has a lot of research institutes and non-profit organizations focused on preservation and conservation of our marine and forest ecology. 

How has COVID affected your ability to connect with members? What techniques (Github, zoom, other) have you used to connect and collaborate with members? Can these techniques be used to make your group more inclusive to people that are unable to attend physical events in the future?

Actually, this is something we have been discussing recently. Since we decided to make the group statewide and since Maine is a large state we decided to hold our first hybrid event starting on March 30th. Everything will be conducted simultaneously in person and online, using the Zoom platform. This will be our first hybrid event and there are bound to be a few hiccups, but we are hoping that over time we can get things running smoothly. The group hopes to get feedback from those who will be attending online to make things run as efficiently as possible so that both in-person and online attendees feel like they walked away with a great experience.

What trends do you see in R language over the next year?

This is truly an exciting time to be an R user in the clinical study space. Attending the R/Medicine and R/Pharma conference over the last few years has given me the opportunity to observe that there has been a big push to develop open-source software for regulatory submissions such as the United States Department of Agriculture (USDA) and Food and Drug Administration (FDA). Packages such as the admiral, validatoR, and others part of the pharmaverse are changing the landscape for regulatory submissions. All in all, I am very excited to see what the future holds!

What is your favorite R event that you have attended? From a small meetup to a big conference!

Talking about one event, in particular, is difficult, but I would definitely choose the first conference I attended, which was held in 2018 and was named Nor’EastR Conference.

At that time, I only had three years of experience in R; I sat with three people I did not know but who were very welcoming to me in my first R conference. Little did I know at the time, the three people who worked at Revolution Analytics: JD Long, who is going to be one of the keynotes for the Posit 2023 conference: David Smith, R developer advocate for Microsoft; Kirk Metler, Chief Data Scientist at IBM. It was just amazing to be in a place where everyone was talking passionately about R.

When is your next event? What are your plans for the group for the coming year? Please give details!

As I previously mentioned, we are very excited about our first hybrid event to be held on March 30, titled “Data Mining Methods for Improving Health Outcomes”. The event will feature Dr. James Quinlan, Associate Professor of Mathematics and Data Science at the University of New England, who will provide a presentation on the use of R for frequent dataset mining, with an emphasis on the application of these techniques to improve health outcomes.


How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups around the world organize, share information and support each other. We have given grants over the past four years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute. We are now accepting applications!

R in Finance and Accounting Sector in Korea

By Blog

Woo June Jung, Founder of the R Korea Group (also on Facebook) recently talked to the R Consortium to discuss his efforts to promote the use of R in Korea. He stressed the importance of communities and also shared the group’s experience of hosting an annual R User Conference in Korea for six consecutive years. They R Korea Group stopped hosting the conference during the pandemic but are hopeful to start again this year. He also discussed his work on two accounting projects developed in R.

Woo June has played a vital role in building the R community in Korea and now hopes to start a R-Ladies chapter in Korea.


Please share about your background and your involvement in the R Community. What is your level of experience with the R language?

I don’t remember exactly, but I think it was around 2005 when I first encountered R, through my econometrics course in graduate school. With some experience in languages like C or Basic, I was fascinated by the amazing capabilities of R and began using it for research from that time onwards. Of course, I knew how to use SAS and SPSS, but I thought R was much better in many aspects.

Around 2010, the big data boom in Korea began and R started receiving attention as well. I think this was around the time when I started using R for analytical work. At first, I analyzed survey data, and later worked on web log analysis, text mining, sentiment analysis, anomaly detection, recommendation, etc., in various fields such as finance, commerce, communication, and manufacturing.

In addition to my two roles, as an analyst and researcher using R, I also founded and manage the Korean R user community (R Korea). The reason for this is that I hoped other people would learn about this amazing programming language. I had been using a Facebook group for building the community, but I am currently working on moving it to a web-based platform. In the early days of the community, I held free and open R seminars every other week for 2-3 years to expand our user base. With these seminars, the community grew rapidly and now has over 10,000 users. From 2014 to 2019, R Korea also held an annual conference called R User Conference in Korea (RUCK), which attracted at least 400 attendees each year, with some years exceeding 700 attendees. In 2020, we invited Hadley Wickham, but due to the COVID-19 situation, the conference was canceled, and we have not been able to hold it again since.

R User Conference in Korea (RUCK) 2015

Unfortunately, during this period, people’s interest in R declined as artificial intelligence rapidly rose in popularity. However, some people who had positive memories of the RUCK conference contacted me and expressed their desire to keep the tradition going (special thanks to Kim Jin-seop and Na Young-jun). I hope we can hold the RUCK again for 2023. Personally, I have made good friends through community activities. One of my closest friends is Jeon Hee-won, who developed the Korean morphological analysis package KoNLP. (Currently, the KoNLP package is not supported on CRAN, and I would like him to maintain the package again if his circumstances allow.)

The community provides a communication channel with people in related fields. Community participants can solve their own problems within the community and also help others. Sometimes, they can relieve daily fatigue with witty jokes. These activities can help develop capabilities as analysts. I think these are the reasons why people seek out communities.

I consistently receive a lot of help on the new trends in R through R-bloggers, and I am very interested in the activities of R-Ladies. I would like to launch R-Ladies Korea through a newly opened community website.

What industry are you currently in? How do you use R in your work?

Currently, I have two roles. One is a DX researcher in the finance industry, and the other is a researcher in the accounting field. In my job, I mainly deal with the valuation modeling of unlisted companies and startups, which fortunately is also one of the research topics in accounting. As an accounting researcher, I am interested in the DX field of accounting such as structured accounting information like XBRL and digital reporting. Since accounting information is provided to the market mostly in the mixed form of data types such as numbers, text, etc., text mining can also be used in accounting research. I use R for both work and research.

What trends do you currently see in R language and your industry? Any trends you see developing in the near future?

Currently, I am working or researching in various fields, but artificial intelligence is the trend. For this reason, the demand for R in Korea has significantly decreased, and it is in a dangerous state. This phenomenon is thought to be due to Korea’s sensitivity to trends and small market size. 

The scope of (statistical) data analysis is quite broad. However, data analysis is often simply classified into data analysis (descriptive statistics from a statistical perspective) and machine learning (there is also a machine learning field in statistics). Special thanks to the authors who wrote ISLR. Many people claim that using traditional but most used models such as regression is outdated, and machine learning is a new approach. This is wrong obviously.

Nevertheless, as the IT industry grows and the significant achievements of artificial intelligence and machine learning have become a major trend, interest in statistics significantly decreases. People seem to be more interested in creating IT services or products and believe that using artificial intelligence can create better services. In this environment, Python, a general-purpose language, has become the dominant language. This trend is likely to continue for some time to come. R, which is strong in (commonly called) statistics, is likely to face more difficult times in Korea. However, to confirm the analysis results at the same level as those output by R, Python requires much more code to be written. R just lacks deep learning packages.

Please share about a project you are currently working on or have worked on in the past using the R language.

Currently, I am not working on any projects like developing packages. Instead, I am focusing on research papers related to accounting. Previously, I worked on projects called WARD (Wrangling Accounting-Related Data) and AIA (Accounting Information Analysis) related to accounting, but they are currently private.

WARD is a project that structures and digitizes accounting-related information. Accounting information has diverse ranges and types. Accounting information does not simply mean financial statements, and it has been said that all publicly available data can be a subject of accounting research such as privacy, ESG, etc. 

For example, one of the important research topics in accounting is firm value relevance, which requires stock price data. If it is an evaluation of the value of an unlisted company, it may be necessary to crawl funding news. Forensic techniques are also used to investigate accounting fraud, and recently, AI is also used for continuous auditing. To conduct accounting research for these purposes, various types of data such as tabular data, text data, and numerical and string data stored in databases or web must be handled, and various analysis techniques such as visualization, analysis using numerical values, and text mining can be applied. 

WARD aimed to develop a data package for accounting that provides these various types of data. However, as the data size grew larger, it became difficult to handle accounting on GitHub and it continues to be used for only private purposes. AIA provided functions to analyze the data provided by the accounting package, but it is also being operated privately.

What resources or techniques did you use?

I am performing all the work for my research papers in the Posit (RStudio) environment. Recently, I have been conducting research on information security, internal controls, ESG, and the valuation of unlisted companies. Of course, tidyverse is always one of the first packages I load, and I even use Python in the Posit environment. I use several other packages as well, but there are too many to list in detail. 


How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups around the world organize, share information and support each other. We have given grants over the past four years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute. We are now accepting applications!

Announcing R/Medicine 2023!

By Announcement, Blog, Events

Join us at R/Medicine 2023! The 6th annual conference will be fully virtual from June 5 through 9 and feature two days of workshops followed by a day of demos, a Hackathon, and a poster session. The last two days will be filled with speaking sessions, presentations, and lightning talks. This year’s keynotes include Jeff Leek, Vice President and Chief Data Officer, at Fred Hutch Cancer Center, and Neale Batra, President of Applied Epi

The R/Medicine conference provides a forum for sharing R based tools and approaches used to analyze and gain insights from health data. Conference workshops provide a way to learn and develop your R skills. Midweek demos allow you to try out new R packages and tools, and our hackathon provides an opportunity to learn how to develop new R tools. The conference talks share new packages, and successes in analyzing health, laboratory, and clinical data with R and Shiny with a vigorous ongoing discussion with speakers (with pre-recorded talks) in the chat.

Check out some highlights from the 2022 conference on our YouTube channel!

Here’s a glimpse of the 2023 R/Medicine workshops:

  • Using REDCap and R to rapidly produce biomedical publications
  • R/Medicine 101: Introduction to Clinical Data Analysis with R


🐦 Early Bird Registration is now open until May 5th so sign up for the conference now! We are accepting proposals for 30 minute talks, 30 minute panel discussions, and 10 minute lightning talks. 

📣 Interested in sponsoring R/Medicine? Please take a look at our sponsorship brochure.

Kaizen Project for R Package Documentation

By Announcement, Blog

Anyone familiar with R will know that in addition to being a superb language for statistical computing, it comprises an ecosystem and community of extraordinary depth and commitment. Because of the tradition of providing multiple levels of documentation within contributed R packages, CRAN and Bioconductor have become great repositories of statistical knowledge. However, as with any human enterprise that is meant to persist over time, there is a need not only for ongoing maintenance but also for continuous improvement. It is in this spirit that the R Consortium’s Infrastructure Steering Committee (ISC) would like to announce the Kaizen project for R package documentation. 

Beginning with the present open call for proposals, the ISC will award grants for projects to improve the documentation of “essential” R or Bioconductor packages. By essential, we mean packages that help to form the backbone of R’s capabilities in some area of statistical or computational analysis and are important to an identifiable segment of the R Community. It is likely that a significant proportion of the packages in CRAN Task Views and on Bioconductor will meet these criteria. 

Documentation projects might include providing a missing vignette, updating the examples in the package HTML and PDF help, or writing a tutorial (not necessarily in English) to be published in a publicly accessible web page.

To apply, please follow the procedure on the Call For Proposals webpage. In your proposal, please include statements affirming that you have already contacted the package maintainers, that they would support your efforts and will incorporate your contributions where applicable, and that you will publish your work under a license that is agreeable to the package authors.

Note: Kaizen 改善 is a Japanese word that describes the concept of continuous improvement. Look here and here for discussions of kaizen philosophy and history.