Peter Solymos of the Edmonton R User Group (Yegrug) recently spoke to the R Consortium about the growing acceptance of R in the industry in Edmonton. He also discussed his professional journey with R and the challenges of organizing an R User Group in the post-pandemic era. Peter actively contributes to the R community through his work as an open source package developer. He also maintains a blog about hosting data apps, aiming to provide all the required information in one place.
Peter is an ecologist and Data Scientist with a PhD in Biological Sciences from the University of Debrecen, Hungary. He currently works as a Senior Data Scientist at E Source.
Please share your background and your involvement in the RUGS group or the R Community.
My background is in ecology, and I used to study animals in the field. Then, I became interested in multivariate statistics and started using R. At that time, I was teaching at the University of Veterinary Medicine in Budapest, Hungary. All the students were learning R as a part of their stats course. I felt a bit behind because I was still working with Excel. So, I had to upskill and learn R, which was beneficial as it helped me secure a job in Canada.
I’m also a package developer, and I have published many statistically focused packages. I have also developed some non-statistical packages, like how to unify progress bars in R, which is my most downloaded package now. I love developing packages as an open source developer and focus on interesting questions. In my day job, I am a senior data scientist working with utility companies to assess and mitigate risk. On the side, I am also involved in the R community and doing a bit of consulting here and there. I also like developing Shiny apps.
I got involved with R because I needed it for my personal development. In academia, I haven’t met anyone who is not using R or is not at least familiar with it. We started the R user group at the University of Alberta in 2012. It took a hiatus for a couple of years, and we restarted it in 2021 after the pandemic. Last year, we hosted seven meetups, and this year we just started. You can find more about the Edmonton R User group events and links to past recordings on our GitHub.
I think after the pandemic this synchronous way of meeting – especially if it is online first – is becoming a challenge. The reason I am saying this is that sometimes very few people show up at the physical events and it’s not because they are not interested, but they think of it like a YouTube video that they can watch later on in their own time. This is an interesting change and I am not saying it is bad. I think this is expected and I would do the same in their position. But as an organizer, it puts you under pressure to make the events more interesting and to be able attract more attendees. This might just be the new way of life post-pandemic that we have to accept. Maybe we should talk about interesting topics with those who show up for the events and then the rest can watch it later.
Can you share what the R community is like in Edmonton?
In Edmonton, the academic community is utilizing R in various fields. Ecologists are using it for their conservation-related or resource management questions. Alberta is a big province with a comparatively small population. It’s mostly remote and covered with forests. And in that forest, there is a lot of resource development going on, for example, oil and gas, forestry, mining, etc. It is really important to understand the effects, and there is a lot of spatial data analysis going on and R is well-suited for that. So this is one area that I know particularly well.
There has been a growing acceptance of R in the industry. In the past, companies only wanted their employees to work with Python. Right now, companies don’t mind which language you used to get your job done and R is a part of these languages. This change is fortunate for R developers and data scientists in how the industry approaches their data science stack. In government and health sciences, R is being heavily used and a lot more prevalent than Python. SAS and other tools were more common in the past in these areas and right now R is the dominant tool.
Please share about a project you are currently working on or have worked on in the past using the R language. Goal/reason, result, anything interesting, especially related to the industry you work in?
A few years after Shiny was introduced, I started playing around with it and got into app development, mostly for teaching. In the beginning, we had some workshops, and it was a cool way to demo things without having to write code. Later, I started using it for consulting and I started looking for ways to host apps so that they are not visible to everyone as I built them for a client.
There are various ways to host Shiny, like Shiny apps. It is nice to begin with, but people tend to outgrow it at some point if they have particular requirements. In 2017, I started using ShinyProxy which was almost brand new. ShinyProxy is used to deploy Shiny and other apps in a dockerized environment. It lets you authenticate and authorize users to allow them to see the information you want them to see. As time went by, ShinyProxy has developed and I have gotten better at working with it.
I believed this was common knowledge because I could find the information and learn it myself. But at some point, I realized that people might struggle with the setup because all the information is pretty scattered on the Internet. Usually, you find tutorials that have a narrow focus. So I started writing blog posts about Shiny hosting and I think now it is past 60 posts. I titled the blog Hosting Data Apps. I started following a table of contents that I had in mind at that point. What is Shiny? What are the various ways of hosting it, and how can you learn how to use Docker or ShinyProxy? How to set up a custom domain or HTTPS, and authentication, or how to scale your apps. People have been interested, and we have received very positive feedback, which has led to a book deal with CRC Press for me and my co-author Kalvin Eng.
This coming year, we are going to focus on writing this book titled “Hosting Shiny Applications for R and Python.” We are going to cover questions like how do you pick the best hosting option? Once you pick the best option for your use case, how can you implement it, and what are the considerations you need to think about?
I find it interesting that it is so easy to learn Shiny these days. I know people who have never touched it but are good R programmers and could learn it in a few days. But once you want to share your app, eventually you start thinking about what else you can do. It’s hard to find everything in one place. This is what motivated me to start writing the blog with a book in mind. And right now we are getting closer to the end goal.
What trends do you currently see in R language?
I see a huge interest in cross-language reproducibility like Quarto and R Markdown. There is an interest in how to build beautiful things for the web as products, books, and websites and how to make everything reproducible. Even though Quarto’s ecosystem is not as well developed as knitr and R Markdown, people are using it for everything. They don’t care about the lack of tools, and that’s how the ecosystem is developing. If something is missing, they just figure it out and now it exists, and everyone can use it. I think this drive that if I can’t find something, I will build it, is what’s taking the R community forward. This approach is what makes the community so valuable and welcoming.
How do I Join?
R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups worldwide organize, share information and support each other. We have given grants over the past four years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute.