R Consortium, Author at R Consortium

Dec 01

R in Action: Madrid R Users Group on Trends and Insights Navigating R’s Evolution and Future Trends in Madrid’s Dynamic Industry

By R Consortium Blog

Madrid R Users Group (RUG) is a community of industry professionals from a diverse range of backgrounds that provides a learning and networking platform. The organizing committee of the Madrid RUG recently engaged in a conversation with the R Consortium regarding the use of R in their respective industries. The organizers, Carlos Ortega, Pedro Concejero, Francisco Rodriguez, and José Luis Cañadas shared their valuable insights about the industry’s evolving landscape of R applications.

Please share about your background and involvement with the RUGS group.

Francisco: I began learning the R programming language in 1999 when I started my Ph.D. program. R was primarily used for command-line programming, but I did not use it seriously until 2012, 16 years later. Since then, I have used R in various enterprise settings, including insurance companies and banks. I have used R to process data and build models, and I continue to use it in my current role in the financial sector.

Carlos: I began my career at AT&T Microelectronics, a semiconductor plant in Madrid. We collaborated with Bell Labs, who invented R and used it heavily in the manufacturing plant. We collaborated with many colleagues, including John Chambers and his team. Some of their team members in the statistical group at Bell Labs laboratories were in our plant, sharing their code. When they started using R, we also started using it. However, the factory has since closed.

I transitioned to the consulting industry, working independently without formal affiliation to a particular sector. I analyzed data in various industries, including finance, manufacturing, and telecommunications. I worked in consulting for several years, and later with Francisco at a Spanish telecommunications company called Orange, and previously in the banking sector. We applied various models, analyses, and algorithms using distributed computing on servers and clusters.

I am currently employed in the service sector for a multinational human resources company. We place thousands of people into employment. Since the 1990s, I have used R to model, analyze, and clean data for many years.

Jose Luis: I am a statistician employed by Telco in Orange, Spain. I have been working in Orange for approximately five years. My primary job is to work with a Spark cluster using “sparklyr”, and the H2O library “sparkling.” I am a big fan of “sparklyr,” tidyverse, and dplyr. dplyr and sparklyr are the most useful tools for my work.

Pedro: My academic background is in psychometrics, and I have used nearly all the major statistics software, including SAS and BMDP. I came to R when I started working in data science for my previous company, Telefonica. In 2011, I began learning R on my own, and I enjoyed it immensely. I then met some people I am working with now when I started teaching psychometrics with R at the university. They thought I was crazy, but I enjoyed it, and it was a success. I then worked on many projects at Telefonica, including social network analysis with iGraph. I particularly enjoyed making prototypes with Shiny.When Shiny first appeared, it was a marvel, especially for those of us in data science who wanted to create systems, web pages, and prototypes. It was quite successful. I left Telefónica four years ago, and I miss it sometimes. Now I teach text mining and artificial intelligence at the university. I use Python for artificial intelligence, as I find it a bit easier. But I teach the rest of the machine learning and text mining with R.

Can you share what the R community is like in Madrid?

Carlos: As I mentioned before, we meet monthly. We work in different sectors: Francisco in insurance and finance, Jose Luis in telecommunications, me in services, and Pedro in education. Pedro previously worked in telecommunications as well. Most of the people we invite to our sessions are from industry. We invite a few people from academia from time to time, but most of our activities and recent developments are from industry. Therefore, industry is currently the focus of our group.

I joined this group because of its diversity, especially in the applied world. Many people are working on different projects, and their diverse ideas can be inspiring. In this sense, I believe that academics sometimes become too engrossed in their circles. The real-world applications of R are found by meeting people with diverse backgrounds.

Pedro: All of our meetings are recorded and uploaded on our website since 2010 or 2011. The credit for this goes entirely to Carlos who has consistently maintained this website.

Why do industry professionals come to your user group? What is the benefit for attending?

Carlos: The most important part of the meeting is the social events that take place afterward. The meeting itself typically lasts 40-45 minutes, during which we present new things, such as applying the Spark package or H2O library. Recently, Francisco presented how to apply a scorecard method for risk models in banking. Pedro presented how to use Shiny or Shiny mixed with GLMs or other types of models. Many people attend these meetings to see how effective R is in the real world. After the meetings, we discuss the state of the industry, such as which companies are betting on R and where the good projects are. This is a great way to socialize and discuss our issues.

Pedro: Now that Carlos has introduced the topic, I believe that the debate between Python and R is pointless. In my experience, both languages can be used seamlessly, and there is no difficulty in switching between them. My experience at Telefonica has shown that the choice of language can depend on the background of the project team, but ultimately, the results are the same. However, I must mention that the documentation available for R is excellent. I believe that R has a significant advantage over any other framework in the industry for statistical modeling.

The vignettes for packages are particularly helpful, as they provide detailed information on how to use the packages. Additionally, I believe that it is much easier and faster to start doing data science in R than in Python. Python is chaotic, with different versions of the language and packages being released frequently. R is more homogenous, with backward compatibility being a priority for the R Foundation. This makes it much easier to maintain a consistent environment in R.

While Python may be easier to use for some tasks, I believe that R is the better choice for professional data science work. It is more stable, has better documentation, and is more widely used in the industry.

Shiny in Python was released this year. However, when I teach about Shiny in Python, I tell my students to read the documentation in R. This is because the two languages are very similar, with only minor differences. For example, ggplot and Shiny are both available in both languages. As a result, students can simply copy and paste code from the R documentation and use it in Python. This will allow them to quickly and easily create powerful applications.

Francisco: The process of creating training, connecting to a database, extracting data, viewing the data loosely, creating a button, and distributing all the applications to all the people in your enterprise is relatively straightforward. However, the security of the data must be taken into account. If the data is compliant with GDPR and can be shared, then it can be quickly made into an application that can be viewed on a mobile phone. This can be done in Python, but it is more difficult. I prefer to use R for training.

Jose Luis: The use of plumber or other API packages has made it easier than ever to deploy R models in production. Kubernetes and Docker have made this possible.

Carlos: The approach to production is changing significantly with PosIt (former RStudio). Rather than having packages isolated, the focus is shifting to how to produce models in the enterprise. This means bringing R and Python closer to a real environment. These packages are making things much easier.

Pedro: For example, when I teach machine learning, I often use Python. However, when I get to mixed models, which I know José Luis is interested in, I have to recommend that students install an R library. This is because no Python library for mixed models is as comprehensive or well-developed as the ones available in R. At least two years ago, this was the case. I believe that R is currently years ahead of Python environments for statistical analysis.

What trends do you currently see in R language and your industry? Any trends you see developing in the near future?

Francisco:In my sector, banking, I remember that 15 years ago, SAS totally dominated the modeling part. Before 2010, it was the best option. It was not only a language but also a solution that allowed you to create a core model easily without coding. However, it was very expensive. Nowadays, I have noticed that R methods have been appearing in the industry since around 2015 or 2016. For example, in 2017, I was a consultant and taught a course to the regulator here in Spain. The Central European Bank wanted to use R to inspect bank entities, which was surprising to me, as I had not expected another country to use software other than SAS. Currently, we are testing R using a library created in 2020 called “scorecard,” which I believe is a powerful library.

This library, with only a few lines of code, enables building a complex model step-by-step and putting it into production easily. In my current job, I can use SAS or R. Here, given the choice, I use the easier and faster one, which is R. You can access data quickly with R, without any issues with the network. You use your computer’s memory, which gives you a lot of freedom to use the data you want.

I believe that the flexibility of this library in particular is causing many data scientists to transition to open-source. It includes people who use Python since Python has copied this library, “scorecard.” Formerly, one had to pay to use it. The quality of the model is comparable, if not better. What you are using here is a logistic regression.

Carlos: In the service sector, the use of dashboards has been on the rise. Power BI, in particular, has seen a significant increase in popularity, surpassing Tableau. QlikView has all but disappeared from the market. Now, the two leading dashboarding platforms are Power BI and Tableau. However, I believe that the industry is poised for a major change. R is a powerful tool, but it requires a machine with the R engine installed. Imagine being able to use R in a serverless mode, like in WebR, even through a web browser. This would revolutionize the industry by eliminating the need to pay for licenses. Dashboards could be published and shared with colleagues for free.

In essence, the way we expand will change significantly, as we will use R or Python to make it very easy to create and distribute dashboards at no cost. I believe this change, as well as the things that are working now at Posit, could have a major impact on our industry but especially in the service sector, where we use a lot of different dashboards.

Jose Luis: I agree with Carlos, but I must point out that QlikView is still in operation. As you know, I worked with you at Orange, and I can assure you that Qlik is alive and well. Recently, I have been using Quarto to create reports for my customers and business owners. I have found Quarto to be an effective tool for communicating my results. I have used Quarto to create slides, reports, and interactive documents. I am very pleased with Quarto’s ability to help me share my analysis. I may also use “conflr” library to create analyses in R markdown and publish them directly to Confluence. This would allow me to create an analysis and immediately publish the documentation, which would be a great time-saver.

Pedro:You inquired about what we miss or would like to see in the R environment. I am not familiar with the web services provided by Posit or RStudio. However, I would like to see something similar to Google Colab. Google Colab for universities is a marvel. You have access to really powerful machines and a lot of RAM for free. It is a freemium service, but you can still use it. I am not aware of an equivalent in the R sector.

Carlos: Posit Cloud’s free tier provides a very small account with only one gigabyte of storage. This is insufficient for many users, especially those who need to use GPUs for machine learning or data science. Posit is not as powerful as the Google Colab Platform. It would be beneficial for Posit to offer a GPU-enabled tier for users who need more powerful hardware.

Carlos: The use of GPUs in deep learning is a topic that is open to debate. Python dominates the field, and has a wide range of packages and libraries available for deep learning, such as PyTorch. R has an older library for deep learning, but it is not as widely used as Python. Many of the latest developments in deep learning, such as generative AI, are primarily done in Python. Therefore, for R to be used more widely for deep learning, it needs to have faster and more compatible libraries, as well as a more user-friendly interface.

Pedro: The development of R is progressing at a rapid pace, perhaps too quickly for the requirements of R libraries. This is because the libraries are constantly changing, with major changes occurring every six months. As a result, users must keep a close eye on the third number in the library versions to ensure that they are using the latest and greatest version. This can be a bit messy, but it is also exciting to see the language evolving so quickly.

Jose Luis: I used iGraph and visNetwork to conduct social network analysis on mobile phones in order to communicate my findings. I also used Spark to perform social network analysis (library Graphframes), but I presented the results to my customers using R. It was a large project.

Any upcoming projects that you might want to discuss?

Jose Luis: I have an upcoming talk planned in which I will discuss the use of R in production. You have seen API, Docker, and other technologies. Perhaps sharing Azure in AWS and Google Cloud. I believe I attempted to teach too much. However, it may be beneficial to teach others how to use R in production. I would start with simple automation, such as using the cronR R Package and taskscheduleR in Windows. The next step would be to use an API. Then Docker. Then Kubernetes. Finally, publish and deploy on a cloud platform.

Carlos:In my case, I calculate projections for different types of time series on a monthly basis. This can involve up to 3,000 to 4,000 different time series, which are calculated automatically and in a distributed manner. These projections are already in production, and I use them to track different KPIs for different economic sectors in Spain. This is of great importance to my industry, as it allows us to identify which sectors are growing and which are declining so that we can invest in the right areas. This is a live system that is of great value to us.

Francisco: In my case, I am working on the first level of the problem that Jose Luis mentioned. My objective is to predict which customers are likely to default on their payments. After a customer makes a purchase online, they have to pay the first installment. My goal is to predict the probability that the second installment will not be paid.

I have developed a fully automated process that collects all the data from a SAS database. The process extracts the data, loads it into a model, applies the model, and then prepares the data for another application. The other application uses a model to prioritize customers who are likely to pay, and then selects a subset of customers to send SMS or email reminders to.

This process is fully automated and runs every day between 10am and 6pm. It takes only five minutes to complete.

Pedro: In my case, I will be presenting a paper on psychometrics at the Barcelona Congress. Psychometrics is a niche area of statistics that focuses on psychological measurement. R is the only software that offers advanced psychometrics models. I will be using one of the oldest but still classical models. I have prepared a table of the current status of psychometrics models in R, which I may include in my presentation. This is because R is the only option for psychometricians nowadays. The big suites do not offer this software. This is another example of why R is the only way to go for these niche areas.

How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups organize, share information, and support each other worldwide. We have given grants over the past four years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute.

Learn more

Nov 30

Love0

deposits R Package Delivers a Common Workflow for R Users

By R Consortium Blog

Mark Padgham, a Software Research Scientist for rOpenSci, has decades of experience in R, C, and C++, and maintains many packages on CRAN. Mark is leading the development of the deposits R package. Mark has been supported throughout this project by rOpenSci staff.

Publicly depositing datasets associated with published research is becoming more common, partly due to journals increasingly requiring data sharing and partly through more general and ongoing cultural changes with data sharing. Yet data sharing is often seen as time-consuming, particularly to meet the expectations of individual data repositories. While documentation and training can help familiarize users with data-sharing processes, browser-based data, and metadata submission workflows can only be so fast, are not easily reproduced, and do not facilitate regular or automated data and metadata updates. Better programmatic tools can transform data sharing from a mountainous climb into a pit of success.

deposits is a universal client for depositing and accessing research data in online deposition services. It provides a unified interface to many different research data repositories, which functions like dplyr, through “verbs” that work identically across many backend data repositories.

Currently supported services are Zenodo and Figshare. These two systems have fundamentally different APIs and access to deposition services has traditionally been enabled through individual software clients. The deposits package aims to be a universal client offering access to a variety of deposition services, without users having to know any specific details of the APIs for each service.

The deposits package works seamlessly with the “frictionless” data workflow, to enable unified documentation of all aspects of datasets in one place.

Outside of his work at rOpenSci, Mark has a passion for urban environments and understanding how cities can be improved. He is the lead developer of the Urban Analyst platform, ‘a platform for open-source, open-access interactive visualizations of urban structure and function.’ Mark says, “Cities cannot learn; therefore, I built a data platform for cities to learn from one another.”

RC: Doesn’t data sharing take too much time and too much effort? What is Deposits and what does it do? What problem are you solving?

Data sharing takes time and effort; everyone sharing from different places makes it hard to sync up. However, the deposits R package creates a common workflow for R users. It aims to streamline the data-sharing process.

It addresses the issue of disparate data-sharing locations by creating a standardized workflow, simplifying the process for researchers. All deposits are initiated on the nominated services as “private” deposits, meaning they can only be viewed by the deposit owner until the owner decides. A deposit can only be publicly viewed once it has been published. The process of using deposits to prepare one or more datasets for publication involves multiple stages of editing and updating.

RC: How far along are you on the project? Currently supported services are zenodo and figshare. Will you be adding more?

Currently, the project provides support for Zenodo (CERN’s Data Centre) and Figshare (Open Data) as the initial services. There are plans to expand and include more repositories in the future.

The team is working on integrating additional services, and there is a possibility of securing further funding for the Harvard Dataverse system, which operates as a federated rather than a centralized system. Integrating the DataVerse system presents additional complexities due to its federated nature, requiring more intricate API handling with greater flexibility but potentially posing challenges in adopting the workflow.

RC: Have users contributed their own plugins to extend functionality to other repositories?

deposits is implementing a modular/plugin system to enable users to contribute their own plugins to extend functionality to other repositories. Users will be able to authenticate, prepare data and metadata, and finally submit, fetch, and browse data.

Actual activity around plugins has been a little slow so far. We are writing a JSON Schema for the system which will improve the process. We will be seeking people to build plugins after more adaptations are done and documented. Actually, I would not recommend regular users try to extend deposits to other repositories yet. But that is coming soon!

Preparing the data is a barrier, mainly during preparation. deposits is made to run workflows, documenting all the columns in a data table. By using deposits, there will be a completely painless, single update function.

RC: What was your experience working with the R Consortium? Would you recommend applying for a grant to others?

Yes! The application process was painless and straightforward. In fact, I got a second grant recently. I’m very thankful for the support.

The only off-putting part of the process was no guidance on how much to ask for. You are fully enabled to submit it on your own. This is good, and I appreciate the outcome, getting financial support. But giving applicants better overall guidance would be very helpful. The R Consortium should make the application process more inclusive with more consultation.

Nov 29

Love0

2023 : A Year of Progress for PHUSE CAMIS Working Group.

By R Consortium Blog

This blog post contributed by PHUSE CAMIS

As we draw towards the end of 2023, the PHUSE CAMIS Working Group reflect on their key progress and successes this year.

The CAMIS repository went live in January 2023, drawing on the content from the PHUSE CSRMLW project. This searchable repository compares analysis method implementations in software (CAMIS) such as SAS, R and python.

The White Paper was published in June, which highlighted the importance of clearly specifying your analysis, such that it can be replicated in different software, and isn’t relying on default options which can be different.

For more complex analyses, it can still be hard to understand what defaults and algorithms your software is using, so the team focused 2023 on expanding our repo content, comparing SAS vs R methods. By August, we had covered the following topics in the repo: quartiles, rounding, anova, mmrm, cmh, log-rank, cox-ph, mcnemar’s test, kruskal-wallis test and logistic. October saw the launch of the sub-working group: CAMIS-Oncology, led by Somasekhar Sriadibhatla (AstraZeneca). This team will focus specifically on oncology endpoints and analyzing them in SAS, R and Python. The CAMIS team have expanded in membership during 2023 presenting at numerous conferences around the world. In November, we welcomed Harshal Khanolkar (NovoNordisk), to join the leadership team alongside Christina Fillmore (GSK) and Lyn Taylor (PAREXEL). Our focus for 2024, will be on the creation of additional content for the repo, and sharing awareness of the project across the medical research and wider community. We’d like to take this opportunity to thank all of our team members and contributors, and encourage everyone to check out the repository and help us to grow our content CAMIS (psiaims.github.io). If you would like to join the team please get in touch through the repo.

Nov 28

Love0

Igniting Innovation: Bilikisu Wunmi Olatunji’s Journey with Abuja’s Thriving R User Community

By R Consortium Blog

The R Consortium recently caught up with Bilikisu Wunmi Olatunji, founder of the Abuja R User Group and R-Ladies Abuja. Bilikisu shared they began conducting in-person meetups last month by adding new types of gatherings, such as focus groups, quarterly workshops, and more. The October event was a success, as the group now has members from diverse fields, including a lawyer. Currently, the group is planning more exciting events for the upcoming year.

We interviewed you in September 2021 and published Creating Successful R User Groups in Abuja, Nigeria. It’s nice to connect with you again! What’s new?

It’s great to reconnect with you too! Since our last interview in September 2021, our local meetup group has seen some exciting developments. We’ve continued to grow our community with members from within and outside our local community, and we have had both local and international speakers who have supported us by honoring our invitation to speak during our meetups, which led to more engaging and beneficial experiences for our members.

Last month, we had our first in-person meetups since the COVID-19 lockdown. This has led to some significant changes to the expansion of our event offerings. We’ve added new types of gatherings, such as focused groups (Data science and analytics with R, Epidemiology, Econometrics), quarterly workshops, and hybrid meetups, to cater to our community’s broader range of interests. These additions have received a very positive response, and we’re looking forward to increased participation and impact for our members.

To maintain momentum, we’ve utilized WhatsApp groups and established focused subgroups for Data Science and Analytics, Physiology, and Economics, allowing continuous engagement and specialized discussions.

We’re piloting an on-site group starting today and plan to expand access in January. The next six weeks, leading up to Christmas, are a trial period for this new approach. Beginning in January, we aim to scale this prototype, enhancing our support for the community.

We also appreciate the grant we got from R Consortium this year to support our group. Thank you, R Consortium.

In summary, we’re moving towards study groups, quarterly hybrid meetups, and workshops that will continue to serve our local community effectively. We’re excited about what’s ahead and look forward to the positive changes these plans will bring.

We want to get to know you more on the personal side. Can you please tell me about yourself? For example, hobbies/interests or anything you want to share about yourself.

Sure, I’d be happy to share a bit about myself. I am Bilikisu Wunmi Olatunji. I’m happily married and blessed with wonderful children, both sons and daughters. Although my work often keeps me very busy, in my downtime, I have a passion for the arts — I particularly enjoy drawing. Another one of my pleasures is savoring delicious food. I’m not much of a cook, so I relish the opportunity to dine out with my family and explore different cuisines.

When relaxing at home, I love spending quality time with my kids, often watching cartoons or movies. These moments are special to me as they allow us to laugh and enjoy each other’s company.

These are aspects of my life that I usually keep private. Professionally, I’m a data scientist and the founder of Abuja R User Group and R-Ladies Abuja. I work closely with other co-organizers to help develop and expand our community.

You held a Meetup on Abuja R User Group Meet & Greet on Oct 28th, 2023. Can you share how the event went? What kind of topics were covered? Why those topics?

Yes, we had our first in-person meetup since the COVID-19 lockdown. The primary goal of the meetup is to meet each other and know what the members want from the group moving forward. We had new members who attended the meetup, and most of our old members also called in to show their support. We discussed the challenges the group and the members face, e.g., attending in-person meetups. I found the event to be a positive experience that will motivate me and others to help improve the groups’ meet-up attendance. A fun suggestion from our members was to involve dancing and more enjoyment in our meetups.

Who was the target audience for attending this event?

Our focus is on academics, particularly those in statistics and computer science, which aligns well with our area’s abundance of educational institutions. However, integrating our work with schools has been challenging due to bureaucratic hurdles, making it time-consuming and stressful.

Despite these obstacles, we’ve made some progress. We managed to have academic participation in our events—for instance, Muhammed Tahir Muhammed, a statistics lecturer from our network. We also had a representative from the field of epidemiology named Isaac Joseph, the department director.

We’re excited about the potential for collaboration with these academics and professionals. At a recent event, we were pleased to see participation from three professionals from different fields, a new record for us. Among them was a lawyer, Michael Ezeh, who has experience with data visualization tools like Power BI and Tableau. He’s keen to expand his programming knowledge, starting with R, which is encouraging. We also had the following new members: Obaniyi Fidelis (Economics), Mafiana Ifechi (Data Analyst), and Benedicta Amarachukwu (New to R Data Analyst).

We’re running small focus groups as a pilot, which we hope to launch by January. The diverse backgrounds of our attendees are promising, and they will likely attract even more participants to our community.

Any techniques you recommend using for planning for or during the event? (Github, zoom, other) Can these techniques be used to make your group more inclusive to people unable to attend physical events in the future?

I recommend Zoom meetings and Github. They are both easy to help regarding collaboration and resource sharing.

When is your next event? Please give details!

We will be starting our focus groups in January. We are currently using the remaining days of this year to plan for 2024.

Please share any additional details you would like included in the blog.

We’re excited and hopeful that we will be able to impact the community with much more coming next year. We’ll be able to reach out to more people. As shown in our pictures, we have a fun time. I can also say that our numbers on the Meetup group are growing, which makes me happy.

How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups worldwide organize, share information, and support each other. We have given grants over the past four years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute.

Learn more

Nov 27

Love0

R Validation Hub: Wrapping up 2023 and Welcoming 2024!

By R Consortium Blog, Events

Abstract:

Join us on November 28th for our last Community meeting of 2023. The R Validation Hub’s Executive Committee will summarize this year’s achievements, our presence at conferences and other key moments to celebrate.

Agenda:

Looking Back on 2023 (20 minutes): Review the year’s highlights, including conferences and working group summaries.
Focus of the Reg R Repo Working Group (20 minutes): Insights into the group’s initiatives and achievements.
Open Discussion (20 minutes): An interactive session for feedback and community engagement.

Conclusion:

Don’t miss this opportunity to reflect on 2023 and contribute to shaping our direction for 2024. Join us for a session filled with insights, networking, and community spirit.

Join the community call! (Microsoft Teams meeting)

Nov 21

Love0

From Local Roots to Global Reach: The Collaborative Expansion of R-Ladies Gaborone

By R Consortium Blog

The R Consortium recently spoke to Simisani Ndaba, Founder and Organiser of R-Ladies Gaborone. Since the last time we spoke to R-Ladies Gaborone, Simisani has expanded the group to other parts of the world using platforms like WhatsApp and Fosstodon. Moreover, R-Ladies Gaborone has collaborated with chapters from Brisbane, Nairobi, Rabat, Cologne, and Damman.

Last year, we published Learning the Fundamentals of R, Workshop with R-Ladies Gaborone and Botswana R User Group. What’s new with the R-Ladies Gaborone Chapter?

Since last year’s event, we started an R-Ladies Whatsapp group from the feedback we got from the workshop. The participants talked about how a lot more people would have known about our functions and could get in touch with us directly if we had a WhatsApp group. Currently, we have members from around the world in this group, and it’s been a successful way to keep everyone informed. The chapter also has a Fosstodon account because we can reach people in the open science community. This initiative has played a significant role in growing our chapter both in Botswana and regionally in Namibia and Zimbabwe.

The chapter also has a new website which has all our material from our past events, our activities, and R related work around Botswana. The website is part of our focus on raising awareness of how R is used in Botswana and how people can get regional online content. The website is online and is updated regularly.

Our founder and co-organizer, Simisani Ndaba, was chosen as an opportunity scholar and represented the chapter at the 2023 posit conference in Chicago. She got to meet all the other R-Ladies chapters we had met online, collaborated with, and had communicated on the R community Slack channels.

Posit Conference 2023 Opportunity Scholars

Please share your background and involvement with the R-Ladies Gaborone group.

I have a Masters in Computer Information Systems and a Bachelors in Business Information Systems. I also have a Post Graduate Diploma in Teaching in Computer Science. I have been working as a Teaching Assistant since 2016 at the University of Botswana and have been in research for the same amount of time. My interests are in Machine Learning and Data Science. I recently wrote a paper on A Review of the use of R Programming for Data Science Research in Botswana to raise awareness of the use of R in Botswana academia. I believe it is an interesting read for people who are interested in Rs’ use in the rest of the world and possibly form potential collaborations. I became a Carpentries Instructor in June 2022, which allowed me to use and reference the Carpentries material for the workshop last year.

Shifting gears to more personal endeavors, I founded R-Ladies Gaborone in September 2021. At that time, I was between jobs and seeking a productive way to spend my time. My exposure to R came in 2019 at the CoData RDA summer school in Italy. Leveraging that experience, I began hosting events and collaborating with the Botswana R user group. I have organized events with chapters from Brisbane, Nairobi, Rabat, Cologne, and Damman.

Can you share what the R community is like in Gaborone, Botswana?

There are only two R community organizations, Botswana R User Group and R-Ladies Gaborone, so the R community is not large. However, there is a passionate interest among the R enthusiasts. I believe the enthusiasm comes from knowing about Rs’ use in Data Science, a growing skill in the job market. They want to improve their data analysis skills and appreciate the local R communities that provide beginner-friendly teaching material, project work, and R-related webinars from around the world.

You had a Meetup on How ChatGPT may affect the future of Data Science on November 4, 2023. Can you share more on the topic covered? Why this topic?

ChatGPT is a Generative AI app that generates new data from user prompts. Generative AI is ubiquitous with ChatGPT, DALL-E, Perplexity, Bard, and others. All these apps use Large Language models which involve training Deep Learning models, particularly Neural Networks, to generate similar new data used to train the models. There have been many workshops and talks on how Generative AI has been used to create stories, in student assignments and coding in different languages, but the impact in Data Science has not been talked about enough. Questions like, will there be any addition or removal from the Data Science workflow because of simplicity or redundancy?

Who is the target audience for attending this event?

Anyone interested, who works with and is enthusiastic about Generative AI. Students, researchers, the academic community, and anyone with access to Zoom (Laugh Out Loud). We encourage people from all walks of life to attend because such discussions affect them in their daily lives and are not just for people who are in STEM. The session recording can be found below.

How CHATGPT affect the future of Data Science

As a chapter of R-Ladies Global, we are encouraged to use Meetup.com to create and

announce events and register participants. There is also a section on Meetup where we can add the Zoom link for online events and site locations for in-person activities. The Zoom account is an R-Ladies account, which allows us to go up to 2 hours rather than the 40-minute (10 minutes free) link. I would highly recommend using Meetup.com because it also has a functionality where you can communicate with all the members through email and shows you other related activities happening in your area.

During the events, we record our sessions and upload them either to the R-Ladies Global YouTube channel or our own R-Ladies Gaborone Youtube Channel. Thanks to R-Ladies Global, they make it possible to use these resources, and unfortunately, communities that do not have support may have to budget for their activities. Since our chapter started during COVID-19, we began and continued having events online. We can reach many people worldwide through online events and gain incredible insight from different perspectives because of their project experiences. Our future plans are to have in-person meetups to engage with our local community.

Please share any additional details you would like included in the blog.

Simisani Ndaba at the 2023 posit conference, Starting from the far left, Riva Quiroga (R-Ladies Santiago & Valparaíso co-founder), Mouna Belaid (R-Ladies Global team member & R-Ladies Paris organizer), Bilikisu Wunmi Olatunji (R-Ladies Abuja founder and co-organiser), Simisani Ndaba (R-Ladies Gaborone) and far right, Yanina Bellini Saibene (R-Ladies Leadership Team & rOpenSci Community Manager)

How do I Join?

Learn more

Nov 20

Love0

Webinar: Discover the Future of R in Regulatory Submissions

By R Consortium Blog, Events

Are you an analyst, statistician, or data science enthusiast keen on understanding how open source software like R shapes the future of regulatory submissions? This webinar is for you!

“R and Shiny in Regulatory Submission”

When: Dec 11, 2023
Time: 3:00 p.m. – 4:30 p.m. EDT
Duration: 1.5 hours
Where: Zoom – Virtual Event
Hosts: FDA Statistical Association and R Consortium

Agenda Highlights:

Opening Remarks by Ning Leng, People and Product Lead in Product Development Data Sciences at Roche.
Presentation on Open Source Software for Regulatory Submissions by Paul Scheutte, FDA.
Dive into R Consortium R Submission Pilot 2 with Eric Nantz from Eli Lilly. Discover how an R-based submission with a shiny component unfolded.
Reviewing Experience of R-based Submissions with Hye Soo Cho, FDA. Understand the nuances and challenges faced during the review process.
Interactive Panel Discussion moderated by Ning Leng. Join Paul Schuette, Hye Soo Cho, and Eric Nantz as they delve deep into the R adoption journey and discuss practical challenges and solutions.

Why Should You Attend?

The data science world is rapidly evolving, and R is at the forefront of this transformation. With a robust open source community backing it, R brings many cutting-edge statistical tools—a standout feature. R shiny offers unparalleled flexibility and interactivity, revolutionizing how data scientists operate. Recently, the R consortium broke new ground by introducing a Shiny component in a submission package, a pivotal moment marking the fusion of open source capabilities with formal regulatory processes.

In the upcoming webinar, the FDA and industry speakers will share their unique experiences with R-based and shiny-based submissions. Whether you’re an industry professional or an aspiring data scientist, this is an opportunity to stay ahead of the curve.

Conclusion

Integrating open source software in regulatory processes represents a leap toward more transparent, efficient, and adaptable systems. Don’t miss this golden opportunity to learn, interact, and contribute to this transformative journey.

About the R consortium R Adoption Series

The R consortium R adoption series is a curated set of webinars focusing on the growing adoption of the R programming language in the data science community. Each webinar provides insights through a compelling case study and offers an interactive platform for attendees to pose questions and learn. This series is a collaborative effort by the R consortium, PhUSE, and PSI. Dive deep into the series here: R-Adoption Webinar Series.

Nov 16

Love0

Highlights from R-Ladies Paris Hybrid Meetup Empowering Community Outreach

By R Consortium Blog

This blog post contributed by Mouna Belaid, on behalf of the R-Ladies Paris team

We were delighted to host a recent hybrid event called Feedback from Experiences in Data Science and Scientific Consultation on October 19 at R-Ladies Paris (in French). We were delighted to invite two amazing speakers: Anna Doizy, a researcher, consultant, and trainer specializing in experimental methodology and data analysis using R, and Kim Antunez, a public statistician at INSEE – The National Institute of Statistics and Economic Studies. Both speakers delivered their presentations in French, providing valuable insights and knowledge to our diverse audience.

We regularly post updates about our upcoming events on our Meetup group. Everyone is welcome to join us there.

We are grateful for datacraft for hosting our event. The venue was well equipped with large screens for the virtual part of the meetup, ensuring a smooth experience for our attendees.

Before starting the presentations, we initiated the recording to make the content accessible. We also went live on our Facebook page, underlining our commitment to inclusivity and ensuring that a wider audience could engage with the event.

Many Contributions Made an Excellent Event

Chaima Boughanmi, a Data Scientist at ExpandedBM – BVA and organizer at R-Ladies Paris, opened the conference and introduced the R-Ladies Paris community and our guest speakers.

Then Xavier Lioneton, COO at datacraft, provided an insightful introduction about datacraft, which is a learning and coworking club for data scientists. Furthermore, they offer an exciting agenda with upcoming data science-related events. If you’re interested, we highly recommend exploring their agenda to stay informed about their upcoming events.

Following the warm introduction to our community, Anna Doizy started her presentation virtually, through the discussion about her scientific projects and her journey as a freelancer for the past three and a half years. She shared her transition from the role of a “biostatistician” to the one of a “scientific consultant” and delved into the challenges she faced while establishing her own company. Anna’s talk also covered her valuable contributions in helping researchers to enhance their experimental approaches.

Her talk was truly impressive.

To ensure the smooth flow of the event, the question and answer session was moderated by Mouna Belaid, an R Consultant at ArData and an organizer at R-Ladies Paris, alongside Chaima Boughanmi.

Then came the turn of our second invited speaker, Kim Antunez, who delivered a live presentation titled “Mon roman d’appRentissage (My LeaRning Novel).” She began by introducing herself by identifying as an R Lady, a title she has proudly earned. Kim then took us on a journey through the eight years she spent developing her skills in the field of data science.

Kim impressed us with stories of her personal experiences, including her work with R Shiny applications, the development of R packages, and her contributions to community presentations, which showcased her dedication and accomplishments.

Following Kim’s presentation, we facilitated a second round of questions and answers, which provided an invaluable opportunity for the audience to delve even deeper into Kim’s experiences.

In the closing session, Chaima Boughanmi gave an overview of the upcoming meetups at R-Ladies Paris :

Afterwork collaboration with PyLadies Paris. We got together at a cozy bar in central Paris to discuss all things Python and R on October 24. We always look forward to collaborating with PyLadies communities. You can also find a collection of replays of talks and workshops where we brought both communities together to explore and execute similar tasks in Python and R
Upcoming Presentation – “Glitter Makes SPARQL: A French R Package for Exploring and Collecting Data from the Semantic Web”: The presentation will be in French and led by Lise Vaudor on November 14 at 12:30 pm CET

Following the talks, we set aside some time for networking and meaningful exchanges among our community members.

Our gratitude goes out to Posit for their contribution of fantastic R stickers, which we were thrilled to offer to our community members.

We also had the pleasure of enjoying delicious sandwiches from Chlew and sushi from Bozen which we highly recommend.

We want to express our appreciation to the R Consortium, whose grant made it possible for us to host this event and provide an opportunity for the exchange of knowledge and ideas.

This event provides the perfect setting for R enthusiasts to meet each other and share their experiences.

For those who missed the event or wish to review the talks, a video recording is available.

You may also find replays of our previous meetups organized into playlists on our YouTube channel.

If you’d like to join us as a speaker or have any questions, please don’t hesitate to reach out to us at paris@rladies.org. We look forward to hearing from you!

Nov 13

Love0

Dive Into R: Collaborate with the Indy UserR Group as a Newsletter Contributor!

By R Consortium Blog

R Consortium recently talked to Sam Parmar of the Indy UseR group about using R in public health and pharma in Indianapolis. Sam also spoke about his volunteer work with the R Weekly newsletter. The newsletter aims to provide subscribers with a comprehensive list of the latest R resources from around the web. Sam also discussed his short online book with tips and tricks for using AI assistant tools such as ChatGPT or GitHub Copilot.

Please share about your background and involvement with the RUGS group.

I am a Statistical Data Scientist at Pfizer and a member of the RCoE SWAT team, which stands for Scientific Workflows and Analytical Tools. We consult with other Pfizer business lines to provide technical expertise on using the R programming language and its various packages. We are also building a community around using R, which has over 1000 members at Pfizer.

From 2017 to 2020, as an epidemiologist for a local health department, I came across R, which we used in conjunction with SAS. This is how I first got involved with the R community. I used to read the R Weekly newsletters and started participating in the Indy UseR group, which Shankar Vaidyaraman and Derrick Kearney organized. It helped me discover many new tools and grow my network. I used R for a few years as an epidemiologist before working in the Pharma sector.

In my experience with the IndyUseR group, the R community is very welcoming. R has excellent documentation, including Quarto books and R Markdown, which makes learning the language easier. I love that many people are willing to create excellent free resources and present their work. We’ve been lucky to have the creators of the gt and targets packages present at our previous user group meetings. I wouldn’t have been able to change careers without this welcoming community successfully.

Can you share what the R community is like in Indianapolis?

The community is diverse, with folks from many backgrounds, such as pharma, public health, and academia. Seeing new faces in our user group meetings this year has been great. Many recent graduates and students attending our meetings are familiar with the tidyverse. At a meeting earlier this year, we connected with a professor teaching analytics at a local university. We enjoyed learning his perspective on educating students in R.

Please share about a project you are currently working on or have worked on in the past using the R language. Goal/reason, result, anything interesting, especially related to the industry you work in?

As I stated previously, I was an avid reader of the R Weekly newsletter. I joined the curation team that generates these issues online just over a year ago. We have an engaged community of readers and listeners and a podcast accompanying the newsletter.

R Weekly is an open source project launched in 2016 and is still actively maintained today. This achievement is remarkable, as very few open source projects last and are actively maintained for a long time. A volunteer curation team oversees the publication of R Weekly. We aggregate information from various RSS feeds, as well as contributions via pull requests made directly to the GitHub repository we host. Our content is public, so it is possible to view historical issues and suggestions we have integrated into the platform.

It is an amazing resource, and we are looking for more members on our team. So, if anyone reading this interview is interested, please submit resource links or do a few pull requests and join our team. Fill in this form to join our team.

Another thing I’m working on is a short tips and tricks book on using AI assistant tools for programmers. This book aims to guide anyone interested in integrating tools like ChatGPT or GitHub Copilot into their workflow and setting up guidelines. It’s not perfect, but I wanted to share it with the community to generate interest and get support from anyone interested in contributing to the book.

What trends do you currently see in R language and your industry? Any trends you see developing in the near future?

There is a lot of interest in Shinylive for R and Python in the pharma space. The developments in WebR technology are truly amazing. I’m involved in a Submissions Working Group Pilot that is looking into the use of WebR and Shinylive for regulatory submissions. Hopefully, in the future, we’ll see it being used for submissions. Additionally, the integration of GitHub Copilot in the RStudio IDE is an exciting new release that I think many people are looking forward to.

How do I Join?

Learn more

Nov 09

Love0

Pfizer Leaders Discuss the Adoption of Open Source, Predominantly R, in Clinical Trial Reporting

By R Consortium Blog

In this PHUSE video interview, Pfizer is making strides in incorporating open source tools into its clinical reporting processes. Michael Rimler, PHUSE Open Source Technologies Director, brings in Mike Smith, Senior Director at Pfizer, head of its R Center of Excellence, and R Consortium board member, and Patti Compton, Vice President and head of Statistical Data Science and Analytics at Pfizer, to discuss Pfizer’s open source journey.

While Pfizer has used R for many years, primarily in clinical pharmacology and QA metrics, the company is now looking to transition more of its clinical trial reporting to be predominantly R based within the software development lifecycle. Smith notes some challenges of “changing the wheels on the bus while it’s rolling down the highway,” but both leaders emphasize the importance and benefits of collaboration through open source.

Patti Compton sees opportunities to build Pfizer’s data science community, leverage the rich library ecosystem, and gain platform independence to take time out of drug development timelines. Both envision a future where submissions to regulators use interactive data tools to allow deeper exploration of clinical trial results.

Smith predicts more interactive platforms between industry and regulators for activities like label negotiations within the next two to three years. Compton agrees that open source will be applied in new ways across drug discovery and development and expects expectations for data science skills to increase. The interview provides insight into Pfizer’s motivations and goals for advancing open source adoption within clinical reporting and analytics.

You can watch the full interview here.