Sep 09

R Community Explorer – R User Groups

By josephrickert Blog

By Ben Ubah, Claudia Vitolo and Rick Pack

We recently announced an R-Ladies focused open-source dynamic dashboard built using R and Javascript. That work has now been extended to encompass all R user groups organized through Meetup.com. You can find this new dashboard at this link and its code, here.

The R user group support program and the R-Ladies project, are featured in two out of three top-level R Consortium projects.

How We Identified R User Groups on Meetup

Identifying all R user groups on Meetup.com required more effort than R-Ladies groups. While R-ladies groups are centrally created and their names follow a standard convention, the names of other R user groups are more difficult to predict.

We extended Curtis Kephart’s technique for using string matching to retrieve upcoming R events to:

Match among all data science groups on Meetup (7700 +) those with strings like “r user”, “r-user”,“r-lab”,“phillyr”,“rug”,“bioconductor”,“r-data”,“rug” in their Meetup URL names. We then performed a second round of string matching to search for strings like “programming-in-r”, “r-programming-”, “-using-r”, “r-language”, and “r-project-for-statistical” in the groups’ topics field.
Retrieve all user groups that mention “r-project-for-statistical-computing” in their topics separately.
Retrieve all R-Ladies groups separately, which was necessary to avoid missing some groups.

Procedure

For this dashboard, the following procedure was followed:

We used the meetupr package to extract R user groups from Meetup.com
Improved the existing find_groups() and get_events() functions in meetupr to meet our requirements and switched from the defunct Meetup API keys to OAuth 2.0 authentication system. This switch was quite complicated and will be discussed further in another article.
Transformed the data retrieved from Meetup via meetupr from data frames to JSON, GeoJSON and CSV
Stored the data by committing the JSON/GeoJSON/CSV files to the GitHub repository of the project.
Developed a static HTML dashboard interface based on an open-source Bootstrap template
Rendered the stored data via the dashboard interface
Automated the process of extracting R user groups, data transformation and storage.
Deployed the dashboard via GitHub Pages

The Tools We Used

Combining R (for data-analysis) and JavaScript (for data-presentation) is at the heart of this project as this combination offers great flexibility with automation and deployment.

We used a mix of these tools to develop the dashboard:

R, RStudio and the following packages:

meetupr, curl, jsonlite and leafletR

Javascript and the following libraries: jquery.js, d3.js, echarts.js, leaflet.js, leaflet-markercluster.js and lodash.js
Gentelella Admin Dashboard Bootstrap HTML template
Travis CI to automatically build the project, execute R scripts and bash commands
Bash commands to call R scripts and commit modified files to GitHub

Acknowledgments

We appreciate Curtis Kephart (RStudio) for contributing code that helped us with ideas on identifying R user groups on Meetup.

We also thank the authors of the meetupr package for their excellent work. Special thanks to Jenny Bryan, Erin LeDell, and Greg Sutcliffe for their help over the last month with implementing the requirements for the new Meetup OAuth 2.0 authentication system.

Sep 09

Love0

New R Consortium Blog Guidelines

By R Consortium Blog

The R Consortium is posting new blog guidelines to help facilitate posts from members, ISC grants recipients and the community at large. Please review and send in your ideas!

R Consortium Blog Overview

The R Consortium blog will serve as a channel for the members, ISC grant recipients and the community at large to broadcast to a wide audience how their work and engagement is growing opportunities for the R language for data science and statistical computing.

This may include summaries of how leading institutions, companies and developers are using, developing and advancing R.

Those involved with developing, maintaining, distributing, and using R software are encouraged to contribute to the blog.

Guest posts from the R Consortium community at large or projects funded by the ISC that enhance R and support users are welcomed. Updates about R-related conferences (including useR!), meetings (including SatRDays and RLadies), local user groups worldwide, new working groups or programs for R language certification and training are of interest. Other topics would certainly be considered, but it should be something of interest to the broader R community.

Accepted blog posts are at the sole discretion of the R Consortium.

Quality

We are looking for posts that teach and give value to our community. Blogs should include the meta-narrative that “R is a fast-growing language for statistical computing and graphics” and “the R Consoritum supports the worldwide community of users, maintainers and developers of R software.”

Guest posts must be vendor neutral, though it may mention vendors involved in specific deployment or adoption paths, or their hosting of an in-person event or speaking at an event, or other indications of meaningful participation in the community. It shouldn’t feel like an advertisement for your product, services or company though. Your post must be your content, but can be published elsewhere on the Internet with permission from that website. All content should have a byline (preferably by a company engineer) and be published Creative Commons with Attribution, so you’re welcome to re-publish on your own blog.

The most interesting posts are those that teach or show how to do something in a way maybe others haven’t thought of. Good blog posts show hurdles that were encountered and explain how they were overcome (not that everything is rainbows and unicorns). When showing upstreaming of a patch fixing an issue for others, link back to the Github issue, so readers can follow along. We don’t avoid critical commentary or broad issues, but approach them with sensitivity, professionalism and tact in a way that is beneficial and positive for the community. It would be helpful to the R Consortium to discuss how to choose between different technologies and how to accommodate different legacy issues and cloud platforms.

Be interesting and inspiring!

Promotion

Your blog will be shared on R Consortium’s Twitter channel. Please feel free to retweet or share. Don’t forget to share your work on your own social channels and favorite news aggregator sites. Suggested sites: Twitter, LinkedIn, Reddit, Hacker News, DZone, TechBeacon. Plus industry sites like: https://www.r-bloggers.com/about/, rweekly.org and reddit.com/r/Rlanguage.

How to submit for consideration

Please submit the blog post or a brief summary and the topic of the post to R-marketing@lists.r-consortium.org (with the Subject line: “Proposed Blog: BLOG TITLE”) for consideration. The PR team will review your submission in a timely manner and provide the green light to draft the entire article or provide feedback on next steps. If you are submitting an article or presentation that already exists, please send it in its entirety with a note on the expressed permission from the owner of content. Once your submission has been approved, it will be added to our blog publishing calendar and a publish date will be provided, so you may plan to promote accordingly through your personal and company social media channels. Blog posts should be no longer than 1,000 words and no shorter than 300 words. Diagrams, code examples or photos are strongly encouraged.

Aug 23

Love0

$50,000 in New Grants Approved

By R Consortium Blog

The R Consortium actively supports new projects to help R development both technically and organizationally. Improving R infrastructure and building for long term stability are key goals of the R Consortium. These types of support cannot be matched by individual companies.

The newest three projects that have been awarded grants have been announced. Congratulations to R-global, R ecosystem for meta-research, and R Community Collaboratives. These ambitious projects cover two technical areas – focusing on geographical coordinates and evidence synthesis – as well as resources and support to facilitate on-the-ground organization of community R events.

In total, over $50,000 in new grants were approved.

More projects will be funded soon. Is your R project one of them? See below for more information on applying for funding.

R-global: analysing spatial data globally

Edzer Pebesma (edzer.pebesma@uni-muenster.de)

https://github.com/r-spatial/global/

Currently, a number of R spatial functions assume that coordinates are two-dimensional, taken from a “flat” space, and may or may not work for geographical (long/lat) coordinates, depicting points on a globe. This project will try to make such functions more robust and helpful for the case of geographical coordinates. It will reconsider the concept of a bounding box, and build an interface to the S2 geometry library (http://s2geometry.io/), which powers several modern systems that assume geographic coordinates.

Expanding the ‘metaverse’; an R ecosystem for meta-research

Martin Westgate (martin.westgate@anu.edu.au)

https://rmetaverse.github.io

Evidence synthesis is the process of identifying, collating and summarizing primary scientific research to provide reliable, transparent summaries such as systematic reviews and meta-analyses. Despite their importance for linking research with policy, however, evidence synthesis projects are often time-consuming, expensive, and difficult to update. Open and reproducible workflows would help address these problems, but these workflows are poorly supported by the current package environment, preventing access by new users and hindering uptake of the well-developed suite of statistical tools for meta-analysis in R. The metaverse project will integrate and expand tools to support evidence synthesis and meta-research in R; suggest flexible workflows to complete these projects in a straightforward and open manner; and provide a collector package allowing easy access to these developments for new and experienced users.

R Community Collaboratives

Angela Li (angela@angelalidata.com)

https://github.com/unconf-toolbox

Previously known as the Unconf Toolbox, R Community Collaboratives provide resources and support to facilitate on-the-ground organization of community events. These events engage individuals in the R community through in-person collaboration on open source projects. R Collabs emphasize learning and mentorship, encouraging R users to become R developers. They are inspired by the unconference organized by rOpenSci, but are designed to encourage local organizers to put on events for their own community. To do so, this project develops useful technical and logistical infrastructure for R Collab organizers. These include a website template, an organizing handbook, and a project dashboard for reporting out.

Join the Grant Program!

Strengthening the R community by improving infrastructure and building for long term stability is one of the primary focuses of the R Consortium. To achieve this, the R Consortium’s Infrastructure Steering Committee (ISC) has developed a grant program to fund development of projects that broadly help the R community.

Everyone is encouraged to apply, regardless of experience or expertise!

For a description of the types of projects that are being funded, examples of previous projects, and more, please see our information here: https://www.r-consortium.org/projects/call-for-proposals

Aug 12

Love0

R Community Explorer

By josephrickert Blog

by Ben Ubah, Claudia Vitolo and Rick Pack

Introduction

One of the most important qualities of the R Language is its thriving community. The R community has a reputation for being particularly friendly, welcoming and cohesive, which has enhanced its adoption and expansion. R user groups have accordingly flourished, especially in recent years.

In this year’s Google Summer of Code program, the proposal, “Data-Driven Exploration of the R Community” was selected. For this, the project’s developer, Ben Ubah, thanks the project’s mentors, Claudia Vitolo and Rick Pack for their contributions.

The primary motivation for this project was the need to have a consistent, data-driven, automated dashboard that provides a broad overview of global R User Groups and R-Ladies Groups.

The R Consortium and other stakeholders have invested in community expansion and sustenance initiatives like R-Ladies, R User Group Support (RUGS) program, Event Sponsorship, RCDI-WG and SatRdays.These promote the learning and adoption of R in many under-represented regions. They have also significantly enhanced community engagement.

As the R community has progressed, there does not appear to have arisen a way to track its global user groups’ inception and activity. Is there a way to find out which regions require more representation? How do we recognize the efforts of organizers who put in a lot of effort to organize events that sustain user groups? How do we easily locate and recognize the most active groups and perhaps learn from their successes? Could we somehow ascertain the impact of the initiatives set by the R Consortium and others on a global scale? Could there be a unified platform dedicated to exploring the R community in an open-ended curiosity-driven fashion? These were the thoughts that inspired this project.

While this project is in its infancy, we have started seeing some encouraging results after the first coding phase of Google Summer of Code. It is our hope to share with you what we have achieved so far and receive welcomed feedback, if you are so inclined.

R-Ladies Groups

Since the R Consortium first funded the R-Ladies initiative, there has been a sporadic diffusion of their chapters and members globally. Perhaps partially as a result of having a consistent leadership compositon and funding, R-Ladies groups are mostly managed on meetup.com, and share a common naming convention. This makes it quite easy to find them on meetup.com and explore their data from the meetup API.

In the first phase of Google Summer of Code, this project explored a way to track R-Ladies Groups globally from the meetup API, using the meetupr package developed by R-Ladies.

This exploration was intended to be completely data-driven, automated but rendered via a static dashboard that would be hosted via GitHub Pages. R-Ladies already have a shiny dashboard, which only runs on a Shiny Server. Inspired by that dashboard, we developed one with some useful differences such as faster loading, additional aesthetic features such as thematic coloring, and additional tabular displays, charts and counts.

What Has Been Achieved

For the R-Ladies dashboard, the following were achieved:

We used the meetupr package to extract R-Ladies Chapters from Meetup.com
Improved the existing find_groups() and get_events() functions in meetupr to meet our requirements
Transformed the data from Meetup to required formats
Persisted the data on GitHub
Developed a static HTML dashboard interface based on open-source Bootstrap template.
Rendered the persisted data via the dashboard interface.
Automated the process
Deployed it via GitHub Pages

The Tools We Used

To accomplish the following, we used a mix of the tools listed below:

R, RStudio and the following packages: meetupr, curl, jsonlite and leafletR
Javascript and the following libraries jquery.js, d3.js, echarts.js, leaflet.js and lodash.js
Gentelella Admin Dashboard Bootstrap HTML template
Travis CI to build the project, execute R scripts and bash commands
Bash commands to call R scripts and commit modified files to GitHub

How We Achieved it

We used the meetupr package to retrieve R-Ladies Groups from meetup.com with an R script.
We further analyzed this data and computed several summaries out of it. We used the leafletR package to transform our data frame to GeoJSON. We used this GeoJSON file to create a leaflet map using leaflet.js. In this map, R-Ladies groups are separated into three groups with markers of three color categories: Active (purple), Inactive (dark-purple), and Unbegun (orange). Active groups have had an event in the past 180 days or have an upcoming event in the future. Inactive groups have not had an event in the past 180 days and do not have an upcoming event. Unbegun groups have not had an event in the past and none are planned for the future.
Persisted all data and our summaries in CSV / JSON files. After each Travis build, the data and our summaries gets updated straight from the Meetup API.
We wrote bash commands to run our R scripts, and commit updated CSV / JSON files to GitHub after every Travis build.
We setup Travis Cron Jobs, to build this project daily and update our data.
We then, customized the Gentelella Admin Dashboard Bootstrap HTML template to our requirements.
Rendered our summaries via widgets on this dashboard. Used Javascript/libraries to perform other simpler summaries and produce maps, charts and tables.

The Result

At the end we have an open-source dynamic dashboard for R-Ladies that is updated daily, but is built to be static and hosted via GitHub Pages. This could be seen as another approach to building information dashboards with R as a back-end technology, maintaining separation of business data-processing from data-presentation.

At the time of writing, there are 165 R-Ladies chapters composed of 50,000 + members, across 47 countries, 162 cities, with more than 1,580past events and many upcoming. 71% of R-Ladies chapters are active, 13% are inactive, and 16% are unbegun. Unbegun groups have members but have not started organizing events yet. Our observation is that members are added to the R-Ladies community daily.

The pop-up markers in the leaflet map display important information about each R-Ladies chapter including a link to the group’s webpage, number of events, status, inactive months, and how to become an organizer for inactive/unbegun groups.

Feedback

We are just starting this project and are in hopes of expanding its reach far beyond its current state. We would love to hear from you if you have any ideas or find issues. Feel free to Follow / Star the project at its GitHub repo: https://github.com/benubah/r-community-explorer/

We have started working on general R user-groups and plan to report our progress soon with some lessons we have learned.

Jun 04

Love0

ISC Project Status

By josephrickert Blog

Refactoring and updating the SWIG R module
Richard Beare
This project is complete. See the project page for a summary.

stars: Scalable, spatiotemporal tidy arrays for R
Edzer Pebesma
The stars package currently averaging approximately 9200 downloads per month, and user involvement through github issues is rising. Look here for project status, and If you are going to useR! 2019 look for the stars tutorial.

An Earth data processing backend for testing and evaluating stars
Edzer Pebesma
Active development takes place on an AWS instance that has access to the multi-petabyte Sentinel-2 satellite image archive. See the project page for status.

histoRicalg — Preserving and Transfering Algorithmic Knowledge
John C. Nash
histoRicalg continues to try to preserve and transfer knowledge of older algorithms that are part of R and other computational software. Our work is available here.

Recent presentations on the project include:

Arpad Lukacs’ “Preserving Numerical Algorithms” at FOSDEM 19 Brussels Feb 2 2019. A video and slides are available
John Nash presented “What and how is R so useful? … And annoying.” to the Sarasota Software Engineers User Group.

Some collaborations have resulted from the project:

Work with Matthew Fidler on merging two CRAN packages for L-BFGS optimization and some preliminary
Work on a 40 year old svd method being used by NASA contractors to model Jupiter’s magnetic field.

RUGS
Joseph Rickert
So far this year the RUGS Program has awarded grants to 50 user groups and 15 small conferences totaling $31,000. We now have over 42,000 members participating in groups associated with the RUGS program.

Validation Hub (formerly called PSI application for collaboration to create online R package validation repository)
Lyn Taylor
The R Validation Hub team are now focused on designing a framework which could be used to assess package risk. The repository would host risk metrics, examples of tests, and validation documentation which together would form evidence of the quality of an R package. This documentation would be free to access and stored on a web based portal. The first version of the website went live in 2019 and we are also on GitHub. If you would like to be involved with the project please contact psi.aims.r.validation@gmail.com. Representatives of the R Validation Hub and PSI AIMS SIG will also be presenting at the 2019 PSI conference in London to give an update on this initiative and their work using R in the regulatory environment.

A unified platform for missing values methods and workflows
Imke Mayer
Our website which houses articles, tutorials, data sets and a first set of workflows around a small set of popular R packages went live in January. We will continue to provide more tutorials, assistance in choosing and using existing R packages, and data sets as time goes on. To help make this project as robust as possible, we encourage authors to submit their articles or works, and reviewers to review their placement on the platform/website, either by contacting us via our website or by submitting changes directly in our GitHub repository

Our goal is to create a benchmark of existing methods for different kinds of data (both synthetic and real data), missing values mechanisms, tasks to be fulfilled, etc. An important aspect is that this work should allow other researches and data scientists to re-use/copy our R code to compare their own method to a maximum of existing methods without having to re-implement the comparisons every time themselves.

Ongoing infrastructural development for R on Windows and MacOS
Jeroen Ooms
Rtools has been updated to GCC 8.3, and several new c/c++ libraries added to the rtools-packages repository. The new toolchain was presented at rstudio::conf 2019.

R Hub
Gábor Csárdi
The R Hub project is in maintenance mode. You can follow activities on GitHub.

Conference Management System for R Consortium Supported Conferences
Steph Locke
We have delivered a Hugo template for UseR! events and a template for creating new SVG logos for the UseR! events

R Ladies
Claudia Vitolo
We are pleased to announce that R-Ladies is now a non-profit organisation incorporated in California (United States) with 501(3)(c) tax-exempt status (a blog post will announce this publicly in the next few days). Since we can now accept donations, we have decided to join CommunityBridge, a Linux Foundation platform that allows for a transparent and traceable management of incoming donations and outgoing expenses.

The R-Ladies is continuing to grow. As per March 2019, there are 142 chapters on meetup.com with 38,000+ R-Ladies members (signed up on meetup.com) and distributed in 45 countries in 6 continents (see our shiny dashboard: ). We have also recently launched a mentorship programme to provide help and support from experienced organisers to less experienced ones. The migration to the meetup.com pro account is progressing smoothly: the last group will be migrated in 2019-Q2.

Developing Tools and Templates for Teaching Materials
François Michonneau
We developed the alpha version of the R package checker to validate links and images in static websites such as Rmarkdown and jekyll. In addition to ensuring that there is no broken links, this package will encourage website authors to use best practices in accessibility by adding the metadata to links and images so they can be processed by screen-readers and other assistive technologies. We are starting to use this package to check some of The Carpentries lessons.

Future Minimal API: Specification with Backend Conformance Test Suite
Henrik Bengtsson
The R package future.tests has made it possible to add support for relaying messages and warnings in the future framework (a frequently requested feature) and release it in a non-breaking manner.

Strengthening of R in support of spatial data infrastructures management : geometa and ows4R R packages
Emmanuel Blondel
Milestones M1 to M5 were successfully delivered. These are identified in Github tickets with labels for each milestone.

M1 targeted provision of an INSPIRE metadata validator embedded into the geometa package. This feature has been tested by data managers in France by the French observatory for universe science, , French CNRS Research units Dynafor , and LETG .

M2 targeted the support of multi-lingual metadata encoding/decoding in geometa. All existing geometa classes subject to internationalization have been extended to support multi-language. A battery of tests has been added in all class test files. In addition to geometa, such new feature required intervention in subsequent packages ows4R and geonapi for the publication of multi-language metadata documents. An online documentation has been made available here.

M3 provides a generic metadata converter was planned to be delivered this month of april 2019.

M4 provides Adapter NetCDF-CF core metadata) and

M5 provides Adapter for EML core metadata

Online documentation has been made available on GitHub.

These features are being used by the IRD Marbec Research Unit and DynaFor..

Work has started for the milestones M6 and M7 which tend tackle and complete the coverage of ISO/OGC 19115-1 and 19115-2 standards in geometa by adding all missing classes (planned for completion for the summer 2019).

May 31

Love0

R Consortium Announces Event Sponsorships for 2019

By R Consortium Blog

The R Consortium is committed to the R Community. We support R projects, meetups and events, via grants and sponsorships. Over the last four years, the R Consortium has given more than $125,000 in support of R events both large and small. We are excited to announce the events we are sponsoring in 2019.

This year we wanted to support a few events in large metro areas with active groups, a mix of geographies, and finally industries that are up and coming. A big thanks to all the amazing R event organizers who are all working to promote, improve, and grow the R language and community.

2019 Sponsorship funding goes to:

deRSE19, a conference for research software developers in Germany, is taking place June 4-5 at the Albert Einstein Science Park in Potsdam. #deRSE19 welcomes scientists, but also people who finance, operate, develop, or maintain research software and do not usually attend conferences.

Cascadia R Conference, is in its third year, takes place on June 8th and serves the Pacific Northwest region of Oregon, Washington, and Vancouver BC. This event is the place to come together in the Pacific Northwest to discuss how people are solving everyday problems with the R language. Stay tuned for speaker announcements and follow them on twitter @cascadiarconf.

BioConductor is a conference focused on providing insights and tools required for the analysis and comprehension of high-throughput genomic data. The event takes place in New York City June 24-27. Speakers include Rob Patro,Jeffrey Leek, Elli Papaemmanuil, Simina Boca, Lieven Clement, Lihua Julie Zhu, Anshul Kundaje. Follow all the action on Twitter at #bioc2019.

UseR Toulouse This global event, July 9-12, in Toulouse, is the largest meeting of the R user and developer community. The program consists of both invited and user-contributed presentations. Invited keynote lectures cover a broad spectrum of topics ranging from technical and R-related computing issues to general statistical topics of current interest. Keynote speakers include Joe Cheng, CTO, RStudio, Julien Cornebise, Director of Research at Element AI (UK), Bettina Grün Professor, Johannes Kepler Universität Linz (Austria), Julie Josse Professor, École Polytechnique (France) among others. In addition, R Consortium’s own Joe Rickert will be giving a talk on high-profile meetup groups and the work they are delivering. Follow the event on Twitter @UseR2019_Conf

EARL Conference The Enterprise Applications of the R Language Conference (EARL) is a cross-sector conference focusing on the commercial use of the R programming language and takes place in London, on September 10-12. The conference is dedicated to the real-world usage of R with some of the world’s leading practitioners. Workshops for 2019 include Shiny for Production, Deep Learning with Keras for R, and Package Development in R among others. Check the website for updates on speakers or join the mailing list or follow them on Twitter @earlconf ‏.

R/Medicine The goal of the R/Medicine conference is to promote the use of the R programming environment and ecosystem in medical research and clinical practice. The event takes place September 12-14, 2019, New Haven, CT. Topic areas for R/Medicine include clinical trial design, the analysis of clinical trial data, personalized medicine, the analysis of patient records, the analysis of genetic data, the visualization of medical data, and reproducible research. For more information follow them on Twitter @r_medicine.

satRday Chicago, a brand new event, is a community-led, regional conference to support collaboration, networking, and innovation within the R community. Tracks for the event ranged from academic and civic applications to industry applications, upskilling reproducibility, statistical methodology and more.

New York R Conference united R enthusiasts and data scientists to explore, share, and inspire ideas. This year’s event covered a wide variety of R language topics from Machine Learning in R to GIS, to tidyverse and beyond by some of the best-known data scientists in the community including Andrew Gelman, Emily Robinson, Namita Nandakumar, Max Kuhn, Wes McKinney, Soumya Kalra, David Madigan. For more about the community visit their website at nyhackr.org, follow them on Twitter at @nyhackr and @rstatsnyc.

While our funding efforts are complete for 2019, we encourage the community to continue to share feedback on Twitter @Rconsortium about R events you’d like to see supported in the future. Let us know what conferences are important to you so we can continue to improve our processes and support for the community.

May 28

Love0

Census Academy Launches with Two R Courses

By R Consortium Blog

by Ari Lamstein

Ari Lamstein is an independent consultant and organizer of the Census Working Group.

The US Census Bureau recently launched Census Academy, an online platform focused on training the public to learn about Census data. R Enthusiasts will be excited to learn that Census Academy has launched with two R-specific courses:

Census Data in R by Jerzy Wieczorek
Mapping Census Bureau Data in R with Choroplethr by Ari Lamstein

If you have an interest in using R to analyze US Census Data, then, in addition to the above courses, you might also want to read A Guide to Working with Census Data in R. The Guide summarizes the most popular datasets that the Census Bureau publishes, as well as the most popular R packages for working with Census Data.

A Guide to Working with Census Data in R was created as part of the R Consortium’s Census Working Group, which you can learn more about here.

Mar 27

Love0

CII Best Practices – R Package Leaderboard

By Mark Hornick Blog

Since my last post on the Core Infrastructure Initiative CII Best Practices Badge for R Packages – responding to concerns, there have been many R language projects started – and completed – on the CII Best Practices site. In this post, we recognize the R projects that have achieved the CII Best Practices – Passing level, and note that several are well on their way to achieving silver level. In all, there are more than 50 CII projects related to R packages, with the popular ggplot2 package at the cusp of joining the group below with 97% completion as of this post.

Please congratulate these package owners for their achievement. If you’re a package developer, consider adding your package to the CII Best Practices ranks, and work your way through the levels of passing, silver, and gold.

Id	Name	Description	Owner
265	madrid.air	Parse air quality data published by http://datos.madrid.es/	Ramón Novoa
1882	DBI	A database interface (DBI) definition for communication between R and RDBMSs	Kirill Müller
2011	Delaporte	Provides the probability mass, distribution, quantile, random variate generation, and method of moments parameter estimation	Avraham Adler
2022	lamw	Calculates the real-valued branches of the Lambert-W function	Avraham Adler
2033	pade	Returns the numerator and denominator when given a vector of Taylor series coefficients of sufficient length as input	Avraham Adler
2041	fixedWidth	Save fixed width files	Jeston
2053	DataExplorer	Simplified Exploratory Data Analysis	Boxuan Cui
2054	PKNCA	Perform all noncompartmental analysis (NCA) calculations for pharmacokinetic (PK) data	Bill Denney
2055	BAS	Bayesian Variable Selection and Model Averaging using Bayesian Adaptive Sampling	Merlise Clyde
2083	MortalityLaws	Fit and compare the most popular human mortality laws	MariusD. Pascariu
2135	drake	A general-purpose workflow manager for data-driven tasks in R that rebuilds intermediate data objects when their dependencies	Will Landau
2136	httptest	A Test Environment for HTTP Requests in R	Neal Richardson
468	busdater	Business dates for R	Mick Mioduszewski
2527	jtools	Summarize and visualize regressions with other helpful tools	Jacob Long

Mar 25

Love0

Package licensing and enterprise use

By Mark Hornick Blog

For enterprise users of R, licensing terms of open source software can occupy a significant share of Legal and Corporate Architecture departments time. In Should R Consortium Recommend CII Best Practices Badge for R Packages: Latest Survey Results, one survey topic touched on the licensing of R packages. In talking with various enterprise users of R, there were a few suggestions about how the R community could make leveraging R packages easier within enterprises, while allowing Legal and Corporate Architecture departments to get more sleep.

Getting approvals to use packages

Some of you may be familiar with the process that enterprise users of R packages go through for approvals to use R in their products. Third party software often needs to go through legal reviews, corporate architectural reviews, security reviews, and line of business approvals before they can find their way into use within an enterprise or in products that they produce.

One area of concern is the use of GPL licenses, and the potential impact they may have on proprietary software. See Why GPL still gives enterprises the jitters for more discussion. While there are varying debates about the true impact of a certain license designation, for example, GPL–2 versus GPL–3, in many large organizations, a more conservative interpretation is often applied. (Comparing license options.)

Perhaps less known, is that it’s not just the license of the package in question, but all of its dependent packages, recursively. For example, is a GPL–3 licensed package using a GPL–2 license package validly designated?

What can we do?

When we ask representatives of enterprises who are responsible for approving the use of third-party software what would make their easier, a few suggestions for package authors and maintainers arise concerning licensing:

Packages should not depend on other packages that have incompatibly licensed materials
Use the most permissive license possible for your package, for example, LGPL, GPL–3 or GPL>=2, as opposed to just GPL–2
Minimize the number of dependent packages whenever possible, since each one requires its own approval process which affects adoption
Avoid using packages with more restrictive licensing terms than you intend for your package

We encourage package authors and maintainers to review their dependent packages and look for opportunities to address the suggestions above. Where possible, encourage dependent package authors and maintainers to adopt more permissive licenses as well. Where not possible, ask whether the functionality provided by the dependent package is essential.

For enterprise users of open source software, ask your Legal departments to share their concerns with developers so more informed choices can be made in the future.

Mar 12

Love0

2019 Update One: R Consortium and ISC Announce the Newest Funded Projects for the R Community

By nmcgrory Announcement, Blog

We are excited to announce a wide and diverse group of new R Consortium funded projects. If you are interested in finding out more about these projects, connect with the project owners via links provided below each project.

New Projects include:

Strengthening of R in support of spatial data infrastructures management

Project Owner: Emmanuel Blondel

The project aims to strengthen the role of R in support of Spatial Data Infrastructures (SDI) management, through major enhancements of the geometa R package which offers tools for reading and writing ISO/OGC geographic metadata, including ISO 19115, 19110, and 19119 through the ISO 19139 XML format. This also extends to the Geographic Markup Language (GML – ISO 19136) used for describing geographic data. The use of geometa in combination with publication tools such as ows4R and geosapi fosters the use of R software to ease the management and publication of metadata documents and related datasets in web catalogues, and then allows to move forward with a real R implementation of spatial data management plans based on FAIR (Findable, Accessible Interoperable and Reusable) principles.

The work plan includes several activities such as working on the completeness of the ISO 19115 (ISO 19115-1 and 19115-2) data model in geometa, functions to read/write multilingual metadata documents, and an increased metadata validation capability with a validator targeting the EU INSPIRE directive. Finally, functions will be made available to convert between geometa ISO/OGC metadata objects and other known metadata objects such as NetCDF-CF and EML (Ecological Metadata Language) to foster metadata interoperability. By providing these R tools, we seek to facilitate the work of spatial data (GIS) managers, but also data scientists, whatever the thematic domain, whose daily tasks consist in handling data, describing them with metadata and publishing datasets

Learn more about the project here.

Catalyzing R-hub Adoption Through R Package Developer Advocacy

Project Owner: Maëlle Salmon,

After the continuing technical progress of R-hub over the last two years, this project aims at
catalyzing its adoption by R package developers of all levels through developer advocacy. Indeed, R-hub is currently a successful and very valuable project, but it is not documented thoroughly, which hinders its wider adoption by package developers. This project shall answer this concern by three main actions: improving R-hub documentation, making R-hub
better known in the community and making the R-hub web site more attractive to, and easier to use by, R developers and users via the ingestion of METACRAN services and the creation of a R-hub blog.

Learn more about the project here.

Licensing R – Guidelines and Tools

Project Owner: Colin Fay

Licensing is a vital part of Open Source. It provides guidelines for interacting with a program, and for making code accessible and reusable (or not). It provides a way to make code open source, in a way one wants to share it, protecting how it will be used and reused. Licensing is also challenging and complex: there are a lot of available licenses, and the choice is influenced by how you import and interact with elements from other packages and/or programs.

With this project, we propose to explore and document the current state of open source licenses in R, and to decipher compatibility and incompatibility elements inside these licenses, to help developers chose the best suited license for their project.
Screen reader support enabled.

Learn more about the project here and here.

Data-Driven Discovery and Tracking of R Consortium Activities

Project Owner: Benaiah Chibuokem Ubah

This project proposes an infrastructure that provides a data-driven approach to render the yearly activities of the R Consortium, by deploying web pages for discovering and tracking ISC Funded Projects, RUGS and Marketing activities. These pages are planned to appear like dashboards summarizing activities in interactive tables and charts, presenting several views, trends and insights to what R Consortium has achieved over time. The project hopes that presenting these achievements in a data-driven manner to the R community, the data science community and prospective R Consortium members will promote greater transparency, productivity and community inclusiveness around R Consortium activities. Screen reader support enabled.

Learn more about the project here.

serveRless

Project Owners: Christoph Bodner, Florian Schwendinger, Thomas Laber

R is a great language for rapid prototyping and experimentation, but putting an R model in production is still more complex and time-consuming than it needs to be. With the growing popularity of serverless computing frameworks such as AWS Lambda and Azure Functions we see a huge chance to allow R developers to more easily deploy their code into production. We want to create an R package that provides a common API for different Function-as-a-Service providers such as Azure Functions and AWS Lambda. We will also look into integrating Docker-as-a-Service (e.g. Azure Container Services) if appropriate. Our main goal is to build a user-friendly cloud agnostic wrapper that can be extended to include additional cloud providers later on. We want to build on the work already done for deploying R functions to AWS Lambda by Philipp Schirmer and on the work already done by Neal Fultz and Gergely Daróczi on a gRPC client/server for R, which is necessary for Azure Functions.

If you like our idea and want to help us, feel free to reach out to us on Github here.

Next-Generation Text Layout in Grid and ggplot2

Project Owner: Claus Wilke

Text is a key component of any data visualization. We need to label axes and legends, we need to annotate or highlight specific data points, and we need to provide plot titles and captions. The R graphics package ggplot2 provides numerous features to customize the labeling and annotation of plots, but ultimately it is limited by the current capabilities of the underlying graphics library it uses, grid. Grid can draw simple text strings or mathematical expressions (via plotmath) in different colors, sizes, and fonts. However, it lacks functionality for changing formatting within a string (e.g., draw a single word in italics or in a different color), and it also cannot draw text boxes, where the text is enclosed in a box with defined margins, padding, or background color. This project will support the development of a new package, gridtext, that will alleviate these text formatting limitations. The project will also support efforts to make these new capabilities available from within ggplot2.

Learn more about the project here.

Symbolic Formulae for Linear Mixed Models

Project Owner: Emi Tanaka

Symbolic model formulae define the structural component of a statistical model in an easier and often more accessible terms for practitioners. The earlier instance of symbolic model formulae for linear models was applied in Genstat with further generalization by Wilkinson and Rogers (1973). Chambers and Hastie (1993) describe the symbolic model formulae implementation for linear models in the S language which remains much the same in the R language (Venables et al. 2018).

Linear mixed models (LMMs) are widely used across many disciplines (e.g. ecology, psychology, agriculture, finance etc) due to its flexibility to model complex, correlated structures in the data. While the symbolic formula of linear models generally have a consistent representation and evaluation rule as implemented in stats::formula, this is not the case for LMMs. The inconsistency of symbolic formulae arises mainly in the representation of random effects, with the additional need to specify the variance-covariance structure of the random effects as well as structure of the associated model matrix that governs how the random effects are mapped to (groups of) the observational units. The differences give rise to confusion of equivalent model specification in different R-packages.

The lack of consistency in symbolic formula and model representation across mixed model software motivates the need to formulate a unified symbolic model formulae for LMMs with: (1) extension of the evaluation rules described in Wilkinson and Rogers (1973); and (2) ease of comprehension of the specified model for the user. This symbolic model formulae can be a basis for creating a common API to mixed models with wrappers to popular mixed model R-packages, thereby achieving a similar feat to parsnip R-package (Kuhn 2018) which implements a tidy unified interface to many predictive modeling functions (e.g. random forest, logistic regression, survival models etc).

We would like to find out what are your experiences with fitting linear mixed model in R! Please fill out this survey to help us understand your problems.

Learn more about the project here.

Editorial Assistance for the R Journal

Project Owner: Di Cook

This project supports the operation of the R Journal. There are two aspects, one is to fund an editorial assistant to send reminders about reviews, and assist with typesetting and copyediting issues. The second part is to explore updating the technical operations of the journal production.

Learn more about the project here.