R notes
Include ifelse within a pipeline
In case we are working in a data pipeline, but in the middle we need to include a condiciontal statement, we can do it in the following way:
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(janitor)
Attaching package: 'janitor'
The following objects are masked from 'package:stats':
chisq.test, fisher.test
status <- "most liked"
iris %>%
clean_names() %>%
{if (status == "most liked") {
filter(., species == "setosa")
} else {
filter(., species %in% c("versicolor", "virginica") )
}} %>%
group_by(species) %>%
tally()
# A tibble: 1 × 2
species n
<fct> <int>
1 setosa 50
Build package from the terminal
clean and install package from the command line:
R CMD REMOVE docmaker
R CMD INSTALL docmaker .
Code functions challenges
Quick notes about short R code and functions things that usually I don’t do and I tend to forget.
Change specific values in all columns for NA’s
A data frame with -9999
values in many columns that need to be replace
with NA
’s values:
# libraries
library(dplyr)
# Test data frame
test <- tribble(~a, ~b, ~c,
"a", 2, -9999,
"b", 3, 5,
"c", -9999, 6,
"d", -9999, -9999)
# Solution
test %>%
mutate_all(~replace(., . == -9999, NA))
# A tibble: 4 × 3
a b c
<chr> <dbl> <dbl>
1 a 2 NA
2 b 3 5
3 c NA 6
4 d NA NA
A function to play with google drive
A mock-up documented function to connect with the google drive API through R, check the files that exist in the drive folder, compare to what you have in your local folder and download just those that you don’t have locally.
Could be part of some workflow in an organization that uses google drive as their site to keep their data files (like a research lab)
#' @import dplyr
NULL
#' @title Download data from google drive
#'
#' @author Ronny Alexander Hernández Mora
#'
#' @description This function will check in a local folder, the existing files
#' and compare which ones are missing from an specific folder in google drive.
#' It will download the missing files
#'
#' @param drive_path The folder in google drive containing the files
#' @param local_directory The local folder in which we want to download the
#' files from google drive.
#'
#' @example
#' \dontrun{
#' get_drive_data(drive_path = "data_workflows/data",
#' local_directory = "datos")
#'}
#'
get_drive_data <- function(drive_path, local_directory) {
options(gargle_oauth_email = "my_email@gmail.com")
# Revisar archivos locales
archivos_existentes <- fs::dir_ls(local_directory) %>%
stringr::str_remove(paste0(local_directory, "/"))
# Revisar archivos en drive
camino <- drive_path
# Check data available
archivos_drive <- googledrive::drive_ls(path = camino)
# archivos_drive <- archivos %>%
# select(name)
# Obtener nombres de archivos faltantes
# Suponiendo que siempre tenemos mas archivos en drive
archivos_faltantes <- archivos_drive %>%
filter(!name %in% archivos_existentes)
# Loop para traerse archivos que estan en drive pero no locales
for (i in archivos_faltantes$name) {
archivos_faltantes %>%
select(id) %>%
slice(1) %>%
pull() %>%
googledrive::drive_download(
path = paste0("datos/", i), overwrite = TRUE
)
}
}
After the function is loaded in your Global environment, coupled with a
map()
and if all the data files have the same variables, we can read
everything together in just one data frame:
# Check drive and download data ------------------------------
get_drive_data(drive_path = "data_workflows/data",
local_directory = "datos")
# Read data --------------------------------------------------
# Create object with files of interest
files <- dir_ls(path = "datos", glob = "datos/principe_*")
principe <- files %>%
map_dfr(read_csv, .id = "file_id")
Create questions with yesno
package
If we have functions that require the confirmation of the user, we can
use the yesno
package to create questions and the answer options:
library(yesno)
publicar <- function(){
respuesta <- yesno::yesno("¿Desea publicar las notas?",
yes = "Estoy seguro de publicarlas",
no = "No, es un error",
no = "NOOO, yo no quiero publicar nada")
if (respuesta == TRUE) {
print("Los correos han sido enviados")
} else {
print("No se envio nada")
}
}
#publicar()
stringr
tips
How many hours have I spent looking how to solve a regex?
A lot!
So here I have some quick notes on things that I have solved before and forget about it pretty often.
How to extract numbers from a string
Sometimes, I need to extract just the numbers that I can find in a string. To achieve this, I can use the following function:
library(stringr)
string_with_numbers <- c("01 uno", "02 dos")
str_extract(string_with_numbers , "\\d+")
[1] "01" "02"
Extract string between brackets
library(stringr)
library(tibble)
library(dplyr)
library(tidyr)
## The data frames with the column that I need
check <- tribble(
~geo,
"{\"geodesic\":false,\"type\":\"Point\",\"coordinates\":[-79.94739867829001,44.3105986723403]}" ,
"{\"geodesic\":false,\"type\":\"Point\",\"coordinates\":[-79.94714795170373,44.310596361431216]}",
"{\"geodesic\":false,\"type\":\"Point\",\"coordinates\":[-79.9468972251475,44.31059404997191]}" ,
"{\"geodesic\":false,\"type\":\"Point\",\"coordinates\":[-79.9466464986213,44.31059173796237]}" ,
"{\"geodesic\":false,\"type\":\"Point\",\"coordinates\":[-79.94639577212517,44.3105894254026]}" ,
"{\"geodesic\":false,\"type\":\"Point\",\"coordinates\":[-79.9461450456591,44.310587112292595]}"
)
## Solution 1 (Didn't work)
check %>%
mutate(test = str_extract(geo, "\\[|\\]") ) %>%
select(test)
# A tibble: 6 × 1
test
<chr>
1 [
2 [
3 [
4 [
5 [
6 [
# Solution 2 (This one works!)
check %>%
mutate(test = str_extract(geo, "\\[(.*?)\\]") ) %>%
select(test) %>%
separate(col = "test", into = c("lat", "long"), sep = ",") %>%
mutate(lat = str_extract(lat, "-?[0-9.]+"),
long = str_extract(long, "-?[0-9.]+"))
# A tibble: 6 × 2
lat long
<chr> <chr>
1 -79.94739867829001 44.3105986723403
2 -79.94714795170373 44.310596361431216
3 -79.9468972251475 44.31059404997191
4 -79.9466464986213 44.31059173796237
5 -79.94639577212517 44.3105894254026
6 -79.9461450456591 44.310587112292595