Though Python is usually thought of over R for doing system administration tasks, R is actually quite useful in this regard. In this post we’re going to talk about using R to create, delete, move, and obtain information on files.
Before working with files, it’s usually a good idea to first know what directory you’re working in. The working directory is the folder that any files you create or refer to without explicitly spelling out the full path fall within. In R, you can figure this out with the getwd function. To change this directory, you can use the aptly named setwd function.
# get current working directory getwd() # set working directory setwd("C:/Users")
A new folder, or directory, can be created in R using the dir.create function, like this:
dir.create("new_folder")
You just need to replace “new_folder” with whatever name you choose. If you don’t write out the full path of this new directory, it will get created into whatever the current working directory is i.e. the value of getwd().
Similarly, creating a blank file can be done with file.create.
file.create("new_text_file.txt") file.create("new_word_file.docx") file.create("new_csv_file.csv")
With this in mind, creating lots of files quickly is made easy. For example, the one-liner below will create 100 empty text files:
sapply(paste0("file", 1:100, ".txt"), file.create)
Copying a file can be done using file.copy.
file.copy("source_file.txt", "destination_folder")
With file.copy, the first parameter is the name of the file to be copied; the second is the destination folder that you want to copy the file to. If the file copies successfully, the function will return TRUE — otherwise, it returns FALSE.
The simplest way of listing all the files in a directory with R is by calling list.files.
# list all files in current directory list.files() # list all files in another directory list.files("C:/path/to/somewhere/else")
Calling list.files with no additional parameters will only list the files and folders directly within the directory — i.e. it doesn’t list the files within any sub-folder unless you tell it to do so. This can be done like this:
list.files("C:/path/to/somewhere/else", recursive = TRUE)
Note, adding the “recursive = TRUE” flag may cause the function to run for a longer period of time if the directory has a large number of sub-folders and files (e.g. running list.files(“C:/”, recursive = TRUE)).
An additional point — running default list.files doesn’t list the full path names of the files. We can set the parameter, full.names, to TRUE to get the full path names.
list.files("C:/path/to/somewhere/else", full.names = TRUE, recursive = TRUE)
list.files can also apply a filter internally to the files you want to list. For instance, the R code below will list all of the CSV files in a directory (similar to “ls | grep .csv” in Linux)
# list all CSV files non-recursively list.files(pattern = ".csv") # list all CSV files recursively through each sub-folder list.files(pattern = ".csv", recursive = TRUE)
The above logic can be really useful if you want to read in all of the CSV files within a given directory. For instance, suppose you have a list of CSV’s in a folder, and you want to produce a single data frame (provided they each have the same layout) from all the files. You can accomplish this in a couple lines of code:
# read in all the CSV files all_data_frames <- lapply(list.files(pattern = ".csv"), read.csv) # stack all data frames together single_data_frame <- Reduce(rbind, all_data_frames)
Another way of getting the files in a directory is using the function, fileSnapshot. fileSnapshot will also give you additional details about the files. This function returns a list of objects.
# get file snapshot of current directory snapshot <- fileSnapshot() # or file snapshot of another directory snapshot <- fileSnapshot("C:/some/other/directory")
fileSnapshot returns a list, which here we will just call “snapshot”. The most useful piece of information can be garnered from this by referencing “info”:
snapshot$info
Here, snapshot$info is a data frame showing information about the files in the input folder parameter. Its headers include:
file.info is similar to fileSnapshot, except that it returns a single record of information corresponding to an input file. For instance, the code below will return the fields above (size, isdir, mode, mtime etc.) for the specific file, “some_file.csv”:
file.info("some_file.csv")
If you want to get just the created time stamp of a file, call file.ctime:
file.ctime("C:/path/to/file/some_file.txt")
Getting the last modified time stamp is similar to above, except we use file.mtime:
file.mtime("C:/path/to/file/some_file.txt")
Files can be deleted with R using unlink. Deleting a single file is as simple as passing the file’s name to this function.
To delete a directory, you have to add the parameter recursive = TRUE.
# delete a file unlink("some_file.csv") # delete another file file.remove("some_other_file.csv") # delete a directory -- must add recursive = TRUE unlink("some_directory", recursive = TRUE)
With unlink, we can delete the 100 text files we created above with file.create — also in just one line of code.
sapply(paste0("file", 1:100, ".txt"), unlink)
You can check if a file exists, using the file.exists function.
# check if a file exists file.exists("C:/path/to/file/some_file.txt") # check if a folder exists file.exists("C:/path/to/file/some_folder") # alternatively, check if a folder exists with dir.exists dir.exists("C:/path/to/file/some_folder")
Running file.exists will return TRUE whether an existing file is a directory or not, whereas dir.exists will return TRUE if and only if the input value exists and is a directory.
Getting the base name of a file can be done using the basename function:
basename("C:/path/to/file.txt")
The above code will return “file.txt”
Tweaking the code above, we can get the directory of a file like this:
dirname("C:/path/to/file.txt")
This will return “C:/path/to”
Getting a file’s extension can be done using the file_ext function from the tools package.
library(tools) file_ext("C:/path/to/file.txt") # returns "txt" file_ext("C:/path/to/file.csv") # returns "csv"
To physically open, or launch, a file, use the shell.exec or file.show functions:
# use shell.exec... shell.exec("C:/path/to/file/some_file.txt") # or file.show to launch a file file.show("C:/path/to/file/some_file.txt")
This can be really handy if you’re modifying a section of code that writes over the same file, and you want to open it to check some results without having to manually do so.
To open a file selection window, you can run file.choose():
file.choose()
Running this command will return the name of the file selected by the user.
As of this writing, there is not a built-in base R function to directly move a file from one place to another, but this can be accomplished using the filesstrings package, and its function file.move:
library(filesstrings) file.move("C:/path/to/file/some_file.txt", "C:/some/other/path")
Here, the first argument is the name of the file you want to move. The second argument is the destination directory.
Very excited to announce the early-access preview (MEAP) of my upcoming book, Software Engineering for…
Ever had long-running code that you don't know when it's going to finish running? If…
Background If you've done any type of data analysis in Python, chances are you've probably…
In this post, we will investigate the pandas_profiling and sweetviz packages, which can be used…
In this post, we're going to cover how to plot XGBoost trees in R. XGBoost…
In this post, we'll discuss the underrated Python collections package, which is part of the…