File Manipulation

R: How to create, delete, move, and more with files

Though Python is usually thought of over R for doing system administration tasks, R is actually quite useful in this regard. In this post we’re going to talk about using R to create, delete, move, and obtain information on files.

How to get and change the current working directory

Before working with files, it’s usually a good idea to first know what directory you’re working in. The working directory is the folder that any files you create or refer to without explicitly spelling out the full path fall within. In R, you can figure this out with the getwd function. To change this directory, you can use the aptly named setwd function.


# get current working directory
getwd()

# set working directory
setwd("C:/Users")

Creating Files and Directories

A new folder, or directory, can be created in R using the dir.create function, like this:

dir.create("new_folder")

You just need to replace “new_folder” with whatever name you choose. If you don’t write out the full path of this new directory, it will get created into whatever the current working directory is i.e. the value of getwd().

Similarly, creating a blank file can be done with file.create.


file.create("new_text_file.txt")
file.create("new_word_file.docx")
file.create("new_csv_file.csv")

With this in mind, creating lots of files quickly is made easy. For example, the one-liner below will create 100 empty text files:


sapply(paste0("file", 1:100, ".txt"), file.create)

Copying a file / folder

Copying a file can be done using file.copy.


file.copy("source_file.txt", "destination_folder")

With file.copy, the first parameter is the name of the file to be copied; the second is the destination folder that you want to copy the file to. If the file copies successfully, the function will return TRUE — otherwise, it returns FALSE.

How to list all the files in a directory

The simplest way of listing all the files in a directory with R is by calling list.files.


# list all files in current directory
list.files()

# list all files in another directory
list.files("C:/path/to/somewhere/else")

Calling list.files with no additional parameters will only list the files and folders directly within the directory — i.e. it doesn’t list the files within any sub-folder unless you tell it to do so. This can be done like this:


list.files("C:/path/to/somewhere/else", recursive = TRUE)

Note, adding the “recursive = TRUE” flag may cause the function to run for a longer period of time if the directory has a large number of sub-folders and files (e.g. running list.files(“C:/”, recursive = TRUE)).

An additional point — running default list.files doesn’t list the full path names of the files. We can set the parameter, full.names, to TRUE to get the full path names.


list.files("C:/path/to/somewhere/else", full.names = TRUE, recursive = TRUE)

list.files can also apply a filter internally to the files you want to list. For instance, the R code below will list all of the CSV files in a directory (similar to “ls | grep .csv” in Linux)


# list all CSV files non-recursively
list.files(pattern = ".csv")

# list all CSV files recursively through each sub-folder
list.files(pattern = ".csv", recursive = TRUE)

The above logic can be really useful if you want to read in all of the CSV files within a given directory. For instance, suppose you have a list of CSV’s in a folder, and you want to produce a single data frame (provided they each have the same layout) from all the files. You can accomplish this in a couple lines of code:


# read in all the CSV files
all_data_frames <- lapply(list.files(pattern = ".csv"), read.csv)

# stack all data frames together
single_data_frame <- Reduce(rbind, all_data_frames)

How to get created / modified times and other details about files

fileSnapshot

Another way of getting the files in a directory is using the function, fileSnapshot. fileSnapshot will also give you additional details about the files. This function returns a list of objects.


# get file snapshot of current directory
snapshot <- fileSnapshot()

# or file snapshot of another directory
snapshot <- fileSnapshot("C:/some/other/directory")

fileSnapshot returns a list, which here we will just call “snapshot”. The most useful piece of information can be garnered from this by referencing “info”:


snapshot$info

Here, snapshot$info is a data frame showing information about the files in the input folder parameter. Its headers include:

  • size ==> size of file
  • isdir ==> is file a directory? ==> TRUE or FALSE
  • mode ==> the file permissions in octal
  • mtime ==> last modified time stamp
  • ctime ==> time stamp created
  • atime ==> time stamp last accessed
  • exe ==> type of executable (or “no” if not an executable)
  • file.info

    file.info is similar to fileSnapshot, except that it returns a single record of information corresponding to an input file. For instance, the code below will return the fields above (size, isdir, mode, mtime etc.) for the specific file, “some_file.csv”:

    
    file.info("some_file.csv")
    
    

    file.ctime

    If you want to get just the created time stamp of a file, call file.ctime:

    
    file.ctime("C:/path/to/file/some_file.txt")
    
    

    file.mtime

    Getting the last modified time stamp is similar to above, except we use file.mtime:

    
    file.mtime("C:/path/to/file/some_file.txt")
    
    

    How to delete files

    Files can be deleted with R using unlink. Deleting a single file is as simple as passing the file’s name to this function.

    To delete a directory, you have to add the parameter recursive = TRUE.

    
    # delete a file
    unlink("some_file.csv")
    
    # delete another file
    file.remove("some_other_file.csv")
    
    # delete a directory -- must add recursive = TRUE
    unlink("some_directory", recursive = TRUE)
    
    

    With unlink, we can delete the 100 text files we created above with file.create — also in just one line of code.

    
    sapply(paste0("file", 1:100, ".txt"), unlink)
    
    

    How to check if a file or directory exists

    You can check if a file exists, using the file.exists function.

    
    # check if a file exists
    file.exists("C:/path/to/file/some_file.txt")
    
    # check if a folder exists
    file.exists("C:/path/to/file/some_folder")
    
    # alternatively, check if a folder exists with dir.exists
    dir.exists("C:/path/to/file/some_folder")
    
    
    

    Running file.exists will return TRUE whether an existing file is a directory or not, whereas dir.exists will return TRUE if and only if the input value exists and is a directory.

    How to get the base name of a file

    Getting the base name of a file can be done using the basename function:

    
    basename("C:/path/to/file.txt")
    
    

    The above code will return “file.txt”

    How to get the directory name of a file

    Tweaking the code above, we can get the directory of a file like this:

    
    dirname("C:/path/to/file.txt")
    
    

    This will return “C:/path/to”

    How to get a file’s extension

    Getting a file’s extension can be done using the file_ext function from the tools package.

    
    library(tools)
    
    file_ext("C:/path/to/file.txt") # returns "txt"
    
    file_ext("C:/path/to/file.csv") # returns "csv"
    
    

    How to physically open a file

    To physically open, or launch, a file, use the shell.exec or file.show functions:

    
    # use shell.exec...
    shell.exec("C:/path/to/file/some_file.txt")
    
    # or file.show to launch a file
    file.show("C:/path/to/file/some_file.txt")
    
    

    This can be really handy if you’re modifying a section of code that writes over the same file, and you want to open it to check some results without having to manually do so.

    How to open a file selection window

    To open a file selection window, you can run file.choose():

    
    file.choose()
    
    

    Running this command will return the name of the file selected by the user.

    How to move a file

    As of this writing, there is not a built-in base R function to directly move a file from one place to another, but this can be accomplished using the filesstrings package, and its function file.move:

    
    library(filesstrings)
    
    file.move("C:/path/to/file/some_file.txt", "C:/some/other/path")
    
    

    Here, the first argument is the name of the file you want to move. The second argument is the destination directory.

    Click here to read other R articles of mine.

    Andrew Treadway

    Recent Posts

    Software Engineering for Data Scientists (New book!)

    Very excited to announce the early-access preview (MEAP) of my upcoming book, Software Engineering for…

    2 years ago

    How to stop long-running code in Python

    Ever had long-running code that you don't know when it's going to finish running? If…

    3 years ago

    Faster alternatives to pandas

    Background If you've done any type of data analysis in Python, chances are you've probably…

    3 years ago

    Automated EDA with Python

    In this post, we will investigate the pandas_profiling and sweetviz packages, which can be used…

    3 years ago

    How to plot XGBoost trees in R

    In this post, we're going to cover how to plot XGBoost trees in R. XGBoost…

    4 years ago

    Python collections tutorial

    In this post, we'll discuss the underrated Python collections package, which is part of the…

    4 years ago