In this post we discuss how to write an R script to solve any Sudoku puzzle. There are some R packages to handle this, but in our case, we’ll write our own solution. For our purposes, we’ll assume the input Sudoku is a 9×9 grid. At the end result, each row, column, and 3×3 box needs to contain exactly one of each integer 1 through 9.
Learn more about data science by checking out the great curriculum at 365 Data Science!
Step 0) Define a sample board
Let’s define a sample Sudoku board for testing. Empty cells will be represented as zeroes.
board <- matrix( c(0,0,0,0,0,6,0,0,0, 0,9,5,7,0,0,3,0,0, 4,0,0,0,9,2,0,0,5, 7,6,4,0,0,0,0,0,3, 0,0,0,0,0,0,0,0,0, 2,0,0,0,0,0,9,7,1, 5,0,0,2,1,0,0,0,9, 0,0,7,0,0,5,4,8,0, 0,0,0,8,0,0,0,0,0), byrow = T, ncol = 9 )
Step 1) Find the empty cells
In the first step, let’s write a function that will find all of the empty cells on the board.
find_empty_cells <- function(board) { which(board == 0, arr.ind = TRUE) }
Step 2) Make sure cell placement is valid
Next, we need a function that will check if a cell placement is valid. In other words, if we try putting a number into a particular cell, we need to ensure that the number appears only once in that row, column, and box. Otherwise, the placement would not be valid.
is_valid <- function(board, num, row, col) { # Check if any cell in the same row has value = num if(any(board[row, ] == num)) { return(FALSE) } # Check if any cell in the same column has value = num if(any(board[, col] == num)) { return(FALSE) } # Get cells in num's box box_x <- floor((row - 1) / 3) + 1 box_y <- floor((col - 1) / 3) + 1 # Get subset of matrix containing num's box box <- board[(3 * box_x - 2):(3 * box_x), (3 * box_y - 2):(3 * box_y)] # Check if the number appears elsewhere in its box if(any(box == num)) { return(FALSE) } return(TRUE) }
Step 3) Recursively solve the Sudoku
In the third step, we write our function to solve the Sudoku. This function will return TRUE is the input Sudoku is solvable. Otherwise, it will return FALSE. The final result will be stored in a separate variable.
solve_sudoku <- function(board, needed_cells = NULL, index = 1) { # Find all empty cells if(is.null(needed_cells)) needed_cells <- find_empty_cells(board) if(index > nrow(needed_cells)) { # Set result equal to current value of board # and return TRUE result <<- board return(TRUE) } else { row <- needed_cells[index, 1] col <- needed_cells[index, 2] } # Solve the Sudoku for(num in 1:9) { # Test for valid answers if(!is_valid(board, num, row, col)) {next} else{ board2 = board board2[row, col] <- num # Retest with input if(solve_sudoku(board2, needed_cells, index + 1)) { return(TRUE) } } } # If not solvable, return FALSE return(FALSE) }
Calling the Sudoku solver
Lastly, we call our Sudoku solver. The result is stored in the variable “result”, as can be seen below.
solve_sudoku(board)
Conclusion
That’s it for this post! If you enjoyed reading this and want to learn more about R or Python, check out the great data science program at 365 Data Science.
There is a small mistake in the code.
Step #3, Line 7 we have:
needed_cells <- find_empty(board)
Should be:
needed_cells <- find_empty_cells(board)
Besides that, it all works just fine!
Corrected. Thanks for the catch!
See CRAN packages ‘sudoku’ and ‘sudokuAlt’. The latter one can make sudoku games as well as solve them and works for higher order games (but to a very limited extent).
Thanks! I might write about those in a follow-up post.
Thank you for posting this. It’s especially helpful for someone like myself who is new to programming. I am curious about step 2. The function arguments include board, num, row, and col. I can see where row and col are implemented in the if statement board[row, ] and board[, col], but what I don’t get is how num works out given the == num in the if statements. How does the function know what num is?
Thanks for reading! The “==” is an equality operator. The expression any(board[row, ] == num returns TRUE if any cell in that row of the board has a value equal to num. Likewise, any(board[col, ] == num does the same check for any cell in the same column. So the function needs num as an input. In Step 3, the code will go through possible values of num and check their validity using the function in Step 2.
Nice solution! Trivial suggestion from me: I might replace the calls of the form any(x == num) with num %in% x, which looks slightly neater (fewer brackets) while doing the same thing.
Thanks for the suggestion!
thanks for this post which shows how R has almost unlimited possibilities. Could you explain the 1st line of the step 3 result <- sudoku. From where "sudoku" comes in the program ? what's its class ?
Thanks for reading! That was a leftover line from another board I was using as a test. I removed it now.
Our work develops a program to detect the Sudoku area solve it and print the solution with augmented reality on a screen. The algorithm consist in identify the Region of Interest (ROI) for crop. We transfer only the locations of numbers in the grid and identify if has a number or is empty, if a number is present no action is done but if the grid position is empty, automatically we put a zero. Afterwards we read each row one by one to know what numbers are present to solve the Sudoku. In the next stage we transform the image in numbers in order to the open source code solves the puzzle. Finally with the solution obtained, be proceed to print on the display the result, thereby obtaining a program which solves with augmented reality this popular game.
Hi! I really enjoyed this post. Thanks for sharing.
Would you please explain the logic behind the third step? Isn’t there a way to return the solved matrix as an object instead of returning the logical results? Wouldn’t it be nicer to user while instead and not write into the user’s environment?
Thanks for reading! Yes, you could write the code that way. I wrote the code to return a logical result because not every input will have a solution – but yes, you could write the function so that it returns the result directly to the user.