Python

3 Packages to Build a Spell Checker in Python

This post is going to talk about three different packages for coding a spell checker in Python – pyspellchecker, TextBlob, and autocorrect.

pyspellchecker

The pyspellchecker package allows you to perform spelling corrections, as well as see candidate spellings for a misspelled word. To install the package, you can use pip:


pip install pyspellchecker

Once installed, the pyspellchecker is really straightforward to use. Note that even though we use “pyspellchecker” when installing via pip, we just type “spellchecker” in the package import statement. The first piece is to create a SpellChecker object, which we’ll just call “spell”.


from spellchecker import SpellChecker

spell = SpellChecker()

Now, we’re ready to test this out with a few misspellings. We’ll use a few words from this list of commonly misspelled words.

To attempt a correction, you can use the correction method:


spell.correction("adress") # address


spell.correction("becuase") # because

pyspellchecker also has a method to split the words in a sentence.


spell.split_words("this sentnce has misspelled werds")

#['this', 'sentnce', 'has', 'misspelled', 'werds']

Once we have a list of the words in the sentence, we can just loop over each word (via a list comprehension) using our SpellChecker object.


words = spell.split_words("this sentnce has misspelled werds")

[spell.correction(word) for word in words]

#['this', 'sentence', 'has', 'misspelled', 'words']

If you just want to flag what words in a sentence are misspelled you can use the unknown method. This method will return a Python set of the potentially misspelled words.


spell.unknown(["dilema", "column", "aquire"])

#{'aquire', 'dilema'}

We can also see the candidate spellings for a misspelled word.


spell.candidates("conceed")

#{'concede', 'conceded'}

TextBlob

The powerful TextBlob can also do spelling corrections. To install TextBlob we can use pip (note all lowercase):


pip install textblob

To use TextBlob’s spellchecking functionality, we just need to import the Word class. Then we can input a word and check its spelling using the spellcheck method, like below.


from textblob import Word

word = Word('percieve')

word.spellcheck()

# [('perceive', 1.0)]

As can be seen above, TextBlob returns two pieces – a recommended correction for this word, and a confidence score associated with the correction. In this case, we just get one word back with a confidence of 1.0, or 100%.

Let’s try another word that returns multiple possibilities. If we input the string “personell”, we get a list of possible corrections with confidence scores because this string is fairly similar in spelling to a few different words.


word = Word('personell')
word.spellcheck()

#[('personal', 0.65),
#('personally', 0.2642857142857143),
# ('peroneal', 0.06428571428571428),
# ('personnel', 0.014285714285714285),
# ('personen', 0.007142857142857143)]

According to its documentation, TextBlob’s spelling correction feature is about 70% accurate.

autocorrect

The last package we’ll examine is called autocorrect. Again, we can install this package with pip:


pip install autocorrect

Once installed, we’ll import the Speller class from autocorrect. Then we’ll create an object that uses the English language (lang = ‘en’). We’ll use this object to do spelling corrections.


from autocorrect import Speller

check = Speller(lang='en')

Next, we can input a sentence to our object, and it will attempt to correct any misspellings.


check("does this sentece have misspelled wordz?")

# 'does this sentence have misspelled words?'

A few caveats

It’s important to keep in mind that no programmatic spell checker is perfect. However, Python does have several pre-made options available, as described above, but you could also potentially build your own as well using fuzzy matching. Also, words outside of context make it more difficult to determine the correct spelling if the misspelled string is similar to multiple words. For example, take the string “liberry”. This is a known misspelling for library. However, it is also just one letter off from liberty.

If we use one of the packages above, we get the word “liberty” returned, which is not illogical, as the string is very close in spelling, but context could help reveal which word makes the most sense. For building a contextual spell checker in Python, you might want to check out recurrent neural networks or Markov models.


spell.correction("liberry") # liberty

word = Word("liberry")
word.spellcheck() # liberty

check("liberry") # liberty

That’s all for this post! Please click here to follow my blog on Twitter.

Andrew Treadway

Recent Posts

Software Engineering for Data Scientists (New book!)

Very excited to announce the early-access preview (MEAP) of my upcoming book, Software Engineering for…

2 years ago

How to stop long-running code in Python

Ever had long-running code that you don't know when it's going to finish running? If…

3 years ago

Faster alternatives to pandas

Background If you've done any type of data analysis in Python, chances are you've probably…

3 years ago

Automated EDA with Python

In this post, we will investigate the pandas_profiling and sweetviz packages, which can be used…

4 years ago

How to plot XGBoost trees in R

In this post, we're going to cover how to plot XGBoost trees in R. XGBoost…

4 years ago

Python collections tutorial

In this post, we'll discuss the underrated Python collections package, which is part of the…

4 years ago