Sitemap

From Frustrating Typos to Smart Suggestions: Implementing Levenshtein Distance in Go CLIs

10 min readJun 26, 2025

Last year, while building an internal CLI tool for our engineering team at work, I found myself constantly making typos where I would rush through and mistype a command by just one or two characters. It was very annoying…

The CLI had commands like auth, deploy, config, and status. But I would frequently type authz instead of auth, au instead of auth, or deplyo instead of deploy. Each time, I’d get a cold, unhelpful error message: "Error: unknown command 'authz' for 'app'". No suggestions, no hints, just a dead end that forced them to either guess the correct spelling or dig through documentation.

I realized this was a perfect opportunity to make our CLI more intelligent and user-friendly. Instead of punishing users for minor typos, why not help them by suggesting what they probably meant to type? The goal was simple: when someone makes a small mistake, the CLI should be smart enough to say “Did you mean ‘auth’?” rather than just throwing an error.

My first instinct was to reach for the Jaccard Similarity Index, a well-known algorithm for measuring how similar two sets are. I knew this as i had implemented something similar when working on my resume similarity checker application . The idea was straightforward: break each command into its component characters, calculate how much overlap exists between the mistyped command and each valid command, then suggest the one with the highest similarity score.

I implemented this approach and ran some initial tests. For commands that shared many characters, it worked reasonably well. But as I tested with real-world typos that our team was actually making, the limitations became glaringly obvious. The Jaccard index treats characters like items in a bag, it doesn’t care about order or sequence. So “htua” (auth spelled backwards) would score just as highly as “authz” when compared to “auth”. Even worse, it completely failed with single-character typos and partial matches which was the exact scenarios our users encountered most frequently.

That’s when I discovered Levenshtein Distance, an algorithm specifically designed to measure how many single-character edits are needed to transform one string into another.

I wanted to go from this

$ go run main.go authz
Error: unknown command "authz" for "app"

to this

➜  go run main.go authz

Command 'authz' not found.

Did you mean?
auth

Diving into this algorithm was like entering a rabbit hole of mathematical elegance, but it proved to be the perfect solution for finding similar strings in our CLI. Let me explain the findings.

What is Levenshtein Distance?

Levenshtein Distance measures the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one string into another. It is also known as edit distance. It is mostly used to measure the similarity between two strings. For example:

  • Distance between auth and authz: 1 (insertion of z)
  • Distance between auth and au: 2 (deletion of t and h)

The Mathematical Foundation

Before diving into the implementation, let’s understand the mathematical concept behind Levenshtein Distance.

Let’s assume there are two strings s1 and s2, with lengths n and m, respectively. We can define a matrix D of size (n + 1) × (m + 1), where D[i][j] represents the minimum number of operations required to convert the first i characters of s1 to the first j characters of s2.

The Recurrence Relation

Base Case:

  • D[i][0] = i (all characters in s1 are deleted to match an empty s2)
  • D[0][j] = j (all characters in s2 are inserted to match an empty s1)

Think of the base case as setting up the edges of our table like putting numbers on the borders before we fill in the middle.

First Row (D[0][j] = j): This answers: "How do I get from nothing to part of the second word?"

  • From “” to “s” → Add 1 letter (s)
  • From “” to “si” → Add 2 letters (s, then i)
  • From “” to “sit” → Add 3 letters (s, then i, then t)

It’s like starting with a blank page and typing out letters one by one.

First Column (D[i][0] = i): This answers: "How do I get from part of the first word to nothing?"

  • From “k” to “” → Delete 1 letter (k)
  • From “ki” to “” → Delete 2 letters (k, then i)
  • From “kit” to “” → Delete 3 letters (k, then i, then t)

It’s like having some text and hitting backspace until it’s all gone.

Recursive Case ( For 1 ≤ i ≤ n and 1 ≤ j ≤ m ):

Once we have those border numbers, we fill in the rest of the table using this simple rule:

For any cell in the middle:

  1. Look at the two letters we’re comparing
  2. If they’re the same → copy the diagonal number (no work needed)
  3. If they’re different → pick the smallest of three neighbors and add 1:
  • Left cell + 1 (insert a letter)
  • Top cell + 1 (delete a letter)
  • Diagonal cell + 1 (change a letter)

Why add 1? Because we’re doing one action (insert, delete, or change), so it costs us 1 operation. Think of it like asking: “What’s the cheapest way to get here?” and checking all three possible paths, then picking the cheapest one.

The value D[n][m] is the Levenshtein Distance between s1 and s2.

Example: Transform “kitten” to “sitting”

Let’s walk through a concrete example to see how the algorithm works. We want to find the minimum number of edits needed to transform “kitten” into “sitting”.

Step 1: Initialize the Matrix

We create a matrix where:

  • Rows represent characters of “kitten” (plus empty string)
  • Columns represent characters of “sitting” (plus empty string)
  • Each cell [i][j] will contain the minimum edits needed to transform the first i characters of "kitten" into the first j characters of "sitting"

Initial Setup with Base Cases

   ""  s  i  t  t  i  n  g
"" 0 1 2 3 4 5 6 7
k 1
i 2
t 3
t 4
e 5
n 6

Why these base values?

  • Row 0 (empty → “sitting”): To transform empty string to “s” needs 1 insertion, to “si” needs 2 insertions, etc.
  • Column 0 (“kitten” → empty): To transform “k” to empty needs 1 deletion, “ki” to empty needs 2 deletions, etc.

Step 2: Fill the Matrix Cell by Cell

Let’s fill each cell using the recurrence relation. When we’re filling each cell in the matrix, we look at two characters, one from each word we’re comparing. If these two characters are exactly the same, we don’t need to do anything special, so we just copy the number from the diagonal cell above and to the left. But if the characters are different, we need to make a change, which costs us 1 extra edit. We have three ways to make this change: we can delete a character (look at the cell above), insert a character (look at the cell to the left), or replace one character with another (look at the diagonal cell). We pick whichever option gives us the smallest number, then add 1 to account for the edit we just made. This way, each cell always contains the cheapest possible way to transform one piece of text into another.

Row 1: “k” vs each character of “sitting”

    ""   s   i   t   t   i   n   g
"" 0 1 2 3 4 5 6 7
k 1 1 2 3 4 5 6 7

Cell [1][1]: “k” → “s”

  • ‘k’ ≠ ‘s’, so we need an edit
  • Options: deletion(1+1=2), insertion(1+0=1), substitution(1+0=1)
  • Minimum = 1 (substitute ‘k’ with ‘s’)

Cell [1][2]: “k” → “si”

  • ‘k’ ≠ ‘i’, so we need an edit
  • Options: deletion(2+1=3), insertion(1+1=2), substitution(1+1=2)
  • Minimum = 2

Row 2: “ki” vs each character of “sitting”

    ""   s   i   t   t   i   n   g
"" 0 1 2 3 4 5 6 7
k 1 1 2 3 4 5 6 7
i 2 2 1 2 3 4 5 6

Cell [2][2]: “ki” → “si”

  • ‘i’ = ‘i’, characters match!
  • Take diagonal value: 1 (no additional edit needed)

Cell [2][3]: “ki” → “sit”

  • ‘i’ ≠ ‘t’, so we need an edit
  • Options: deletion(2+2=4), insertion(1+2=3), substitution(1+1=2)
  • Minimum = 2

Continue filling the entire matrix

      ""  s  i  t  t  i  n  g
"" 0 1 2 3 4 5 6 7
k 1 1 2 3 4 5 6 7
i 2 2 1 2 3 4 5 6
t 3 3 2 1 2 3 4 5
t 4 4 3 2 1 2 3 4
e 5 5 4 3 2 2 3 4
n 6 6 5 4 3 3 2 3

Step 3: Interpret the Result

The bottom-right cell [6][7] = 3 means we need 3 edits to transform "kitten" into "sitting".

Step 4: Trace Back the Actual Transformations

To find the actual edits, we can trace back through the matrix:

Method 1: Optimal Transformations

  1. “kitten” → “sitten”: Substitute ‘k’ with ‘s’ (1 edit)
  2. “sitten” → “sittin”: Substitute ‘e’ with ‘i’ (2 edits total)
  3. “sittin” → “sitting”: Insert ‘g’ at the end (3 edits total)

Method 2: Alternative Path (same cost)

The beauty of Levenshtein Distance is there might be multiple ways to achieve the minimum:

  1. “kitten” → “sitten”: Substitute ‘k’ with ‘s’
  2. “sitten” → “sitten”: Delete ‘e’ and ‘n’
  3. “sitt” → “sitting”: Insert ‘i’, ’n’, ‘g’

Think of each cell as answering: “What’s the cheapest way to get from substring A to substring B?”

"" → ""     = 0  (nothing to nothing)
"k" → "" = 1 (delete k)
"k" → "s" = 1 (substitute k→s)
"ki" → "si" = 1 (substitute k→s, i matches i)
...
"kitten" → "sitting" = 3 (our final answer)

Rule of Thumb

  1. Characters that match don’t cost anything extra: We just take the diagonal value
  2. Characters that don’t match require us to choose the cheapest operation:
  • Insertion: cost from left cell + 1
  • Deletion: cost from top cell + 1
  • Substitution: cost from diagonal cell + 1

The algorithm finds the globally optimal solution, not just locally optimal choices

Why Levenshtein Distance Worked

Unlike Jaccard Similarity, which relies on the overlap of subsets, Levenshtein Distance excels at comparing sequences. It’s especially powerful for:

  • Single-character typos: e.g., authz instead of auth
  • Missing characters: e.g., au instead of auth
  • Extra characters: e.g., authorization instead of auth

This made it a natural fit for our CLI, where users often mistype commands by a small margin.

Implementation in the CLI

Here’s how I implemented Levenshtein Distance to suggest similar commands:

Core Levenshtein Function

func levenshteinDistance(s1, s2 string) int {
s1 = strings.ToLower(s1)
s2 = strings.ToLower(s2)

if len(s1) == 0 {
return len(s2)
}
if len(s2) == 0 {
return len(s1)
}

// Initialize matrix
matrix := make([][]int, len(s1)+1)
for i := range matrix {
matrix[i] = make([]int, len(s2)+1)
}

// Fill base cases
for i := 0; i <= len(s1); i++ {
matrix[i][0] = i
}
for j := 0; j <= len(s2); j++ {
matrix[0][j] = j
}

// Fill the matrix using dynamic programming
for i := 1; i <= len(s1); i++ {
for j := 1; j <= len(s2); j++ {
if s1[i-1] == s2[j-1] {
matrix[i][j] = matrix[i-1][j-1]
} else {
matrix[i][j] = min(
matrix[i-1][j]+1, // Deletion
matrix[i][j-1]+1, // Insertion
matrix[i-1][j-1]+1, // Substitution
)
}
}
}

return matrix[len(s1)][len(s2)]
}

func min(numbers ...int) int {
minValue := numbers[0]
for _, num := range numbers[1:] {
if num < minValue {
minValue = num
}
}
return minValue
}

Command Suggestion Logic

To find similar commands, I loop through the available commands and computed the Levenshtein Distance. Commands with a distance below a threshold are suggested:

func findSimilarCommands(input string, commands []*cobra.Command) []string {
const similarityThreshold = 3
var suggestions []string

for _, cmd := range commands {
if distance := levenshteinDistance(input, cmd.Name()); distance <= similarityThreshold {
suggestions = append(suggestions, cmd.Name())
}
}

return removeDuplicates(suggestions)
}

func removeDuplicates(input []string) []string {
unique := make(map[string]struct{})
var result []string

for _, item := range input {
if _, exists := unique[item]; !exists {
unique[item] = struct{}{}
result = append(result, item)
}
}

return result
}

Results

The implementation delivers exactly what we wanted — helpful suggestions for mistyped commands:

➜  go run cmd/cli/main.go authz   
Error: unknown command "authz" for "cia"

Did you mean this?
auth

exit status 1
➜ cmd/cli/main.go au
Error: unknown command "au" for "cia"

Did you mean this?
auth

exit status 1

The User Experience Impact

From the user’s perspective, the change seems trivial: an incorrect command now yields a helpful suggestion. However, implementing this feature taught me the importance of anticipating user behavior and building software that adjusts to human error.

Beyond CLI: Other Applications

Levenshtein Distance has numerous applications across different domains:

  • Spell Checking: Suggest corrections for misspelled words
  • DNA Sequence Analysis: Compare genetic sequences
  • Natural Language Processing: Determine similarity between sentences or words
  • Fuzzy String Matching: Database search with approximate matches
  • Version Control: Detect similar file names or content changesLessons Learned

Key Lessons Learned

Understand Your Problem: Switching from Jaccard Similarity to Levenshtein Distance was only possible after deeply understanding the limitations of the former. Researching and experimenting led to a better solution.

Embrace Mathematical Rigor: The elegance of Levenshtein Distance lies in its mathematical foundation. While it might seem complex initially, taking the time to understand the algorithm unlocked powerful new capabilities for the CLI.

Focus on User Experience: A small detail, like suggesting the correct command, can significantly enhance usability and reduce user frustration.

Iterate and Test: Real-world testing revealed edge cases (e.g., duplicates or overly generous thresholds). Iteration and feedback loops helped refine the experience.

Final Thoughts

Improving a CLI’s user experience isn’t just about adding more features; it’s about anticipating and resolving friction points for users. Implementing Levenshtein Distance transformed the CLI from a strict, unforgiving tool into a forgiving, user-friendly assistant.

This experience reinforced that great tools adapt to their users, not the other way around. Sometimes, the most impactful improvements come from addressing the smallest pain points like a simple typo in a command.

--

--

Prabesh
Prabesh

Written by Prabesh

Senior Site Reliability Engineer & Backend Engineer | Docker Captain 🐳 | https://www.linkedin.com/in/prabeshthapa

Responses (1)