A Deeper Look at the std::string Class

Calculating Levenshtein Distance

How can I efficiently calculate the Levenshtein distance between two std::strings?

Abstract art representing computer programming

The Levenshtein distance, also known as the edit distance, is a measure of the difference between two strings. It represents the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one word into another.

Calculating this distance efficiently using std::string is an excellent exercise in dynamic programming and string manipulation.

Let's implement an efficient Levenshtein distance calculator:

#include <iostream>
#include <string>
#include <vector>
#include <algorithm>

int levenshteinDistance(const std::string& s1,
                        const std::string& s2) {
  const size_t m{s1.length()};
  const size_t n{s2.length()};

  std::vector<std::vector<int>> dp(
    m + 1, std::vector<int>(n + 1)
  );

  // Initialize first row and column
  for (size_t i{0}; i <= m; ++i) dp[i][0] = i;
  for (size_t j{0}; j <= n; ++j) dp[0][j] = j;

  // Fill the dp table
  for (size_t i{1}; i <= m; ++i) {
    for (size_t j{1}; j <= n; ++j) {
      if (s1[i - 1] == s2[j - 1]) {
        dp[i][j] = dp[i - 1][j - 1];  
      } else {
        dp[i][j] = 1 + std::min({
          dp[i - 1][j],     // deletion
          dp[i][j - 1],     // insertion
          dp[i - 1][j - 1]  // substitution
        });
      }
    }
  }

  return dp[m][n];
}

int main() {
  std::string word1{"kitten"};
  std::string word2{"sitting"};

  int distance{levenshteinDistance(word1, word2)};

  std::cout
    << "The Levenshtein distance between '"
    << word1 << "' and '" << word2 << "' is: "
    << distance << '\n';
}
The Levenshtein distance between 'kitten' and 'sitting' is: 3

Let's break down the levenshteinDistance() function:

  1. We create a 2D vector dp to store our dynamic programming table.
  2. We initialize the first row and column of dp to represent the distance from an empty string.
  3. We iterate through both strings, filling the dp table:
    • If the characters match, we copy the diagonal value (no edit needed).
    • If they don't match, we take the minimum of the three possible operations (deletion, insertion, substitution) and add 1.
  4. The final cell dp[m][n] gives us the Levenshtein distance.

This implementation has a time complexity of O(mn)O(mn) and a space complexity of O(mn)O(mn), where mm and nn are the lengths of the input strings.

We can optimize this further to use only O(min(m,n))O(min(m,n)) space:

#include <iostream>
#include <string>
#include <vector>
#include <algorithm>

int levenshteinDistance(
  const std::string& s1, const std::string& s2
) {
  const size_t m{s1.length()};
  const size_t n{s2.length()};

  // Ensure s1 is the shorter string
  if (m > n) return levenshteinDistance(s2, s1);

  std::vector<int> prevRow(m + 1);
  std::vector<int> currRow(m + 1);

  for (size_t i{0}; i <= m; ++i) prevRow[i] = i;

  for (size_t j{1}; j <= n; ++j) {
    currRow[0] = j;
    for (size_t i{1}; i <= m; ++i) {
      if (s1[i - 1] == s2[j - 1]) {
        currRow[i] = prevRow[i - 1];
      } else {
        currRow[i] = 1 + std::min(
          {prevRow[i],
          currRow[i - 1],
          prevRow[i - 1]
        });
      }
    }
    std::swap(prevRow, currRow);
  }

  return prevRow[m];
}

int main() {/*...*/}
The Levenshtein distance between 'kitten' and 'sitting' is: 3

This optimized version uses only two rows of the DP table at a time, significantly reducing memory usage for long strings.

The Levenshtein distance has many practical applications, including:

  • Spell checking and correction
  • DNA sequence alignment in bioinformatics
  • Fuzzy string matching in search algorithms

By implementing this algorithm efficiently with std::string, we've demonstrated advanced string manipulation techniques and the power of dynamic programming in C++.

Answers to questions are automatically generated and may not have been reviewed.

A computer programmer
Part of the course:

Professional C++

Comprehensive course covering advanced concepts, and how to use them on large-scale projects.

Free, unlimited access

This course includes:

  • 124 Lessons
  • 550+ Code Samples
  • 96% Positive Reviews
  • Regularly Updated
  • Help and FAQ
Free, Unlimited Access

Professional C++

Comprehensive course covering advanced concepts, and how to use them on large-scale projects.

Screenshot from Warhammer: Total War
Screenshot from Tomb Raider
Screenshot from Jedi: Fallen Order
Contact|Privacy Policy|Terms of Use
Copyright © 2024 - All Rights Reserved