The Levenshtein distance, also known as the edit distance, is a measure of the difference between two strings. It represents the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one word into another.
Calculating this distance efficiently using std::string
is an excellent exercise in dynamic programming and string manipulation.
Let's implement an efficient Levenshtein distance calculator:
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
int levenshteinDistance(const std::string& s1,
const std::string& s2) {
const size_t m{s1.length()};
const size_t n{s2.length()};
std::vector<std::vector<int>> dp(
m + 1, std::vector<int>(n + 1)
);
// Initialize first row and column
for (size_t i{0}; i <= m; ++i) dp[i][0] = i;
for (size_t j{0}; j <= n; ++j) dp[0][j] = j;
// Fill the dp table
for (size_t i{1}; i <= m; ++i) {
for (size_t j{1}; j <= n; ++j) {
if (s1[i - 1] == s2[j - 1]) {
dp[i][j] = dp[i - 1][j - 1];
} else {
dp[i][j] = 1 + std::min({
dp[i - 1][j], // deletion
dp[i][j - 1], // insertion
dp[i - 1][j - 1] // substitution
});
}
}
}
return dp[m][n];
}
int main() {
std::string word1{"kitten"};
std::string word2{"sitting"};
int distance{levenshteinDistance(word1, word2)};
std::cout
<< "The Levenshtein distance between '"
<< word1 << "' and '" << word2 << "' is: "
<< distance << '\n';
}
The Levenshtein distance between 'kitten' and 'sitting' is: 3
Let's break down the levenshteinDistance()
 function:
dp
to store our dynamic programming table.dp
to represent the distance from an empty string.dp
table:
dp[m][n]
gives us the Levenshtein distance.This implementation has a time complexity of and a space complexity of , where and are the lengths of the input strings.
We can optimize this further to use only  space:
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
int levenshteinDistance(
const std::string& s1, const std::string& s2
) {
const size_t m{s1.length()};
const size_t n{s2.length()};
// Ensure s1 is the shorter string
if (m > n) return levenshteinDistance(s2, s1);
std::vector<int> prevRow(m + 1);
std::vector<int> currRow(m + 1);
for (size_t i{0}; i <= m; ++i) prevRow[i] = i;
for (size_t j{1}; j <= n; ++j) {
currRow[0] = j;
for (size_t i{1}; i <= m; ++i) {
if (s1[i - 1] == s2[j - 1]) {
currRow[i] = prevRow[i - 1];
} else {
currRow[i] = 1 + std::min(
{prevRow[i],
currRow[i - 1],
prevRow[i - 1]
});
}
}
std::swap(prevRow, currRow);
}
return prevRow[m];
}
int main() {/*...*/}
The Levenshtein distance between 'kitten' and 'sitting' is: 3
This optimized version uses only two rows of the DP table at a time, significantly reducing memory usage for long strings.
The Levenshtein distance has many practical applications, including:
By implementing this algorithm efficiently with std::string
, we've demonstrated advanced string manipulation techniques and the power of dynamic programming in C++.
Answers to questions are automatically generated and may not have been reviewed.
std::string
ClassA detailed guide to std::string
, covering the most essential methods and operators