Edit-Distance wiht/without N-Gram
- tags
- #python #algorithm
- published
- reading time
- 2 minutes
This project implements the Edit Distance algorithm with and without N-Gram support, providing a versatile tool for string similarity and comparison tasks in various domains.
This project implements the Edit Distance algorithm with and without N-Gram support. The Edit Distance algorithm calculates the minimum number of operations required to transform one string into another. It is useful for tasks like spell checking, DNA sequence analysis, and natural language processing.
The implementation consists of two main parts: the Edit Distance algorithm itself and the N-Gram support.
# Code snippet for the edit_distance function
def edit_distance(string1, string2):
# Implementation of the edit distance algorithm
# ...
return distanceIn addition to the basic Edit Distance algorithm, this project also incorporates N-Gram support. N-Grams are contiguous subsequences of characters or words in a given input string. They are used to enhance the Edit Distance algorithm by considering the similarity between two strings based on their shared N-Grams.
# Code snippet for the nGramCreator function
def nGramCreator(input_string, n):
# Implementation of the N-Gram generation
# ...
return ngramsThe Jaccard function computes the Jaccard similarity between two sets of N-Grams. This similarity measure helps in determining the similarity between two strings based on their N-Grams.
The project includes test functions to evaluate the performance of the Edit Distance algorithm with and without N-Gram support. For example, the TestEditDistance function tests the algorithm on a dictionary of Italian words and finds the word with the minimum edit distance from the input word.
# Code snippet for the TestEditDistance function
def TestEditDistance(input_word, dictionary):
# Test function to find the word with minimum edit distance
# ...
return resultThe project also includes various test cases to evaluate the algorithm’s performance in different scenarios, such as testing on words with added characters, removed characters, and swapped characters.
The project generates graphs to visualize the performance of the Edit Distance algorithm with and without N-Gram support. These graphs provide insights into the efficiency of the algorithms in different scenarios.
You can find my full project on this Github Repository.