Hi folks,
I have a question that is similar but different to an earlier post (April 26 to be precise - same title). I am a relative beginner with VB so it would be fantastic if someone could lend me a hand.
I have two columns of text each with 1,000+ lines. I want a formula that generates a percentage match for the words in cells of the same row (eg cells B1 and C1).
I have a formula (courtesy of the earlier post) that counts the number of differing characters in the two cells (see below for VBA code). What I really want is a formula that counts the number of differing words.
Example:
Cell A1: the cat sat on the mat
Cell B1: the hat was squashed by the cat
The Levenshtein formula (code and link below) gives me an output of 14 ie, there are 14 different characters in cell B1 than in cell A1. The result I am looking for is that there are 4 words used in cell B1 that are not used in cell A1.
If there is a way to measure the change in the order of words I would be interested in that also, but I realise that is probably very complicated.
Any help is much appreciated!
The formula I have was posted on this forum by SpeedingLunatic in April 2010 and is a Levenshtein measure. The VBA was borrowed from here: http://en.wikibooks.org/wiki/Algorit...htein_distance and is below...
![]()
Function levenshtein(a As String, b As String) As Integer Dim i As Integer Dim j As Integer Dim cost As Integer Dim d() As Integer Dim min1 As Integer Dim min2 As Integer Dim min3 As Integer If Len(a) = 0 Then levenshtein = Len(b) Exit Function End If If Len(b) = 0 Then levenshtein = Len(a) Exit Function End If ReDim d(Len(a), Len(b)) For i = 0 To Len(a) d(i, 0) = i Next For j = 0 To Len(b) d(0, j) = j Next For i = 1 To Len(a) For j = 1 To Len(b) If Mid(a, i, 1) = Mid(b, j, 1) Then cost = 0 Else cost = 1 End If min1 = (d(i - 1, j) + 1) min2 = (d(i, j - 1) + 1) min3 = (d(i - 1, j - 1) + cost) d(i, j) = Application.WorksheetFunction.Min(min1, min2, min3) Next Next levenshtein = d(Len(a), Len(b)) End Function
Bookmarks