+ Reply to Thread
Results 1 to 2 of 2

Similarity of Two Text Strings

  1. #1
    Registered User
    Join Date
    12-15-2006
    Posts
    64

    Similarity of Two Text Strings

    SUMMARY:

    I need a simple function to compare the text in two cells on the same row and deduce when they are quite similar. I'd like the function to generate a score or similar statistic describing the percent or degree of similarity. Alternatively, I'd like the function to allow me to set a level of similarity I consider acceptable and return one of the two values.

    BACKGROUND:

    I provide property data to the insurance industry. Much of my data comes from Central Appraisal District (CAD) listings which are public information.

    Homeowner's insurance is written only for owner occupied homes, wherease, other policy types are written for rental, leased, renovating or foreclosed properties. Among these homeowner's policies generate the highest commission for the agent. So, owner occupied data is what's in demand.

    CADs list both the owner's mailing address and the property street address. If these two match, we assume the owner lives on the site and include the property record in the data we provide to agents.

    Unfortunately, the CADs aren't perfect. About 10% of time the data entry clerks don't type in the same information for both data fields and are off by a character or two. So, about 10% of the otherwise qualified properties are unnecessarily deleted from our data offerings.

    CADs take great care to make sure the mailing address is correct because they mail the tax bills to that address. The property address accuracy is not considered important by the CADs.

    AIM:

    So, what I'd like to do is to compare two non-identical but similar addresses and calculate a percent or degree of similarity between the two. If they are "sufficiently" similar (perhaps within 5 - 10%) I'd like to return the mailing address to the cell containing the function or I'd like to return the words (Owner Occupied).

    CONSTRAINTS:

    The function must be small and efficient as we routinely work with sets of property records that number near a million rows with 25 or more columns. Resource hogs won't work in this situation.

    THANKS:

    Any assistance would be greatly appreciated.

    LongFisher

  2. #2
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2010, 2019
    Posts
    40,689
    It's an interesting problem but a complicated one (to me, anyway). The statistic you want (I think) is the length of the shortest-path edit script required to transform one string into the other. Text diff utilities use one variation or another of the method. The attached paper is topical.

    I played with this for several hours a few months ago, and got through about half the coding.

    I don't know of a quick hack to approximate a solution.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1