+ Reply to Thread
Results 1 to 4 of 4

Data Analysis to detect duplication

Hybrid View

  1. #1
    Registered User
    Join Date
    02-27-2010
    Location
    KL
    MS-Off Ver
    Excel 2003
    Posts
    19

    Data Analysis to detect duplication

    Dear Admin,

    I got one question here, I need to compile my company address book from 4 workbooks into 1 worksheet. I am very sure that there are some duplication within the records. I was used the Advanced Filter function to extract unique records. But, this function cannot detect some scenarios, please see example:

    A1
    Restaurant Tai Lian private owned
    Restaurant Tai Lian
    Restaurant Tai Lian S/B

    A1
    Astron Martin S/B
    Astron Martin

    A1
    Sime Darby S/B
    Company Sime Darby

    I might can detect the first 2 examples using sorting, but the 3rd example is hardly to use sorting. Furthermore, I got 10k records need to massage. I am wonder is there an easier way or faster to perform the house keep jobs?

    Thanks and Regards

  2. #2
    Forum Expert JBeaucaire's Avatar
    Join Date
    03-21-2004
    Location
    Bakersfield, CA
    MS-Off Ver
    2010, 2016, Office 365
    Posts
    33,492

    Re: Data Analysis to detect duplication

    Perhaps an adjacent column where you remove all these known "strings" that make the names not match. This will remove "private owned" from all the strings...

    =TRIM(SUBSTITUTE(A1, "private owned", ""))

    You can add more like so:

    =TRIM(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1, "private owned", ""),"S/B", ""),"Company", ""))

    Copy the formula down the whole column, then do your sorting/match tests.
    _________________
    Microsoft MVP 2010 - Excel
    Visit: Jerry Beaucaire's Excel Files & Macros

    If you've been given good help, use the icon below to give reputation feedback, it is appreciated.
    Always put your code between code tags. [CODE] your code here [/CODE]

    ?None of us is as good as all of us? - Ray Kroc
    ?Actually, I *am* a rocket scientist.? - JB (little ones count!)

  3. #3
    Registered User
    Join Date
    02-27-2010
    Location
    KL
    MS-Off Ver
    Excel 2003
    Posts
    19

    Re: Data Analysis to detect duplication

    Hi Admin,

    I am appreciate your suggestion, but to me it is seem like not working well as there are 10k lines, I am hardly to do these. If there any formula has function like Vlookup but return all possible values without exact match also throw out ?

    Thanks and Regards!

  4. #4
    Forum Expert JBeaucaire's Avatar
    Join Date
    03-21-2004
    Location
    Bakersfield, CA
    MS-Off Ver
    2010, 2016, Office 365
    Posts
    33,492

    Re: Data Analysis to detect duplication

    Nope...the concept of "almost matches" is a human concept. Any formula or macro will have to be explicitly taught how to logically decide what "close enough" means on a case by case basis. As you work out each exception, it should apply to multiple cases, but you'll still have to work it out.

    A dataset of 10k entries should be broad enough to help you acquire a logic for dealing with most situations, that's a lot data.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1