+ Reply to Thread
Results 1 to 17 of 17

Remove Duplicates from very large comma delimited Excel file

  1. #1
    Registered User
    Join Date
    12-02-2011
    Location
    San Diego, CA
    MS-Off Ver
    Excel 2010
    Posts
    12

    Remove Duplicates from very large comma delimited Excel file

    Hello,

    I would like to get ideas on how to remove duplicates from an extremely large Excel file. We have updates to this file regularly, where we need to clean it for duplicates, and it is arduous.

    The are over 229,000 rows. I have tried various macros, and parsed out the data as well, but I end up with memory errors due to its size.

    Here is an example of what the data consists of:

    PLB,TOL,ZMM,DAT,LD2,FID,NC0,REC,UMT
    NC0,REC,ZMM,UMT,DAT,PLB,FID,TOL,LD2
    NC0,PLB,DAT,REC,TOL,CB5,LD2,FID
    CB5,PLB,DAT,NC0,LD2,REC,TOL,FID
    REC,PLB,TOL,NC0,FID,CB5,LD2,DAT
    REC,NC0,LD2,PLB,DAT,TOL,FID
    TOL,REC,NC0,LD2,PLB,UD2,DAT,CB5,FID
    NC0,FID,LD2,MC5,REC,TOL,PLB,TMS,DAT
    LD2,CB5,TOL,FID,REC,DAT,NC0

    Each row above is in Column A to begin with.

    Any ideas on how to reduce this would be great.

    Thanks,

    Larry

  2. #2
    Forum Expert snb's Avatar
    Join Date
    05-09-2010
    Location
    VBA
    MS-Off Ver
    Redhat
    Posts
    5,649

    Re: Remove Duplicates from very large comma delimited Excel file

    advancedfilter:

    Please Login or Register  to view this content.
    or using a string to check for duplicates
    an indication of a unique item will be put into column B
    empty cells in column B (and their respective rows) will be deleted.
    Please Login or Register  to view this content.
    Last edited by snb; 12-05-2011 at 01:13 PM.



  3. #3
    Registered User
    Join Date
    12-02-2011
    Location
    San Diego, CA
    MS-Off Ver
    Excel 2010
    Posts
    12

    Re: Remove Duplicates from very large comma delimited Excel file

    Hi SNB,

    I tried both of your macros and neither worked. On the first one, Excel ran for an exceptionally long period of time before I can to close the program and go back in. It seems to have hung up Excel.

    The second macro gave an error Run-time error "9". Subscript out of range.

  4. #4
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2010, 2019
    Posts
    40,689

    Re: Remove Duplicates from very large comma delimited Excel file

    Can you explain what rows would get deleted from your example, and why?

    Please Login or Register  to view this content.
    Entia non sunt multiplicanda sine necessitate

  5. #5
    Forum Expert snb's Avatar
    Join Date
    05-09-2010
    Location
    VBA
    MS-Off Ver
    Redhat
    Posts
    5,649

    Re: Remove Duplicates from very large comma delimited Excel file

    well of course: 1 missing

    Please Login or Register  to view this content.
    Last edited by snb; 12-06-2011 at 04:41 AM.

  6. #6
    Registered User
    Join Date
    12-02-2011
    Location
    San Diego, CA
    MS-Off Ver
    Excel 2010
    Posts
    12

    Re: Remove Duplicates from very large comma delimited Excel file

    Sorry to say that didn't work either. Here is my problem. The sample data I showed above is listed all in Column A of the spreadsheet. It's a jumble of repetitive three character codes (alphanumeric). They fill up over 229,000 rows. I am trying to just clean up this list to show only one unique code per row in Column A (can be any column, but using this as an example).

    When I have tried the Advanced Filter, Excel freezes. The same is true is I do text to columns, then try to clean up the data with the Remove Duplicates function.

    In the example I gave, there are actually only 13 unique codes out of the 74 total (if they were all lined up in column individually. In other words, I am trying to put one code per cell per row.

    When I ran your macro revision, it deleted everything when I ran it as the comma delimited data. If I were to remove that info, and put one code per row, it deleted a few rows, but not all.

    I think Excel can do this, but the volume of initial data is what is causing an issue. I thought of running a pivot table too, then extracting from that, but it still hung up.

    I am thinking that if the macro or formula could clean the data in the comma delimited format first, then that would work, but I haven't been successful at that.

    Anyway, if it helps, from the example above, the unique values after duplicates have been removed would be:
    Please Login or Register  to view this content.

  7. #7
    Forum Expert snb's Avatar
    Join Date
    05-09-2010
    Location
    VBA
    MS-Off Ver
    Redhat
    Posts
    5,649

    Re: Remove Duplicates from very large comma delimited Excel file

    please post a sample workbook containing 100 rows.
    or use (assuming 3-letter combinations only in column A, separated by a comma):

    Please Login or Register  to view this content.
    Last edited by snb; 12-06-2011 at 06:54 AM.

  8. #8
    Forum Expert nilem's Avatar
    Join Date
    10-22-2011
    Location
    Ufa, Russia
    MS-Off Ver
    2013
    Posts
    3,377

    Re: Remove Duplicates from very large comma delimited Excel file

    as an option
    Please Login or Register  to view this content.

  9. #9
    Registered User
    Join Date
    12-02-2011
    Location
    San Diego, CA
    MS-Off Ver
    Excel 2010
    Posts
    12

    Re: Remove Duplicates from very large comma delimited Excel file

    Here is a sample workbook of the data. Thanks in advance for your help.
    Attached Files Attached Files

  10. #10
    Forum Expert
    Join Date
    01-15-2007
    Location
    Brisbane, Australia
    MS-Off Ver
    2007
    Posts
    6,591

    Re: Remove Duplicates from very large comma delimited Excel file

    Hi

    try this
    Please Login or Register  to view this content.
    I just repeatedly copied your data until it had about 230k rows, and it took about 11 seconds to action. Output is in column AZ

    rylo

  11. #11
    Registered User
    Join Date
    12-02-2011
    Location
    San Diego, CA
    MS-Off Ver
    Excel 2010
    Posts
    12

    Re: Remove Duplicates from very large comma delimited Excel file

    Hi Rylo,

    I tried your macro and perhaps I am doing something wrong, but I am not getting the results that you are. For example, if you were to extract this data into SQL or Access, and you parse out the data in all cells (again, the data is comma delimited), and I import the data into the database by removing duplicates, I end up with 144 unique three alphanumeric codes.

    We get this data in large bulk each month, and I was trying to see if I could do this function in Excel.

  12. #12
    Registered User
    Join Date
    12-02-2011
    Location
    San Diego, CA
    MS-Off Ver
    Excel 2010
    Posts
    12

    Re: Remove Duplicates from very large comma delimited Excel file

    I think I know what I am not explaining as i read the thread. I have been wanting to have one three alphanumberic code per cell in the column. So, if cell A1 has "CLF, H05, ABC, ZBA, TB1" and cell A2 has "XYZ, H05, TB1, ABC, CLF", then the desired result would be in column A as follows:

    --A1--
    1 CLF
    2 H05
    3 ABC
    4 ZBA
    5 TB1
    6 XYZ

    As you can see, the initial data had duplicated in cell A2 for every code except "XYZ". So, in this example, the first initial cell parsed the data vertically (and there were no dups in the first cell), then in cell A6, only XYZ would be placed there since it was the only non-duplicate after it was compared to the cells above.

    Does this make sense?

    Thanks,

    Larry

  13. #13
    Forum Expert snb's Avatar
    Join Date
    05-09-2010
    Location
    VBA
    MS-Off Ver
    Redhat
    Posts
    5,649

    Re: Remove Duplicates from very large comma delimited Excel file

    Did you try the suggestion I gave ?

    Implemented in the attachment.
    Attached Files Attached Files
    Last edited by snb; 12-08-2011 at 05:47 PM.

  14. #14
    Registered User
    Join Date
    12-02-2011
    Location
    San Diego, CA
    MS-Off Ver
    Excel 2010
    Posts
    12

    Re: Remove Duplicates from very large comma delimited Excel file

    Rylo, it looks like that worked! I realized after looking more closely at the code that I have to select the column to get it to work. I am going to double check this in SQL the count, but looking at it now, it works great. And yes, it was very quick (5 to 10 seconds)!

    Do you think there is a way for macro to automatically select the data starting in cell A1, and down that column, or should I just select it myself?

    How would you suggest to delete columns A through AY, so the updated data is in Column A? I was going to use

    Range("A1:AY1").EntireColumn.Delete at the end of your code just before the MsgBox line.

    Thanks Again!

    Larry

  15. #15
    Forum Expert
    Join Date
    01-15-2007
    Location
    Brisbane, Australia
    MS-Off Ver
    2007
    Posts
    6,591

    Re: Remove Duplicates from very large comma delimited Excel file

    Larry

    Sorry, that was me being lazy and using the recorder to do the "text to columns" bit, then working from there.

    I think changing
    Please Login or Register  to view this content.
    to
    Please Login or Register  to view this content.
    should do it.

    2 Your code to delete should be OK. Another way is
    Please Login or Register  to view this content.
    rylo
    Last edited by rylo; 12-08-2011 at 06:10 PM.

  16. #16
    Forum Expert snb's Avatar
    Join Date
    05-09-2010
    Location
    VBA
    MS-Off Ver
    Redhat
    Posts
    5,649

    Re: Remove Duplicates from very large comma delimited Excel file

    I had to reload the attachment (mysteriously vanished)

    On second thought:

    - you probably get these data in a csv or txt file (flat ASCII)
    - in that case you won't need to open those as an Excel Workbook
    - than probalby the fastest method is:

    Please Login or Register  to view this content.
    Last edited by snb; 12-09-2011 at 05:33 AM.

  17. #17
    Registered User
    Join Date
    12-02-2011
    Location
    San Diego, CA
    MS-Off Ver
    Excel 2010
    Posts
    12

    Re: Remove Duplicates from very large comma delimited Excel file

    Hi rylo & snb,

    snb -- I tried your code, and it came up with a bad file name, so I probably put in a bad location. You are correct though that the data comes in as a csv/txt file.

    rylo, I added just a few things to your code and it looks like it is running great. The average time is 8 to 9 seconds.

    Here is the code in case you or anyone else is interested.

    Please Login or Register  to view this content.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1