Removing rows based on duplicates?

**spyrule** · 10-26-2007, 11:34 AM

Hello,

I am wanting to write a macro, that can remove all rows that are duplicates, except 1.

Scenario:

I get lists of existing customers from several locations (geographically, and seperate network database's), so I cannot pre-filter the data. I simply copy these entire rows of data, onto one worksheet. I then have a central list which i filter against.

However, usually I am left with a list of 5-10,000 records which are duplicates. Now the duplicate count can be anywhere from 2 to 5 duplicates of the same phone number (that's all I filter against).

Currently I have a column with : =COUNTIF(C:C,C2)>1
I run this on ever row, and then sort the data based on the boolean value.

Now what I need help with, is vba code that could process through all the lines that column G = true, and then scan that entire "section" of numbers, and erase all rows that are duplicates, and only retain 1 copy of each number (the whole row of that number).

The idea I had would be something like, if cell(g) = true, then step through the remaining rows, copy first number to a new worksheet, then erase that row, step to the next row, and check if that number matches any number in the new worksheet, if not, copy that row to the new worksheet, if it is already there, then delete that row.

Here is a sample file of the kind of data that I am talking about :

sample.zip

ANY help, or suggestions on this would be greatly apreciated.

thank you in advance,

Spyrule.

**BigBas** · 10-26-2007, 12:41 PM

I made a revision to your formula. Now, it will only return TRUE for the 2nd (and 3rd, 4th, etc) duplicate. So, if you have ABC123 3 times, the first will return false, the second two will return TRUE.

I also added a macro that will delete all of the TRUE entries.

Let me know if it helps.

**spyrule** · 10-26-2007, 01:21 PM

Thank you SO VERY MUCH!.

You have no idea how much time you just saved me!! LOL.

Cheers,

Spyrule.

For those who are wondering, here is the change in formulae that he did, so that other can benefit :

original formula : =countif(C:C,C1)>1

new formula : =countif($C$1:C1,C1)>1

What this does, is grow the comparison list from the top down, so it will automatically scan ALL numbers by the end of the list.

Allthough I suspect this formula would only work once the list has been
sorted by the comparison column (in this case the phone numbers in column C).

Also the auto-delete macro :

Sub Delete_Duplicates()
Dim LastRow As Long

LastRow = Cells(Cells.Rows.Count, 6).End(xlUp).Row

Application.ScreenUpdating = False
For i = LastRow To 3 Step -1
If Cells(i, "F").Value = True Then
Cells(i, "F").EntireRow.Delete
End If
Next i
Application.ScreenUpdating = True
End Sub

Simply update the "F" with whatever the column it is you want to verify.

Thank you again BigBas, this is awesomely apreciated.

Spyrule.

**BigBas** · 10-26-2007, 03:20 PM

You're welcome. I am glad I can help.

One thing to note. You mentioned this:

Allthough I suspect this formula would only work once the list has been
sorted by the comparison column (in this case the phone numbers in column C).

That is not the case. The list does not have to be sorted. If you just follow the formula down the column, it becomes clear why it works. It only queries the column down to the cell itself. For example, cell A15 only does a COUNTIF from A1:A15. So, if it is the first entry, it will receive FALSE (not a duplicate). If it is not the first, then it will be TRUE.