+ Reply to Thread
Results 1 to 6 of 6

Efficient deduping without "Remove Duplicates"

Hybrid View

Jbm444 Efficient deduping without... 06-25-2010, 12:35 PM
shg Re: Efficient deduping... 06-25-2010, 12:39 PM
Jbm444 Re: Efficient deduping... 06-25-2010, 01:00 PM
shg Re: Efficient deduping... 06-25-2010, 05:02 PM
Jbm444 Re: Efficient deduping... 06-25-2010, 05:15 PM
shg Re: Efficient deduping... 06-25-2010, 05:24 PM
  1. #1
    Registered User
    Join Date
    06-11-2010
    Location
    grinnell, iowa
    MS-Off Ver
    Excel 2007
    Posts
    79

    Efficient deduping without "Remove Duplicates"

    Hey,
    I have a list of about 100 websites, and I need to dedup this list. The problem is, the duplicates I need to remove take the form of:
    http[://www].google.com/oneuniquepage/furtherstuff/evenmorestuff/andjustalittlemoreafterthat
    http:[//www].google.com/seconduniquepage/onlyoneextrathistime
    http:[//www].google.com/thirduniquepage
    (extra brackets so I'm not posting urls in this post).
    So that even though I have google.com one time and I only want it listed once, I can't just hit the remove duplicates button because of the mess after the /. The solution I came up with works, but it is extremely slow.
    Sub Macro1()
    On Error Resume Next
    Dim mylastrowe As Long
    
    With Sheet3
    mylastrowe = .Cells.Find("*", .Cells(.Rows.Count, .Columns.Count), SearchOrder:=xlByRows, SearchDirection:=xlPrevious).Row
    
        Sheet3.Columns("B:B").TextToColumns Destination:=Range("B1"), DataType:=xlDelimited, _
            TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=False, _
            Semicolon:=False, Comma:=False, Space:=False, Other:=True, OtherChar _
            :="/", FieldInfo:=Array(Array(1, 1), Array(2, 1), Array(3, 1), Array(4, 1), Array(5, _
            1), Array(6, 1), Array(7, 1)), TrailingMinusNumbers:=True
        Sheet3.Columns("D:D").Replace What:="www.", Replacement:="", LookAt:=xlPart, _
            SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
            ReplaceFormat:=False
            Sheet3.Columns("B:B").Replace What:=":", Replacement:="://", LookAt:=xlPart, _
            SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
            ReplaceFormat:=False
                Sheet3.Sort.SortFields.Clear
        Sheet3.Sort.SortFields.Add Key:=Range("D1:D" & mylastrowe), _
        SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal
        With Sheet3.Sort
            .SetRange Range("A1:P" & mylastrowe)
            .Header = False
            .MatchCase = False
            .Orientation = xlTopToBottom
            .SortMethod = xlPinYin
            .Apply
        End With
        removeDups
        Sheet3.Range("C1").FormulaR1C1 = "=CONCATENATE("""",RC[-1],"""",RC[1],""/"",RC[2],""/"",RC[3],""/"",RC[4],""/"",RC[5],""/"",RC[6],""/"",RC[7],""/"",RC[8],""/"",RC[9],""/"",RC[10],""/"",RC[11],""/"",RC[12],""/"",RC[13],""/"",RC[14],""/"",RC[15],""/"",RC[16],"""")"
       Macro8
       Macro9
       RemoveTrailingSlash2
       Clear
       End With
    End Sub
    Sub Macro8()
    Dim mylastrow As Long
    
    With Sheet3
    mylastrow = .Cells.Find("*", .Cells(.Rows.Count, .Columns.Count), SearchOrder:=xlByRows, SearchDirection:=xlPrevious).Row
        .Range("C1").AutoFill Destination:=.Range("C1:C" & mylastrow), Type:=xlFillDefault
    End With
    End Sub
    Sub RemoveTrailingSlash2()
        Dim lastrow As Long
        Dim i As Integer
        
        lastrow = Sheet3.Range("B" & Rows.Count).End(xlUp).Row
        With Sheet3
        For i = 1 To lastrow
            Do While Right(Sheet3.Range("B" & i), 1) = "/"
                Sheet3.Range("B" & i) = Left(Sheet3.Range("B" & i), Len(Sheet3.Range("B" & i)) - 1)
            Loop
        Next
        End With
    End Sub
    
    Sub Macro9()
    With Sheet3
        Sheet3.Columns("B:B").ClearContents
        End With
    With Sheet3
        Sheet3.Columns("C:C").Copy
        Sheet3.Columns("B:B").PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
            :=False, Transpose:=False
    End With
    End Sub
    Sub removeDups()
    Dim mylastrow As Long
    
    With Sheet3
    mylastrow = .Cells.Find("*", .Cells(.Rows.Count, .Columns.Count), SearchOrder:=xlByRows, SearchDirection:=xlPrevious).Row
    
    Sheet3.Activate
    .Cells.Select
    .Range("A1:P" & mylastrow).RemoveDuplicates Columns:=4, Header:=xlNo
    .Range("C13").Select
        End With
    End Sub
    
    Sub Clear()
    '
    ' Macro9 Macro
    '
    
    With Sheet3
        Sheet3.Columns("C:S").ClearContents
        End With
    End Sub
    Essentially, this delimits the list by slashes, uses Remove Duplicates, then concatenates it all back together. The major complication that I see is that the concatenation requires me to define it until about column Z so that I don't miss any of the original website address (there may be a lot of /s). This creates a new problem that makes me have to remove a trailing slash from all these cells which don't have a lot of /s. It seems to me that I shouldn't be writing code to defeat problems raised by my own code. As a consequence of my screwy (though working) system, my code is extremely slow to run (I have to run it on more than just one worksheet). Does anyone have an idea for improving the efficiency of this code, especially with regards to eliminating the need to solve problems with code created by code before it? Thanks for the help.
    Last edited by Jbm444; 06-25-2010 at 12:37 PM.

  2. #2
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2010, 2019
    Posts
    40,689

    Re: Efficient deduping without "Remove Duplicates"

    Maybe post an example of the data, with items to be deleted highlighted and the rationale.
    Entia non sunt multiplicanda sine necessitate

  3. #3
    Registered User
    Join Date
    06-11-2010
    Location
    grinnell, iowa
    MS-Off Ver
    Excel 2007
    Posts
    79

    Re: Efficient deduping without "Remove Duplicates"

    Here's an example. Sheet3 has some websites in column B, with duplicates, and Sheet4 shows how Sheet3 should look (approximately) after the code is done. If you run the code I posted above on it, it works (it looks slightly different than Sheet4, but how it ends up is acceptable in that there are no duplicates and that the links still work if copy/pasted into a url bar).
    The difference is, that data in my actual workbook goes on for over hundred lines in the same fashion, and the code gets exponentially slower with more data. This is compounded by the fact that I'm running the same code on multiple sheets.
    Attached Files Attached Files

  4. #4
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2010, 2019
    Posts
    40,689

    Re: Efficient deduping without "Remove Duplicates"

    Try this. Note the refernce required.
    Sub x()
        Dim wks         As Worksheet
        Dim av          As Variant
    
        Set wks = ActiveSheet
        With wks
            With Intersect(.UsedRange, .Columns("B"))
                .Replace What:="//", Replacement:="|", LookAt:=xlPart
                .TextToColumns Destination:=.Cells(1), _
                               DataType:=xlDelimited, _
                               TextQualifier:=xlNone, _
                               ConsecutiveDelimiter:=False, _
                               Tab:=False, _
                               Semicolon:=False, _
                               Comma:=False, _
                               Space:=False, _
                               Other:=True, OtherChar:="/"
                av = Unique(.Cells, 1, False)
            End With
            .Cells.ClearContents
            .Range("A1").Resize(UBound(av) + 1).Value = WorksheetFunction.Transpose(av)
            .Columns("A").Replace What:="|", Replacement:="//"
        End With
    End Sub
    
    Function Unique(r As Range, _
                    Optional iMode As Long = 0, _
                    Optional bCaseSensitive As Boolean = False) As Variant
        ' Requires a reference to Microsoft Scripting Runtime
        ' shg 2008
        '     2010 -- added option to return the dictionary
    
        ' iMode Return
        '   0   Count of unique elements of r
        '   1   1xn array of uniques of r
        '   2   2xn array of uniques of r and their counts
        '   3   Dictionary of unique of r and their counts (VBA only)
    
        Dim dic     As Scripting.Dictionary
        Dim cell        As Range
    
        Set dic = New Scripting.Dictionary
        With dic
            .CompareMode = IIf(bCaseSensitive, BinaryCompare, TextCompare)
    
            For Each cell In Intersect(r, r.Parent.UsedRange)
                If Len(cell.Text) Then
                    If .Exists(cell.Value) Then
                        If iMode >= 2 Then .Item(cell.Value) = .Item(cell.Value) + 1
                    Else
                        .Add Key:=cell.Value, Item:=1
                    End If
                End If
            Next cell
    
            Select Case iMode
                Case 0: Unique = .Count
                Case 1: Unique = .Keys
                Case 2: Unique = Array(.Keys, .Items)
                Case 3: Set Unique = dic
            End Select
        End With
    End Function

  5. #5
    Registered User
    Join Date
    06-11-2010
    Location
    grinnell, iowa
    MS-Off Ver
    Excel 2007
    Posts
    79

    Re: Efficient deduping without "Remove Duplicates"

    I tried to look up what you mean by the reference, but I couldn't find much on it for Excel 2007. This is the closest I got: http://msdn.microsoft.com/en-us/libr...ice.10%29.aspx

  6. #6
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2010, 2019
    Posts
    40,689

    Re: Efficient deduping without "Remove Duplicates"

    In the VBE, Tools > Reference, scroll down (other than 'sticky' commonly-used references, they're in alphabetical order). Tick the box.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1