+ Reply to Thread
Results 1 to 5 of 5

Combining Multiple Spreadsheets (semi-duplicates)

Hybrid View

  1. #1
    Registered User
    Join Date
    03-18-2011
    Location
    Melbourne, Australia
    MS-Off Ver
    Excel 2010
    Posts
    11

    Combining Multiple Spreadsheets (semi-duplicates)

    Hi all,

    I know it's been posted before but I can't seem to get my head around it, so I thought I'd start a new thread and show my working example file.

    I have 3 worksheets in one .xls file and they are lists of restaurants. They have been scraped from different sources but there's approx 70% overlap. Problem is, from one source the relevant data might
    a.) be slightly different from the other sources (so I need to define which is the default)
    b.) one source might have more information (extra columns) than the other sources (perhaps including the default source).

    How do I merge them?

    I've attached a sample of what the working excel file looks like, and a sample of the outcome.
    Assume the first worksheet is the default (if overlap, choose Data Capture 1 as right), then Data Capture 2, and lastly 3.

    Hope you guys can help!
    Attached Files Attached Files

  2. #2
    Forum Guru (RIP) Marcol's Avatar
    Join Date
    12-23-2009
    Location
    Fife, Scotland
    MS-Off Ver
    Excel '97 & 2003/7
    Posts
    7,216

    Re: Combining Multiple Spreadsheets (semi-duplicates)

    Try this
    Option Explicit
    
    Sub Main()
        CopyToOneSheet
        CombineLikeDataRemoveDuplicates
        DeleteOldSheets
    End Sub
    
    Private Sub CopyToOneSheet()
        Dim ws As Worksheet
        Dim LastRow As Long, NextRow As Long
    
        'Add new sheet to workbook
        Sheets.Add After:=Sheets(Sheets.Count)
        ActiveSheet.Name = "Combined"
        
        Set ws = ActiveSheet
        'Copy the first sheet with headers to "Combined"
        With Sheets("Data Capture 1")
            LastRow = .UsedRange.Rows.Count
            .Rows("1:" & LastRow).Copy ws.Range("A1")
        End With
        'Copy the next sheets without headers to "Combined"
        With Sheets("Data Capture 2")
            LastRow = .UsedRange.Rows.Count
            NextRow = ws.Range("A" & Rows.Count).End(xlUp).Row + 1
            .Rows("2:" & LastRow).Copy ws.Range("A" & NextRow)
        End With
        With Sheets("Data Capture 3")
            LastRow = .UsedRange.Rows.Count
            NextRow = ws.Range("A" & Rows.Count).End(xlUp).Row + 1
            .Rows("2:" & LastRow).Copy ws.Range("A" & NextRow)
        End With
        
        'Sort the new sheet data on column A
        LastRow = ws.Range("A" & Rows.Count).End(xlUp).Row
        With ws.Sort
            .SortFields.Clear
            .SortFields.Add Key:=Range("A2:A" & LastRow), SortOn:=xlSortOnValues, _
                            Order:=xlAscending, DataOption:=xlSortNormal
            .SetRange Range("A1:L" & LastRow)
            .Header = xlYes
            .MatchCase = False
            .Orientation = xlTopToBottom
            .Apply
        End With
        ws.Range("A:L").EntireColumn.AutoFit
        
    End Sub
    
    Private Sub CombineLikeDataRemoveDuplicates()
        Dim ws As Worksheet
        Dim LastRow As Long, RowNo As Long, ColNo As Long
    
        Set ws = Sheets("Combined")
        With ws
            LastRow = .Range("A" & Rows.Count).End(xlUp).Row
            For RowNo = LastRow To 2 Step -1
                If .Cells(RowNo, 1) = .Cells(RowNo - 1, 1) Then
                    For ColNo = .Columns("B").Column To .Columns("L").Column
                        If .Cells(RowNo - 1, ColNo) = "" Then
                            .Cells(RowNo - 1, ColNo) = .Cells(RowNo, ColNo)
                        End If
                    Next
                    .Cells(RowNo, ColNo).EntireRow.Delete
                End If
            Next
        End With
    End Sub
    
    Private Sub DeleteOldSheets()
        On Error GoTo ResetApplication
        Application.DisplayAlerts = False
        
        Sheets("Data Capture 1").Delete
        Sheets("Data Capture 2").Delete
        Sheets("Data Capture 3").Delete
        
    ResetApplication:
        Err.Clear
        On Error GoTo 0
        Application.DisplayAlerts = True
    End Sub

    This will copy all the sheets to a new sheet.
    Duplicate entries are checked for empty cells and these are filled from the row below, if two rows have data in the same column, the first row takes precedence.
    The "duplicate" row is then deleted
    Finally the old worksheets are deleted.

    If you need to keep the original data for any reason, including checking, use Save As to save the result, if not just save the workbook.

    Try this on a copy of your real workbook first. To run the code use the macro "Main"

    The code is not meant to be clever, I have tried to keep it as basic as possible, just one step up from the macro-recorder, that should help you follqw it easily and make any changes to suit your real sheet that you might need.

    Hope this helps
    Attached Files Attached Files
    If you need any more information, please feel free to ask.

    However,If this takes care of your needs, please select Thread Tools from menu above and set this topic to SOLVED. It helps everybody! ....

    Also
    اس کی مدد کرتا ہے اگر
    شکریہ کہنے کے لئے سٹار کلک کریں
    If you are satisfied by any members response to your problem please consider using the small Star icon bottom left of their post to show your appreciation.

  3. #3
    Forum Guru (RIP) Marcol's Avatar
    Join Date
    12-23-2009
    Location
    Fife, Scotland
    MS-Off Ver
    Excel '97 & 2003/7
    Posts
    7,216

    Re: Combining Multiple Spreadsheets (semi-duplicates)

    I said in my previous Post
    The code is not meant to be clever, I have tried to keep it as basic as possible......
    This is no excuse for bad practice

    UsedRange is unreliable and should best be avoided

    Change this Line (3 Places)
            LastRow = .UsedRange.Rows.Count
    To this
            LastRow = .Cells.Find("*", .Cells(.Rows.Count, .Columns.Count), _
                                  SearchOrder:=xlByRows, SearchDirection:=xlPrevious).Row

    Thanks for the reminder Jerry

    Apologies again

  4. #4
    Registered User
    Join Date
    03-18-2011
    Location
    Melbourne, Australia
    MS-Off Ver
    Excel 2010
    Posts
    11

    Re: Combining Multiple Spreadsheets (semi-duplicates)

    Hey guys, it works well in theory but not so well in application. I'll attach the new data set to show you what I mean.

    1. Entertainment Book's address field is the most accurate than the others but doesn't separate street number for street name.
    2. Conoisseur Club and UrbanSpoon have extra fields like menu, logo, more Bio entries, meal price.
    3. Entertainment Book has an Up To value for their discounts
    4. I'd believe for a duplicate listing, Entertainment Book's phone number listing to be more right than the others and bio to be more right than the others.
    5. In the event that the listings aren't spelt the exact same way, that's fine we can manually edit those if the other fields seem to be the same.
    6. Entertainment Book's Type/Cuisine Type - Ideally these should have been merged to be the same field - either cuisine type or the others (retail, activity, travel).

    I know it's a bit of work, happy to commission someone to help me get it perfect and clean.

  5. #5
    Registered User
    Join Date
    03-18-2011
    Location
    Melbourne, Australia
    MS-Off Ver
    Excel 2010
    Posts
    11

    Re: Combining Multiple Spreadsheets (semi-duplicates)

    http://rapidshare.com/files/454409762/DataFiles.zip

    The excelforum for some reason wouldn;t let me upload it.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1