+ Reply to Thread
Results 1 to 4 of 4

Identifying complex duplicates

  1. #1
    clsnyder@gmail.com
    Guest

    Identifying complex duplicates

    Hi

    I have a spreadsheet with > 40,000 rows. An abbreviated example:
    Date RecNum Code Amt Secondary
    1/1/2005 453654 44324 550.54 no
    12/24/2004 546676 54161 65.66 no
    12/24/2004 546676 44970 900.44 no
    12/24/2004 546676 43830 no
    7/2/2003 54161 54640 773.21 no


    The problem:
    I need to identify dates where the RecNum occurs more than once. All
    but one occurrence on the same date should have the secondary column as
    'yes'. (In other words, if one RecNum (=person) has several codes on
    the same day, one of the codes is primary, and all the others are
    secondary. Ideally, I would look at amount to decide which is the
    primary code and which are the secondary (biggest amount = primary).
    Sometimes, no amount is given (ie, the data is crappy).

    I assume I need to create a new column of date + RecNum (?), then check
    for duplicates, and use an if..then conditional to compare amounts...

    Any guidance would be appreciated....

    TIA


  2. #2
    Registered User
    Join Date
    01-15-2006
    Location
    Mid-Michigan
    Posts
    5
    Without supplying an exact solution, consider the following approach as a set of macros:

    1) Sort entire table as descending amount
    2) turn on autofilters for all columns
    3) start with the first data row, moving down towards the last row on each iteration
    4) extract the date value on the current row.
    5) turn on a filter for that date only
    6) set the first result row as yes, the others as no
    7) undo the autofilter selection by date
    8) increment your selection row, go back to 4)

    Of course, add in needed error checking etc to deal with faulty data.

  3. #3
    Toppers
    Guest

    RE: Identifying complex duplicates

    Hi,

    This assumes data (columns A to E) is sorted by DATE (Ascending),
    RECNUM(Ascending) and AMT (Descending). I have done limited testing!

    HTH

    Sub FindDuplicates()

    Dim DateRng As Range
    Dim lastrow As Long, r As Long
    Dim nrec As Long, nrecnum As Long, ndate As Long

    lastrow = Cells(Rows.Count, 1).End(xlUp).Row
    Set DateRng = Range("a2:a" & lastrow)

    r = 2
    Compdate = Range("a" & r).Value
    Do
    ndate = Application.CountIf(DateRng, Compdate)
    If ndate > 1 Then
    nrec = 0
    Do
    nrecnum = Application.CountIf(Range(Cells(r, 2), Cells(ndate + r - 1,
    2)), Cells(r, 2))
    Cells(r, 5) = "Yes"
    If nrecnum > 1 Then
    Cells(r + 1, 5).Resize(nrecnum - 1, 1) = "No"
    End If
    r = r + nrecnum
    nrec = nrec + nrecnum
    Loop While nrec < ndate
    Else
    Cells(r, 5) = "Yes"
    r = r + 1
    End If


    Compdate = Range("a" & r).Value

    Loop While r <= lastrow
    End Sub

    "clsnyder@gmail.com" wrote:

    > Hi
    >
    > I have a spreadsheet with > 40,000 rows. An abbreviated example:
    > Date RecNum Code Amt Secondary
    > 1/1/2005 453654 44324 550.54 no
    > 12/24/2004 546676 54161 65.66 no
    > 12/24/2004 546676 44970 900.44 no
    > 12/24/2004 546676 43830 no
    > 7/2/2003 54161 54640 773.21 no
    >
    >
    > The problem:
    > I need to identify dates where the RecNum occurs more than once. All
    > but one occurrence on the same date should have the secondary column as
    > 'yes'. (In other words, if one RecNum (=person) has several codes on
    > the same day, one of the codes is primary, and all the others are
    > secondary. Ideally, I would look at amount to decide which is the
    > primary code and which are the secondary (biggest amount = primary).
    > Sometimes, no amount is given (ie, the data is crappy).
    >
    > I assume I need to create a new column of date + RecNum (?), then check
    > for duplicates, and use an if..then conditional to compare amounts...
    >
    > Any guidance would be appreciated....
    >
    > TIA
    >
    >


  4. #4
    Bob Phillips
    Guest

    Re: Identifying complex duplicates

    Does this do what you want?

    In row 2

    =IF(SUMPRODUCT(--($A$2:A2=A2),--($B$2:B2=B2))>1,"no","yes")

    and copy down

    --

    HTH

    RP
    (remove nothere from the email address if mailing direct)


    <clsnyder@gmail.com> wrote in message
    news:1137349505.800459.156610@g47g2000cwa.googlegroups.com...
    > Hi
    >
    > I have a spreadsheet with > 40,000 rows. An abbreviated example:
    > Date RecNum Code Amt Secondary
    > 1/1/2005 453654 44324 550.54 no
    > 12/24/2004 546676 54161 65.66 no
    > 12/24/2004 546676 44970 900.44 no
    > 12/24/2004 546676 43830 no
    > 7/2/2003 54161 54640 773.21 no
    >
    >
    > The problem:
    > I need to identify dates where the RecNum occurs more than once. All
    > but one occurrence on the same date should have the secondary column as
    > 'yes'. (In other words, if one RecNum (=person) has several codes on
    > the same day, one of the codes is primary, and all the others are
    > secondary. Ideally, I would look at amount to decide which is the
    > primary code and which are the secondary (biggest amount = primary).
    > Sometimes, no amount is given (ie, the data is crappy).
    >
    > I assume I need to create a new column of date + RecNum (?), then check
    > for duplicates, and use an if..then conditional to compare amounts...
    >
    > Any guidance would be appreciated....
    >
    > TIA
    >




+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1