+ Reply to Thread
Results 1 to 3 of 3

Complicated data deduplication criteria using power query

  1. #1
    Registered User
    Join Date
    01-30-2022
    Location
    Texas
    MS-Off Ver
    365
    Posts
    62

    Complicated data deduplication criteria using power query

    Hi,

    I'm working on cleaning data to use in my project. The deduplication criteria are quite complicated. The data include Disease (5 values), ID, name, Accession number, specimen date, specimen source, result date, result status (3 values) (included in the attachment sample), and other variables (not included). The criteria for removing observations are as follows:

    For observations with the same "name", "specimen_date", "Result_Status" and different "disease", keep all as in observations 2 & 3.
    - I removed duplicate rows using "disease", "Name", "specimen_date" and "result_status" columns.

    For observations with the same "disease", "name", and "result status", keep the one with the earliest "specimen date" (or "result_date") as in observations 4 & 5, also 16 & 17.

    - I sorted the data set by "specimen_date" and "result_date" before removing duplicate rows.


    For observations with the same "disease", "name", "Specimen_Date" and different "result status", keep the one with the 'Final Result' and delete the 'preliminary' as in observations 21 & 22 also 18 & 19, and 6 & 8.

    - I removed duplicate rows using "disease", "Name", and "specimen date".

    For observations with the same name, disease, and result status but a difference in specimen date of more than one year, keep both as in observations 23 & 24.

    - I have no clue how to work on this.

    Any help on this is greatly appreciated.

    Thank you
    Attached Files Attached Files
    Last edited by Mayasak; 01-04-2023 at 02:14 AM.

  2. #2
    Forum Expert
    Join Date
    12-24-2007
    Location
    Alsace - France
    MS-Off Ver
    MS 365 Office Suite
    Posts
    5,088

    Re: Complicated data deduplication criteria using power query

    where is "specimen_number"
    - Battle without fear gives no glory - Just try

  3. #3
    Registered User
    Join Date
    01-30-2022
    Location
    Texas
    MS-Off Ver
    365
    Posts
    62

    Re: Complicated data deduplication criteria using power query

    My mistake, It's "specimen_date". Edited the post to reflect this.
    Thank you

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. How to delete rows on Different criteria in Power Query
    By Rosellina in forum Excel General
    Replies: 7
    Last Post: 11-03-2022, 03:22 AM
  2. [SOLVED] Two Query summary deduplication problem
    By choletseng in forum Excel Formulas & Functions
    Replies: 9
    Last Post: 02-25-2022, 03:59 AM
  3. [SOLVED] Power Query - Join Two tables using NOT EQUAL criteria
    By MLAN_75 in forum Excel General
    Replies: 9
    Last Post: 06-30-2020, 11:23 AM
  4. Replies: 4
    Last Post: 02-17-2020, 06:03 AM
  5. Sub-Forum for Excel Power Tools (Power Query, Power Pivot & Power BI)
    By chullan88 in forum Suggestions for Improvement
    Replies: 10
    Last Post: 06-28-2018, 02:25 PM
  6. [SOLVED] Power Query to Extract Data based on multiple criteria
    By Philipsfn in forum Excel Formulas & Functions
    Replies: 27
    Last Post: 01-16-2018, 01:45 PM
  7. Replies: 0
    Last Post: 10-25-2016, 02:59 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1