Closed Thread
Results 1 to 13 of 13

Returning all data from a list above the nth percentile

  1. #1
    Registered User
    Join Date
    02-15-2006
    Posts
    15

    Returning all data from a list above the nth percentile

    Hello,

    Is there any function in Excel that returns all values from a list above or below a certain percentage rank/percentile?

    For instance, if you want the average of the 2% highest numbers in a list? Or the sum of the 25% most expensive items in a shopping catalogue?

    It has to be a "dynamic" function, where you can just drop in a set of figures, sort the list, add a percentile, and perform a calculation on all the figures from the cutoff point signalled by the percentile...

    Hope anyone can help me out here.

    Thanks
    Christian

  2. #2
    vezerid
    Guest

    Re: Returning all data from a list above the nth percentile

    Chlor

    you can base your aggregate functions on virtual arrays, which have the
    same length as your data and include either the original numbers or
    FALSE in their corresponding positions, for numbers meeting certain
    criteria. Logical values are ignored in such functions. Example:

    =AVERAGE(IF(A1:A10>PERCENTILE(A1:A10),A1:A10))

    These are *array* formulas, thus you enter them with Shift+Ctrl+Enter.

    Does this help?

    Kostis Vezerides


  3. #3
    Registered User
    Join Date
    02-15-2006
    Posts
    15
    Yes!

    It works, thanks a lot!

    brgds
    Christian

  4. #4
    Registered User
    Join Date
    02-15-2006
    Posts
    15
    I do have a follow up question:

    If you want to apply the above function to categories, for instance:

    95-100%
    60-95%
    30-60%
    10-30%
    0-10%

    How do you write these functions?


    regards
    Christian

  5. #5
    vezerid
    Guest

    Re: Returning all data from a list above the nth percentile

    Christian,

    I just saw your message.

    So you mean you want the average of those in the top 5%, then the next
    35% etc? I think it is the inverse function of PERCENTILE, i.e.
    PERCENTRANK that you must use. Given a data set in A1:A10 and a value
    in the same scale as the dataset in B1, then

    =PERCENTRANK(A1:A10,B1)

    will give you the percentage of data with a value below. Thus, to get
    the average of those in the 60-95% you would say something like:

    =AVERAGE(IF(AND(PERCENTRANK(A1:A10,A1:10)>0.6,PERCENTRANK(A1:A10,A1:A10)<0.95),A1:A10))

    Does this help?

    Kostis


  6. #6
    Registered User
    Join Date
    02-15-2006
    Posts
    15
    Hello again,

    I think we are getting closer, but the function returns #N/A.

    This is the function I have used:

    {=AVERAGE(IF(AND(PERCENTRANK(G2:G150; G2:G150)>0,6; PERCENTRANK(G2:G150; G2:G150)<0,95); G2:G150))}

    Any further suggestions?

  7. #7
    vezerid
    Guest

    Re: Returning all data from a list above the nth percentile

    Christian,

    As you see I do not immediately follow up because of my time zone... I
    must admist I had not tested the formula I gave you. Now I tested it
    with a dataset in A1:A10. When the dataset was empty it returned #N/A
    (ok, this would make sense). WHen I entered data it returned 0. This
    did not make sense. I broke the formula as follows, using intermediate
    results:

    In column B:B:
    =PERCENTRANK($A$1:$A$10,A1)
    In column C:C
    =IF(AND(PERCENTRANK($A$1:$A$10,A1)>0.6,PERCENTRANK($A$1:$A$10,A1)<0.95),A1)
    In a separate cell:
    =AVERAGE(C1:C10)

    This worked. I am still baffled about this and I will probably post a
    question myself. Nothing in the documentation says anything about such
    behavior. At any rate, if you can afford the luxury of auxiliary
    columns you can adopt the approach above.

    HTH
    Kostis


  8. #8
    vezerid
    Guest

    Re: Returning all data from a list above the nth percentile

    OK, just clarified it though my own post. Try this:

    =AVERAGE(IF((PERCENTRANK(A1:A10,A1:A10)>0.6)*(PERCENTRANK(A1:A10,A1:A10)<0.95),A1:A10))

    Again array entered.

    Regards

    Kostis


  9. #9
    Registered User
    Join Date
    02-15-2006
    Posts
    15
    Great Kostis, it works just fine, thanks a lot.

    To make this function completely dynamic, I need the cell references to be open-ended, as the function returns #N/A if the cell references do not match exactly the number of cells with numbers in them.

    This means that I can not define the function to a general cell reference (fex, having sets with various amounts of numbers that needs to be calculated, I could generalise and set every cell reference in all the functions to fex A1:A5000, knowing that my sets of numbers never exceed 4000) in order to be able to just drop in columns of numbers afterwards, and I have to adjust every function to match exactly the number of columns/rows that I have filled with numbers each time.

    I have searched for solutions with regards to open-ended cell references, but can not find anything.

    Do you have any clue as to how to solve this? I know I am asking a lot here, this have really been of great help to me.

  10. #10
    vezerid
    Guest

    Re: Returning all data from a list above the nth percentile

    Christian,

    You lucky dog <g>. Yes, it can be done the way you ask. The following
    formula will do this, based on the indirect function. As you see,
    essentially we are replacing the expression A1:A10 with the expression:

    INDIRECT("A1:A"&COUNT(A:A))

    This assumes that your data start from 1 and subsequently you have all
    numbers. If for example your data start from A2, it should be:

    INDIRECT("A2:A"&COUNT(A:A)+1)

    An alternative is to write the range, say, A2:A3562 in a cell E1 (enter
    it as simple text) and use instead

    INDIRECT(E1)

    The full formula for cells starting at A1 is:

    =AVERAGE(IF((PERCENTRANK(INDIRECT("A1:A"&COUNT(A:A)),INDIRECT("A1:A"&COUNT(A:A)))>0.6)*(PERCENTRANK(INDIRECT("A1:A"&COUNT(A:A)),INDIRECT("A1:A"&COUNT(A:A)))<0.95),INDIRECT("A1:A"&COUNT(A:A))))

    Again to be array entered.

    Regards,
    Kostis


  11. #11
    Registered User
    Join Date
    02-15-2006
    Posts
    15
    Works like a dream!

    Again, thank you.

    Best regards
    Christian

  12. #12
    Registered User
    Join Date
    04-30-2010
    Location
    Pittsburgh, PA
    MS-Off Ver
    Excel 2008 MAC
    Posts
    2

    Re: Returning all data from a list above the nth percentile

    Quote Originally Posted by vezerid View Post
    OK, just clarified it though my own post. Try this:

    =AVERAGE(IF((PERCENTRANK(A1:A10,A1:A10)>0.6)*(PERCENTRANK(A1:A10,A1:A10)<0.95),A1:A10))

    Again array entered.

    Regards

    Kostis
    How does the asterisk work in this formula? I like the elegance, but I am scared to use it as I am unsure of the function.

    Thanks

  13. #13
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2010, 2019
    Posts
    40,689

    Re: Returning all data from a list above the nth percentile

    Welcome to the forum, ddc.

    Please take a few minutes to read the forum rules, and then start your own thread.

    Thanks.
    Entia non sunt multiplicanda sine necessitate

Closed Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1