Hello everyone,
Kindly help with the formula that could search with the given query id (column F) for the highest %Gene identity (column D), which could fetch the values for %gene identity (column I), the gene (column H), and relevant accession (column G). However, Query ID should match the highest % Gene identity only once.
For example, the query id: 27504.m000612, once matched with Raf29 having the overall highest %Gene identity (70.5%) will not be searched again even though it has a higher %Gene identity for Raf21 (69.9%) than that of 27471.m000401 (Raf21: 68.9%). Now, the query ID: 27471.m000401, could be allotted with the following/successive highest (which is 2nd in this case) Raf21 (68.9%) [This is to Avoid alloting duplicate genes to the following query IDs]
Let me explain this in another way, For example, the query id: 27504.m000612 matches with the gene Raf29 having the overall highest %Gene identity (70.5%). We don't want Raf29 to be recognized again for 27471.m000401 (which has a lower %Gene identity than 27504.m000612), so it will be matched to the following highest %Gene identity which is Raf21 (69.9%), this will exclude Raf29 in order to avoid duplication.
However, if the highest/following highest %Gene identity is exactly similar between two query IDs then it could indicate a conflict.
I have similar 300 query ids with such data. I would appreciate it if you could provide the formulas to fetch values for columns G, H, and I from the given query as in column F, thanks in advance.
I have also uploaded the following sample test data image as an excel attachment.
Screenshot_2021-04-22.png
Bookmarks