Dear all,
I have this matrix of non-continuous, independent data (let's call them "a", "b", "c")
Variable 1 Variable 2 Variable 3
Sample 1 a a a
Sample 2 b b e
Sample 3 c a a
Values are not continous numbers, or measures, but more of a label for each "Variable". Additionally, "a" in Variable 1 does not relate to "a" in Variable 2.
I would like a way to assess similarity (=shared values) between "Samples". I don't care to know which "Variable" is similar or different between the two "Samples", just the number of shared values is fine.
For the example above, we see that:
- Sample 1 has 2 shared value with Sample 3 (for Variable 2)
- Sample 2 has 0 shared value with Sample 1 for any variable
- Sample 3 has 0 shared values with Sample 2
- Each sample has 3 shared values with itself
In that example, Sample 1 and 3 are more similar to each other than Sample 2 (if we exclude self-similarity).
I guess a good way of outputing this is to create a "similarity" matrix:
Shared values:
Sample 1 Sample 2 Sample 3
Sample 1 3 0 2
Sample 2 0 3 0
Sample 3 2 0 3
Does it sound like something possible to do in Excel?
Thanks a lot for any help. I hope I was clear enough!
All the best,
G.
Bookmarks