Statistical Analysis of a series?

**Alabaster** · 04-30-2010, 03:35 AM

I have several sequences of data and I'd like to do statistical analysis of trends within these sequences.

For simplicity's sake I'll illustrate this using letters:

Sequence 1: A B L A G F C
Sequence 2: X B D K W L H K
Sequence 3: A B L H K W E N
Sequence 4: X B L A G C H K
Sequence 5: J L H B K W E N
etc...

The actual patterns are non-random and have several very clear patterns and tendencies.

I'd like to do some number crunching to determine things like:

1.) What are the most common sequences (pairs, trios), etc. of letters? (for example, in the examples above, "LAG" occurs in both sequence 1 and 4.
2.) What are the most common opening patterns (as in, the first 2-4 letters) (for example, Sequence 1 and 3 both begin with ABL, sequence 2 & 4 both begin with XB.
3.) What are the most common closing patterns (last 3-4 letters) (for example, WEN is the most common ending, HK is the second most common)
4.) When the letter L occurs, what are the most common letters preceding / succeeding it? (e.g., L is usually followed by H, and sometimes by A. W is always preceded by K)
5.) When the letters D and F occur in succession, what are the most common letters preceding / succeeding this pair?
6.) In all the sequences, what are the most/least common letters (in order)?
etc.

...And many other similar questions...

The ultimate goal: given a new sequence with blanks in it, I would like to make intelligent guess as to the most likely letters to fill those blanks.

Thanks

Statistical Analysis of a series?

LinkBack

Thread Tools

Rate This Thread

Display

Statistical Analysis of a series?

Thread Information

Users Browsing this Thread

Bookmarks

Bookmarks

Posting Permissions