"Will this work with balexander@email.gov.za - returning Alexander?"
No. Since both my algorithms state: "Assume the first letter of the first name is the very first letter in the e-mail address", so it would look for the strings:
"b", "ba", "bal", "bale", "balex", "balexa", "balexan", "balexand", "balexande", "balex", "balexander"
and, from amongst those which match against your list, choose the longest.
We could have the algorithm go through all strings of all lengths from all starting points, if you want, i.e. as well as the above, it would also then search for:
"a", "al", "ale", "alex", ... ,"l", "le", "lex", "lexa", ..., "e", "ex", "exa", "exan", ... , "r"
and then take the longest of those, but this is making things much more complicated, not to mention the fact that it would then potentially take a surname over a first name (since we are no longer restricting our search strings to beginning with the first letter of the e-mail address).
I'm afraid you need to make a decision over one of the two algorithms I proposed (or offer an alternative that I haven't yet mentioned, of course) and accept the fact that, like I said earlier, it's simply never going to work in all cases. I'm afraid that you still just don't seem to grasp how near-impossible it is to know with any reasonable certainty which part of a given e-mail address is the first name. Not to mention the fact that no part of it may be the first name...
Regards
Bookmarks