I am trying to clean out an e-mail newsletter list w/ nearly a million rows compiled over 10+ years. I've gotten the list down to <100k entries and now the tricky ones are left (there's an unbelievable amount of "bot" entries). I'm having a tough time separating these from genuine addresses. I'd like to:
- find any instance where more than 5 numbers exist in the first half of an address
- find any instance where there are more than 2 transitions between numbers and letters in the first half of the address
Can you help? Here's a sample of some addresses:
506ecc94.8070804@gardenoffrancis.com
20a12.2f2bceae.3d7fe7f5@aol.com
500fd301.5090605@hazelst.com
50d4fa18.8010200@gmail.com
50b4090c.2060102@gmail.com
50adc5a2.3010904@gmail.com
54824.71a65b4a.3d415bc3@aol.com
Bookmarks