Hi everyone,

First post here!

For work at the moment I need to sort through a bunch of URLs, which may have duplicates in them in different formats, eg:

They may start with:

http
https
have www
not have www
have a trailing slash /
have a trailing slash with a # at the end, eg. /#

(Sorry not to be more specific, can't post links yet)

These could sometimes be very long and URLs in general are all different lengths, but the bits I want to match should always ignore the final 2 characters in the url, and everything before the root domain on the site, eg. http, www, etc.

I'd like to look for and delete duplicates, while still retaining one instance of the URL - which one doesn't matter so much, as they all the different variations point to the same content anyway.

Is there any way, or any functions in excel I can use to make this as easy and hands-free as possible?

Thanks!