I spent about 3-4 hours copying and pasting various data from excel to notepad and then when I tried to remove duplicate entries using an external program it ERASED everything. Needless to say I am a bit upset.
I have about 10 gb's of backlink data from various websites that I want to explore. The csv files are from Majestic, OSE and Ahrefs. What I would like to do is import all of spreadsheets, remove duplicate url's and only export the external backlinks for further research. I've done some research into using macros, vba, etc. and I don't understand any of that stuff yet.
I have access to Microsoft Mysql database, Microsoft Access and a lot of hosting plans if needed. I'm not entirely sure what is the best option since I was only able to import one file at a time using Microsoft Access and it will take forever to do it that way. If there is software I can buy to do this that would be ideal since I'm not a programmer, however free is always a good way to go sometimes. I'm sure there is some type of macro, command, etc. I can possibly use to get the job done.
Each of the aforementioned providers have a different format for the files (listed below). My ballpark guess about the number of rows/lines is probably 10-20 million, but don't quote me on that because I'm not entirely sure.
-Ahref Format-
Index,AhrefsRank,UrlFrom,IpFrom,UrlTo,LinksInternal,LinksExternal,Size,Title,Visited,FirstSeen,PrevVisited,Anchor,Type,HttpCode,NoFollow,Image,Site-wide,Alt,TextPre,TextPost
NOTE: Uses " every now and then with no rhyme or reason
-Majestic Format-
Target URL,Target ACRank,Source URL,Source ACRank,Anchor Text,Source Crawl Date,Source First Found Date,FlagNoFollow,FlagImageLink,FlagRedirect,FlagFrame,FlagOldCrawl,FlagAltText,FlagMention,SourceCitationFlow,SourceTrustFlow,TargetCitationFlow,TargetTrustFlow
NOTE: Mostly every single row uses " to separate the colums.
-OSE Format-
URL,Title,Anchor Text,Page Authority,Domain Authority,Number of Domains Linking to this Page,Number of Domains Linking to Domain,Origin,Target URL,Link Equity,No Link Equity,Only rel=nofollow,Only follow,301
NOTE: Uses " every now and then with no rhyme or reason
My guess at this juncture is that I will have to create a macro for each provider for future use. Does anybody know the most cost effective and efficient way to bulk import a few hundred csv files? It would be really nice if there was a SELECT ALL feature when importing csv files, but I couldn't find such a feature.
Thanks.
Bookmarks