I've got a huge file of data that I need consolidated. I've attached a file of dummy data to show how it's organized when I receive it and what it needs to look like - basically, it has 6 or 7 identifying columns for each item, and the original files have about 30 value columns after. Each row has only one value, and in the next row the identifying cells are repeated and the next value is in place. A few things going on that make this difficult:
- some (many) items have duplicate values. These cannot be combined or averaged in any way, but a new line needs to be made with only the 'extra' value in it.
- some values are below a threshold - i.e. they show '<number'. Excel has been treating this as a text item and throws it out of pivot tables.
- Did I mention these files can be upwards of 40 000 lines long?
My current workaround is to add in another column that I put a mark into on the row with a duplicate data point, identified by using a pivot table with item counts. Once all counts are 1 or 0, I replace the '<' symbol with a number I know isn't in the original data sheet (ex 88888888888), and create a pivot table with all max values. I then copy and paste the pivot table into a new worksheet, and reinsert the '<' symbols.
This is STILL crazy painful and tedious (although not as painful as doing it by hand, which was how it was suggested I should do it...). Please tell me there is a better way?
Bookmarks