Hello all,
I have used this site many times and always found brilliant solutions! I've never made an account though, as I've never been stumped like this before.
I have a very large data set, currently at 289781 rows. It is complaint data spanning many years. There are 56 columns of information per row, not always filled. There is also a sales data set, which relates to the large data set, but not line for line. I can't share the details of the information, but I'll try to make a comparable example, using the most relevant columns:
Large Data:
Complaint Reference Date of Complaint City Product Range Product Family Issue Category Issue Sub Category 12345 01/01/2018 London Fruit Apple Delivery Wrong Product Delivered 12346 05/04/2018 Manchester Vegetable Carrot Health Product Made Customer Turn Green
Sales:
Date City Product Range Product Family Sales Jan 2018 London Fruit Apple 145 Jan 2018 Manchester Vegetable Carrot 5373
The sales table does not contain Issue Category or Issue Sub Category information. So, you can see how many Carrots were sold for a particular market, or how much Fruit was sold overall etc.
My problem is I am trying to calculate CPM (Complaints per Million). I do this by dividing the number of complaints by the sales and then multiplying by 1000000.
This is fine if I only want to see down to the Product Family level, i.e what the CPM for Carrots is for London over a certain time etc. However, I would like to be able to see the CPM of Issue Categories, and furthermore see the Top 3 occurring Issue Sub Categories. The sales data will not line up with an Issue Category or Sub Category, as sales is shown down to the product family level.
For example, there may have been 100 Apples sold, and 50 complaints. Of those 50 complaints, 40 were 'I hate your apples' and 10 were 'I got oranges instead'. The CPM for apples overall is easy, but how do I calculate the CPM against the individual complaints to enable me to see the CPM for 'I hate your apples'?
I apologise if this is confusing! I have already tried adding the Sales value to each line of complaint data, matching it to the right country, date, product and product family. However, a pivot table will then do the SUM of sales, so when I look at a certain issue sub category, it will add up each lines' sales data. I've tried setting it to MIN instead of SUM, but I don't think this works when looking over multiple filters (Apples and Carrots for example)
I've also tried multiple calculated fields, GETPIVOTDATA in non pivot tables, adding slicers to pivots, I just cannot get my head around it!
The CPM has to be calculated first before finding the top 3 issues. I was previously finding the top 3 occurring complaints for a category and then calculating the CPM which was much easier. But just because there are more of a certain complaint, if it also has a billion sales, it's CPM will actually be quite low. So I need to somehow calculate the CPM for each category, then sort it to find the top 3 issues.
Any help would be greatly appreciated. Again, I apologise for being long winded and probably confusing! Please do not hesitate to ask if you would like further clarification on anything.
Bookmarks