Formatting, matching, and aggregating different datasets

**u1659** · 02-05-2019, 05:35 AM

I have a number of datasets by year that I would like to aggregate into a defined time-series format. The defined format is below (table 2).

However, I have two problems.

(1) The source datasets are formatted like the source data table below (table 1). Is there any way to compile and format these data without doing so manually? There are over 800 rows in five datasets.

(2) Across the years/source datasets, the number of organisations are different (for eg, there are 956 organisations in 2014, 892 in 2015, 923 in 2016, etc.). The majority of the organisations are the same, but the datasets add or remove different ones based on whether the organisations are active. How do I construct an Index-Match formula to account for this?

Specifically, I want to have all the organisations that appear across the years/datasets included in the aggregated data, with the correct matching data for the years they are present and NA for when they are not.

Thanks in advance for any help on this.

Table 1 - Individual source data for specific year

Area name	Area code	Organisation name	Organisation code	Female 0-4	Female 5-14	Female 15-24	...	Male 0-4	...
Area 1	001	Organisation 1	AAA	X1	X2	X3	...	Y1	...
...Area N	...00N	...Organisation M	...MMM	X1	X2	X3	...	Y1	...

Table 2 - Intended-format data for all years

Area name	Area code	Organisation name	Organisation code	Gender	Age group
Area 1	001	Organisation 1	AAA	Female	0-4
Area 1	001	Organisation 1	AAA	Female	5-14
Area 1	001	Organisation 1	AAA	Female	15-24
Area 1	001	Organisation 1	AAA	Female	...
Area 1	001	Organisation 1	AAA	Male	0-4
Area 1	001	Organisation 1	AAA	Male	...
Area 2	002	Organisation 2	BBB	Female	0-4
Area 2	002	Organisation 2	BBB	Female	5-14
...Area N	...00N	...Organisation M	...MMM	Female	...