I have a number of datasets by year that I would like to aggregate into a defined time-series format. The defined format is below (table 2).

However, I have two problems.

(1) The source datasets are formatted like the source data table below (table 1). Is there any way to compile and format these data without doing so manually? There are over 800 rows in five datasets.

(2) Across the years/source datasets, the number of organisations are different (for eg, there are 956 organisations in 2014, 892 in 2015, 923 in 2016, etc.). The majority of the organisations are the same, but the datasets add or remove different ones based on whether the organisations are active. How do I construct an Index-Match formula to account for this?

Specifically, I want to have all the organisations that appear across the years/datasets included in the aggregated data, with the correct matching data for the years they are present and NA for when they are not.

Thanks in advance for any help on this.

Table 1 - Individual source data for specific year

Area name Area code Organisation name Organisation code Female 0-4 Female 5-14 Female 15-24 ... Male 0-4 ...
Area 1 001 Organisation 1 AAA X1 X2 X3 ... Y1 ...
...Area N ...00N ...Organisation M ...MMM X1 X2 X3 ... Y1 ...


Table 2 - Intended-format data for all years

Area name Area code Organisation name Organisation code Gender Age group 2014 2015 2016 ...
Area 1 001 Organisation 1 AAA Female 0-4
Area 1 001 Organisation 1 AAA Female 5-14
Area 1 001 Organisation 1 AAA Female 15-24
Area 1 001 Organisation 1 AAA Female ...
Area 1 001 Organisation 1 AAA Male 0-4
Area 1 001 Organisation 1 AAA Male ...
Area 2 002 Organisation 2 BBB Female 0-4
Area 2 002 Organisation 2 BBB Female 5-14
...Area N ...00N ...Organisation M ...MMM Female ...