Hello!
I have built the logic that I think gets the results that I want, using four x three helper columns and one x results columns.
However, the formulas required are very memory intensive and will not be practical for real applications.
The external database exports I am planning to use this logic on consists of between 100.000 and 500.000 rows of various data, of which approximately 500 to 2.000 rows contains the relevant data as reflected in the attached workbook.
I would therefore be very grateful if someone would be able to transform my example to something more efficient and robust, presumably using VBA, Power Query and/or improved functions.
Below are the formulas along with some explanation:
Section 1 (splitting two columns with data from three hierarchical levels into three columns):
A2:
B2:
C2:
Section 2 (filling in gaps in parent levels to connect as many of the three levels as possible by going backwards from higher to lower levels):
E2:
F2:
G2:
Section 3 (removing duplicate rows):
I2:
J2:
K2:
Section 4 (removing incomplete rows where more complete rows exist and inserting zeros for blanks):
M2:
N2:
O2:
Section 5 (sorting the resulting rows in ascending order by level 1, then by level 2 and lastly by level 3):
Q2:
R2:
S2:
I am looking forward to your replies. I am sure this solution can be improved significantly. 
Best regards,
Marbleking
Bookmarks