+ Reply to Thread
Results 1 to 19 of 19

What is the optimal way for Excel to lookup a large amouts of data?

  1. #1
    Registered User
    Join Date
    11-13-2009
    Location
    France
    MS-Off Ver
    Excel 2007
    Posts
    32

    What is the optimal way for Excel to lookup a large amouts of data?

    Hi,

    What is the most optimal way (spreadsheet size vs speed) for Excel to access a lot of data?
    I have 1.75 million records each containing: Postcode, District, County, Region.

    For speed, I split the postcode data into 22 worksheets, each containing the postcode records starting with a particular letter, A,B,C,D, etc.
    The formula below determines the first letter of the postcode, then looks up the corresponding data in column 2 (District) in the correct worksheet (A,B,C, etc.)

    Then I put the formula into two additional columns to return the corresponding County and Region data. (Columns 3 & 4) and down to just 50 rows

    The result is fast but the worksheet has grown from 2mb to 32mb. Which takes forever to load. No idea why.

    So, I need help! What is the most optimal way (spreadsheet size vs speed) to for Excel to access a lot of data?
    As I mentioned, I have 1.75 million records each containing: Postcode, District, County, Region

    The formula I used which as quick but made the worksheet huge was:
    IF(LEFT(E2,1)="A",VLOOKUP(E2,[Lookup.xlsx]A!$A$1:$E$24521,2,FALSE),
    IF(LEFT(E2,1)="B",VLOOKUP(E2,Lookup.xlsx]B!$A$1:$E$169530,2,FALSE),
    IF(LEFT(E2,1)="C",VLOOKUP(E2,Lookup.xlsx]C!$A$1:$E$148162,2,FALSE),
    etc. down to
    IF(LEFT(E2,1)="Z",VLOOKUP(E2,[Lookup.xlsx]Z!$A$1:$E$631,2,FALSE)

  2. #2
    Forum Expert mrice's Avatar
    Join Date
    06-22-2004
    Location
    Surrey, England
    MS-Off Ver
    Excel 2013
    Posts
    4,967

    Re: What is the optimal way for Excel to lookup a large amouts of data?

    It looks like your data volume is moving beyond the point at which Excel is the best way of storing it. I would suggest moving to a database solution instead. You might try MS Access as one way forward as you may already have this as part of an MS Office installation.
    Martin

  3. #3
    Forum Guru DonkeyOte's Avatar
    Join Date
    10-22-2008
    Location
    Northumberland, UK
    MS-Off Ver
    O365
    Posts
    21,531

    Re: What is the optimal way for Excel to lookup a large amouts of data?

    It's not clear from your file if the 22 sheets and summary sheet are within the same file or not ?

    Is the data in Column A on each of the 22 sheets sorted by Col A in ascending order ?

    In which columns on your summary sheet are you placing your VLOOKUP formulae ?

  4. #4
    Registered User
    Join Date
    11-13-2009
    Location
    France
    MS-Off Ver
    Excel 2007
    Posts
    32

    Re: What is the optimal way for Excel to lookup a large amouts of data?

    Thanks for your reply.

    I have no experience with Access. Would I store the postcode data in MS Access and then pull it into Excel from there?
    If yes, what part of Access/Excel do I need to learn about to move data between the two applications.?

    I am always up to learn new things.

  5. #5
    Registered User
    Join Date
    11-13-2009
    Location
    France
    MS-Off Ver
    Excel 2007
    Posts
    32

    Re: What is the optimal way for Excel to lookup a large amouts of data?

    Don Hi,

    In one workbook are the VLOOKUP formula. In another separate workbook is the postcode data [Lookup.xlsx] which just contains contains 22 text only worksheets, called A, B, C etc. The vlookup's are looking up the first data column which is sorted in ascending order. It works fast.

    The only problem is that the main workbook containing the vlookups has grown for 2mb to 32mb just by adding these the lookup formula into 150 cells.
    Last edited by holmwood; 05-08-2010 at 05:30 AM.

  6. #6
    Forum Guru DonkeyOte's Avatar
    Join Date
    10-22-2008
    Location
    Northumberland, UK
    MS-Off Ver
    O365
    Posts
    21,531

    Re: What is the optimal way for Excel to lookup a large amouts of data?

    Is [Lookup.xlsx] always open when you're conducting the VLOOKUPs ?

  7. #7
    Registered User
    Join Date
    11-13-2009
    Location
    France
    MS-Off Ver
    Excel 2007
    Posts
    32

    Re: What is the optimal way for Excel to lookup a large amouts of data?

    Yes [Lookup.xlsx] is always open.

    Speed is not the issue, getting the data is very fast. The problem is why the main spreadsheet grows for 2mb to 32 mb when I add the above conditional Vookup formula into just 150 cells. This means the main sheet takes ages to load and to save.

    Should I have 2 sheets each with 750,000 rows and a simpler VLOOKUP formula, or access the data in a different way, or use a different formula?

    Why does adding 150 new formula add 30mb to the sheet size??

  8. #8
    Forum Guru DonkeyOte's Avatar
    Join Date
    10-22-2008
    Location
    Northumberland, UK
    MS-Off Ver
    O365
    Posts
    21,531

    Re: What is the optimal way for Excel to lookup a large amouts of data?

    You could condense the formula courtesy of an INDIRECT call, however, this would in turn make the calculations Volatile.

    If you opt to use INDIRECT then I would suggest you take other steps to utilise the fact that your data is stored in Ascending order (in short, using Binary Search based lookups).

    This will improve efficiency which though as you say is already good - once Volatile you will want to improve further.
    (Volatiles will calculate more often so it's a good idea to make them as fast as possible).

    I would first suggest that in a single column on your Summary sheet you establish the row position of the matching item (on the appropriate sheet), this reduces no. of repetitive calcs in remaining columns, eg:

    Please Login or Register  to view this content.
    where the value in E2 is not found in the appropriate sheet in Lookup.xlsx the result will be 0.

    In your other calcs you can then use the value in Zn to retrieve the data via an INDEX/MATCH call (the MATCH having been conducted in Col Z), ie this:

    Please Login or Register  to view this content.
    becomes

    Please Login or Register  to view this content.
    Let us know if the above approach helps at all.
    Last edited by DonkeyOte; 05-09-2010 at 04:13 AM. Reason: revised wording

  9. #9
    Registered User
    Join Date
    11-13-2009
    Location
    France
    MS-Off Ver
    Excel 2007
    Posts
    32

    Re: What is the optimal way for Excel to lookup a large amouts of data?

    Don, I am very grateful for your time and wisdom.
    I will be out for the rest of today, but will implement your suggestions tomorrow and let you know how I get on.
    Many thanks.

  10. #10
    Registered User
    Join Date
    11-13-2009
    Location
    France
    MS-Off Ver
    Excel 2007
    Posts
    32

    Re: What is the optimal way for Excel to lookup a large amouts of data?

    Don HI,
    I have used the formula you gave me which is much shorter and more elegant. Thank you.
    Also, I have changed the postcode data so it looks up just one field comprising district,county,region. Once retrieved I split it out with: Morefunc WMID(G4,1,1,",") and WMID(G4,2,1,",") and WMID(G4,3,1,",")
    However, the huge size of the workbook remains although now only 21mb.

    How can I find out what is making the sheet so large, Is there way to analyse which cells are making the workbook so large?

    Next question, how do I search/find my previous thread? I am sure to told me once how to put code into it's own window.

    Thanks

  11. #11
    Forum Guru DonkeyOte's Avatar
    Join Date
    10-22-2008
    Location
    Northumberland, UK
    MS-Off Ver
    O365
    Posts
    21,531

    Re: What is the optimal way for Excel to lookup a large amouts of data?

    Have you tried re-setting the Used Range ?

    Go to the last row of data - then go to the row below and highlight all subsequent rows and run Clear -> All (repeat for Columns)

    Save the file & close - review File Size.

    If the above does not work - and if you have other sheets within the "large" file...

    - create a copy of your current file for working purposes
    - use the copy file and proceed to delete the sheets one by one each time you delete a sheet save the file & review the file size

    the above will help identify the offending sheet(s) at which point you can investigate further (ie repeat above process for starters).

  12. #12
    Registered User
    Join Date
    11-13-2009
    Location
    France
    MS-Off Ver
    Excel 2007
    Posts
    32

    Re: What is the optimal way for Excel to lookup a large amouts of data?

    I have now found the culprits that are making my sheet huge.
    If I add the following formula to just ONE cell, (no copying down) the sheet size grows by nearly 1mb:

    =VLOOKUP($I2&"¬"&$S2,[Lookup.xlsx]CountyPlaceNames!$A$1:$B$50868,2,FALSE)

    Why? Can I stop this happening?

  13. #13
    Forum Contributor
    Join Date
    12-03-2009
    Location
    Providence RI
    MS-Off Ver
    Excel 2007, Excel 2010
    Posts
    260

    Re: What is the optimal way for Excel to lookup a large amouts of data?

    VLOOKUP tends to add a lot to file size and is a somewhat slower formula. Excel stores extra information we never see nor truly know about. I would recommend looking into INDEX/MATCH over VLOOKUP if/when possible if it is adding that much to your file size.

  14. #14
    Registered User
    Join Date
    11-13-2009
    Location
    France
    MS-Off Ver
    Excel 2007
    Posts
    32

    Re: What is the optimal way for Excel to lookup a large amouts of data?

    I will check this out.

    Thanks

  15. #15
    Registered User
    Join Date
    11-13-2009
    Location
    France
    MS-Off Ver
    Excel 2007
    Posts
    32

    Re: What is the optimal way for Excel to lookup a large amouts of data?

    Unfortunately, when looking up over 50,000 rows, whether I use VLOOKUP or the equivalent INDEX/MATCH formula, the size is the same.

    Which makes a new empty sheet with either of the following formula in just ONE cell, a staggering 883KB (nearly 1 Mb)
    Please Login or Register  to view this content.

    Interestingly, if I put both the above formulas (each referring to the same data) in the same sheet, the size says the same at 883KB.

    It's seems like Excel is caching the lookup data in the summary sheet.

    Can I stop this happening? What can I do to keep my sheet sizes small, as I have many different lookups to add?

  16. #16
    Valued Forum Contributor squiggler47's Avatar
    Join Date
    02-17-2009
    Location
    Littleborough, UK
    MS-Off Ver
    Excel 3.0 to 2007+2010! (couldnt get 2.1 working)
    Posts
    1,013

    Re: What is the optimal way for Excel to lookup a large amouts of data?

    I would assume that Excel is caching the information from Lookup.xlsx incase it isnt open for some reason! Which would explain why using the same range doesnt increase the file size!

    It should be easy to verify by unzipping the XLSX file and opening with notepad, you should be able to see if the data is stored in the file!
    Regards
    Darren

    Update 12-Nov-2010 Still job hunting!

    If you are happy with the results, please add to our reputation by clicking the blue scales icon in the blue bar of the post.

    Learn something new each day, Embrace change do not fear it, evolve and do not become extinct!


  17. #17
    Valued Forum Contributor squiggler47's Avatar
    Join Date
    02-17-2009
    Location
    Littleborough, UK
    MS-Off Ver
    Excel 3.0 to 2007+2010! (couldnt get 2.1 working)
    Posts
    1,013

    Re: What is the optimal way for Excel to lookup a large amouts of data?

    on inspection the XLXS file contains a section called external links, which contains a copy of the external data referenced in the sheet, including all values!

    Storing the data in Access/MSSQL/MySQL and using a UDF to query the database would be a more efficient method! If you have access to Access I'm sure someone here would help with the database side, and the excel query side!
    Last edited by squiggler47; 05-12-2010 at 06:05 AM.

  18. #18
    Forum Contributor
    Join Date
    12-03-2009
    Location
    Providence RI
    MS-Off Ver
    Excel 2007, Excel 2010
    Posts
    260

    Re: What is the optimal way for Excel to lookup a large amouts of data?

    Could you use the formula in VBA instead to do it and paste values so that the formulas don't stay? That could cut down your size as it wouldn't need to cache the data. This is just a quick example of a piece of code I do that with to save calculation time when using the file. Don't know if the lookup data is variable or not. In my case it is, but only once a month so I run the macro monthly.

    Please Login or Register  to view this content.

  19. #19
    Registered User
    Join Date
    11-13-2009
    Location
    France
    MS-Off Ver
    Excel 2007
    Posts
    32

    Re: What is the optimal way for Excel to lookup a large amouts of data?

    Summary:

    Excel caches the data it accesses via VLOOKUP or MATCH/INDEX in the summary sheet. This is why you can access data without the source data sheet being open.
    The downside is that it make your summary sheet large in Mb, if you have a large data array. Which in turn, means it takes forever to open or save your summary sheet.

    It would be nice if Excel gave the user the option to cache or not to cache.

    I learnt a lot from this thread. Thanks to everyone for their help and advice.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1