+ Reply to Thread
Results 1 to 3 of 3

Import Various Tables with Different Formats from PDF into Excel and arranged.

  1. #1
    Registered User
    Join Date
    03-12-2014
    Location
    Houston
    MS-Off Ver
    Excel 2010
    Posts
    52

    Import Various Tables with Different Formats from PDF into Excel and arranged.

    I am trying to import multiple PDF tables into Excel but the tables appear on different pages with 3 different formats of the table. A solution on how to import PDF tables with the same format has been provided in the link below and it worked great for me for the folders that have the same format.

    https://www.excelforum.com/excel-pro...ml#post5858137

    This time I plan to import from multiple PDF tables with various formats and the values I need to extract are shown in the attached pdf in red. How can I modify the solution shown in the link above to be able to solve this problem?

    The results that I need are shown in the excel file called output2.xlsx. The results in blue are from the first table format, the results in yellow are from the second table format and the results in green are from the third table format.

    Overall, I plan to read through a folder that has about 700 pdf files with these tables but these tables do not appear on specific pages.

    **Please also note that the PDF may have other tables that are not relevant. I want the Query to be able to skip all those irrelevant tables and pick the table formats that I specified

    Thanks in advance
    Attached Files Attached Files
    Last edited by chi05; 08-10-2023 at 01:36 PM.

  2. #2
    Registered User
    Join Date
    08-10-2023
    Location
    new york
    MS-Off Ver
    12.0
    Posts
    1

    Re: Import Various Tables with Different Formats from PDF into Excel and arranged.

    Based on placing two copies of your PDF in the same directory with different names, this seems to process all the files and extract the tables. That assumes your real data is going to resemble the sample data

    let

    group= (filename)=>
    //function to open each file and pull out all the tables in them for processing
    let Source = Pdf.Tables(File.Contents(filename), [Implementation="1.3"]),
    List = List.Union(List.Transform(Source[Data], each Table.ColumnNames(_))),
    #"Expanded Data" = Table.ExpandTableColumn(Source, "Data", List,List),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Data", each Text.StartsWith([Name], "Table")),
    #"Grouped Rows" = Table.Group(#"Filtered Rows", {"Name"}, {{"data", each _, type table}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each split([data]))
    in #"Added Custom",

    //function to process each table individually
    split= (variable)=> let
    #"Filtered Rows1" = Table.SelectRows(variable, each ([Column1] <> "" and [Column1]<>null and [Column1]<>"Final" and [Column2] <> "" and [Column2]<>null)),
    #"Promoted Headers" = Table.PromoteHeaders(#"Filtered Rows1", [PromoteAllScalars=true]),
    #"Removed Columns" = Table.RemoveColumns(#"Promoted Headers",List.FirstN(Table.ColumnNames(#"Promoted Headers"),3)),
    Names=List.RemoveNulls(List.Transform(Table.ColumnNames(#"Removed Columns"), each if Text.Contains(_,"Volume") or Text.Contains(_,"Unit") then _ else null)),
    #"Removed Other Columns" = Table.SelectColumns(#"Removed Columns",Names),
    FirstName=List.First(Table.ColumnNames(#"Removed Other Columns")),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Removed Other Columns",{FirstName} , "Attribute", "Value"),
    #"Renamed Columns" = Table.RenameColumns(#"Unpivoted Other Columns",{{FirstName, "Category"}}),
    Transform = Table.TransformColumns(#"Renamed Columns",{{"Attribute",each Text.BeforeDelimiter(_,"_"), type text}}),
    //Dates
    RepeatCount=Table.RowCount(#"Removed Columns"),
    Dates = Record.ToList(variable{1}),
    b = List.RemoveNulls(List.Transform(Dates, each try if Text.Contains (_,"(") and not Text.Contains (_,"Table") then Text.BeforeDelimiter(_, "(") else null otherwise null)),
    a={List.Repeat(b,RepeatCount + 2)},
    //Distance
    #"Promoted Headers5" = Table.PromoteHeaders(variable, [PromoteAllScalars=true]),
    Distance = List.Repeat(List.RemoveNulls(List.Transform(Table.ColumnNames(#"Promoted Headers5"), each try if Text.Contains(_,"Distance") then Number.From(Text.Select(_,{"0".."9"})) else null otherwise null)),List.Count(Names)-1),DistanceTable= Table.FromList(Distance, Splitter.SplitByNothing(), {"Value"}, null, ExtraValues.Error),
    #"Added Custom3" = Table.AddColumn(DistanceTable, "Category", each "Distance"),
    //Total
    #"Filtered Rows5" = Table.SelectRows(variable, each ([Column1] = "Final")){0},
    r=Record.ToList(#"Filtered Rows5"),
    s=List.PositionOf(r,"Final"),
    rr=List.Transform(r, each if _="" then null else _),
    t=List.RemoveNulls(List.RemoveFirstN(rr,s+1)),
    TotalTable= Table.FromList(t, Splitter.SplitByNothing(), {"Value"}, null, ExtraValues.Error),
    #"Added Custom4" = Table.AddColumn(TotalTable, "Category", each "Final"),
    //combine it all
    h = Table.FromColumns(Table.ToColumns(Transform&#"Added Custom3"& #"Added Custom4")&a),
    #"Removed Columns1" = Table.RemoveColumns(h,{"Column2"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Removed Columns1",{{"Column1", type text}, {"Column3", type number}, {"Column4", type date}}),
    #"Pivoted Column" = Table.Pivot(#"Changed Type", List.Distinct(#"Changed Type"[Column1]), "Column1", "Column3", List.Sum)
    in #"Pivoted Column",

    Source2 = Folder.Files("c:\temp5"),
    // main code, retrieve all filenames, process them with the functions, then combine and expand them
    #"Filtered Rows" = Table.SelectRows(Source2, each ([Extension] = ".pdf")),
    #"Added Custom" = Table.AddColumn(#"Filtered Rows", "Data", each group([Folder Path]&[Name])),
    #"Renamed Columns" = Table.RenameColumns(#"Added Custom",{{"Name", "zName"}}),
    #"Expanded Data" = Table.ExpandTableColumn(#"Renamed Columns", "Data",{"Name","Custom"},{"Name","Custom"}),
    List = List.Union(List.Transform(#"Expanded Data"[Custom], each Table.ColumnNames(_))),
    #"Expanded Data2" = Table.ExpandTableColumn(#"Expanded Data", "Custom", List,List),
    #"Changed Type" = Table.TransformColumnTypes(#"Expanded Data2",{{"Column4", type date}})
    in #"Changed Type"

  3. #3
    Registered User
    Join Date
    03-12-2014
    Location
    Houston
    MS-Off Ver
    Excel 2010
    Posts
    52

    Re: Import Various Tables with Different Formats from PDF into Excel and arranged.

    Quote Originally Posted by popcornlove View Post
    based on placing two copies of your pdf in the same directory with different names, this seems to process all the files and extract the tables. That assumes your real data is going to resemble the sample data

    let

    group= (filename)=>
    //function to open each file and pull out all the tables in them for processing
    let source = pdf.tables(file.contents(filename), [implementation="1.3"]),
    list = list.union(list.transform(source[data], each table.columnnames(_))),
    #"expanded data" = table.expandtablecolumn(source, "data", list,list),
    #"filtered rows" = table.selectrows(#"expanded data", each text.startswith([name], "table")),
    #"grouped rows" = table.group(#"filtered rows", {"name"}, {{"data", each _, type table}}),
    #"added custom" = table.addcolumn(#"grouped rows", "custom", each split([data]))
    in #"added custom",

    //function to process each table individually
    split= (variable)=> let
    #"filtered rows1" = table.selectrows(variable, each ([column1] <> "" and [column1]<>null and [column1]<>"final" and [column2] <> "" and [column2]<>null)),
    #"promoted headers" = table.promoteheaders(#"filtered rows1", [promoteallscalars=true]),
    #"removed columns" = table.removecolumns(#"promoted headers",list.firstn(table.columnnames(#"promoted headers"),3)),
    names=list.removenulls(list.transform(table.columnnames(#"removed columns"), each if text.contains(_,"volume") or text.contains(_,"unit") then _ else null)),
    #"removed other columns" = table.selectcolumns(#"removed columns",names),
    firstname=list.first(table.columnnames(#"removed other columns")),
    #"unpivoted other columns" = table.unpivotothercolumns(#"removed other columns",{firstname} , "attribute", "value"),
    #"renamed columns" = table.renamecolumns(#"unpivoted other columns",{{firstname, "category"}}),
    transform = table.transformcolumns(#"renamed columns",{{"attribute",each text.beforedelimiter(_,"_"), type text}}),
    //dates
    repeatcount=table.rowcount(#"removed columns"),
    dates = record.tolist(variable{1}),
    b = list.removenulls(list.transform(dates, each try if text.contains (_,"(") and not text.contains (_,"table") then text.beforedelimiter(_, "(") else null otherwise null)),
    a={list.repeat(b,repeatcount + 2)},
    //distance
    #"promoted headers5" = table.promoteheaders(variable, [promoteallscalars=true]),
    distance = list.repeat(list.removenulls(list.transform(table.columnnames(#"promoted headers5"), each try if text.contains(_,"distance") then number.from(text.select(_,{"0".."9"})) else null otherwise null)),list.count(names)-1),distancetable= table.fromlist(distance, splitter.splitbynothing(), {"value"}, null, extravalues.error),
    #"added custom3" = table.addcolumn(distancetable, "category", each "distance"),
    //total
    #"filtered rows5" = table.selectrows(variable, each ([column1] = "final")){0},
    r=record.tolist(#"filtered rows5"),
    s=list.positionof(r,"final"),
    rr=list.transform(r, each if _="" then null else _),
    t=list.removenulls(list.removefirstn(rr,s+1)),
    totaltable= table.fromlist(t, splitter.splitbynothing(), {"value"}, null, extravalues.error),
    #"added custom4" = table.addcolumn(totaltable, "category", each "final"),
    //combine it all
    h = table.fromcolumns(table.tocolumns(transform&#"added custom3"& #"added custom4")&a),
    #"removed columns1" = table.removecolumns(h,{"column2"}),
    #"changed type" = table.transformcolumntypes(#"removed columns1",{{"column1", type text}, {"column3", type number}, {"column4", type date}}),
    #"pivoted column" = table.pivot(#"changed type", list.distinct(#"changed type"[column1]), "column1", "column3", list.sum)
    in #"pivoted column",

    source2 = folder.files("c:\temp5"),
    // main code, retrieve all filenames, process them with the functions, then combine and expand them
    #"filtered rows" = table.selectrows(source2, each ([extension] = ".pdf")),
    #"added custom" = table.addcolumn(#"filtered rows", "data", each group([folder path]&[name])),
    #"renamed columns" = table.renamecolumns(#"added custom",{{"name", "zname"}}),
    #"expanded data" = table.expandtablecolumn(#"renamed columns", "data",{"name","custom"},{"name","custom"}),
    list = list.union(list.transform(#"expanded data"[custom], each table.columnnames(_))),
    #"expanded data2" = table.expandtablecolumn(#"expanded data", "custom", list,list),
    #"changed type" = table.transformcolumntypes(#"expanded data2",{{"column4", type date}})
    in #"changed type"
    wow !!!!!!!!!!!!!!!!!!!!!!!! Thank you so much!!!!

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. [SOLVED] Import Tables from PDF into Excel
    By chi05 in forum Excel Programming / VBA / Macros
    Replies: 4
    Last Post: 08-09-2023, 07:34 PM
  2. Import data to Word from Excel tables
    By 2pado in forum Excel General
    Replies: 0
    Last Post: 04-07-2023, 02:39 PM
  3. Import multiple excel tables at once.
    By kris01 in forum Word Formatting & General
    Replies: 2
    Last Post: 10-15-2020, 12:49 AM
  4. [SOLVED] Data from improperly arranged rows from different tables
    By LesliePrabakar in forum Excel Formulas & Functions
    Replies: 2
    Last Post: 08-27-2016, 04:44 AM
  5. How to Import Data and Tables from Excel into a Word Doc
    By todd18us in forum Excel Programming / VBA / Macros
    Replies: 1
    Last Post: 07-23-2015, 10:00 AM
  6. Excel Bug? Pivot tables not reading certain formats
    By ISDgirl in forum Excel General
    Replies: 4
    Last Post: 04-10-2014, 07:04 PM
  7. Replies: 3
    Last Post: 07-05-2013, 04:04 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1