ok thanks.
since i will have more than 1 pdf file with same tables, can i just change the name of the file in the power query?
ok thanks.
since i will have more than 1 pdf file with same tables, can i just change the name of the file in the power query?
If you have several PDF files that you want to import from (basically run the same process as the first file), then the best way to do it is to use a FOLDER source. You'll get a list of all the files in the folder, and then you can filter that down by extension (so filter down to something like "ends with ".pdf"), and then you can filter further if you wish (until you get just the files you're trying to import). The basic difference is that PowerQuery uses the transformations done on the sample file, and applies them to all the subsequent files and appends all the results to one table. Works great if the column names in the tables all have the same names.
I could post an example. (I guess I could just make a copy of the original PDF and change the name of one.) Would that help?
I made copies of your original PDF and imported the contents of a folder... Here's the PowerQuery code that PowerBI wrote:
PowerBI basically writes the whole thing for you... The trick is to start with a Folder source, and then you can filter the folder's contents down to whatever you want (in this case, the PDFs matching a specific naming convention). You can rename the columns. I just didn't do that part. There are a bunch of really good resources on PowerQuery around the Internet etc. One I really like is Chandeep Chhabra's YouTube channel: Goodly - YouTube He's great at explaining how to do anything related to PowerBI. And then there's Ken Puls & Miguel Escobar's book: Amazon.com: Master Your Data with Power Query in Excel and Power BI: Leveraging Power Query to Get & Transform Your Task Flow: 9781615470587: Escobar, Miguel, Puls, Ken: BooksCode:let Source = Folder.Files("C:\Users\User\Downloads"), #"Filtered Rows" = Table.SelectRows(Source, each ([Extension] = ".pdf")), #"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each Text.StartsWith([Name], "Dati")), #"Filtered Hidden Files1" = Table.SelectRows(#"Filtered Rows1", each [Attributes]?[Hidden]? <> true), #"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])), #"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}), #"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File"}), #"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File", Table.ColumnNames(#"Transform File"(#"Sample File"))), #"Changed Type" = Table.TransformColumnTypes(#"Expanded Table Column1",{{"Source.Name", type text}, {"Column1", type text}, {"Column2", type text}, {"Column3", Percentage.Type}, {"Column4", type text}, {"Column5", Percentage.Type}, {"Column6", type text}, {"Column7", Percentage.Type}}), #"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"Source.Name"}), #"Changed Type with Locale" = Table.TransformColumnTypes(#"Removed Columns", {{"Column4", Currency.Type}}, "it-IT"), #"Changed Type with Locale1" = Table.TransformColumnTypes(#"Changed Type with Locale", {{"Column6", Currency.Type}}, "it-IT") in #"Changed Type with Locale1"
There's also stuff on the Microsoft site, but those are two of my favorite resources.
If you get stuck, post back and I'll see if I can get you sorted out.
Oh right... "inconsistent column names" problem... Chandeep has you covered
really interesting.
i'll try to figure out the best option for this messy files and then i come back
If you post a few files so I can see what the column names are, I think I can fix it. (The column names or column order has to match... if they don't, you can use Goodly's example of how he matches columns.) Then when you have that, you can append all the different tables from the PDF tables to one table in PowerBI or Excel
Just found a better YouTube video for it... not sure his initial instructions are entirely politically correct!! "In this case you can ask the person who gave you the data to [censored] off, and... hey hey hey, just kidding." https://youtu.be/-hIa5g2oA70?si=T0bfjx9erp8om2FB
hi, thanks very much and sorry for the late, but i had some urgencies and i coudn't find time to study that stuff.
This vid is very good, will be very useful
Unluckily not for this case, those pdf tables are so messed up that even inside every table there is a column problem, so i'm making a procedus just copying and using html character to divide values.
Sorry, my answer was so dumb, I had to delete it. If these are Reporting Services (SSRS) reports, then normally they're based on a single stored procedure (basically a canned query that accepts parameter values). So maybe the most sensible way forward is to see if someone can grant you execute rights on that/those stored procedures. Then the result will be returned as a table, and you can use the ADO object library to connect to SQL Server and execute the stored procedure and return the results back to your database. Then the result would be a regular database table and not some HTML mess that you have to parse and deal with. If that's possible, that's the route I would take.hi, thanks very much and sorry for the late, but i had some urgencies and i coudn't find time to study that stuff.
This vid is very good, will be very useful
Unluckily not for this case, those pdf tables are so messed up that even inside every table there is a column problem, so i'm making a procedus just copying and using html character to divide values.
Last edited by madpiet; 07-12-2024 at 07:29 PM. Reason: dumb answer... had to fix it.
These look like Reporting Services reports. If that's the case, can you get permission to execute the query they're based on? If you can do that, you can likely avoid a whole lot of this mess. No need to parse anything, or get any specific tables. Infinitely easier. All someone would have to do is add you to the group that has permission to execute the query.