Results 1 to 6 of 6
  1. #1
    kd2017 is offline Well, I tried at least.
    Windows 10 Access 2016
    Join Date
    Jul 2017
    Posts
    1,142

    Decode PDF Metadata

    Is anyone aware of a good resource that explains exactly how to interpret pdf metadata?

    I'm using pdftk.exe to get a 'data_dump' of a pdf file's metadata then parsing the bookmarks and page labels. I'm making some assumptions along the way but I'd like to be able to confirm my assumptions against some kind of documentation.



    Specifically, what *exactly* is PageLabelNewIndex ? I assume that's like an 'absolute' page number or index but I want to be sure.

  2. #2
    CJ_London is online now VIP
    Windows 10 Access 2010 32bit
    Join Date
    Mar 2015
    Posts
    11,397
    Would have thought adobe would have it?
    I have an app that extracts the text from a pdf into a form textbox and then formulae to extract specific values - used for pdf invoices for posting into an accounting system.
    Click image for larger version. 

Name:	image_2022-11-10_172853982.jpg 
Views:	18 
Size:	152.8 KB 
ID:	49082

  3. #3
    kd2017 is offline Well, I tried at least.
    Windows 10 Access 2016
    Join Date
    Jul 2017
    Posts
    1,142
    The only thing I found from adobe's website was information on metadata like a standard user would find in the Document Properties dialog box. I didn't see any documentation for developers regarding metadata. My googlefu is probably just lacking.

    Side note: That's a cool looking app you got there! For some reason little utilities like that just make me happy
    What software are you using extract the text from the pdf?

  4. #4
    kd2017 is offline Well, I tried at least.
    Windows 10 Access 2016
    Join Date
    Jul 2017
    Posts
    1,142
    I suppose I can study the source code from pdftk... https://gitlab.com/pdftk-java/pdftk/...va/report.java

  5. #5
    CJ_London is online now VIP
    Windows 10 Access 2010 32bit
    Join Date
    Mar 2015
    Posts
    11,397
    it's a .exe called pdfToText

    As a test, if you open a pdf and hit ctrl-A, whatever is highlighted will be extracted using this tool (you could of course then just copy/paste this to a textbox)

    should be able to download from here https://download.cnet.com/PDF-to-Tex...-75415960.html

    to execute, the vba code is

    Shell CurrentProject.path & "\pdftotext.exe -layout """ & PDFName & """"

    where pdfname is the file path and name of the pdf

    it creates a text file with the same path and name (but with .txt rather than .pdf)

    from there, easy enough to read the text file into a string which can then be assigned to a textbox control

    What is not so good is reading 'table data' such as

    item...qty...price...value

    still working on that - main issue is the item and/or description columns, there are so many variations. Works fine if the construction is consistent, not so good if the text file looks like this

    Click image for larger version. 

Name:	image_2022-11-10_231808230.png 
Views:	12 
Size:	35.2 KB 
ID:	49087
    Last edited by CJ_London; 11-11-2022 at 06:05 AM.

  6. #6
    Join Date
    Jan 2017
    Location
    Swansea,South Wales,UK
    Posts
    4,858
    Quote Originally Posted by kd2017 View Post
    I suppose I can study the source code from pdftk... https://gitlab.com/pdftk-java/pdftk/...va/report.java
    Even ask them?
    Please use # icon on toolbar when posting code snippets.
    Cross Posting: https://www.excelguru.ca/content.php?184
    Debugging Access: https://www.youtube.com/results?sear...bug+access+vba

Please reply to this thread with any new information or opinions.

Similar Threads

  1. Decode an IIf query
    By Charlie24 in forum Queries
    Replies: 5
    Last Post: 05-06-2022, 02:12 PM
  2. Get track metadata directly from CD
    By GraeagleBill in forum Programming
    Replies: 25
    Last Post: 07-18-2021, 11:13 PM
  3. How to get Metadata from .wav file into Access ?
    By edmscan in forum Programming
    Replies: 3
    Last Post: 12-31-2016, 01:21 PM
  4. how to decode a string of numbers
    By cjlieber in forum Programming
    Replies: 4
    Last Post: 04-16-2012, 06:50 PM
  5. read metadata from video files
    By user in forum Access
    Replies: 1
    Last Post: 07-07-2011, 05:05 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Other Forums: Microsoft Office Forums