Page 1 of 2 12 LastLast
Results 1 to 15 of 23
  1. #1
    Doug Simpson is offline Novice
    Windows 10 Access 2016
    Join Date
    Jul 2017
    Posts
    13

    How to import text files (non-delimited) from filepath into a field so it's searchable

    Hi,

    I'd like to import text files into a field in Access 2016. I have a database in which I have the path name that refers to folder and a text file within the folder in each record. Each text file is extracted, raw OCR text, not delimited text. Each text file is associated in each record with an image or with a multi-page PDF from which the OCR text file was obtained. I have on each record a browser control that displays the image or PDFs. At present, I can view the text files using the text path field (see image) to display the text (also using browser control), but the viewable text is not searchable. I want to use the filepath field in each record to import the text--not to just refer to the separate text files, because I need to be able to word-search across the entire population of OCR records to find records related to searches.

    How do I accomplish this? Is this a transfer text command? Or something else? Any help would be great!



    Thanks,

    Doug

    Click image for larger version. 

Name:	TextPath-formSnip.PNG 
Views:	31 
Size:	21.4 KB 
ID:	29671

  2. #2
    orange's Avatar
    orange is offline Moderator
    Windows 10 Access 2010 32bit
    Join Date
    Sep 2009
    Location
    Ottawa, Ontario, Canada; West Palm Beach FL
    Posts
    16,716
    More info please. Are all texts same format? Can you give us some representative data (as a txt file)?
    Do the colon (":") delimited fields/names represent your field names and when "empty/blank" signifies no data for that field in this record?
    Can you give us an example of a search based on the data you have shown?

  3. #3
    Doug Simpson is offline Novice
    Windows 10 Access 2016
    Join Date
    Jul 2017
    Posts
    13
    Thanks for your reply.

    The files are not delimited, not in any way I care about preserving or working around. I do see carets around page numbers, but I don't care about any differentiation related to that. If need be, I can view the text, see the page, and then follow the page reference to the related image or PDF. Thus I don't really care about fields within texts, even if they were present (as they might be for extracted OCR from emails as an example), because I will only use the text to search for specific words, so that I can then read the related image or PDF.

    As an example, I might want all documents related to the word "Henry." I want to search the text field (yet to be created) to find all "Henry" hits. I can then view the related images and PDFs. I don't need to know delimited fields that might exist with the text, such as "to," "from," "subject," or "message body" that might appear in OCR text if it came from email.

    Here are two examples:

    << VWD0038260 >> November 7, 2013 9:15:10 PM PST 1 / 57 Work Order Details // 071040 : 206675 O/M SSO Overflow San Marino Dr San Marino Dr O/M Labor GL-0170010 Deposit control 0122705 Asset: Location: CI: Sched Start: Site: VWD Job Plan: Sched Finish: Priority: Supervisor: BMCCRORY Target Start: Work Type: CM Lead: EGARCIA Target Finish: Status: CLOSE Vendor: Actual Start: 3/14/07 Parent: Owner: Actual Finish: 5/2/07 Failure Class: Owner Group: Report Date: 3/8/07 Problem Code: Service: Reported By: GGALINDEZ Project #: Service Group: GL Account: 200-3010-000-5401 Classification: << VWD0038261 >> November 7, 2013 9:15:10 PM PST 2 / 57 Work Order Details // 071041 : 206696 O/M Star Stone Dr - SSO RSF/Linda Vista O/M Labor GL-BLANK (Data retrieved from OLD Data Stream W/O) Deposit control 0122705 Asset: Location: CI: Sched Start: Site: VWD Job Plan: Sched Finish: Priority: Supervisor: CHARRELL Target Start: Work Type: CM Lead: Target Finish: Status: CLOSE Vendor: Actual Start: 3/14/07 Parent: Owner: Actual Finish: 12/19/08 Failure Class: Owner Group: Report Date: 3/8/07 Problem Code: Service: Reported By: GGALINDEZ Project #: 071041 Service Group: GL Account: 200-3010-000-5401 Classification: << VWD0038262 >>


    Another example:

    I!SM N!WCOM! M1XE! &!INC FLING MUSIC & ENTERTAINMENT BY TOM & MAURA SIT DOWN DINNER CATERED BY THE QUAIL' S INN Date: Friday, April 24, 1998 Place: LSM Recreation Lodge Time: 5:30p.m., Social; 6:30p.m., Dinner Cost: $1 7.50 per mnembe,; $19.50 per guest Trish and Glenn Henry and their committee are hosts for this GALA EVENT, The fantastic team of Tom & Maura are back by popular demand to provide the entertainment and music. A true mixer, table seating will be by draw. The Quail's Inn will cater a sit down dinner featuring the chef's specialty Stuffed Apple Walnut Chicken. Reservations are limited and must be received by April 16, 1998.

    The second example contains the word "Henry," and I'd expect a search to retrieve that record, so that I can then view the related image.

    Thanks!

    Doug

  4. #4
    aytee111 is offline Competent At Times
    Windows 10 Access 2013 64bit
    Join Date
    Nov 2011
    Location
    Nomad
    Posts
    3,936
    Create a loop in VBA to read each of the textpaths and import the file associated. You will need to create a specification for the import which you will do using a manual import. You can set this up however you choose, such as having it all in one field.

  5. #5
    Doug Simpson is offline Novice
    Windows 10 Access 2016
    Join Date
    Jul 2017
    Posts
    13
    Thanks aytee111--I'll try to learn VBA, loops, and specifications for this.

    In the meantime if you have more insights on this, I'd sure appreciate it.

    I want the text pulled into one field per record, based on the field "textpath," so that the 70,000 or so records, each of which has one of 70,000 PDFs (including multi-page PDFs) has its unique text in the textpath field, pulled from the related *.txt file. Thus I can view with browser control the pdf, pulled into (or viewed via) browser control, e.g. from path "E:\001-LSM-Documents-and-Databases\AllPDFs\HOL2 000019.pdf." That same record has an associate text file "E:\001-LSM-Documents-and-Databases\AllTexts\HOL2 000019.txt," which contains the text that is drawn from the PDF via OCR. I can view that text via browser control (not searchable across the document population), but I want the entire text from each PDF to be pulled into the record, so that I can search the entire 70,000-record population at once.

    The table name is "writings."

    Thanks for your input! I'll go to work on it.

    Doug

  6. #6
    aytee111 is offline Competent At Times
    Windows 10 Access 2013 64bit
    Join Date
    Nov 2011
    Location
    Nomad
    Posts
    3,936
    I'm not sure what you are trying to do with pdf's, Access can't import them. I would suggest that you manually import one of the files, create a specification for it (on the Advanced button), then import others to see what happens. You will know then more about how to go about what you are trying to accomplish.

  7. #7
    orange's Avatar
    orange is offline Moderator
    Windows 10 Access 2010 32bit
    Join Date
    Sep 2009
    Location
    Ottawa, Ontario, Canada; West Palm Beach FL
    Posts
    16,716
    Here is a version:

    A simple table

    Click image for larger version. 

Name:	DougTableDesign.jpg 
Views:	28 
Size:	10.6 KB 
ID:	29701

    populated with your data
    Click image for larger version. 

Name:	DougTableWithData.jpg 
Views:	28 
Size:	225.1 KB 
ID:	29702

    and a query

    SELECT tblData.id, tblData.data
    FROM tblData
    WHERE (((tblData.data) Like "*Henry*"));

    That returns record Id 2.

  8. #8
    Doug Simpson is offline Novice
    Windows 10 Access 2016
    Join Date
    Jul 2017
    Posts
    13
    I am doing a document review, looking for nuggets of information. For example, assume I want to know meetings that Henry attended. I need to find Henry-related records. To do this, the work flow often is generally one of two methods.

    (1) Labor-intensive, more-precise method: I use the browser control in a form to view every record, read every single page of every browser-control-displayed PDF, and then I can manually record or identify the Henry-related documents. For example, I can read the 70,000 PDFs, identify "Henry" as a meeting participant in some of the PDFs, then record that data in a field created to hold that data.

    (2) Less-labor intensive, less-precise method (what I an trying to do here): Relying on OCR extracts (that sometimes are accurate, and sometimes not) that have been imported from 70000 text files into the the text-data field, I simply filter the 70,000 records based on a search term, "Henry," in the example. Then then I have found all Henry-related records, which I can then read to identify the meetings Henry attended.

    Below is a screenshot of a form (reformatted to fit the screenshot). I have a bunch of fields of data on the left. In the middle is a PDF viewed in the Access record using browser control, drawn from the FilePath field.

    At the right is the raw OCR text extracted from that same PDF. The text is drawn from the related text file, via the TextPath field. Currently this is viewable only, not searchable--I am using browser control.

    What I want to do is make the OCR searchable by pulling the text in from the TextPath field into a field that is searchable.

    Thanks for your help folks.

    Doug

    Click image for larger version. 

Name:	ScreenshotAccessForm.PNG 
Views:	27 
Size:	155.7 KB 
ID:	29704
    Attached Thumbnails Attached Thumbnails ScreenshotAccessForm.PNG  
    Last edited by Doug Simpson; 07-28-2017 at 10:16 AM. Reason: Replace image with better one.

  9. #9
    ssanfu is offline Master of Nothing
    Windows 7 32bit Access 2010 32bit
    Join Date
    Sep 2010
    Location
    Anchorage, Alaska, USA
    Posts
    9,664
    You could use the FileSystemObject to read the entire text file, then update the appropriate record. But depending on each of the text file sizes, with 70,701 files, you might bump up against the 2GB max file size.

    See
    http://www.4guysfromrolla.com/webtec...ect/faq1.shtml
    http://www.4guysfromrolla.com/webtec...ect/faq2.shtml


    Are you wanting to push a button to add each text file (70701 button clicks) or one click to loop through and add the text files??



    You could make a COPY of your dB, delete all but around 10 records and post it here.

    After deleting the records (leaving 10), do a "Compact and Repair", then Zip it.
    Add the appropriate text files for the records in the dB to the zip.
    Post the zip file.

  10. #10
    Doug Simpson is offline Novice
    Windows 10 Access 2016
    Join Date
    Jul 2017
    Posts
    13
    Hi Steve. Thanks for your thoughts on this. I have gone through the posts you referenced, but I don't understand where to begin. It's like reading a foreign language to me--I guess I have reached the limits of my understanding. But maybe I'll learn more soon.

    I have zipped ten or so records along with the associated PDFs and text files in folders. This structure mimics what I have--the database and two folders of relevance with the files in those fodlers. But the path names in the database will need to be changed of course.

    I don't want a button to import 70000 times. That would not import the text at once, and until I have clicked and imported 70,000 times, I cannot search the OCR text field.

    The table in issue is "writings." I created form "Copy of Writings-2," which allows viewing of the PDF next to its OCR text, both using browser control for display. My objective is to not just look at the text pulled into the database as with browser control, but to import the relevant text field using the TextPath. Note that there are some instances where there is no text file--so there are about 70,000 records, but only about 69500 text files.

    Any insights on this would be great. I am surprised this is so challenging. Thanks for your help.

    Doug

    BTW--I am not having luck with the zip file uploader. I try to attach and upload, but it doesn't seem to happen. File size is not too bit--12.4 MB.
    Last edited by Doug Simpson; 07-29-2017 at 09:20 AM. Reason: Trying to add a zip file

  11. #11
    Doug Simpson is offline Novice
    Windows 10 Access 2016
    Join Date
    Jul 2017
    Posts
    13

    Why can't I upload the ZIP?

    Click image for larger version. 

Name:	LSM-Zip problem.PNG 
Views:	25 
Size:	72.0 KB 
ID:	29716

    Maybe the issue with the zip not uploading is the browser detecting an insecure connection?

    I downsized the number of PDF to make the zip file 5.7 MB, but still cannot get it to function.

    Any preferred alternate method for me to get this uploaded or to you?

    Doug
    Last edited by Doug Simpson; 07-29-2017 at 10:31 AM. Reason: Revised after exploring upload options and changing file size.

  12. #12
    June7's Avatar
    June7 is offline VIP
    Windows 7 64bit Access 2010 32bit
    Join Date
    May 2011
    Location
    The Great Land
    Posts
    52,815
    Can upload to a fileshare site such as Box.com and post link.
    How to attach file: http://www.accessforums.net/showthread.php?t=70301 To provide db: copy, remove confidential data, run compact & repair, zip w/Windows Compression.

  13. #13
    Doug Simpson is offline Novice
    Windows 10 Access 2016
    Join Date
    Jul 2017
    Posts
    13

  14. #14
    Doug Simpson is offline Novice
    Windows 10 Access 2016
    Join Date
    Jul 2017
    Posts
    13
    Not sure whether this sheds any light.

    On the page linked below, I found the script quoted below "Import all text files in a folder." But the script seeks delimited text files to import; mine are not delimited--they are just raw OCR text files. Also, there is no way I can see to associate the FilePath field with the relevant record, so that the text related to a PDF is paired with the correct text file--the documents woudl be imported wholesale, unrealted to the specific records already in the database.

    So I'd like to have the FilePath field that contains the proper file name to control the import, and thus have the imported text in the right record--times 70000.

    http://access-excel.tips/access-vba-...ertext-method/

    Public Sub import()
    Dim FileName, FilePathName, Path, FileNameList() As String
    Dim FileCount As Integer

    DoCmd.SetWarnings False
    Path = "C:\test"
    FileName = Dir(Path & "")

    While FileName <> "" And Right(FileName, 3) = "txt"
    FileCount = FileCount + 1
    ReDim Preserve FileNameList(1 To FileCount)
    FileNameList(FileCount) = FileName
    FileName = Dir()
    Wend

    If FileCount > 0 Then
    For FileCount = 1 To UBound(FileNameList)
    FilePathName = Path & FileNameList(FileCount)
    DoCmd.TransferText transferType:=acImportDelim, SpecificationName:="Query1 Export Specification", TableName:="import_table", FileName:=FilePathName, hasfieldnames:=True
    Next
    End If
    DoCmd.SetWarnings True
    End Sub

  15. #15
    Doug Simpson is offline Novice
    Windows 10 Access 2016
    Join Date
    Jul 2017
    Posts
    13
    Maybe another option is to use Excel, if easier, and open the textpath file into an associated Excel column created to hold that imported text, then import the Excel files (already containing the text data) into Access?

    https://social.msdn.microsoft.com/Forums/en-US/801d2a85-de63-43e1-95f8-945bb69decbf/multiple-text-file-import?forum=exceldev



Page 1 of 2 12 LastLast
Please reply to this thread with any new information or opinions.

Similar Threads

  1. Import Specs and Text Files
    By Gloverb06 in forum Access
    Replies: 3
    Last Post: 03-15-2016, 01:43 PM
  2. Replies: 7
    Last Post: 11-20-2012, 05:08 PM
  3. Import Text Files without access
    By 95DSM in forum Import/Export Data
    Replies: 1
    Last Post: 09-10-2010, 04:36 PM
  4. Field returns error when I import Text Files to Access
    By geng in forum Import/Export Data
    Replies: 3
    Last Post: 06-01-2010, 02:20 PM
  5. Cannot import excel or text files
    By donald_s in forum Access
    Replies: 2
    Last Post: 04-13-2010, 11:48 PM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Other Forums: Microsoft Office Forums