Results 1 to 12 of 12
  1. #1
    mercapto is offline Advanced Beginner
    Windows XP Access 2010 32bit
    Join Date
    Jan 2012
    Location
    Spain
    Posts
    91

    Extract text from each word document in a file

    I've got hundreds of word documents, all writen with the same structure. File names are just numbers.

    I want to make this loop:

    1. navigate through the file where all my docs are, open the doc
    2. extract the title text, which is writen in the page header in a certain position of a table... get the string
    3. Store that string in my access db. (i can figure this part)
    4. use that string to change the name of the file



    Anyone has any idea?

  2. #2
    Rod is offline Expert
    Windows 7 32bit Access 2007
    Join Date
    Jun 2011
    Location
    Metro Manila, Philippines
    Posts
    679
    As I suspect you realise, you need to use functions beyond the standard 'mainstream' Access objects.

    Scripting Runtime may be sufficient for you to identify the files in a Windows directory and to iterate through them.

    You then need to use properties and methods from the Word object model to open and read the document contents.

    How you then save the document is up to you. Without checking I'm not sure whether Scripting Runtime allows you to change a file name. I would tend to use the Word method corresponding to 'Save As' and then delete the original file.

    OK, those are the ideas. I guess the next step is that you want help. Post back with specific questions.

    PS When I write Windows directory above I really mean File Folder.

  3. #3
    mercapto is offline Advanced Beginner
    Windows XP Access 2010 32bit
    Join Date
    Jan 2012
    Location
    Spain
    Posts
    91
    Thanks Rod.

    I don't know what is scripting runtime, could be functions like Dir?? Should I use this?

    About the "word object model", I'm still an ignorant. There are many examples in the web of how to search for text, replace, etc. But I couldn't find anything about how to grab text from word and import it to access. If I knew that I could start doing something.

  4. #4
    Rod is offline Expert
    Windows 7 32bit Access 2007
    Join Date
    Jun 2011
    Location
    Metro Manila, Philippines
    Posts
    679
    Let me say here that if this is a one-time exercise then it may be quicker for you to rename your files in the crude, blunt way of using Windows.

    If however it's a job that needs to be repeated or if you wish to use this as a learning exercise then let's proceed.

    Your project will need to reference the Scripting Runtime and Word object models. Here are the instructions on how to do that.

    1. Navigate to the VBA coding window ([Alt]+F11)
    2. Click on Tools on the menu bar
    3. Click on References in the drop-down list
    4. Scroll down the list and locate Microsoft Scripting Runtime
    5. Click the checkbox
    6. Scroll further down and find Microsoft nn.n Word Object Library (nn.n will probably be 14.0)
    7. Click the checkbox
    8. Click OK

    The references list should look something like this:

    Click image for larger version. 

Name:	1.jpg 
Views:	39 
Size:	36.4 KB 
ID:	11040


    Scripting Runtime gives you various properties and methods for managing files, folders and the directory structure. It also give you a reference to the Dictionary object, a very, very useful object.

    The Word object library references all the properties and methods of the Word application.

    To fulfil your objective you will need to become comfortable with the concepts of OOP (Object Orien[ta]ted Programming) - no, it's not that difficult, you will be using objects not writing them

  5. #5
    mercapto is offline Advanced Beginner
    Windows XP Access 2010 32bit
    Join Date
    Jan 2012
    Location
    Spain
    Posts
    91
    Wow.

    Thanks for your incredible help.

    I'm not going to lie you, this is much more difficult that I thought. I think I'd rather pay my small brother to do this.
    But anyway, I want to learn this, so I'll give it out a try in the future. *bookmark*

    Thanks again!!

  6. #6
    Rod is offline Expert
    Windows 7 32bit Access 2007
    Join Date
    Jun 2011
    Location
    Metro Manila, Philippines
    Posts
    679
    Wait, we've only just started. I need to tell you a little about Objects. You're already using them without realising it; just about everything in Access is an object. Here is something I wrote on the subject.

    Objects

    I have never encountered a satisfactory definition of an object, so I won’t try. Rather let me describe an object by its characteristics. Objects have methods and properties. (Microsoft Office objects also have Events and user written objects have a provision for User Events.) The concept is that an object ‘owns’ its properties and methods and it is the only thing allowed to operate its methods or manipulate its properties. Other routines outside the object may pass data to the object, retrieve data from the object or ask the object to perform a task. However those outside routines should treat the object like a ‘black box.’ The benefits are consistency and reusability. (Actually there are many more benefits but they are beyond this cursory introduction.) An object may be a list (collection) of things. If while browsing through Access objects, you encounter one with the properties and methods of add, count, item, remove, you may be sure the object is a collection.

    An object’s property (or a collection’s item) may itself be an object; you may think of it as one object ‘pointing’ to another. Thus hierarchies of objects may be constructed. For example an Access form has, amongst other things, a collection of controls, a control may have a form as one of its properties (subform!), that subform has a collection of controls, and so on.

    Every object is a copy of the object’s template. The object template (boilerplate if you like) is called a class while the objects generated from it are called instances. Thus every object instanced from the same class has identical methods and properties, but probably different property values. Suppose there is a class called Customer; instances of this class may be CustomerA, CustomerB, etc. Each instance exists once and once only; multiple references to an instance may be created for use in different places in the code but there should only one copy of the instance itself.

    The ‘Set’ Keyword

    When assigning an object reference to a variable you must use the ‘Set’ keyword.
    Example VBA code Comment
    Set ctlCustomers = Me.Controls.Item(“lstCompanies”) ctlCustomers now contains a reference to this form’s control named lstCompanies. Actually you would more concisely code this as Me.lstCompanies.
    Set ctlCust = ctlCustomers A second reference to the control is now contained in ctlCust.
    Set xlsApp = GetObject(, “Excel.Application”) The function GetObject returns an object so the ‘Set’ keyword must be used.

    Hopefully this helps with that mysterious situation: “Why do I sometimes have to use ‘Set’ and other times not?” When assigning an object reference, use the ‘Set’ keyword.

    Object Addressing

    An object’s property or method is accessed by including its name after a period. Using the first entry in the table above as an example, there is a property of Me named Controls. This is a collection of all the control objects in Me. A single item from a collection may be accessed by specifying its key, in this case its name. Often, if the retrieved item is itself an object, the address can get quite long. Here is an example of retrieving a value from a textbox of a subform that is on another form, not this form.
    Forms(“frmMain”).Controls.Item(“sfrDetails”).Form. Controls.Item(“txtInvoiceTotal”).Value
    Actually, because of defaults, this would most likely be coded as:
    Forms(“frmMain”).sfrDetails.Form.txtInvoiceTotal

    The Life of an Object Instance

    An object instance remains extant as long as there is at least one reference ‘pointing’ to it. It is thus necessary to know the life span of VBA variables. Variables declared inside a procedure are valid only while that procedure is executing. If one of the variables is an object reference then when the procedure finishes, the object reference is destroyed. If that reference is the only (or last) reference to the object instance then the object instance is also destroyed.

    Variables declared at the module level (in the declarations section) are valid while the module is valid. For standard VBA modules this is no problem; Access creates one copy of the module at start up and this remains extant until the application is closed. (For this reason the declaration section of a standard VBA module is also a good place to put global variables.) Forms and reports also have modules; however they ‘come and go,’ so beware!

    Module level variables may be declared in class modules (object templates); in fact there is no other place to ‘store’ property values and references.

    I need to say at this point that it is possible to destroy the object instance and leave existing references to it, hanging. For example if Access creates and references a Word application, it is possible for a user to close independently the Word application (destroy the object instance) and leave Access’ references hanging and no longer valid. This shouldn’t be the case but it does seem to happen.

    ---

    OK, the best way to learn is to do. Why not design a small prototype Access project that opens a Word document. To do this:

    1. Create a new Access db called OffAuto.accdb
    2. In this db, create a new form called frmWord
    3. In this form create a textbox, txtDocPath, and a command button, cmdOpenDoc
    4. Now add library reference to Word (and Sripting Runtime if you wish) as I described previously.


    (I assume that if I told you how to do this I'd be guilty of teaching Grannie how to suck eggs.)

    In frmWord's design view, make sure the Property Sheet is displayed. Select the cmdOpenDoc control and then double-click against 'On Click' under the 'Event' tab in the Property Sheet. The system will respond by displaying '[Event Procedure]' Now click on the elipses and the system will take you to the VBA coding window with the event procedure 'top and tail' already coded. (This is one - the simplest - of many ways of doing this.)

    Now complete the module for frmWord as follows.

    Code:
    Option Compare Database
    Option Explicit
    Private mdocDoc As Word.Document
    
    Private Sub cmdOpenDoc_Click()
        If IsNull(Me.txtDocPath) Then Exit Sub
        Set mdocDoc = GetObject(Me.txtDocPath)
        mdocDoc.Application.Visible = True
    End Sub
    As long as a valid Word document path is entered in the textbox you should see the Word logo appear on your taskbar and when you switch to it the document should be opened for you. Note that I have declared the variable for the document, mdocDoc (blue text) as a module level variable in the declaration section of the module.

    OK, many things can go wrong but if you haven't tinkered with MS defaults and the document path is valid then all should work. I know this is very crude and far from automated but its purpose is to give you confidence.

    Let me know how you get on. If necessary I'll send you my db. Remember to make anything you send me readable by v2007.

  7. #7
    mercapto is offline Advanced Beginner
    Windows XP Access 2010 32bit
    Join Date
    Jan 2012
    Location
    Spain
    Posts
    91
    Hello again, Rod.
    Finally I could find time for this. The exercise was quite easy, I am already familiar with VBA. You can talk to me as an "sophomore".
    So now I can open a word document, I guess that the next step is to grab text from it. This is the most difficult point for me.

    The next next step is to list all the docs in the dir and create the loop. I think I could manage this...

  8. #8
    Rod is offline Expert
    Windows 7 32bit Access 2007
    Join Date
    Jun 2011
    Location
    Metro Manila, Philippines
    Posts
    679
    Hi. It's always difficult to pitch a response at an appropriate technical level. So if you are now the world's expert at sucking eggs, I apologise.

    The 'trick' used by many of us is to use the 'Record Macro' facility in the target application to produce an example of what code should be placed in the Access VBA module. Luckily Word has this facility so I suggest you switch to Word and use it to record exactly what you want to do from Access. You can rarely copy and paste exact code but it's usually only a matter of changing variable names, etc. Try it. See how you get on.

    Dir will certainly give you each document in turn but is limited in other respects. I don't think you can rename the document via Dir - not even sure whether Scripting Runtime gives you this. Have you looked for and/or found any suitable APIs?

  9. #9
    mercapto is offline Advanced Beginner
    Windows XP Access 2010 32bit
    Join Date
    Jan 2012
    Location
    Spain
    Posts
    91
    In Spain we say that "not everything is black or white, there are greys". Some times very faint greys, but greys.

    I am not an expert in sucking eggs (damn, that sounds vey bad in my language, I thought in english was so... :-P), but I do know how to place a button and use an event. You don't need to apologize!! I don't feel offended at all. I just feel "guilty" because you used 200 words to explain me something that I already knew. But if you want to be so clear in your explanations you are absolutely welcome.

    OK... back to the business... Changing the name of the file is not absolutely necesary, but for sure I will need to link a filename to a title.

    Continuing with my exercise, I took this code from word:
    Code:
    Sub Macro1()
    
        Selection.MoveDown Unit:=wdLine, Count:=4
        Selection.MoveUp Unit:=wdLine, Count:=1, Extend:=wdExtend
        Selection.Copy
        Debug.Print Selection
    End Sub
    Let's see if I can do something with ir (now I must give a break). I'll give news back. Meanwhile you can give me any other tip or guidance.

    Thanks a lot again.

  10. #10
    Rod is offline Expert
    Windows 7 32bit Access 2007
    Join Date
    Jun 2011
    Location
    Metro Manila, Philippines
    Posts
    679
    I didn't realise you are Spanish; your English is very good. In case you don't know, the English expression, "Teaching your grandmother to suck eggs" means saying something that the listener already knows very well. (The eggs in question are bird's eggs!) Don't feel too bad about my extended explanation of objects; it was already written!

    You will need to prefix the Selection object with the name of your variable for the Word application reference. Just looking at some of my projects I found it necessary to do something like this:

    Code:
        mdocDoc.Activate
        mdocApp.Selection.Goto What:=wdGoToBookmark, Name:=str
    Ignore the fact I'm using the GoTo method; the point is I needed to activate the particular document (mdocDoc) before using the Selection object explicitly referenced by the application variable (mdocApp). Try it without first and if it does not work then see if my suggestion helps.

  11. #11
    mercapto is offline Advanced Beginner
    Windows XP Access 2010 32bit
    Join Date
    Jan 2012
    Location
    Spain
    Posts
    91
    Great success!!

    I did it in two steps:

    - First, I added the filenames to my database. I used a textbox to enter the sucesive subdirectories and the program checked them to extract and record the paths (I could probably made something to recognize the subdirectories but it didn't worth it).
    Code:
    Private Sub cmdDir_Click()
    Dim strDirRaiz As String
    Dim strFF As String
    Dim strDirectorio As String
    Dim strFichero As String
    
    Dim db As DAO.Database
    Dim rs As DAO.Recordset
    Dim strSQL As String
    
    Set db = CurrentDb()
    strSQL = "SELECT tblDocumentos.Titulo, tblDocumentos.NombreArchivo, tblDocumentos.Referencia, tblDocumentos.Enlace FROM tblDocumentos"
    Set rs = db.OpenRecordset(strSQL, dbOpenDynaset)
    
    strDirRaiz = "C:\mydir\"
    strFF = Me.txtDir 'Name of the secondary dir.
    strDirectorio = strDirRaiz & "\" & strFF
    strFichero = Dir(strDirectorio & "\*.doc")
    
    If strFichero = "" Then
        Exit Sub
    Else
        rs.AddNew
    rs!NombreArchivo = strDirectorio & "\" & strFichero
    rs.Update
        Do Until strFichero = ""
            strFichero = Dir()
               rs.AddNew
       rs!NombreArchivo = strDirectorio & "\" & strFichero
       rs.Update
    'I believe there is a way to avoid repeating this lines, but I don't have time now.
        Loop
        
    End If
    
    Set rs = Nothing
    Set db = Nothing
    
    End Sub
    - Second, with all the paths in my database, open the docs and grab the text.
    Code:
    Dim strTitulo As String
    Dim db As DAO.Database
    Dim rs As DAO.Recordset
    Dim strSQL As String
    
    Set db = CurrentDb()
    strSQL = "SELECT tblDocumentos.Titulo, tblDocumentos.NombreArchivo,..."
    Set rs = db.OpenRecordset(strSQL, dbOpenDynaset)
    
    Do While Not rs.EOF
    
        Set mdocDoc = GetObject(rs!NombreArchivo)
        mdocDoc.Application.Visible = True
        mdocDoc.Activate
    'This part is the action I wanted to perform in word.
        Selection.MoveDown Unit:=wdLine, Count:=4
        Selection.MoveUp Unit:=wdLine, Count:=1, Extend:=wdExtend
    'end
        Selection.Copy
    
        strTitulo = Left(Selection, Len(Selection) - 1)
        rs.Edit
        rs!Titulo = strTitulo
        rs.Update
        mdocDoc.Close
        
        Debug.Print strTitulo
    rs.MoveNext
    Loop
    
    Set rs = Nothing
    Set db = Nothing
    And it worked. In a couple of minutes I recorded 336 docs, with no mistakes.

    I know it can be improved, but I believe this is a very powerful tool, especially for those who deal with lot's of standard documents (ISO, i. eg.).

    And I am very thankful to you Rod!

    P.S.: I am looking forward to do it with pdf.

  12. #12
    Rod is offline Expert
    Windows 7 32bit Access 2007
    Join Date
    Jun 2011
    Location
    Metro Manila, Philippines
    Posts
    679
    Welcome to the shadow world of Office Automation. Glad I could help!

    PS If things ever go 'strange' on you, reboot and then look for the creation of multiple copies of Office applications.

Please reply to this thread with any new information or opinions.

Similar Threads

  1. Form to Word or another email able document
    By Rynorem23 in forum Forms
    Replies: 2
    Last Post: 01-28-2013, 04:37 PM
  2. Poor Word Wrapping When Exporting Report to Rich Text File
    By caki2112 in forum Import/Export Data
    Replies: 11
    Last Post: 11-01-2012, 03:00 PM
  3. Create new Word document.
    By Bill H in forum Programming
    Replies: 3
    Last Post: 06-12-2012, 06:40 AM
  4. Import Word document using VBA
    By degras in forum Import/Export Data
    Replies: 4
    Last Post: 04-12-2011, 02:40 AM
  5. Replies: 5
    Last Post: 07-16-2010, 10:12 AM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Other Forums: Microsoft Office Forums