Page 1 of 2 12 LastLast
Results 1 to 15 of 19
  1. #1
    Manish_05 is offline Novice
    Windows XP Access 2010 32bit
    Join Date
    Apr 2014
    Posts
    12

    Import Pubmed XML (Child Nodes) to Access

    Hi Friends,

    I have an XML file which is downloaded from http://www.ncbi.nlm.nih.gov/pubmed.

    I am trying to import this XML file to Access by using VBA. I know Excel VBA but I have never worked with Access VBA, so I am finding it difficult to get the solution. There are few problems which I have facing while I am trying to do it manually.

    - When I am trying to Import it, Access pops up a message saying DTD is prohibited. This error can be following these steps. Open XML file in notepad -> delete 2nd line from the XML file -> save it
    - Then if the previous error is eliminated by deleting the 2nd line from XML file, then once the import starts, it pops up a message saying “Not all of your data has been successfully imported”. I don’t have any idea about this.
    - Now once the data is imported with the error message, on left hand side many tables are being created (where as if I try to open this file in Excel it creates only one file), Don’t have any idea about this. Also no 2 tables have common column.

    What I am trying to do.
    a) Import XML file downloaded from Pubmed site to Access, in a single table.
    b) There are more than 100 child nodes in the XML file (which varies from file to file); I don’t need to import all the fields to Access.
    c) Only few files are required to be imported such as (Abstract, Article day, Article Month, Article year, Affiliation, Journal title, PMID)

    Hope I have not confused with my words, please reply if you need any clarifications.

    XML file to be imported pubmed_result(251).zip



    Thanks & Regards,
    Manish

  2. #2
    ItsMe's Avatar
    ItsMe is offline Sometimes Helpful
    Windows 7 64bit Access 2010 32bit
    Join Date
    Aug 2013
    Posts
    7,862
    A strongly typed XML file should successfully import to your DB without any error. If you do not want all of the data, I would suggest importing the entire file and then parse the data once it is in a temp table, storing the good data in your permanent table.

    If you can not get the XML file to import without error....

    Check for a new table created by Access "Paste Errors" to see if you can understand the issues.

    Edit the XML file so it is strongly typed and fits your data structure or closely fits your data structure. The XML may require multiple tables and it would be as simple as splitting (editing) the file into two or three import processes.


    Or….

    Third option would be to use Internet Explorer and lots of VBA to parse the XML syntax by searching elements, bringing the elements into string variables, parsing the strings and saving to your table.

  3. #3
    Manish_05 is offline Novice
    Windows XP Access 2010 32bit
    Join Date
    Apr 2014
    Posts
    12
    Quote Originally Posted by ItsMe View Post
    A strongly typed XML file should successfully import to your DB without any error. If you do not want all of the data, I would suggest importing the entire file and then parse the data once it is in a temp table, storing the good data in your permanent table.

    If you can not get the XML file to import without error....

    Check for a new table created by Access "Paste Errors" to see if you can understand the issues.

    Edit the XML file so it is strongly typed and fits your data structure or closely fits your data structure. The XML may require multiple tables and it would be as simple as splitting (editing) the file into two or three import processes.


    Or….

    Third option would be to use Internet Explorer and lots of VBA to parse the XML syntax by searching elements, bringing the elements into string variables, parsing the strings and saving to your table.
    ItsMe, thanks for your reply...

    If you don't mind I need some more help from your side. I am extremely sorry because I am not able to understand what exactly i need to do in order to resolve my problem. As i said that i am new to access maybe because of that I am not able to hold your helping hands.

    Friend, the size of XML files which I download from Pubmed site goes upto 1 GB, so I think it is impossible to edit the XML file as suggested by you.

    I think the 3rd option is better, so can you please help me with the code, I know i am asking you too much but i don't have any other alternative (I have searched entire web but didn't get any help).

  4. #4
    orange's Avatar
    orange is offline Moderator
    Windows XP Access 2003
    Join Date
    Sep 2009
    Location
    Ottawa, Ontario, Canada; West Palm Beach FL
    Posts
    16,870
    Manish,

    What exactly do you need? What have you tried so far?
    Do you know how these things are related:
    Abstract,
    Article day,
    Article Month,
    Article year,
    Affiliation,
    Journal title,
    PMID

    I tried to create a simple stylesheet, but got an XPath Parse error. I also tried importing the xml and got an error regarding the DTD.

    I've removed some of the xml at the start and renamed the file ".txt"
    I can read it into an access table, each line has a unique number.

    How do you define abstract?
    I'm finding lines like

    <AbstractText Label="MATERIALS AND METHODS" NlmCategory="METHODS">Using a case-control study design, the investigators identified and adju....
    There are 751 lines identified as <PMID Version="1">24227163</PMID>
    There are 229 Abstracts

  5. #5
    Manish_05 is offline Novice
    Windows XP Access 2010 32bit
    Join Date
    Apr 2014
    Posts
    12
    Quote Originally Posted by orange View Post
    Manish,

    What exactly do you need? What have you tried so far?
    Do you know how these things are related:
    Abstract,
    Article day,
    Article Month,
    Article year,
    Affiliation,
    Journal title,
    PMID
    Thanks Orange for your reply,

    There are no common fields in the child nodes mentioned above by you. I mean to say they are not related to each other, they are all independent fields.

  6. #6
    ItsMe's Avatar
    ItsMe is offline Sometimes Helpful
    Windows 7 64bit Access 2010 32bit
    Join Date
    Aug 2013
    Posts
    7,862
    I was able to import the XML file after deleting the line you mentioned and editing two fields in two tables from Text 255 to Memo.

    This Sample DB is the result. When you import your file, use the option to "Append Data to Existing Tables".
    XML_Tables.zip

    I am not an expert with XML. Using the wizard creates an issue. The problem is that you have so many different tables of data. There are no relationships between the tables and the data is basically useless. There are no key values for the data. You would have to include the node index with the import process and I am not sure how to do this using the Wizard.

    I will take another look at it later. Maybe someone has a way of including Key values in the import process.

  7. #7
    rpeare is offline VIP
    Windows XP Access 2003
    Join Date
    Jul 2011
    Posts
    5,441
    I am not super knowledgeable about XML files but from my understanding they are basically text files with loops within loops so if that's the case have you attempted to import building your own parser rather than relying on the XML import native to MS Access?

    You can cycle through lines of a text based file using filesystemobject, then trigger a record to be added to your database anytime you reach a 'new' record

    this would be the basic structure:

    Code:
    Dim f
    Dim fs
    Dim sSource
    
    Set fs = CreateObject("scripting.filesystemobject")
    sSource = "C:\test\pubmed_result(251).xml"
    Set f = fs.opentextfile(sSource)
    
    sline = f.readline
    Do While f.atendofstream <> True
        Debug.Print sline
        sline = f.readline
    Loop
    
    f.Close
    Set fs = Nothing
    then you'd have to build in recognition for what type of line you're processing to extract the proper value.

  8. #8
    Manish_05 is offline Novice
    Windows XP Access 2010 32bit
    Join Date
    Apr 2014
    Posts
    12
    Quote Originally Posted by ItsMe View Post
    I was able to import the XML file after deleting the line you mentioned and editing two fields in two tables from Text 255 to Memo.

    This Sample DB is the result. When you import your file, use the option to "Append Data to Existing Tables".
    XML_Tables.zip

    I am not an expert with XML. Using the wizard creates an issue. The problem is that you have so many different tables of data. There are no relationships between the tables and the data is basically useless. There are no key values for the data. You would have to include the node index with the import process and I am not sure how to do this using the Wizard.

    I will take another look at it later. Maybe someone has a way of including Key values in the import process.

    Thanks bro, for your help. I am trying to automate to this process, because user will download any number of XML file from Pubmed site and will import the data in access for further use so i can't use the Import Wizard.

    I will not do any data analysis with the imported data, it will be used as a database so relationship is not required.

    Thanks ItsMe.

  9. #9
    Manish_05 is offline Novice
    Windows XP Access 2010 32bit
    Join Date
    Apr 2014
    Posts
    12
    Quote Originally Posted by rpeare View Post
    I am not super knowledgeable about XML files but from my understanding they are basically text files with loops within loops so if that's the case have you attempted to import building your own parser rather than relying on the XML import native to MS Access?

    You can cycle through lines of a text based file using filesystemobject, then trigger a record to be added to your database anytime you reach a 'new' record

    this would be the basic structure:

    Code:
    Dim f
    Dim fs
    Dim sSource
    
    Set fs = CreateObject("scripting.filesystemobject")
    sSource = "C:\test\pubmed_result(251).xml"
    Set f = fs.opentextfile(sSource)
    
    sline = f.readline
    Do While f.atendofstream <> True
        Debug.Print sline
        sline = f.readline
    Loop
    
    f.Close
    Set fs = Nothing
    then you'd have to build in recognition for what type of line you're processing to extract the proper value.
    rpeare, i tried your code it run without any bug, but i am not able to see any table in access with the data. I think I am asking you some absurd question, but I dont know how to parse data. I worked on Excel VBA, but I am totally new to Access. Sorry but I am asking too much from you.

  10. #10
    orange's Avatar
    orange is offline Moderator
    Windows XP Access 2003
    Join Date
    Sep 2009
    Location
    Ottawa, Ontario, Canada; West Palm Beach FL
    Posts
    16,870
    rpeare, ItsMe
    I agree with rpeare approach. I was trying similar, since I haven't used XML in 8 years and that was minimal.
    I read the xml into a table with 1 field FieldX.
    I was looking for the tags that represented the lines to be kept.
    If I could do an Instr (FieldX, "Tag") > 0 then keep the line -- where xxx is a line of input and Tag was one of the appropriate tags.
    But since there is apparently no relationship??? I'll leave the OP to use ItsMe's database.
    I'm not sure what the prohibited DTD represents, but my simple xslt was dead from the get-go.

    Glad to see everyone is busy....

  11. #11
    rpeare is offline VIP
    Windows XP Access 2003
    Join Date
    Jul 2011
    Posts
    5,441
    you don't see a table you'd have to pick out the records (or individual pieces of data) you wanted for various tables then append them as you went through each record. Basically you'd have to build a SQL statement then execute it within the loops.

  12. #12
    Manish_05 is offline Novice
    Windows XP Access 2010 32bit
    Join Date
    Apr 2014
    Posts
    12
    Quote Originally Posted by orange View Post
    rpeare, ItsMe
    I agree with rpeare approach. I was trying similar, since I haven't used XML in 8 years and that was minimal.
    I read the xml into a table with 1 field FieldX.
    I was looking for the tags that represented the lines to be kept.
    If I could do an Instr (FieldX, "Tag") > 0 then keep the line -- where xxx is a line of input and Tag was one of the appropriate tags.
    But since there is apparently no relationship??? I'll leave the OP to use ItsMe's database.
    I'm not sure what the prohibited DTD represents, but my simple xslt was dead from the get-go.

    Glad to see everyone is busy....

    Friends, if "prohibited DTD" is the one which is creating the problem then manually we can remove it by deleting the second line from XML file by opening it in Notepad

  13. #13
    ItsMe's Avatar
    ItsMe is offline Sometimes Helpful
    Windows 7 64bit Access 2010 32bit
    Join Date
    Aug 2013
    Posts
    7,862
    Quote Originally Posted by rpeare View Post
    you don't see a table you'd have to pick out the records (or individual pieces of data) you wanted for various tables then append them as you went through each record. Basically you'd have to build a SQL statement then execute it within the loops.
    I am all for the approach of parsing data and appending records in tables.

    Post #6 has a DB that is able to use an import process to existing tables. The only thing that is missing are the Key values. Since I am not a pro with XML I did not want to be emphatic that this XML example is not "strongly typed". There may be a way to supplement the Import Wizard and include the needed key values.

    In order to properly import the data I see a need to enumerate the Nodes in the XML file. You could do this reading the entire thing as text. My preference is to use Internet Explorer and go after Elements. It is much easier to go after ID and Name elements than to read each line, to then interpret what a given line's Element ID or Element Name is.

    I use an approach similar to this but it gets much more involved.
    https://www.accessforums.net/program...tml#post202672

  14. #14
    rpeare is offline VIP
    Windows XP Access 2003
    Join Date
    Jul 2011
    Posts
    5,441
    Personally, I do not like to alter original files, period. If there is a problem at any point and you've altered your original information you're screwed in tracking it down/auditing it. That's why I suggested the filesystemobject method, then you're treating it strictly as a text file and because it's in xml format it's got a reliable layout that is easy to parse. It may be a more heavy handed approach but at least you would be able to import your data without any special modification to your source files.

  15. #15
    ItsMe's Avatar
    ItsMe is offline Sometimes Helpful
    Windows 7 64bit Access 2010 32bit
    Join Date
    Aug 2013
    Posts
    7,862
    Quote Originally Posted by rpeare View Post
    Personally, I do not like to alter original files, period...
    I believe the DTD line is there for the benefit of Web Browsers and other apps reading the XML. I do not know because I do not write XML but I believe the DTD data will provide information on required elements and where to place a given element, relative to its parent Node.

Page 1 of 2 12 LastLast
Please reply to this thread with any new information or opinions.

Similar Threads

  1. Replies: 10
    Last Post: 06-16-2014, 08:37 AM
  2. Replies: 7
    Last Post: 01-21-2014, 11:32 AM
  3. Child Record Created, but won't display in child form
    By CementCarver in forum Programming
    Replies: 1
    Last Post: 04-10-2013, 12:42 PM
  4. Replies: 3
    Last Post: 12-14-2012, 11:26 AM
  5. Responding To Clicks On Treeview Nodes
    By Hack in forum Programming
    Replies: 3
    Last Post: 04-12-2012, 06:46 AM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Other Forums: Microsoft Office Forums