Results 1 to 9 of 9
  1. #1
    Stopwatch is offline Novice
    Windows 7 64bit Access 2013
    Join Date
    Sep 2014
    Posts
    5

    Parsing HTML Tables for Storage in Access

    The Problem:




    I have a bunch of HTML with multiple tables, two of which are of interest (posted above). The first table describes a person, their address, phone number, and so on. The second table describes information about their account. Both tables have headers (text in `<h3></h3>` tags) through which they can be referenced.
    I am relatively new to manipulating HTML and so my first question is how someone goes about taking the first table and importing it into an Access database or into Excel so that if they had 50 of these documents the columns would
    be, in this example, Name, Address, Phone, Age, with each row being a different person.


    My second question is slightly more complicated. The second table has another table embedded in it. Essentially this means that there are lots of smaller split up tables that need to be put into one complete table. That is, I am hoping to find a way to treat it as an additional column in the main table (the one actually given its own header)
    housing this smaller table. The idea would be to then manipulate this table in the same way as the first table. The final problem is that these tables don't have a common field that I can use to relate them, so there would need to be a way to take
    a field from the first table and append it to the second table. Considering this, I think the steps needed can be summarized as follows:


    >1.


    > For a single HTML document take the second table and perform the
    > actions necessary to append the smaller table's <td></td> tag and
    > the main table's `<th colspan="7"></th>` to the section formatted
    > as

    `<tr> <td style="text-align: center;">1</td> <td
    style="text-align: center;">10/2013</td> <td style="text-align:
    center;">12345678</td> <td style="text-align: right;">0167</td> <td
    style="text-align: center;">10/2014</td> <td style="text-align:
    right;">0036</td> <td style="text-align: center;">1111</td> </tr>`

    > so that all 9 pieces of data get their own column upon import.
    >
    > 2. Take the column whose <th></th> tags contain the string "Name" and add it as the first column in the second table.
    >
    > 3. Import the first table's <td></td> tags to Access.
    >
    > 4. Import the second table's now revised <td></td> tags to Access.
    >
    > 5. Repeat for all HTML documents so that there are two complete tables.



    The HTML:

    Code:
     <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
            <html>
              <head>
                <title>Customer File</title>
                <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
                <style type="text/css">
                  body { width: 800px; margin: 40px auto; font-family: Verdana; }
                  h1 { font-size: 18pt; margin-bottom: 20px; }
                  h2 { font-size: 14pt; margin-bottom: 20px; }
                  h3 { font-size: 11pt; text-decoration: underline; margin-bottom: 20px; }
                  th { text-align: left; }
                  hr { height: 1px; border: none; margin: 20px 0; color: #ccc; background-color: #ccc; }
                  .text { font-size: 10pt; }
                </style>
              </head>
              <body>
    
    
    
    
              <!--FIRST TABLE--> 
    
    
            <h3>Customer</h3>
                    <table class="text" cellspacing="0" cellpadding="0" border="0">
                      <tr>
                        <th style="width: 12em;">Name</th>
                        <td>Tom Thompson</td>
                      </tr>
                      <tr>
                        <th>Address</th>
                        <td>123 St., Vancouver, BC, V6Z 2M7</td>
                      </tr>
                      <tr>
                        <th>Phone</th>
                        <td>555-555-1233</td>
                      </tr>
                      <tr>
                        <th>Date of Birth, Age</th>
                        <td>03/04/1985, 29</td>
                      </tr>
                    </table>
                
               <!-- SECOND TABLE--> 
                
                    <h3>Collection</h3>
                    <table class="text" cellspacing="0" cellpadding="0" border="0" style="width: 640px;">
                      <tr>
                        <th style="text-align: center;">#</th>
                        <th style="text-align: center;">Date Purchased</th>
                        <th style="text-align: center;">Product Name</th>
                        <th style="text-align: right;">$</th>
                        <th style="text-align: center;">Date of Product Procurement</th>
                        <th style="text-align: right;">Item Number</th>
                        <th style="text-align: center;">Order Number</th>
                      </tr>
                      <tr>
                        <td colspan="7">
                          <br />
                        </td>
                      </tr>
                      <tr>
                        <th colspan="7">VALUE MENU</th>
                      </tr>
                      <tr>
                        <td style="text-align: center;">1</td>
                        <td style="text-align: center;">05/10/2013</td>
                        <td style="text-align: center;">Hamburger</td>
                        <td style="text-align: right;">4.50</td>
                        <td style="text-align: center;">01/10/2013</td>
                        <td style="text-align: right;">0005</td>
                        <td style="text-align: center;">5678345</td>
                      </tr>
                      <tr>
                        <td colspan="7">
                          <table class="text" cellspacing="0" cellpadding="0" border="0">
                            <tr>
                              <th style="width: 10em;">Cashier Name</th>
                              <td>Sam Johnson</td>
                            </tr>
                          </table>
                        </td>
                      </tr>
                    </table>
               </body>
               </html>

  2. #2
    ranman256's Avatar
    ranman256 is offline VIP
    Windows Vista Access 2010 32bit
    Join Date
    Apr 2014
    Location
    Kentucky
    Posts
    9,525
    For the 1st Q...
    in a Macro , use transfertext,"Import HTML"
    that should import the data.

    Q2, I dont think you can import a table in a table. It must be reconfigured or reformated, or copied /pasted.

  3. #3
    Stopwatch is offline Novice
    Windows 7 64bit Access 2013
    Join Date
    Sep 2014
    Posts
    5
    Sorry ranman, could you expand on what you mean? The issue at the moment is that I am not sure how to reconfigure or reformat the table so that all sub-table fields are appended as new columns.

  4. #4
    ssanfu is offline Master of Nothing
    Windows XP Access 2000
    Join Date
    Sep 2010
    Location
    Anchorage, Alaska, USA
    Posts
    9,664
    I would have to see more data, but it *might* be possible to write a UDF to capture the data. It would involve a lot of code.

    Since "#" and "&" are not valid field names you would have to change the field names.
    Having multiple data in one field is not normalized (DOB, Age) but could be dealt with.

    I would have (add) an autonumber (PK) field it table "Customer". Read the customer data, check if already in the table, append if not, grab the PK field number and put it in the FK field in table "Collection".

    Wouldn't be difficult but could be challenging.

  5. #5
    Stopwatch is offline Novice
    Windows 7 64bit Access 2013
    Join Date
    Sep 2014
    Posts
    5
    Thanks ssanfu. Is there any easier way you can see to get html of this format into a database? Would MySQL just complicate things? I'm not even sure what a UDF is, which is probably related to the reason this is causing me so much trouble.

  6. #6
    ssanfu is offline Master of Nothing
    Windows XP Access 2000
    Join Date
    Sep 2010
    Location
    Anchorage, Alaska, USA
    Posts
    9,664
    Is there any easier way you can see to get html of this format into a database?
    Don't have a clue. I haven't had to import html data.

    Would MySQL just complicate things?
    MySQL is just a database engine like SQL Server, Jet/ACE, Oracle,etc.

    UDF is User Defined Function. Basically, code you write (function or subroutine) to accomplish what you want to happen; importing data to tables, special calculations, things you need/want to happen that aren't built in to Access.

    I Googled "import html table to access vba" and first site found (with code example - might not be all you need):
    http://stackoverflow.com/questions/1...ccess-database

    Second site (also with code):http://bytes.com/topic/access/answer...site-using-vba


    Also try "import html table to access". Lots of results...

  7. #7
    Stopwatch is offline Novice
    Windows 7 64bit Access 2013
    Join Date
    Sep 2014
    Posts
    5
    Quote Originally Posted by ssanfu View Post
    Don't have a clue. I haven't had to import html data.


    MySQL is just a database engine like SQL Server, Jet/ACE, Oracle,etc.

    UDF is User Defined Function. Basically, code you write (function or subroutine) to accomplish what you want to happen; importing data to tables, special calculations, things you need/want to happen that aren't built in to Access.

    I Googled "import html table to access vba" and first site found (with code example - might not be all you need):
    http://stackoverflow.com/questions/1...ccess-database

    Second site (also with code):http://bytes.com/topic/access/answer...site-using-vba


    Also try "import html table to access". Lots of results...
    Thanks, but the biggest issue is modifying the structure of the HTML. Are there any procedures in an already pre-constructed HTML-parsing libraries that would really do a good job with this?

  8. #8
    ssanfu is offline Master of Nothing
    Windows XP Access 2000
    Join Date
    Sep 2010
    Location
    Anchorage, Alaska, USA
    Posts
    9,664
    but the biggest issue is modifying the structure of the HTML
    I thought you were trying to import data from html tables?????


    Are there any procedures in an already pre-constructed HTML-parsing libraries that would really do a good job with this?
    Maybe this will help
    http://stackoverflow.com/questions/5...ccess-database (See the answer)

    I Googled "parse html in access 2010" .... lots of results

  9. #9
    Stopwatch is offline Novice
    Windows 7 64bit Access 2013
    Join Date
    Sep 2014
    Posts
    5
    Thanks ssanfu! That's correct, but I'm trying to import data from HTML tables that contain sub-tables. These can't be readily imported and so need to be parsed.

Please reply to this thread with any new information or opinions.

Similar Threads

  1. Replies: 1
    Last Post: 02-14-2014, 04:50 PM
  2. Replies: 0
    Last Post: 10-14-2013, 09:24 AM
  3. Replies: 6
    Last Post: 04-30-2013, 02:42 PM
  4. Access front end for parsing xml string
    By raghu_nandan1 in forum Programming
    Replies: 0
    Last Post: 04-21-2011, 07:58 PM
  5. parsing data in access (coding?)
    By banker247 in forum Programming
    Replies: 0
    Last Post: 01-13-2009, 12:05 PM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Other Forums: Microsoft Office Forums