Results 1 to 15 of 15
  1. #1
    Alioud is offline Novice
    Windows 7 64bit Access 2010 32bit
    Join Date
    Apr 2012
    Posts
    4

    Question Help needed to convert a massive text file into an access database

    Hi guys,



    This might be simple to a lot of you but it is very difficult to me.
    I have a large (150 megabyte) database in a text file.
    with records separated by *RECORD* and fields names are also between asterisks. This is an example of one record:



    *RECORD*
    *FIELD* NO
    100050
    *FIELD* TI
    100050 AARSKOG SYNDROME, AUTOSOMAL DOMINANT
    *FIELD* TX
    Grier et al. (1983) reported father and 2 sons with typical Aarskog
    syndrome, including short stature, hypertelorism, and shawl scrotum.
    They tabulated the findings in 82 previous cases. X-linked recessive
    inheritance has repeatedly been suggested (see 305400). The family
    reported by Welch (1974) had affected males in 3 consecutive
    generations. Thus, there is either genetic heterogeneity or this is an
    autosomal dominant with strong sex-influence and possibly ascertainment
    bias resulting from use of the shawl scrotum as a main criterion.
    Stretchable skin was present in the cases of Grier et al. (1983).

    Teebi et al. (1993) reported the case of an affected mother and 4 sons
    (including a pair of monozygotic twins) by 2 different husbands. They
    suggested that the manifestations were as severe in the mother as in the
    sons and that this suggested autosomal dominant inheritance. Actually,
    the mother seemed less severely affected, compatible with X-linked
    inheritance.

    *FIELD* RF
    1. Grier, R. E.
    dominant inheritance of the Aarskog syndrome. Am. J. Med. Genet. 15:
    39-46, 1983.

    2. Teebi, A. S.
    of a family with review and discussion of nosology. Am. J. Med. Genet. 46:
    501-509, 1993.

    3. Welch, J. P.: Elucidation of a 'new' pleiotropic connective tissue
    disorder. Birth Defects Orig. Art. Ser. X(10): 138-146, 1974.

    *FIELD* CS

    Growth:
    Mild to moderate short stature

    Head:
    Normocephaly

    Hair:
    Widow's peak

    Facies:
    Maxillary hypoplasia
    Broad nasal bridge
    Anteverted nostrils
    Long philtrum
    Broad upper lip
    Curved linear dimple below the lower lip

    Eyes:
    Hypertelorism
    Ptosis
    Down-slanted palpebral fissures
    Ophthalmoplegia
    Strabismus
    Hyperopic astigmatism
    Large cornea

    Ears:
    Floppy ears
    Lop-ears

    Mouth:
    Cleft lip/palate

    GU:
    Shawl scrotum
    Saddle-bag scrotum
    Cryptorchidism

    Limbs:
    Brachydactyly
    Digital contractures
    Clinodactyly
    Mild syndactyly
    Transverse palmar crease
    Lymphedema of the feet

    Joints:
    Ligamentous laxity
    Osteochondritis dissecans
    Proximal finger joint hyperextensibility
    Flexed distal finger joints
    Genu recurvatum
    Flat feet

    Skin:
    Stretchable skin

    Spine:
    Cervical spine hypermobility
    Odontoid anomaly

    Heme:
    Macrocytic anemia
    Hemochromatosis

    GI:
    Hepatomegaly
    Portal cirrhosis
    Imperforate anus
    Rectoperineal fistula

    Pulmonary:
    Interstitial pulmonary disease

    Thorax:
    Sternal deformity

    Inheritance:
    Sex-influenced autosomal dominant form
    also X-linked form

    *FIELD* CD
    Victor A. McKusick: 6/4/1986

    *FIELD* ED
    carol: 02/16/2011
    alopez: 6/3/1997
    mimadm: 3/11/1994
    carol: 7/7/1993
    supermim: 3/16/1992
    supermim: 3/20/1990
    ddp: 10/26/1989
    marie: 3/25/1988


    The original text file has a semicolon after each line, that's why Access converted it to a single row list like you see above.


    How do I make access Seperate each record since every new record starts with *Record* and then make access create and populate the fields like they appear above?

    Thanks!
    Ali

  2. #2
    June7's Avatar
    June7 is online now VIP
    Windows XP Access 2010 32bit
    Join Date
    May 2011
    Location
    The Great Land
    Posts
    52,898
    I don't understand description 'single row list'. You used the import wizard? There is no field header row and no field delimiter and each line is not a single complete record so I expect the wizard failed. Will need VBA. The asterisks are an additional complication because that is wildcard character.

    Google: access vb import text file
    There are many examples of coding for a variety of text file structures.

    I did a quick test of some code to import a couple of records based on the given sample.
    Code:
    Sub ImportTextFile()
    Dim rs As ADODB.Recordset
    Dim strLineData As String
    Dim strField As String ' Holder for field name
    Set cncurrent = CurrentProject.Connection
    Set rs = New ADODB.Recordset
    ' Open the text file
    Open "D:\Forums\Alioud.txt" For Input As #1
    ' Open the table to insert the text file into
    rs.Open "SELECT * FROM Table1", cncurrent, adOpenDynamic, adLockOptimistic
    Do While Not EOF(1)
        ' Read a line of data.
        Line Input #1, strLineData
        If strLineData Like "*[*]RECORD[*]*" Then
            rs.AddNew
        ElseIf strLineData Like "*[*]FIELD[*]*" Then
            strField = Right(strLineData, 2)
        Else
            rs.Fields(strField) = rs.Fields(strField) & strLineData
        End If
    Loop
    ' Close the data file.
    Close #1
    rs.Update
    rs.Close
    End Sub
    But first I set up a table with fields:
    ID Auto
    NO Text
    TI Text
    TX Memo
    RF Memo
    CS Memo
    CD Text
    ED Memo
    Last edited by June7; 04-07-2012 at 07:48 PM.
    How to attach file: http://www.accessforums.net/showthread.php?t=70301 To provide db: copy, remove confidential data, run compact & repair, zip w/Windows Compression.

  3. #3
    Alioud is offline Novice
    Windows 7 64bit Access 2010 32bit
    Join Date
    Apr 2012
    Posts
    4

    I can not thank you enough

    I can not thank you enough for your help.. but I am still no where. I tried and tried and searched many coding examples on google but to no avail. I still get errors on VBA with your code and the codes I copied from online sources.

    The file is a freely available database of all known human malformations from John Hopkins University called OMIM (Online Mendelian Inheritance in Man). The problem is, it is huge. The compressed file is around 70 megabytes and original size is 152 megabytes. I am trying to convert the text to access format so physicians can query the database offline.

    This is the compressed file.. (Ignore the small files, the file I am trying to convert is OMIM.TXT)

    http://www.4shared.com/file/IX6vCCcZ/omimtxt.html


    Thanks for your help in advance.

    Ali

  4. #4
    June7's Avatar
    June7 is online now VIP
    Windows XP Access 2010 32bit
    Join Date
    May 2011
    Location
    The Great Land
    Posts
    52,898
    You tried my code - what are the error messages? Just re-read your OP, I forgot about the semi-colon. Is it on the *FIELD* lines? If so, change code to:
    strField = Left(Right(strLineData, 3), 2)

    Be sure to set VBA reference to: Microsoft ActiveX Data Objects 2.8 Library
    How to attach file: http://www.accessforums.net/showthread.php?t=70301 To provide db: copy, remove confidential data, run compact & repair, zip w/Windows Compression.

  5. #5
    Alioud is offline Novice
    Windows 7 64bit Access 2010 32bit
    Join Date
    Apr 2012
    Posts
    4
    It would probably take me 3 centuries to learn how to set VBA reference to ActiveX Data Objects!!! Please don't laugh at me.. At least I try to be a good physician

    This is the file on my Google Documents
    https://docs.google.com/open?id=0B5Y...aGRYRGNwQ2V1UQ

    Thank you very much.

    Ali

  6. #6
    June7's Avatar
    June7 is online now VIP
    Windows XP Access 2010 32bit
    Join Date
    May 2011
    Location
    The Great Land
    Posts
    52,898
    Setting reference is easy. From the VBA editor > Tools > References, find item and check it.
    How to attach file: http://www.accessforums.net/showthread.php?t=70301 To provide db: copy, remove confidential data, run compact & repair, zip w/Windows Compression.

  7. #7
    Alioud is offline Novice
    Windows 7 64bit Access 2010 32bit
    Join Date
    Apr 2012
    Posts
    4
    I've never used VBA in my life.. I am trying on a friend's PC.. I will keep trying.

    If you convert the file to MS Access.. I think I should list your name on the project at my department, it is only fair.

    Ali

  8. #8
    June7's Avatar
    June7 is online now VIP
    Windows XP Access 2010 32bit
    Join Date
    May 2011
    Location
    The Great Land
    Posts
    52,898
    I downloaded the compressed file but cannot extract it. I don't have WinZip. I can open Windows Compression zip only.

    Here is an article about programming in Access http://office.microsoft.com/en-us/ac...010341717.aspx

    Also, review the article on debugging at the link at bottom of my post.
    Last edited by June7; 04-08-2012 at 05:39 AM.
    How to attach file: http://www.accessforums.net/showthread.php?t=70301 To provide db: copy, remove confidential data, run compact & repair, zip w/Windows Compression.

  9. #9
    ssanfu is offline Master of Nothing
    Windows 2K Access 2000
    Join Date
    Sep 2010
    Location
    Anchorage, Alaska, USA
    Posts
    9,664
    Quote Originally Posted by Alioud View Post
    I can not thank you enough for your help.. but I am still no where. I tried and tried and searched many coding examples on google but to no avail. I still get errors on VBA with your code and the codes I copied from online sources.

    The file is a freely available database of all known human malformations from John Hopkins University called OMIM (Online Mendelian Inheritance in Man). The problem is, it is huge. The compressed file is around 70 megabytes and original size is 152 megabytes. I am trying to convert the text to access format so physicians can query the database offline.


    Ali
    I also had errors when I tried to execute June7's code. I had to add a line

    Code:
    Dim cncurrent as Connection
    No more errors, but there was no data in the table. I don't use ADO, so maybe I did something wrong.

    So I tried DAO. It took a while, with many false starts. The file doesn't seem to be divided into records - just one long (long) string. I finally got the code right to separate the fields.
    So there were two problems: how to separate the records and how to separate the fields per record. There are a varying number of fields per record with some field names occurring twice in a record.

    Attached is a mdb (I have A2K) with the code, table and forms. One form is to do the import and the other is to view the data. I had to use a tab control to get all of the controls on the form.

    There are approx 22,229 records
    There are 10 fields, but not all fields are in every text file record and in some records there are duplicate fields. Where a record has duplicate fields, I concatenated them. (If there were two "CN" fields, the resulting record has both 'CN' data fields from the text file, in one database field.

    I tried to find out about the data structure by going to the locations you cited, but I was lost. So maybe this will give you a start.....

    You will have to import the text file. When I imported the text file, the mdb file ended up approx 360 MB. Open the "Importform" and click the button. Select the text file and sit back.... time.... A message box will appear when finished.

  10. #10
    RuralGuy's Avatar
    RuralGuy is offline Administrator
    Windows 7 64bit Access 2010 32bit
    Join Date
    Mar 2007
    Location
    8300' in the Colorado Rocky Mountains
    Posts
    12,922
    Way to go Steve. I was just starting to play with it because it looked like an interesting exercise. A little too slow it seems. I ran it in ac2010 and it stopped for some reason so I'll dig in and see what is going on. Nice job.

  11. #11
    RuralGuy's Avatar
    RuralGuy is offline Administrator
    Windows 7 64bit Access 2010 32bit
    Join Date
    Mar 2007
    Location
    8300' in the Colorado Rocky Mountains
    Posts
    12,922
    Steve,
    I replace the frm.Repaint in the ImportTxtFile sub with DoEvents and even though is slows down on occasion, it keeps on chunking along until done. Again, nice job.

  12. #12
    RuralGuy's Avatar
    RuralGuy is offline Administrator
    Windows 7 64bit Access 2010 32bit
    Join Date
    Mar 2007
    Location
    8300' in the Colorado Rocky Mountains
    Posts
    12,922
    Hmm...did you notice the maximum field sizes?
    AV = [261691]
    CD = [78]
    CN = [7634]
    CS = [4171]
    ED = [8143]
    NO = [6]
    RF = [261577]
    SA = [2902]
    TI = [1022]
    TX = [215265]
    Makes me want to examine those larger fields a little closer.

  13. #13
    ssanfu is offline Master of Nothing
    Windows 2K Access 2000
    Join Date
    Sep 2010
    Location
    Anchorage, Alaska, USA
    Posts
    9,664
    Thanks Allan.

    I couldn't find the structure on the OMIM web site, so I set the field sizes of the unknown fields to memo. Some of the fields has a *lot* of data... Worked on it intermittently over two days. Corrupted the mdb twice.. code did what I told it to do, not what I wanted it to do.

    What is the possibility of you posting the A2010 version??

  14. #14
    RuralGuy's Avatar
    RuralGuy is offline Administrator
    Windows 7 64bit Access 2010 32bit
    Join Date
    Mar 2007
    Location
    8300' in the Colorado Rocky Mountains
    Posts
    12,922
    So far I'm running your db with the one change I made. I'm still going to zero in on those huge fields. It could be that some marker was missing.

  15. #15
    RuralGuy's Avatar
    RuralGuy is offline Administrator
    Windows 7 64bit Access 2010 32bit
    Join Date
    Mar 2007
    Location
    8300' in the Colorado Rocky Mountains
    Posts
    12,922
    Nope! They are just *huge* fields with lots of comments. Wow! Let's hope the OP appreciates your work.

Please reply to this thread with any new information or opinions.

Similar Threads

  1. Convert access file to exe file
    By Mina Garas in forum Access
    Replies: 2
    Last Post: 12-30-2011, 10:11 AM
  2. Replies: 5
    Last Post: 10-28-2011, 12:12 PM
  3. Massive check-list database
    By JFo in forum Access
    Replies: 103
    Last Post: 09-19-2011, 07:06 PM
  4. Can I convert a access file to exe file?
    By zhshqzyc in forum Access
    Replies: 3
    Last Post: 01-25-2011, 09:49 AM
  5. Replies: 5
    Last Post: 10-24-2009, 01:16 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Other Forums: Microsoft Office Forums