Results 1 to 8 of 8
  1. #1
    Stan Denman is offline Advanced Beginner
    Windows XP Access 2010 32bit
    Join Date
    Mar 2012
    Posts
    50

    Count word frequency in a table

    I have a table of hospital names. I want to determine what single and tuples in each hospital's name is unique (single occurence in the table) identification purposes. Ideally I want to exclude "fluff" words like "of" from the query where those words have a frequency that exceeds "X" appearances. I would like a query that gives me this result from the table:

    RH 1
    Dedman 1
    RH Dedman 1
    Mother Francis 1
    Francis 1
    Death 1
    Vincennes 1
    Regional 1
    Mercy 1
    Tulsa 1


    JPS 1

    Where the table has these records: "Mother Francis Memorial Hospital and Clinic of Tulsa" and "Angel of Death Health and Medical Center" and "Mother Angel of Mercy memorial Hospital", "Vincennes Medical Center" and "JPS Health Clinic", "RH Dedman Memorial Hospital".

    In this example, let's say "X" is 3, therefore "Hospital", medical", "medical center" , "of" , "and", memorial" are not a part of determining what is a one-apperance tuple (obviously would not be unique one word).

  2. #2
    ranman256's Avatar
    ranman256 is offline VIP
    Windows Vista Access 2010 32bit
    Join Date
    Apr 2014
    Location
    Kentucky
    Posts
    9,523
    I would loop thru the recordset,
    then loop thru 1 field, pulling 1 word at a time, then append it to the 'count' table.
    when done, run delete query to remove the list of fluff words,(via the tFluff table)
    then run the count.

  3. #3
    CJ_London is online now VIP
    Windows 10 Access 2010 32bit
    Join Date
    Mar 2015
    Posts
    11,411
    be clear about what you are looking for - individual words? or phrases?

    "medical", "medical center" - if there are two 'medical's and one 'medical center' then actually you have three medical's

    and 'mother francis' is a phrase

    on that basis there is an argument that says '
    Vincennes Medical Center' is unique



  4. #4
    Stan Denman is offline Advanced Beginner
    Windows XP Access 2010 32bit
    Join Date
    Mar 2012
    Posts
    50
    Thank you Ajax. I do see your point. Certainly an entire name is unique and identifying. A little background of what I am trying to do: I have an html table with hospital names entered by a government worker. And I wanting to match what the operator entered with the correct hospital. Obviously, if the operator entered the correct name (that being what is found in a database of hospitals) I would have no need to try to create this code. But the operators are sloppy. The thing is, though, they are not indiscriminate about the data they enter. They seems to actually follow a rough "that's good enough" methods. For example, here in Dallas Fort Worth most people know "parkland" to be the county hospital. So I see operators entering in "Parkland Memorial Hospital", " Parkland Medical Center" or "Parkland Hospital". I am thinking if I can identify words that are unique to a hospital name that this will likely be a correct match.

    So I have those two extremes for identifying the hospital: Complete name match, so its got to be right, and at the other extreme just one word matches but that work in not in the name of any other hospital. When it comes to phrases I am less sure. When it comes to hospital names, there are a lot of "fluff" words like "Hospital, Medical Center" etc. I certainly could find unique phrase made up of fluff words, but they question is whether that will be meaningful, or simply a coincidence. For, example, I might find a unique, "Hospital and Medical Center" but is that really reflecting meaning?

    Anyway, so what I am saying is I am out to find unique single words, and a unique phrase of two words UNLESS those two words are "Fluff" words which I am going to define as appearing in the list of hospital names more than X times. Thank you so much for your interest and response.

  5. #5
    Stan Denman is offline Advanced Beginner
    Windows XP Access 2010 32bit
    Join Date
    Mar 2012
    Posts
    50
    Thank you very much for your reply!

  6. #6
    CJ_London is online now VIP
    Windows 10 Access 2010 32bit
    Join Date
    Mar 2015
    Posts
    11,411
    You have added a bit of vital information with the comment
    And I wanting to match what the operator entered with the correct hospital. Obviously, if the operator entered the correct name (that being what is found in a database of hospitals)
    basically you are trying to match to something, not just find unique words

    name matching is a bit of an art and trial and error - every situation is different. I often see this issue where two separate systems each want to be the 'master' for particular data - for example a billing system and a crm system. CRM wants to be master of the customer data because it is often the 'first introduction' of a new customer to the business, billing wants to be master because for an invoice to be legally valid you must have the right legal name in the event you need to chase them for non-payment.

    depends on the volume of data but if you already have a list of 'valid' names, then using a vba function parse the words into an array by splitting on the space character. then do the same with each record from your 'good enough' table compare each element of the valid array with each element of the good enough array. Then create a score based on the number of matches divided by the number of words in the valid name - so never more than 1. Or perhaps do it the other way round.

    Order may matter - without order 'medical center' would score 1 when compared with 'center of medical excellence' - or if done the other way round 0.5.

    with order e.g. first word 'medical' (1) so look to other array for medical - and that is 3. second word 'center'(2), but you only look for matches after 3 - so no score. So now instead of scoring 1, you score 0.5 (or 0.25 for the other way round). The higher the score, the closer the match

    but for all this effort, you still wont match Parkland to Parklands or Parkland to Park Land.

    Another basis (which can also be combined with the above) is to use the fluff table of words you want to ignore - such as of, hospital, center, etc.Again have a vba function which loops through the fluff table and either removes those words from the name Or regularises them in some way, so you are left with a rump of non fluff words.

    Anyway just some suggestions of techniques you might employ - and don't forget in most cases the identifier name (e.g. Parklands) will be the the first word - so you only need to compare in the other table with names that start with the same word (or perhaps same letter). i.e. instead of looking at 1000 records, you only need to look at 20. - and don't forget to eliminate exact matches before you start.

  7. #7
    Missinglinq's Avatar
    Missinglinq is offline VIP
    Windows 7 64bit Access 2007
    Join Date
    May 2012
    Location
    Richmond (Virginia, not North Yorkshire!)
    Posts
    3,018
    Parsing names...people...streets...hospitals...is usually one of the biggest hairballs you'll encounter in coding in VBA! The possibilities are simply endless!

    You'll also have to taken into account names with commas in it...i.e. Saint Jude's...and periods...V.C.U. Medical Center.

    My guess would be that your best approach, going forward, would be to have a Combobox with hospital names. You could have the variations as the first Field...V.C.U. Medical Center...VCU Med Center...MCV (yes...the original name) then the correct name as the second field...Virginia Commonwealth University Medical Center...and have it populate the hospital name field.

    Linq ;0)>
    The problem with making anything foolproof...is that fools are so darn ingenious!

    All posts/responses based on Access 2003/2007

  8. #8
    CJ_London is online now VIP
    Windows 10 Access 2010 32bit
    Join Date
    Mar 2015
    Posts
    11,411
    @linq

    reminds of a solution I provided a few years ago. A table with a 'master name' and aliases

    AliasPK...AliasName...AliasFK
    1...........Frederick.....1
    2...........Fred............1
    3...........Freddy ........1
    4...........Janet ..........4
    5...........Jan..............4

    the master name was where PK=FK

    query would be something like

    Code:
    SELECT B.AliasName
    FROM tblAliases A inner JOIN tblAliases B ON A.AliasFK=B.AliasPK
    WHERE A.AliasName LIKE "*" & [Enter a name] & "*"

Please reply to this thread with any new information or opinions.

Similar Threads

  1. Count times a word appears in a month
    By omegads in forum Queries
    Replies: 3
    Last Post: 09-12-2016, 07:14 PM
  2. Replies: 5
    Last Post: 10-29-2014, 12:12 PM
  3. Report Based on Field Frequency
    By thegnome in forum Reports
    Replies: 1
    Last Post: 03-12-2013, 12:28 PM
  4. Generate reports by frequency
    By MFS in forum Programming
    Replies: 2
    Last Post: 11-18-2010, 08:09 AM
  5. how do i do a word count
    By clueless in forum Queries
    Replies: 0
    Last Post: 06-03-2009, 09:01 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Other Forums: Microsoft Office Forums