Results 1 to 9 of 9

Randomizing Data for Posted db's

  1. #1
    Micron is online now Virtually Inert Person
    Windows 10 Access 2016
    Join Date
    Jun 2014
    Location
    Ontario, Canada
    Posts
    6,319

    Randomizing Data for Posted db's

    We often ask poster to upload a copy of their db and the issue is often that it contains sensitive data. Here's a little function that we could point them to whereby they could,
    ON A COPY OF THEIR DB, randomize string field data. Sometimes complete data removal will not allow us to troubleshoot OP's issue because queries, forms, etc. need records in order to observe or trouble shoot the problem. The following function could solve the issue of revealing sensitive data by randomizing all or part of a string field by calling the function in an update query. The function call goes into the UPDATE TO row of the query design grid. This sql worked properly on a table that I tested it on (I have aliased the table and field names):

    Code:
    UPDATE tbl1 SET tbl1.field1 = randomizedata(tbl1.field1,3);
    The supposition is that it may be desirable to retain n characters at the beginning of the string and randomize the rest, hence the numeric parameter. If not, use 0 (untested). The function call would go into each text field that you want to randomize, using the proper field reference of course. The function being called is

    Code:
    Function RandomizeData(strField As String, i As Integer) As String
    Dim str1 As String, str2 As String
    
    'i= count of characters from beginning of string that are to be retained. Pass to Left function and assign to str1
    str1 = Left(strField, i)
    
    'make loop counter start at position number of NEXT character which is to be replaced. End value is string length.
    For i = i + 1 To Len(strField)
    
    'make str2 equal current value + a random lower case character from a to z and repeat to end of counter
    'to ignore spaces, comment out the If, Else, End If lines
    
    If Mid(strField, i, 1) = " " Then
        str2 = str2 + Mid(strField, i, 1)
    Else
        str2 = str2 & Chr((121 - 97 + 1) * Rnd + 97)
        'Debug.Print str2
    End If
    Next
    
    'piece left and right parts together and return to query
    RandomizeData = str1 + str2
    
    'e.g input string = Donald Duck, output string with 1st 2 characters retained might be Dosnoh itat
    
    End Function
    If there is any interest in taking this further I can write an additional functions that retain OR randomizes numeric data, retains numeric data in strings (e.g. addresses) or separators (e.g. "-" in phone numbers). I'll put that on hold until I see if there is even any interest in this so far.

    ***NOTE***
    Randomizing data is only practical IF the values are NOT PK (Primary Key) values, or if they are, Cascade Updates has been set in relationships between a field to be randomized and any other table where the values exist as Foreign Keys. If related foreign values are autonumber PK values, then there should be no issue, otherwise randomizing one side of a relationship and not the other would be pointless.
    - "doesn't work" is no help. Post err msgs and where.
    - Use code tags for code/sql. Implement changes in copies of your database.

  2. #2
    orange's Avatar
    orange is online now Moderator
    Windows 10 Access 2010 32bit
    Join Date
    Sep 2009
    Location
    Ottawa, Ontario, Canada; West Palm Beach FL
    Posts
    14,305
    Micron,
    An interesting approach, but will the OP, who is hesitant to provide the database, accept a function and modify his/her table field values.
    Another option is to direct the OP to a Test Data Generator ( I use this one from time to time) and get them to populate a copy of their table(s) accordingly.
    I guess my concern is that many posters who feel their data is confidential/priceless/must remain secret don't believe we are not here to "steal" such data. In fact, when asked to provide their structures (jpg of tables and relationships) even without the actual tables and data are very hesitant and say things like company policy prevents me from showing anything. They can't see the value in giving a mock up of their situation with anonymized names (people, places, things...). Bottom line is would such posters actually use such a facility?

    I often ask for sample data --Porky Pig, Paige Turner, General Purpose, Central City and anonymized names etc.
    Rarely do I see a poster provide such data back.

    I have some functions for RandomNumbers, RandomAlphas and RandomDates but only used to assist a poster or test a routine.

    How practical would this be? An interesting project for sure.

  3. #3
    Micron is online now Virtually Inert Person
    Windows 10 Access 2016
    Join Date
    Jun 2014
    Location
    Ontario, Canada
    Posts
    6,319
    An interesting approach, but will the OP, who is hesitant to provide the database, accept a function and modify his/her table field values.
    Regarding that and the rest of your questions - I don't know and/or remains to be seen. As noted, running this on a copy of a db (don't think I could have made that clearer to them, correct?) should have no impact on their data. As for Porky, etc. that is not much different that what I provided except that they don't have to update all Albert to Donald, all Samuel to Mickey... and so on, one by one. Just call the function on any/every text field and randomize the value partially or completely. I have a situation where this is required for a business case so that I don't see sensitive data and figured that I might as well share it. If no takers, then no problem, but I know it will be useful for my situation.
    Rarely do I see a poster provide such data back.
    Maybe if we point them to this function they will. I guess I won't know because it's not downloadable.

    P.S. I bookmarked your link in the hopes it will prove useful some day. Thanks.
    Last edited by Micron; 07-04-2019 at 07:15 PM. Reason: added comment
    - "doesn't work" is no help. Post err msgs and where.
    - Use code tags for code/sql. Implement changes in copies of your database.

  4. #4
    orange's Avatar
    orange is online now Moderator
    Windows 10 Access 2010 32bit
    Join Date
    Sep 2009
    Location
    Ottawa, Ontario, Canada; West Palm Beach FL
    Posts
    14,305

  5. #5
    Uncle Gizmo is offline Novice
    Windows 10 Access 2013 64bit
    Join Date
    Oct 2016
    Posts
    23
    Several companies gave me their confidential data without a thought. Only one got me to sign a non-disclosure agreement, but only after I pointed it out to them. They were a private clinic helping people with conditions including drug addiction and nymphomania amongst others.

    Sent from my Pixel 3a using Tapatalk

  6. #6
    orange's Avatar
    orange is online now Moderator
    Windows 10 Access 2010 32bit
    Join Date
    Sep 2009
    Location
    Ottawa, Ontario, Canada; West Palm Beach FL
    Posts
    14,305

  7. #7
    Micron is online now Virtually Inert Person
    Windows 10 Access 2016
    Join Date
    Jun 2014
    Location
    Ontario, Canada
    Posts
    6,319
    Orange - not sure what you mean. As you can see, no one else commented on the usefulness (or not). I expect only time will tell, and that it will be driven by how many times anyone directs a poster here when they have concerns about data privacy. Even then, I'm not sure I'll know as there's nothing to download (which at best would provide a counter of downloads but not be an indicator of use). Perhaps if the community implores users to leave a comment after using (and they do) I will have something to go on.

    I appreciate your feedback.

  8. #8
    orange's Avatar
    orange is online now Moderator
    Windows 10 Access 2010 32bit
    Join Date
    Sep 2009
    Location
    Ottawa, Ontario, Canada; West Palm Beach FL
    Posts
    14,305
    My understanding is that it is something you need for some current/impending project, and you were going to build something. I think it's a worth while approach to anonymize tables in an existing database, rather than getting a poster to drain all tables, then fill same with data from a test data generator. If you're building something along the lines you posted, then I'm interested in testing once the logic of PK and FK and/or unique indexes or whatever you are contemplating is determined. I think it would be a very useful exercise just from a learning perspective.

    I did test the function against a variety of tables and randomized text and memo type fields and all seemed fine. As with most of my tables and modules it's a lot of independent things created to assist posters, so not a "real data base " as such.

    Anyway, if you have something you want to share regarding testing or additional functionality or PK/FK/unique indexes or other "constraints", I am interested.
    Good luck.

  9. #9
    Micron is online now Virtually Inert Person
    Windows 10 Access 2016
    Join Date
    Jun 2014
    Location
    Ontario, Canada
    Posts
    6,319
    OK, I see. Yes it's for an impending job that contains data that I'm not allowed to see. My plan is to just give the code with instructions because they know enough to be able to stick it in a standard module and use it in an update query against a copy of the db. What they can't do on their own is fix certain issues they are having, but I need related tables with data to have any hope of solving their issue. I know that the tables in question have relationships - I just need to remember to ensure referential integrity is being employed.

    Nice to hear that it worked for you as much as you were able to test it. I'm not sure I see the need to put the function in a db though. As you have stated, it's usable as is - just make sure it's correctly used! Putting it in a db would make it harder for the poster to get at, I think.

    if you have something you want to share regarding testing or additional functionality or PK/FK/unique indexes
    Interesting comment. Wondering what the likelihood is that it would ever generate 2 random but exact modifications across thousands of records? Checking the table for the randomized value first might slow things down a lot, especially if DLookup was used. Hmmm, if it was a paid solution, I certainly would have to cover that off?

Please reply to this thread with any new information or opinions.

Similar Threads

  1. Posted new post but now I cannot find it...
    By shephardfamilyenterprise in forum General Chat
    Replies: 15
    Last Post: 12-18-2018, 01:53 PM
  2. randomizing statments on report
    By BatmanMR287 in forum Reports
    Replies: 2
    Last Post: 07-12-2015, 01:34 PM
  3. Replies: 0
    Last Post: 01-15-2015, 11:03 AM
  4. Randomizing based on criteria
    By webisti in forum Access
    Replies: 8
    Last Post: 12-18-2013, 11:29 AM
  5. Replies: 1
    Last Post: 01-19-2012, 03:31 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Tech Forums: Microsoft Office Forums