Attachment 52705Attachment 52705I would welcome any comments and suggestions on my DB table and schema. Let me tell you what I am trying to accomplish. The DB is for storing information about an evidence file in pdf form. The file is a combination of structured, semi and unstructured data. I am first addressing extraction of structured and semi-structured data. A key strategy is for me to direct the search based upon the type of document in the ParentBookmark text. So for example, if i have a DDE type doc, I know that I need facts A to G. This information is stored in an ontology. And the ontology has specific instructions for how to find a given text C. I have found if the pdf is converted to text I can reliable know that text C always follows test "this is the time" and stops with the text "for action". If I look at the data I want to extract, I can put all the data in 1 of three buckets: Binary, FreeText and TextLimited. So "Can the claimant do unskilled" work is a yes or no answer. Fact of "Stair Climbing Ability" as one of Unlimited, Never, Occasionally, Frequently. And we can have free text answers with lots of text. Also some CaseFacts can precipitate lots of other needed information. So a case fact of "past" job" needs lots of detail for that job. So I have organized as attached. the key table is CaseFacts. But it handles only "one off" type questions, like for example "UnskilledWorkAbility" with its boolean answer. And I have subclassed for those types of inquiries that trigger a lot more needed information like TreatingSources and PRW and OtherWork. Chat GPT has also suggested that I subclass based upon the 2 types: Boolean, FixedList and FreeText because for every record in CaseFacts 2 of 3 fields FactValueBoolean, FactValueFreeText and FactValueLimitedList will always be null. But I am not really sure how to do that, or if we create unneeded complication. I guess we could have tables PRW, TreatingSources and OtherWork as subtables if I have a tableCaseFact_FreeText?
So what I want the workflow for my app to do is accept an upload of a pdf. Then create initial tables including the PDFBookmark and PDFParentBookmark table that give me the structure. Now I process the pages based upon the identify of the ParentBookmark. that is, I have an ontology that has rules for what data values, and how to get them, for any given document. And with these extracted data points get we create a record in CaseFact, etc