This may be a long post so bear with me.
I am parsing a file that, at times, can be quite large and there is one section of my code that really takes a lot of time to run and I'm just not seeing a way to make it any more efficient so if anyone wants to take a crack at how I can do this more efficiently please feel free to post ideas.
1. The external data is what amounts to insurance eligibility information, and the local dataset must be updated to reflect the external data (text file) eligibility information
2. Some of the eligibility information may have overlapping dates (see below)
3. I must be able to detect voids in coverage as well as covered days (these are paid differently than covered days, see below)
4. The eligibility period is typically 1 year
This is an example of my starting data:
ID SpanStart SpanEnd Note 169103 4/10/2016 7/31/2016 Example of a client with a void in coverage 169103 9/1/2016 4/30/2017 169316 4/10/2016 7/31/2016 Example of a client with overlapping coverage 169316 7/1/2016 4/30/2017
This is an example of where I need to end up (this would then be compared against our local data to find conflicts, that part is relatively quick so I am not concerned with it at this point)
ID Ind SD ED 169103 -1 4/10/2016 7/31/2016 169103 0 8/1/2016 8/31/2016 169103 -1 9/1/2016 4/10/2017 169316 -1 4/10/2016 4/11/2016
This is how I'm currently approaching this:
1. From my initial dataset, determine the start and ending eligibility range. (basically the min value of SpanStart and the max value of SpanEnd in the table above)
2. Populate a temporary table with the discrete dates within the eligibility range.
3. Update the temporary table using the eligibility information
4. Update the temporary table with a sequential episode number (this is strictly to facilitate the final step)
PeriodDate Ind Episode 4/10/2016 -1 1 ... 7/31/2016 -1 1 8/1/2016 0 2 ... 8/31/2016 0 2 9/1/2016 -1 3 ... 4/10/2017 -1 3
5. Create an aggregate append query to my 'final' calculated span table and append the results to my table holding the 'correct' spans as indicated in the eligibility file
ID Ind SD ED 169103 -1 4/10/2016 7/31/2016 169103 0 8/1/2016 8/31/2016 169103 -1 9/1/2016 4/10/2017 169316 -1 4/10/2016 4/11/2016
The part that's chewing up the most time is step 4. I just don't see a good way to make this faster because I need to be able to detect voids in coverage and be able to compare those to existing data.
I'm working on an example database but it may take a while to get a clean version to do just the section I'm trying to economize, but until then, if you have a suggestion for this type of operation I am all ears.