I had posted previously about the 14-day Covid-19 case rate calculated by San Diego County and how I couldn’t get numbers anywhere close to theirs.
It turns out the discrepancy between the values I calculated and the county’s is due to the fact that the county isn’t using the number of positive cases reported each day for their calculation. They use the date of illness onset.
And until recently, they only provided that data in a PDF chart that made it impossible to know the exact numbers they were using in their calculation. But last week, they added the underlying dataset to the SANDAG Open Data Portal.
I downloaded the dataset and created some visualizations from the data.
For reference, here is the chart of new cases by date reported:
And here is the chart of cases by date of illness onset:
You can see that the charts are fairly different. In the chart of cases by illness onset, it is clear that cases are declining. However, due to delays in test results, positive case rates are reported days, maybe longer, after the tests were taken, so cases by reported date are not showing the decline in new cases.
The county provides a good explanation of the data by illness onset dataset and how they calculate the 14-day case rate. Basically, they try to determine the date that someone first developed symptoms and assign the positive test result to that date, updating the data as they get new information.
For example, on one day, a person’s episode date may be the report date; the next day, when additional records are received, the specimen collection date; the next day, once an interview has been completed, the illness onset date. Because of this and because of the time it takes for people to become ill, be tested, get results, be reported, and be interviewed, the number of cases on any given episode date will shift over time, particularly for the most recent two weeks.Metadata for COVID19_Cases_by_Episode_Date dataset
The delay in collecting the information for each positive case in order to update the onset date as accurately as possible is the reason the county uses a 3-day lag in calculating the 14-day case rate. But the case numbers for a specific “episode date” may change even more than 3 days after the fact. And due to an incubation period of up to 14 days, this 14-day case rate is still an approximation for what is currently going on with the case rate in the county.
Using the county data, I have charted the 14-day case rate and the number of cases by illness onset. Our goal to stay off the state’s watchlist is under 100 per 100,000. (Governor Newsom just announced at his briefing today that they expect San Diego County to come off the list tomorrow.) You can see how the 14-day case rate basically smooths out the data for the date of illness onset, which can vary from day to day.