I like to look at new cases by age group to look for trends. For example, in mid-late June, cases among 20-somethings started to skyrocket. But cases also rose substantially for kids and other age groups.
But I was still shocked to see the current new case data for the 10-19 age group.
Starting in September cases for this age group have risen substantially. Since this age group goes through 19, I wonder if the rise in cases is driven by college students who have returned to campus. We know that SDSU has had several hundred new cases since late August (although driven by behavior outside of classes, since most classes resumed online).
To see new cases for all age groups since April 1, 2020, see http://cv19.shulok.com/2020/09/09/case-rate/.
In late August, California announced a new state-wide tier system called Blueprint for a Safer Economy to standardize how counties decide what and when businesses can reopen. Previously, we had been using a 14-day case rate with a goal of under 100 per 100k to get off the state’s monitoring list, which required the closure of certain indoor operations.
This binary system (on or off the state monitoring list) has been replaced with are now 4 tiers based on the average daily case rate (per 100k) over a 7 day period and testing positivity rate. Exceeding either threshold places a county in a more restrictive tier. Counties can only move up one tier at a time and must wait 3 weeks before moving to a new tier, even if our case rate and testing positivity rates would place us in a less restrictive tier.
You can find more details on these new guidelines on the California Department of Public Health website: https://www.cdph.ca.gov/Programs/CID/DCDC/Pages/COVID-19/COVID19CountyMonitoringOverview.aspx
Most counties in the state are in the most restrictive tier, with widespread community transmission. The new metric for moving from Tier 1 (purple) to Tier 2 (red) is essentially the same as the previous metric; that is, getting to under 7 per 100k for a daily average case rate is comparable to under 100 per 100k over a 2 week period (it corresponds to 98 per 100k in a 14-day range). So when this new system was introduced, shortly after San Diego County came off the state monitoring list, we landed in the red tier. (Our positive test rates are well under the threshold for that tier.)
The state provides specific guidelines for businesses on if and how they can operate depending on which tier the county falls within.
How it works
Like the previous 14-day metric, the new 7-day daily average case rate is calculated using the date of illness onset data, not the test reporting date.
Detailed instructions how how the number is calculated are available on the metadata page for the relevant data set, but here is the relevant excerpt:
To calculate San Diego County’s current case rate following the method described by the California Department of Public, do the following:
1. Download the data.https://www.arcgis.com/home/item.html?id=d8268d78e12346ceaf4b7df9f9b69d78
2. Filter for the most recent “update date.”
3. Ignore the seven most recent episode dates.
4. Sum the case counts excluding prison inmate cases of the seven most recent episode dates prior to that and divide by 7.
5. Divide by 3,370,418 (State of California Department of Finance 2020 Population Projection for San Diego County: http://www.dof.ca.gov/Forecasting/Demographics/Projections/).
6. Multiply by 100,000.
When the county first announced this new metric, I set up my Tableau Public file to calculate and display this metric. The county only updates the value once per week, although I’m able to calculate and display it for each day.
I noticed that when they posted the updated case rate yesterday, it was 6.9 per 100k, which just barely keeps us in Tier 2 (or the red tier). My calculations showed we hadn’t been at 6.9 since August 30. That’s when I noticed that the instructions say “Sum the case counts excluding prison inmate cases“. And sure enough, I looked at the dataset and saw they had added a column for the total number of cases by illness onset date excluding prisoners. So I adjusted my calculations accordingly.
In Tableau, I created a calculated field that sums up 7 days, starting with 7 days ago.
Then I take that value, divide by 7, divide by 3,370,418, then multiply by 100,000, per the instructions from the county.
And then I chart it over time.
You can see that I am getting 7.29 using the data through September 7, while the county reported 6.9.
To be fair, I am charting past dates based on the most recent Update Date, whereas the county does not go back and recalculate previous case rates when the data for illness onset is updated for past dates. The 7-day lag for calculating the case rate is designed to account for the delay in illness onset data reporting. However, the most recent case rate I am calculating, using the September 8th update date values should match the county’s value for September 8th.
I looked again at the state’s page on their tier framework and noticed that they use a case rate adjustment factor based on testing rates. Since San Diego County is reporting 133.2 tests per 100k people, and the state’s average testing rate is 217.9 per 100k, we have been assigned an adjustment factor of 1.155.
To explore the data county by county, the page linked above includes a link to an Excel spreadsheet tracking data for the tier framework, which you can also access here: https://www.cdph.ca.gov/Programs/CID/DCDC/CDPH%20Document%20Library/COVID-19/Blueprint_Data_Chart_090820.xlsx
Their spreadsheet shows a rate of 6.9 for San Diego, the same as the county reported. And with the adjustment factor, it is a 7.9. However, we are still listed as in Tier 2.
So I’m still unclear where the county’s case rate of 6.9 per 100k comes from. If you see an error in my calculation, please let me know.
For my current charts of cases by illness onset and the case rate, see http://cv19.shulok.com/2020/09/09/case-rate/.
For more information on how the 7-day daily average case rate is generated, see https://www.arcgis.com/home/item.html?id=d8268d78e12346ceaf4b7df9f9b69d78.
I had posted previously about the 14-day Covid-19 case rate calculated by San Diego County and how I couldn’t get numbers anywhere close to theirs.
It turns out the discrepancy between the values I calculated and the county’s is due to the fact that the county isn’t using the number of positive cases reported each day for their calculation. They use the date of illness onset.
And until recently, they only provided that data in a PDF chart that made it impossible to know the exact numbers they were using in their calculation. But last week, they added the underlying dataset to the SANDAG Open Data Portal.
I downloaded the dataset and created some visualizations from the data.
For reference, here is the chart of new cases by date reported:
And here is the chart of cases by date of illness onset:
You can see that the charts are fairly different. In the chart of cases by illness onset, it is clear that cases are declining. However, due to delays in test results, positive case rates are reported days, maybe longer, after the tests were taken, so cases by reported date are not showing the decline in new cases.
The county provides a good explanation of the data by illness onset dataset and how they calculate the 14-day case rate. Basically, they try to determine the date that someone first developed symptoms and assign the positive test result to that date, updating the data as they get new information.
For example, on one day, a person’s episode date may be the report date; the next day, when additional records are received, the specimen collection date; the next day, once an interview has been completed, the illness onset date. Because of this and because of the time it takes for people to become ill, be tested, get results, be reported, and be interviewed, the number of cases on any given episode date will shift over time, particularly for the most recent two weeks.Metadata for COVID19_Cases_by_Episode_Date dataset
The delay in collecting the information for each positive case in order to update the onset date as accurately as possible is the reason the county uses a 3-day lag in calculating the 14-day case rate. But the case numbers for a specific “episode date” may change even more than 3 days after the fact. And due to an incubation period of up to 14 days, this 14-day case rate is still an approximation for what is currently going on with the case rate in the county.
Using the county data, I have charted the 14-day case rate and the number of cases by illness onset. Our goal to stay off the state’s watchlist is under 100 per 100,000. (Governor Newsom just announced at his briefing today that they expect San Diego County to come off the list tomorrow.) You can see how the 14-day case rate basically smooths out the data for the date of illness onset, which can vary from day to day.
I was watching the San Diego County media briefing yesterday (video) and when Dr. Wooten showed the latest slide related to the county’s triggers (full slide deck), I was surprised to see that our 14-day case rate was 105.7 (per 100,000). This is the metric that the state used to place the county on the County Monitoring List. We need to get that under 100 per 100,000 to get off the list, which will allow businesses and schools to reopen.
In late July, that 14-day case rate was in the 150s. And Dr. Wooten said at several media briefings that in order for San Diego County to get off the monitoring list (i.e. to get the number under 100 per 100k), we would need to have fewer than 240 new cases per day for 14 days.
But we haven’t had fewer than 240 cases since June 22. Our daily case numbers have been trending down, but have generally been bouncing around between the high 200s and low 600s for the last 2 weeks. So it didn’t make sense that we were already down to 105.7, but I wasn’t sure how it was calculated and filed that thought away for the duration of the briefing.
At the end of the county’s briefing, Tarryn Mento from KPBS asked a question directly about this calculation and pointed out that it didn’t match her own calculations or what the state had posted for San Diego County on their website (her coverage of the briefing is here, but only briefly mentions the glaring discrepancy in case rates).
Dr. Wooten explained how they calculate the number. She said (and I’ll include a transcript from the video at the end of this blog post) that they use a 3 day lag time, meaning they add up the number of cases for the 14-day period starting 3 days ago. Then they divide that by the population and multiply by 100,000. She said they use 3.37M for the calculation. (The state lists the population of San Diego County as 3,370,418.)
Curious, I went to the state’s website after Ms. Mento’s question. At that time, the state showed a 14-day case rate of about 177. That is quite a difference from 105.7.
To make sure I understood the calculation, I used the 240 number that Dr. Wooten has repeated as the number we’d need to be under for 14 days to get off the monitoring list.
240 cases x 14 days = 3,360 3,360 cases ÷ 3,370,418 people in the county = 0.0009969 0.0009969 x 100,000 people = 99.69
So 240 cases per day for 14 days would put us just under 100 cases per 100,000 people. Ok, this makes sense so far.
Today, I took the county’s COVID 19 stats from the open data portal, which at the time I downloaded it only went through August 3. I manually added the total reported new cases (348) for August 4. And I also noticed that for August 1, they listed 256 new cases, but the Percent Positive chart provided by the county and the state data (which I confirmed in the state’s dataset) show 306 cases reported that day. So I updated that in my downloaded dataset.
Then I calculated the 14 day case rate in my own spreadsheet, first by simply adding up the new cases from the most recent 14 days (i.e. not using the 3 day lag) and got 5757, which when divided by the population and multiplied by 100,000 gave me 170.8 for August 4th. (I used formulas in my spreadsheet so I could calculate the values back to July 6, when the county officially changed the first trigger on our Triggers dashboard to use the 14-day case rate).
The screenshot below which I took today (Aug 6, 2020) shows that my numbers match the state’s calculation exactly.
So what is going on? Dr. Wooten mentioned a 3 day lag, but since the last few days have been lower than before, I wouldn’t expect a lower case rate if I started 3 days ago. But I calculated it anyway. And as expected, the rate is higher – 183.98 per 100,000 for August 4.
I also saw that calculations for previous days were over 200. But I never saw a 14-day case rate on the Triggers Dashboard higher than 158.5.
Although the dashboard only displays the current metrics, the SANDAG Open Data Portal has a dataset of the COVID 19 Triggers, so I downloaded that and copied the values for Trigger 1 into my spreadsheet to compare. The numbers from the Triggers Dashboard are so far off from the state’s values and even further off from what it would be using the calculation described by Dr. Wooten at the press briefing yesterday, I’m at a complete loss to understand this discrepancy.
So in my spreadsheet, I added two more columns. One is the number of cases that the 14 day total would need to be to get the county’s 14-day case rate, and one that is the difference in number of cases between that and what the 14-day total (using the 3 day lag calculation) would actually be. The numbers of cases are off by thousands.
I’ve posted my spreadsheet here.
I went to the Triggers Dashboard this evening, hoping it would be cleared up, and this is now what it shows:
I still don’t know where the 110.1 rate comes from and I don’t know how they arrived at the 112.4 case rate for the state.
The only possible clue I see is that it says in the trigger explanation “measured using the date of illness onset with a 3-day lag”.
Illness onset may be different than the cases reported each day. Test results have been repeatedly delayed. It seems like nearly every column in the COVID 19 Percent Positive chart has an asterisk to indicate this. The county does provide another chart showing Covid cases by date of illness onset, but it is very difficult to establish the exact numbers each day and these values are not, as far as I can see, provided on the county’s open data portal.
The county really needs to clear up this discrepancy. People are frustrated – both those who think the state is going to far in trying to slow the spread and those who think we still aren’t doing enough. Transparency is essential. And between this and the fact that Gavin Newsom has not adequately addressed the issue of the missing test results from commercial labs in the state, it is becoming difficult to have confidence that we are getting an accurate count of new cases in the county.
The following is the transcript from the relevant portion of the media briefing video.
Tarryn Mento, KPBS:
Dr. Wooten, I apologize, but I’m trying to understand a little bit more about how you get to the case rate of 105 that you are reporting today. Can you clarify the total amount of cases over the 14 day period that you’re looking at? Because if you look back, it’s about 5700 cases and dividing that by the population and then using the formula you used, I’m not getting 105.7.
Dr. Wilma Wooten:
So let me explain again. First of all, you have to go back 3 days. The 3 day lag. And then from that point, back 14 days, it’s the number of cases for each of those days. So you have to be very clear on the dates that you’re using. And you add that, divided by the population. And you have to be very clear on what population you’re using. The state is using 3.37 [million]. And then you multiply that by 100,000. And that’s the way it’s calculated.
Can you provide the total number of cases that you are including in that 14 day period? Because also the number you reported today and on the state’s website, for case rate, are also different. I checked it earlier today, it was 134.2 I believe that the state posted.
And the state’s going to be different. The Governor’s website is different. Again, it’s very clear on what days you use. So exactly what I just described to you. I don’t have the number of cases for each day for 14 days past the 3 day lag. We can certainly send it to you via email to show how we calculate that.
I guess then my question is if our calculation and the state’s is different, but the case rate matters according to the state, why are we following our calculation and not the state? Because it’s the state that will allow us to get off the monitoring list.
Well for the case rate, our case rate, and if you saw the case rate today, is that for today’s, with our 398 or whatever the number. It wouldn’t be with those numbers. It would be a 3 day lag. But again, I can send you the information and an explanation of why our numbers, uh. Our numbers are exactly how I have just described them. Again, I don’t know what dates the state has used.
I guess then, my final question is just going to be why are we not aligning with the state’s calculation if the state is setting the metric that allows us to get off the monitoring list.
We are aligning with the state’s, but it depends on what point in time. Again, I have not looked at the state’s website today. But again, we can show you and provide information as to how we’ve calculated the numbers and how it aligns with the state. We are using – and the state has a webpage where you go and calculate the numbers and it’s right there on the website – we are using the same calculations.
According to the CDC report, an overnight camp in Georgia was associated with an outbreak of SARS-CoV-2 (the name of the virus that causes COVID-19).
On the evening of June 22, 2020, a teenage staff member of the camp developed chills and left the camp on June 23. After a positive test result from the staff member was reported to the camp on June 24, the camp began sending campers home and the camp was closed June 27, which was the scheduled last day of camp.
Staff and trainees arrived at the camp June 17 and campers arrived June 21.
The camp adhered to all measures in Georgia’s Executive Order relating to overnight camps which were allowed to open after May 31, and many, but not all, of the CDC’s Suggestions for Youth and Summer Camps to reduce risk for SARS-CoV-2 transmission. This included:
- All trainees, staff and campers had to provide documentation of a negative SARS-CoV-2 test within 12 days prior to arriving at camp
- Cloth masks were required for staff
Policies not in line with CDC’s guidelines:
- No cloth masks for campers
- Windows and doors were not left open for increased ventilation.
Camp activities included daily singing and cheering and although campers were cohorted by cabin, it is unknown if individuals maintained physical distance from members outside their cohort.
The CDC looked at SARS-CoV-2 test results for Georgia residents who had been at the camp (27 campers were from out of state). They looked at any positive tests that were reported to the Georgia Department of Health between when an attendee arrived at the camp and 14 days after they left the camp.
Of 597 Georgia attendees, 260, or 44% tested positive for SARS-CoV-2, including 49% of campers.
Breakdown by age group:
- Ages 6-10: 51 out of 100 (51% positive)
- Ages 11-17: 180 out of 409 (44% positive)
- Ages 18-21: 27 out of 81 (33% positive)
- Ages 22-59: 2 out of 7 (29% positive)
For 136 of those positive cases, the CDC had symptom information. Of those, 36, or 26%, had no symptoms. The other 100 (74%) most commonly reported fever, headache and sore throat.
Also of note, trainees who were at the camp the least number of days only had a 19% positivity rate, while staff members, who were there the longest had a 56% positivity rate.
Because negative test results were not reliably reported and no results were available for some attendees, positivity rates were calculated by dividing the number of positive tests by the number of Georgia resident attendees. This likely resulted in an undercount of positive cases.
Since COVID-19 was highly prevalent in Georgia in June and July, we don’t know for certain that transmission happened at camp.
Case number in San Diego County are on the rise, but so are testing numbers.
Testing is useful in tracking the spread of Covid-19 in San Diego County, but case numbers alone give us an incomplete picture.
How do we know if Covid-19 is spreading faster, or if we are just detecting more cases?
It’s easy to understand that more testing will find more cases. At the start of the outbreaks in San Diego, with tests in short supply, testing was limited to people with certain risk factors, including travel to areas with widespread outbreaks or exposure to someone who had tested positive. As you would expect, a relatively high percentage of those early tests came back positive.
As tests became more widely available, the county tested more people. And more people with Covid-19 were identified. But as testing rates went up, the percentage of positive cases dropped.
Evidence suggests that many people who acquire Covid-19 have no symptoms, or symptoms so mild, they do not realize they have Covid-19. So maybe we are just catching those folks now that we are testing more people.
So if case numbers go up when testing numbers go up, can we really know if cases are rising more quickly?
YES! We can.
This is why the county keeps track of how many tests come back positive. The percentage of positive tests is a key indicator in how fast the virus is spreading.
There are 3 charts below. The top one shows the number of Covid-19 tests reported each day. The middle chart shows the number of new cases reported each day. Note that each chart axis has a different scale, relative to the data it is displaying. This makes it easier to see the trends for each set of data. The two lines on the bottom chart use the same scale.
See here for the latest charts: Positive Case Rate
On average, testing has been steadily rising since mid April. But new cases were generally hovering between 100-200 cases until mid June.
On the bottom chart, the blue line plots the percentage of daily reported tests that were positive. The orange line plots the county’s calculated 14 day rolling average of positive cases. Since the percent positive jumps up and down day to day, the rolling average helps to see the general trend balanced out from the results in a 2 week period.
As testing levels rose, the percent of tests that came back positive came down. And for a period in May and early June, the percent of positive cases hovered around 2-3%. The fact that the percent positive was relatively stable despite more positive cases implies that rising cases were a result of more testing.
However, since June, although testing rates are still higher, the percent that come back positive is rising. The daily reported percent positive has not been below 3% since June 22. At first, it could have just been a variation in testing. Which is why the orange line is important. The rolling average of the percent of tests that come back positive is trending up, even when taking 2 weeks of testing results into account.
Case numbers rise when testing levels rise. But when the percentage of tests that are positive is rising, we know that the virus is spreading faster than before.
In debates on reopening schools, I have seen the repeated claim that although cases are up, hospitalizations and deaths were down.
Hospitalizations and deaths are called lagging indicators because they do not rise, or fall, simultaneously with cases. People who require hospitalization may not be admitted to the hospital for days or weeks after developing symptoms. And deaths may come weeks after the onset of symptoms.
So when cases began to rise markedly starting mid-June, hospitalizations were not obviously trending upwards and deaths were at a low. Some thought that since the rise in cases was being driven by younger folks who have had lower hospitalization and death rates, we wouldn’t see much, if any, uptick in the number of hospitalizations and deaths.
But looking at graphs of new cases, new hospitalizations and new deaths in parallel, the data show new peaks in hospitalizations and deaths. Thankfully they are not proportional to the rise in the number of cases. And this likely is because the majority of cases are among people in lower age groups.
But hospitalizations are rising for all age groups. The number of kids ages 0-9 that are hospitalized is only 28 as of July 29, but that is double the number from June 26. And in the last 10 days, people ages 20-29 have been averaging nearly 3 new hospitalizations per day.
We are also seeing in hot spots around the country that younger people are passing Covid-19 to older, more vulnerable populations in their households. We need to stay vigilant and practice public health precautions.
These links are available on other pages, but here is a quick list.
- Coronavirus in San Diego County
- COVID-19 Dashboard (mobile version)
- Triggers Dashboard (mobile version)
- Covid-19 datasets on the SANDAG Open Data Portal
Historical county data
- Summaries of County of San Diego COVID-19 Hospitalizations by Demographics
- Summaries of County of San Diego COVID-19 Deaths by Demographics
- Summaries of County of San Diego COVID-19 Cases by Race/Ethnicity