Natalie E. Dean, PhD Assistant Professor of Biostatistics at @UF specializing in emerging infectious diseases and vaccine study design. Apr. 19, 2020

1. It's easier to poke holes in a study than to run a study yourself. We should expect many more SARS-CoV-2 serosurveys in our future. So in the spirit of promoting good science, here are my thoughts on best practices for the design of serosurveys.

2. The following is informed by WHO guidance on seroepidemiology, discussions with colleagues, observation of what is out there so far, and personal experience with dengue serosurveys. I will focus on broad scientific aims, populations, and recruitment. 

3. First, it is critical to remember that serosurveys are population-level surveys. They are intended to inform our broader understanding of the disease, not to tell individuals whether or not they have been infected. The test is still too unreliable for the latter.

4. Primary scientific aims:
Determine the proportion of the population that has been infected. This can be used to address questions about "herd immunity." It can also be used to determine proportion of infections detected (is it 1 out of 50 or 1 out of 10?).

5. The above is also critical for inferring the infection fatality ratio. Serosurveys let us estimate the full denominator, rather than just PCR-confirmed cases. Best estimates of IFR seem to be hovering around 0.5 or 0.6%, although more data are emerging.

6. Another major scientific goal is to determine age-specific infection probabilities and, thus, age-specific IFR. This can be achieved if there is good coverage across age groups. Finally, we are interested in the fraction of infections that are asymptomatic.

7. When targeting a geographic region, it may make logistical sense to select a few smaller sub-areas to study. For example, we might select 4-5 areas ranging from most hard-hit (assessed by cases and deaths) to least hard-hit, to establish a range.

8. The best designs are household-based with random selection of households. All persons in the household are invited to participate to get a broad range of ages. Including entire households also allows us to assess transmissibility within households.

9. Household designs are less prone to bias, though not immune as not everyone will consent to participate or be home. For example, essential workers may be harder to recruit than those who are sheltering in place. These studies also take more time to set up and run.

10. In Miami, they used random digit dialing to recruit a representative sample of participants. Participants then agree to visit a drive-thru testing site. Bias can still occur if people do not consent or do not have a car, but it's a creative approach. 

11. Volunteer surveys are a type of convenience sample. In Santa Clara, they used targeted Facebook ads to recruit participants to visit drive-thru test sites. Quotas were established per zipcode to limit over-representation. 

12. NIH's survey in Bethesda is also volunteer-based. Questionnaire data is collected over the phone. Participants who are NIH employees are tested on site, and others are provided with a kit for home-based blood draw. 

13. An obvious concern with volunteer surveys is that people will preferentially enroll because they think they had COVID and want confirmation. Notably, NIH lists prior COVID or current symptoms as an exclusion criteria. But WHO guidance advises against excluding known cases.

14. For volunteer surveys, consent bias is less of a concern if you can achieve high coverage. In San Miguel County, CO, they have processed 2500 tests for roughly 8000 residents. 

15. Another convenience sample is blood donors. Very convenient, though you will still need to collect questionnaire data. Also, people are probably less likely to donate when they are ill. But WHO notes that blood donors are a very eager, easy to follow population.

16. So quick-and-dirty versus slow-and-rigorous? I think both have value. The rapid though potentially biased volunteer surveys establish an order of magnitude for seroprevalence (1%, 10%, 50%?) while we wait for more reliable household data to emerge.

17. Studies collect questionnaire data. In addition to basic demographic data, occupation, travel history, known exposure to a case, and history of clinical symptoms in the X months since transmission started are all important.

18. We can use this data to identify risk factors for infection by comparing exposures of infected and non-infected individuals. A nested case-control study could also be conducted to evaluate risk factors.

19. Finally, surveys are often one-time cross-sectional studies. But in Miami, they are recruiting 750 new participants each week (repeated cross-sectional studies). Longitudinal studies, where the same people are sampled every 3+ weeks, are especially valuable.

20. These are some of the key design features for serosurveys. I have not gone into sensitivity/specificity of the test or data analysis, as the thread is long. But I welcome any suggestions of other considerations, and I will append my favorites to the end of this thread. END.



