The following sections explain methodological details on data collection, data cleaning and weighting for the study “Increased Digitalisation as a Result of the Coronavirus? Home office in Bavaria in February 2021“.
The Bavarian Research Institute for Digital Transformation (bidt) collected the primary data using Google Surveys. The third wave of surveys took place shortly after a new occupational health and safety regulation came into force from 5 to 7 February 2021 in Bavaria and from 4 to 8 February 2021 throughout Germany. The questionnaires for Bavaria and Germany were identical and included seven questions, each on working from home and three on socio-demographic characteristics (job position, age and gender).
Google Surveys has a network of websites on which selected visitors are presented with the questionnaire. The questions appear as a so-called survey wall, where website visitors answer the questionnaire to gain access to additional content. Already during the field phase of the survey, the sample was stratified in a way that, in the course of the survey, underrepresented population groups with regard to region, age and gender were presented with a higher probability, while overrepresented population groups were presented with a lower probability. For more details, see Google (2018). The vast majority of the websites on which the questionnaire was displayed can be assigned to the category “News”. The categories “Arts & Entertainment” and “Other” only play a minor role.
This type of sampling (river sampling) can neither be assigned to pure random sampling nor pre-recruited online panels. Unlike a purely random sample, no exact population can be defined. It follows that no selection probability of an element of this sample can be determined. Nevertheless, comparisons with regard to demographic characteristics, among others, show substantial similarities with internet surveys conducted elsewhere (cf. Pew Research Center 2012).
Originally, Google Surveys collected 1,526 complete responses in Bavaria and 2,500 in Germany. The surveys were conducted independently so that the dataset collected across Germany also contains observations from Bavaria. The data cleaning, weighting, and analysis were also carried out separately for both data sets, among other things, due to the sample extraction described above.
Online surveys are usually so-called self-administered surveys. Here, the interview situation is not subject to any control – in contrast to oral surveys conducted in person or by telephone. This means that more intensive data checking and cleaning is necessary, e.g. to exclude answers from nonserious respondents and “quick fillers”. Respondents who had completed the online questionnaire in an extremely short time were identified in the first step. The lowest percentile of the response time was defined as an exclusion criterion in both surveys. It can be assumed that attentive reading and answering questions can hardly occur below this threshold. Analyses of unusual partial results of these “quick fillers” also support this procedure of data cleaning is appropiate.
In the second step, cases that had given contradictory information on their occupation or home office use during the survey were identified. Such inconsistencies could not be ruled out in advance, as the questionnaires in Google Surveys do not allow for more complex filtering. Thus, corresponding question filters could only be applied ex-post to inconsistent answers. For some respondents, several reasons for exclusion were used simultaneously, so that in the end, 1,237 cases for Bavaria and 1,935 cases for Germany were included in the weighting of the data described below.
It is true for most samples in social sciences that non-responcence is usually not randomly distributed, e.g. because some groups of people can be reached better or worse by a certain type of survey. This leads to subpopulations not being represented in the sample according to the population. Systematic deviations also occur in the surveys presented here despite the sample stratification during the field phase. To ensure that the observations obtained nevertheless reflect as appropiate as possible the structure of adult Internet users in Bavaria and Germany, respectively, weightings were applied to the data sets. For the Germany-wide data set, a redressement weighting was carried out with regard to (1) the combined age and gender structure of the online population as well as (2) the regional distribution of the total population. The target structures were taken from the official statistics (Destatis 2020; Destatis 2021). For the Bavarian data set, a redressement weighting was carried out exclusively according to the combined age and gender structure of the online population in Bavaria.
On the one hand, grouped age and gender of Google Surveys were “estimated” based on the browsing behaviour of the participants (cf. Google 2018), and, on the other hand, were also collected directly as part of the survey, both pieces of information could be suitably combined. The weighting was, therefore, primarily based on the self-report and, in the case of missing data, on the Google estimate, if available. When specifying gender, the category “diverse” could be selected in addition to female and male in the self-report. Since official data for internet use is currently only available for male and female persons, the presumed gender determined by Google was used for weighting in these cases. Thus, only a few cases remained in both data sets for which no age and gender information relevant to weighting was available.
The regional allocation was carried out by Google using the IP address, mostly at the federal state level and, in the case of some large cities, also at the municipality level. Due low number of cases in some cells, the federal states were combined into seven Nielsen areas for the regional weighting of the all-German data set. In very few cases, no regional information could be recorded.
All respondents to whom no values could be assigned for the weighting-relevant variables received a factor of 1.0 in the corresponding weighting step.
Basis for analysis
Only employed respondents were considered for the analyses described here. According to the self-report, 1,058 adult working Internet users were collected in the adjusted Bavarian data set and 1,564 in the modified all-German data set.