Methodology of the Home Office Studies

The following sections describe methodological details on data collection, necessary data cleaning and weighting of the study series “Prevalence and acceptance of home office in Germany”.

The survey data was collected in eight waves in March and June 2020, February, May and September/October 2021, and March, June and October 2022. An overview of all eight waves can be found here.

Survey wave 9

The ninth survey wave on the topic of working from home by the Bavarian Research Institute for Digital Transformation (bidt) was conducted by the market research institute DCore between September 12 and 22, 2023. As part of this study, DCore surveyed German employed internet users aged 18 and over online within the “opt-in” Talk Online Panel. The data collection took into account population-representative quotas for gender, age, and federal state. After cleaning the data set of “quick responders” and respondents with inconsistent or unrealistic response behavior, 994 observations remained in the data set. This sample was then weighted for employed internet users according to gender, age, and federal state (b4p 2022).

Compared to the eight previous waves, there was a fundamental change in data collection in the ninth survey. The previous surveys were conducted using Google Surveys, but this service stopped existing in November 2022.

Survey waves 1-8

Data collection

The primary data used here were collected in eight cross-sectional surveys by the Bavarian Research Institute for Digital Transformation (bidt) using Google Surveys. The first wave of surveys took place at the beginning of the first lockdown in the period from 27 to 29 March 2020, the second wave after a longer phase of gradual relaxations from 12 to 15 June 2020 and the third shortly after a new occupational health and safety regulation came into force in the period from 4 to 8 February 2021. The fourth survey began shortly after the strictest home office regulations came into force and lasted from 6 to 28 May 2021. The fifth wave of surveys took place from 20 September to 10 October 2021. The sixth took place from 3 to 20 March 2022 and thus shortly before the expiry of the home office obligation in Germany. The seventh survey was conducted from 22 to 29 June 2022. The eighth survey wave took place from 14 to 28 October against the backdrop of the energy crisis. The field time required to reach the number of cases was of varying duration. Google Surveys could not give any concrete reasons for this. However, one presumed effect of this longer survey duration is that in the two surveys concerned, the “online population” was better represented by the sample regionally and in terms of age distribution. Therefore, the weighting factors (see below) are significantly smaller in the corresponding survey waves. All eight questionnaires include seven questions on the topic of home office with different emphases per wave and three questions on socio-demographic characteristics (job position, age and gender).

Google Surveys has a network of websites on which selected visitors are presented with the questionnaire. The questions appear as a so-called survey wall, where website visitors answer the questionnaire to gain access to additional content. During the field phase of the survey, the sample was stratified in a way that, in the course of the survey, underrepresented population groups with regard to the distribution by region, age and gender were presented with a higher probability. In comparison, overrepresented population groups were given a lower probability. For a detailed account, see Google (2018).

This type of sampling (“river sampling”) can neither be assigned to pure random sampling nor pre-recruited online panels. Unlike a purely random sample in the classic sense, no exact population can be defined. It follows that no selection probability of an element of this sample can be determined. Nevertheless, comparisons with regard to demographic characteristics, among others, show substiantial similarities with internet surveys conducted elsewhere (cf. Pew Research Center 2012).

Data cleansing

Around 2,500 (waves 1-3 & 5-8) and 3,000 (wave 4) complete responses were originally collected using Google Surveys. Online surveys are usually so-called self-administered surveys. Here, the interview situation is not subject to any control – in contrast to oral surveys conducted in person or by telephone. This means that more intensive data checking and cleaning is necessary, e.g., excluding answers from nonseroius respondents and “quick fillers”. In the first step, respondents who had completed the online questionnaire in an extremely short time were identified. The lowest percentile of response time was set as the exclusion criterion in all four surveys. It can be assumed that attentive reading and answering questions can hardly occur below this threshold. Analyses of unusual partial results of these “quick fillers” also support this procedure data cleaning is appropriate.

In the second step, cases with contradictory information on their occupation or home office use during the survey were also identified. Such inconsistencies could not be ruled out in advance, as the questionnaires in Google Surveys do not allow for more complex filtering. Thus, corresponding question filters could only be applied “ex-post” to inconsistent answers. For some respondents, several reasons for exclusion were used simultaneously, so that ultimately around 2,000 cases in the first three waves, 2,350 cases in the fourth wave and 1,755 cases in the fifth wave, around 1,950 cases in the sixth wave, around 1,710 cases in the seventh wave and around 1,700 cases in the eighth wave were included in the weighting of the data described below.

Weighting

It is true for most samples in social sciences that non-respondence is usually not randomly distributed, e.g. because some groups of people can be reached better or worse by a certain type of survey. This leads to subpopulations not being represented in the sample according to the basic population. Systematic deviations also occur in the present survey despite the sample stratification during the field phase. To ensure that the observations obtained nevertheless reflect the structure of adult internet users in Germany as appropiate as possible, a redressement weighting was carried out with regard to the combined age and gender structure of the online population as well as the regional distribution of the total population. The target structures used were taken from the respective official statistics(Destatis 2019; Destatis 2020a, Destatis 2020b, Destatis 2021). The Iterative Proportinal Fitting procedure using IPFWEIGHT(Bergmann 2011) in Stata 16 was used. The weighting factors lie between 0.5 and 5.2 across all survey waves and, thus, are in a range generally regarded as uncritical (cf. DeBell et al. 2009, 31, quoted from Bergmann 2011).

On the one hand, grouped age and gender were “estimated” in Google Surveys based on the browsing behaviour of the participants (cf. Google 2018), and, on the other hand, were also collected directly as part of the survey, both pieces of information could be suitably combined. The weighting was, therefore, primarily based on the self-report and, in the case of missing data, on the Google estimate, if available. When specifying gender, the category “diverse” could be selected in addition to female and male in the self-report. Since official data on internet use is currently only available for male and female persons, the presumed gender determined by Google was used for weighting in these cases. This left only a few cases for which no age and gender information relevant to weighting was available.

The regional allocation was carried out by Google using the IP address, mostly at the federal state level and, in the case of some large cities, also at the municipality level. Due to the low number of cases in some cells, the federal states were combined into seven Nielsen areas for the regional weighting. In very few cases, no regional information could be recorded. All respondents to whom no values could be assigned for the weighting-relevant variables received a factor of 1.0 in the corresponding weighting step.

Basis for analysis

Only employed respondents were considered for the analyses described here. According to the self-report, 1,579 employed persons were collected in the first wave, 1,478 in the second and 1,564 in the third after adjusting the data sets. In the fourth wave, despite a higher starting point, 1,559 professionals were left in the sample, similar to the number in the other survey waves. The reason for this is the better representation of older population groups in the sample (see above), most of whom are no longer employed. In the fifth wave, with a lower starting point but similar sampling effects as in the fourth wave, there were 1,126 employed persons in the adjusted data set. In the sixth wave, there were again more, with 1,307 employed persons, and in the seventh survey wave, with 1,121 employed persons. However, in the eighth wave of the survey, there were slightly fewer, with 970 professionals.