Definition and delimitation
Today, buzzwords like digitalisation, big data, algorithms, etc. are on everyone’s lips. With all of these topics, we also talk about data and their different contexts of origin or methods of processing. The term data, which is actually so fundamental, should therefore be clarified and clearly defined. As is so often the case, however, this is not the case: there are mainly general definitions that contain a certain vagueness. In addition, there is a multitude of different (scientific) disciplines that have quite different ideas and limitations to grasp the term. If you want to explain what data are, you can think of them as symbols that contain information. The symbol stands for the fact that the information contains a form of action-relevant knowledge that the recipients of the data have to interpret. Therefore, information that is packaged and transmitted in symbols and is in the form of data is always a representation of reality.
Application and examples
Measured values that are generated during an experiment are a good example to understand these relationships. Measured value data are usually represented by number symbols and can thus be processed in many ways: Data can be saved or deleted, copied and summarised. Various methods of data analysis now offer multiple ways of extracting and evaluating the information thus transmitted from the measurement data. In addition, new information can be created by combining the data with other data.
However, the symbol forms in the data and the information they contain are not always as uniformly structured as the measurement series of an experiment. Every written (and even every spoken) word and also pictures can be symbols that contain information. Through digitisation, the available amount of data has “exploded” and the availability of data is increasing extremely: digital data, for example, can be copied at will and can accordingly be distributed in larger quantities to an unmanageable number of recipients. The “demand” for data and the information it contains, or the explanations derived from it for the most diverse contexts, is constantly increasing.
Research
In this respect, the area of data protection is gaining in importance: How can we ensure that data does not fall into the wrong hands? Who is allowed to do what with the information we disclose about ourselves – especially online? Differential privacy promises a possible solution: fundamental to this method of data processing is the assumption that the personal information contained in data is protected when no (confidential) information of individuals can be revealed. To make a data analysis “differential private”, one can, for example, mathematically change the analysis algorithm or the data source. The method promises to deliver comprehensible and exact evaluation results and to make it impossible to relate characteristics and information back to individuals. Although personal data is evaluated for analysis purposes, the subsequent assignment of characteristics or analysis results to an individual is not possible. A data set made “differential private” preserves the privacy of the individual when it is published and at the same time offers a comprehensible data basis.
The project “Differential Privacy: A New Approach to Social Big Data” investigates how valid statistical conclusions can be drawn without violating individual privacy and develops a software environment for implementing differential privacy.
Criticism and problems
The collection, storage and analysis of digital data in many areas of social life (so-called “datafication”) carries with it the hope of being able to better grasp and understand the whole world. The idea that one can understand and explain all processes – from the global climate to individual human behaviour – if one has only collected and analysed enough data about them, immanently contains the desire to make complex relationships predictable and ultimately controllable. For many, this is a frightening idea: personal data and the private information associated with it (addresses, telephone numbers, political views or medical records) might not be something one would like to see multiplied and analysed at will.
Wanting to explain the whole world through data is to be judged as critical per se: Provided we understand data as symbols that carry information, there is always a gap between reality and data that cannot be closed. Philosophically, this can be expressed as the difference between signifier and signified [1,2]: no symbol contains the original information, but always “only” a reference to reality. Moreover, reality is an evolving, dynamic process. In contrast, the image of reality interpreted from data is a fixed “snapshot”. Together with the original gap mentioned above, this makes the information content of data constantly decay: In principle, all data is already outdated at the moment it is collected. Therefore, data cannot be considered objective in itself, but is always subject to interpretation by the recipients. In addition, data must always be considered in the context of its (temporal) origin. In general, it must be questioned how the data came into being, through which process information was translated into symbols and how the data was further processed? The information read from the data cannot simply be set as objective: What must be taken into account here is the derivation of actions or knowledge that arises through the subjective interpretation of the recipients. Therefore, in addition to data processing and data analysis, a third scientific discipline is increasingly developing that deals with the described gap and its consequences: critical data science [3].
Sources
[1] Barthes, Roland (1979). Elemente der Semiologie. Syndikat Verlag, Frankfurt a. Main
[2] Saussure, Ferdinand de (1967). Grundfragen der allgemeinen Sprachwissenschaft. 2. Aufl., Walter deGruyter Verlag, Berlin.
[3] Iliadis, Andrew; Russo, Federica (2016). Critical Data Studies: An Introduction. In: Big Data and Society. Volume 3, Issue 2. SAGE Publishing, Los Angeles/CA.