| Glossary | Technologies | Machine learning

Machine learning

Definition and delimitation

Machine learning is one of the most important subfields of artificial intelligence. A computer program learns from experience in relation to a certain class of tasks if its performance in relation to the task class improves with increasing experience [1]. The result of a learning process is a model (also called a hypothesis) that is created by generalising over properties in the data.

Experience is presented in the form of training data, for example as a list of features (such as age, gender, education level, previous illnesses) or as images or texts. Typical tasks are:

  • the prediction of events, for example, getting a certain disease, or the failure of a machine,
  • the recognition of certain objects or classes, for example, recognising a cat in a picture, or a defect on a workpiece
  • the selection of actions, such as a move in Go or a particular action of a robot.

Performance is determined as an estimate of how many or how large errors the learned model will make when applied to new data (of the same task class).

Three broad classes of machine learning approaches are: supervised learning, unsupervised learning and reinforcement learning. By far the most algorithms and applications are in the area of supervised learning. Here, a function f is approximated from a set of training data D, which originate from a (usually infinite) set X and for which the correct answer C is known and specified, which then outputs an answer for unknown data from X: f: X → C. If C consists of only two values, the answer C is given. If C consists of only two values, it is concept learning (yes, this is a cat/disease K or no, this is not a cat/disease K). If the set C comprises several possibilities (such as different animal species), it is referred to as classification learning. If C is a numerical value (such as size or amount of money), we speak of regression learning. In unsupervised learning, data are grouped according to their similarity. For example, animal species or road signs could be grouped based on characteristics or their visual appearance. Customers could be grouped according to certain characteristics. While the two approaches mentioned above require all data to be available at once before a model is trained, reinforcement learning is based on a continuous learning process in which a model is adapted in relation to the feedback success of its response. For example, in Go, the strategy for selecting appropriate moves is corrected after failure.

Machine learning can be implemented by various types of algorithms. Well-known and widely used are above all: decision tree algorithms, random forests, Bayesian learning, support vector machines and artificial neural networks [2]. The algorithms differ mainly in how the learned models are represented. Decision trees, random forests and Bayesian models belong to the symbolic or interpretable approaches of machine learning. This means that the learned models are in principle readable by humans, similar to conventional software. Support vector machines and neural networks, on the other hand, are non-symbolic, statistical approaches where the learned models are black boxes. The input information is computed in such a complex way that even the model developers themselves cannot understand how exactly a decision is reached.

Currently, deep neural networks are attracting the most attention. Deep learning is the term used to describe various approaches to artificial neural networks that are much more complex than those used in the past [3]these are considerably more complex than the classical neural networks that have been developed since the 1990s. Classical neural networks usually consist of three layers of artificial neurons that are connected forward from the input layer to the output layer. Many deep learning architectures enable so-called end-to-end learning, i.e. learning directly from raw data. In contrast, most classical learning methods, including classical neural networks, require features as input. In many areas, data is available directly in the form of features – as a table. However, if one wants to learn from images, texts or time series data, classical methods first have to extract features from the raw data. This requirement represented the bottleneck for the application of machine learning, which could be overcome by deep learning approaches such as Convolutional Neural Networks.


Machine learning has been one of the central areas of artificial intelligence research since its inception. The first documented learning programme was a programme for learning the strategy game of checkers and was realised by Arthur Samuel (an employee at IBM) in 1952. In 1960, Donald Michie – the father of artificial intelligence in the UK – implemented a first approach to reinforcement learning, the Machine Educable Noughts And Crosses Engine MENACE. In the 1980s, symbolic approaches such as decision tree methods dominated. Explanation-based learning approaches were developed where background theories could be incorporated into the learning process. AI pioneer Patrick Winston provided foundational work on relational learning to identify structural relationships, such as visual concepts (an archway consists of at least two pillars supporting a roof with a minimum distance between them, see 2010 lecture video) or chemical structures. In the late 1980s, research began on multilayer artificial neural networks trained with backpropagation. Backpropagation as a method for minimising errors was introduced by David Rumelhart and Geoffrey Hinton (who both studied psychology). Already in the 1950s, the simulation of a single neuron was realised with the perceptron. However, since only linear functions could be learned with this approach, research on neuron-inspired approaches was initially abandoned in favour of symbolic learning approaches. As a result of the fascination that arose, among other things, due to the work published in the anthology Parallel Distributed Processing (PDP) [4] the representatives of symbolic artificial intelligence, including the symbolic approaches to machine learning, came under pressure to justify themselves. The main arguments against neural networks as a model for human intelligence were as follows [5]were (a) that (the then) neural networks were only applicable to very simple tasks, such as forming the past tense of English verbs, but not to complex cognitive tasks such as playing chess, and (b) that neural networks process input uninterpreted. A neural network trained to recognise traffic signs will compute exactly such an output when given an input of an animal picture. It will not recognise that the input belongs to a different semantic category.

A first book publication that had the term machine learning in its title appeared in 1983 – edited by Ryszard Michalski, Jaime Carbonell and Tom Mitchell [6]. The most important approaches of the early symbolic learning methods can be found there. Many of the approaches explicitly aimed to emulate properties of human learning. Decision tree methods were inspired by psychological work on concept learning [7] inspired. The first textbook on machine learning was written by Tom Mitchell in 1997 [1]. It explains in detail decision trees, neural networks, Bayesian learning, inductive logic programming, genetic algorithms and reinforcement learning. Not yet included are Support Vector Machines as one of the most successful approaches of the early 2000s and Random Forests as an efficient generalisation of decision tree methods, which emerged around the same time.

At the end of the 1990s, symbolic approaches were increasingly displaced by statistical machine learning approaches. In addition to machine learning research rooted in AI, a second research community emerged that was more oriented towards pattern recognition approaches originating from signal processing, with a focus on statistical and neural approaches. The online journal Journal of Machine Learning Research (JMLR), founded in 2000, stood out accordingly from the journal Machine Learning, which had already been established in 1986. In addition to the International Conference on Machine Learning (ICML, since 1980, initially as a workshop) and the European Conference on Machine Learning (ECML, since 1986, initially as a working session), the Neural Information Processing Systems (NIPS, since 2019 NeurIPS) conference became increasingly important. The most widely used textbook was Pattern Recognition and Machine Learning, published in 2006 [8]with its exclusive focus on statistical learning.

In the context of research on neural networks, architectures have been developed since around 2000 that are subsumed under the term Deep Learning [3]. Outside of academia, interest in Deep Learning grew around 2012. In the ImageNet Challenge, in which images are classified for around 1000 concepts, the winner for the first time in 2012 was not an approach from image processing, but a machine learning approach. The Convolutional Neural Netwok (CNN) AlexNet – named after Alex Krizhevsky, a PhD student of Geoffrey Hinton – improved the result of the previous competition by more than 10 percent and did not rely on extracting features from the images. Other high-profile Deep Learning successes include AlphaGo’s victory over the world’s best Go professional, and machine translation systems such as DeepL and GPT-3 (Generative Pre-Trained Transformer). In 2018, Yann LeCun, Yoshua Bengio and Geoffrey Hinton were awarded the most prestigious science prize in computer science, the Turing Award, for their work on deep networks.

Application and examples

Machine learning methods have long been used in a wide variety of applications. Established areas of application are, for example, the assessment of creditworthiness, spam filtering and product recommendations, as well as the field of data analysis in general, in which machine learning approaches are on an equal footing with statistical methods. In areas that have also been addressed for a long time, such as image recognition, speech and text processing, intelligent assistants and intelligent robot systems, promising results are being achieved through the use of deep learning methods. The Learning Systems Platform also lists agriculture, mobility and logistics, energy and the environment, finance and education(e-learning) as application markets. The concrete areas supported and the methods used are diverse: In medicine, for example, methods for classification and segmentation of image data are used to support medical diagnosis. Information extraction from medical texts can be realised with machine learning approaches for language processing. Disease prediction models can be learned from large collections of patient data. In industrial production, for example, digital transformation enables more targeted planning of maintenance activities (predictive maintainance) as well as an interplay between machine control processes and quality control.

In general, in all areas in which information is available in digital form(digitalisation, digital transformation), machine learning can always be used alongside standard algorithms when it is a matter of identifying correlations and patterns in large amounts of data and complex data structures. In addition, machine learning is always useful when the problem is either too complex to process with standard algorithms or when it is not possible to explicitly provide the knowledge needed to solve the problem.

Criticism and problems

With the increasing use of machine learning in many application areas, especially sensitive and safety-critical ones, it is becoming more and more clear what requirements must be met for practical use. There is no guarantee of correctness for models learned from data. To evaluate the performance in new situations, a subset of the training data is usually not used for learning, but is used as test data after the learning process is completed to assess the quality of the learned model. If a learned model achieves an estimated accuracy of 99 per cent, for example, it will make an error in every hundredth input – miss a critical event or falsely assume one (false alarm). This is not problematic in areas where decisions are not time- and safety-critical and where humans can correct: If, when searching for pictures showing cats, a picture of a sofa is also selected, this is not critical. If a tumour were to be overlooked in a tissue section, it would be dramatic. Especially with image data, learned models are often not very robust, which means that a change in a few pixels that is completely irrelevant for humans can lead to the model no longer recognising an object [9]. Current research addresses this problem by including manipulated images in the training set. In order to estimate robustness, the performance is determined for several test data sets and their variance is considered.

Increasingly, it is recognised that the quality of the data used to train a model is critical to the quality of the learned model. In particular, data can have undesirable correlations – so-called sampling biases. The result can be unfair models that, for example, disadvantage certain groups of people. Supervised approaches, which include many deep learning architectures, require that the training data is annotated (labelled) with the desired outputs. For everyday domains, such as traffic signs or species of animals, this can be done through crowdsourcing. In specialised domains, such as medical diagnostics or quality control in industrial production, this can only be done by domain experts. Often, even then, there is no absolute certainty (ground truth). In medicine, we speak of gold standard as the best approximation. Labeling can thus easily become very costly. If this problem is not solved, the data engineering bottleneck can trigger a new AI winter, like the knowledge engineering bottleneck in the era of expert systems (see Artificial Intelligence).

In many application areas, it is necessary for legal (especially liability), ethical or even practical reasons that it is comprehensible how a machine-learned model came to a certain decision. The model developers themselves must be able to assess properties of the learned models – especially possible biases due to overfitting to the data used for learning. For safety-critical applications, aspects of certification and testing are also becoming increasingly relevant. Domain experts – for example in medical diagnostics or in quality control in industrial production – must be able to comprehend, verify and, if necessary, also correct system decisions. Consumers should understand why a system – a smart home control, a driving assistance – behaves in a certain way and why they are recommended certain products, offered certain tariffs or withheld certain offers. These requirements are addressed in research on explainable artificial intelligence (XAI) [10]. In particular, methods are being developed to make the black box models learned with neural networks comprehensible and transparent. However, it is also increasingly being recognised that it makes sense for many application areas to use directly interpretable machine learning approaches [11,12].

Training deep neural networks can produce very good results for various application areas for which classical, significantly less data-intensive machine learning approaches are less suitable. However, this also comes with enormous energy consumption. Studies show, for example, that training a single deep network in the field of language processing causes as much CO2 as five cars [13].


The impressive success of deep learning approaches has led to a new and unprecedented interest in artificial intelligence research. Increasingly, however, there is recognition of the limitations and problems of such approaches [14]. Current research is increasingly working on hybrid approaches combining knowledge-based methods and machine learning. This reconciliation of two historically predominantly separate perspectives on AI can lead to a new quality of approaches that advance the development of human-centred AI systems.

Several PhD projects are investigating how machine learning, deep learning and neural networks can serve as a basis for the development of novel algorithms and interventions. These projects focus on the health sector:

The junior research group “Synth2Real: Training Neural Networks with Virtual Data” works specifically with learning methods for neural networks.

Further links and literature

A generally understandable introduction is provided by: Kristian Kersting, Christoph Lampert, Constantin Rothkopf (eds.). Wie Maschinen lernen: Künstliche Intelligenz verständlich erklärt. Springer Sachbuch, 2019. There is an accompanying introductory video.

A university-level textbook is [2].

An overview of important topics and application areas is provided by the Learning Systems platform.


[1] Tom M. Mitchell. Machine Learning. McGraw-Hill, 1997.

[2] Peter Flach. The Art and Science of Algorithms that Make Sense of Data. CUP, 2012.

[3] Ian J. Goodfellow, Yoshua Bengio, Aaron C. Courville. Deep Learning. Adaptive computation and machine learning. MIT Press, 2016.

[4] David E. Rumelhart, James L. McClelland. Parallel Distributed Processing. Explorations in the Microstructure of Cognition. 2 vols. MIT Press, 1986.

[5] Jerry A. Fodor, Zenon W. Pylyshyn. Connectionism and cognitive architecture: A critical analysis. Cognition, 28(1-2), 3-71, 1988.

[6] Ryszard S. Michalski, Jaime G. Carbonell, Tom M. Mitchell (eds.). Machine Learning. An Artificial Intelligence Approach. 3 volumes, Springer, 1983.

[7] Earl B. Hunt. Concept learning: An information processing problem. Wiley, 1962.

[8] Christopher Bishop. Pattern Recognition and Machine Learning. Springer, 2006.

[9] Alexey Kurakin, Ian J. Goodfellow, Samy Bengio. Adversarial examples in the physical world. 5th International Conference on Learning Representations (ICLR), Workshop Track Proceedings, 2017.

[10] Tim Miller. Explanation in artificial intelligence: insights from the social sciences. Artificial Intelligence, 267: 1-38, 2019.

[11] Stephen H. Muggleton, Ute Schmid, Christina Zeller, Alireza Tamaddoni-Nezhad, Tarek R. Besold. Ultra-strong machine learning: comprehensibility of programs learned with ILP. Machine Learning 107(7): 1119-1140, 2018.

[12] Cynthia Rudin. Please stop explaining black box models for high stakes decisions and use interpretable models instead, 2018, http://arxiv.org/abs/1811.10154.

[13] Emma Strubell, Ananya Ganesh, Andrew McCallum. Energy and Policy Considerations for Modern Deep Learning Research. AAAI 2020: 13693-13696, 2020.

[14] Gary Marcus. Deep Learning: A Critical Appraisal, 2018, https://arxiv.org/abs/1801.00631.