| News | Guest contribution | When is generative AI “good” — and when is it “good enough”?

When is generative AI “good” — and when is it “good enough”?

Generative AI such as ChatGPT has the potential to influence many areas of life, but it cannot solve all problems. Realistic expectation management is crucial in order to ensure that its use has a positive impact. An article by bidt Director Professor Alexander Pretschner.

The potential of tools such as ChatGPT or Midjourney seems immeasurable, given the astonishing quality of the results! Indeed, we can assume that (generative) artificial intelligence (AI) will majorly impact many areas of our lives. At the same time, however, we must remember that even this technology cannot solve all the world’s problems. The insufficiently precise recognition of AI-generated content by artificial intelligence itself is just one example of many — and there are not (yet) any apps that can predict share prices.

A matter of expectation management

Realistic expectation management seems to indicate that AI-based applications cannot simply be labelled as “correct” or “incorrect”, as is the case for classic algorithmic software. One reason is that we use AI, or machine learning, for problems we cannot describe precisely: How do you explicitly describe what constitutes a pedestrian for a pedestrian recognition system? If we could describe issues of this kind precisely, machine learning would generally not be the first choice for solving them. Instead of making categorical statements about correctness — does it work or does it not work — we can make gradual statements, simplified: it probably works in 80 per cent of cases. Hallucinations of large language models are examples of the 20 per cent missing that don’t work. But when exactly is something a hallucination? Are there gradations of problematic and unproblematic hallucinations, and does it depend on the prompt and context of use?

What is the point of reference?

If we as a society want to utilise the opportunities of (generative) AI, we need to understand what it means for an AI to be “good” — in a qualitative, not an ethical sense. When is it “good enough”, and what is the reference? Is the benchmark an average or an outstanding human in their field of expertise? As a society, we will have to agree on what cost-benefit ratio is acceptable for us in which context, given the expected imperfect quality. To put it bluntly, is it permissible for an AI-based learning assistant to deliver perhaps 15 per cent of factually incorrect information? How often do teachers at school or university make mistakes? And if learners have no other way of discussing questions about a subject with a human being, should we deny them this opportunity, given the sometimes incorrect content? There are difficult ethical discussions to be held that go beyond pure utilitarian thinking.

Generative AI as an assistance tool

As a society, we are gradually gaining a better understanding of where we can use generative AI profitably. It can be assumed that these tools will remain assistants for the foreseeable future and will not replace us in most cases. Interaction with generative AI can offer added value, for example, in the case of learning assistants. Here, the process is more important than the product. There is also the case where interaction with generative AI results in a product that represents added value as such. This could be a newsletter, documentation, or code. The quality of the prompts or the interaction determines the quality of the results. The question arises of whether the prompting and checking of (interim) results can be carried out in a less time-consuming and/or skill-intensive way than with a purely manual approach.

Creating and validating intertwined

As Kleist observed in his essay on the gradual realisation of thoughts when talking, some people need the process of developing thoughts or products in small steps to build creativity. This is because it does not seem easy to offer a tool like ChatGPT a comprehensive prompt in one fell swoop, from which the desired artefact is created in one step at the touch of a button. Instead, the prompt is created while thinking — or we think while prompting. As humans, we create part of the result in a small step, the machine creates another part, which we check — and then we start the next step. In the interlocking of creating and checking steps, we will see whether the direct creation of the product is sometimes not the fastest way after all. The answer to this question depends heavily on the context in which generative AI is used. In any case, judgement is becoming an increasingly critical skill.

We will learn what this means for job profiles, training, creation and perception of media content and the nature of democracy. Shaping this lies at the heart of digitalisation, the overlap between technology and society, and the heart of bidt’s activities. We also look forward to understanding the digital transformation in this area and shaping the future through dialogue!

Prof. Dr. Alexander Pretschner

Chairman of bidt's Board of Directors and the Executive Commitee | Chair of Software & Systems Engineering, Technical University of Munich | Scientific director, fortiss