Definition and delimitation

A bot is often used to describe software that executes something on its own. This includes a number of very different technologies: On the one hand, the term appears in connection with malicious software and describes, for example, viruses that independently execute actions on an infected computer. Furthermore, software is referred to as a bot that executes actions in computer systems or other software, either as deliberate automation or instead of actions that are actually expected of humans. The latter is the case, for example, with bots in computer games or to overcome access restrictions – which is why many registrations on the internet ask for confirmation “I am not a robot”.

In this article, however, the focus is on bots as interactive systems with which users can interact, whereby the interaction concept envisages the bot in a person-like role [1]. This includes in particular the ability of the system to understand natural language, both textual and spoken. Prominent examples of this are chatbots and voice assistants (e.g. Apple Siri, Google Assistant, Amazon Alexa, Microsoft Cortana).

Application and examples

An often cited historical example is ELIZA, a computer programme developed by Joseph Weizenbaum at MIT in the 1960s [2]eLIZA simulated conversations according to predefined rules and was intended as a kind of parody of (superficial) conversations one might have with a psychotherapist. Surprisingly, however, the system seemed unexpectedly realistic to a number of early users or was interpreted by them with human characteristics.

Today’s applications and examples can be found in modern voice assistants, e.g. in smartphones and vehicles, as well as in chatbots that take over a (first) customer contact in many places, e.g. on company websites. A particularly recent example is a WHO chatbot that provides information on the Covid 19 pandemic [3]. Also worth mentioning (and can be tried out online) is Kuki, a chatbot that, contrary to the current trend towards ever larger language models from the field of machine learning, communicates instead by means of a manually maintained rule system – and has won many prizes [4].

Criticism and problems

Two major critical aspects should be addressed here: Firstly, problems can arise from bots’ understanding of language, and secondly from a discrepancy between mediated and actual capabilities.

A number of challenges arise from the complexity and properties of natural language: bots are supposed to understand it, but they often fail to do so because of aspects such as ambiguity and context. A short dialogue in a café serves as an example:

  • Customer: “Do you have takeaway?”
  • Employee: “Milk and sugar?”
  • Customer: “Black, please.”

Neither is the customer’s question grammatically complete, nor does the employee answer the question directly, nor is “black” as a colour an appropriate answer to “milk and sugar?” – unless one just understands the context. What works smoothly for the humans in the example could present a difficult challenge for a “barista bot”. Similarly, there are social and cultural aspects, dialects, slang, prosody (speech melody) and other factors.

Another problem with bots is the expectations that a representation of systems as (personified) autonomous actors conveys to users: People use language in everyday life to communicate with people. Language as an interaction concept therefore quickly conveys skills and intelligence that a chatbot or voice assistant does not possess, at least not (yet) [5].

Overall, this leads to the criticism that bots, as “conversation-based” interactive systems, do not really engage in conversation today: they do not take part in a conversation in the human sense, but are at most operated out of one (e.g. when one gives the instruction “Alexa, turn on the light” as a secondary activity in conversation with others).

Research

Bots combine aspects from different research areas, since different sub-problems have to be solved: for example, a voice assistant must understand spoken language as input, but also master speech synthesis in order to respond. In between, the input is converted into text, which then has to be analysed, as is also the case with chatbots. Here, many other subtasks of speech understanding and generation arise, such as incorporating context (e.g. what has been said so far) or even recognising and understanding named entities (e.g. people, places, company names) – and of course generating an appropriate and meaningful response or reaction. Research on these questions is taking place, for example, in the field of signal processing and computational linguistics or natural language processing.

Beyond these technical aspects, the bot as an interface between humans and computer systems also poses numerous challenges for research; for example, questions about how a dialogue can be designed in a targeted manner for a specific purpose of the bot, which variants there should be or must be understood, or how the bot should deal with input from users that is not understood [6]. In addition, there are questions regarding the design and presentation of the bot, such as whether it should have a name or present a certain kind of personality. Research on theoretical and practical aspects of such questions can be found in the field of human-computer interaction.