There is still a lot of future music in dealing with voice assistants

The project team of “MOTIV – Digital Interaction Competence: Monitor, Training and Visibility” is researching interaction with intelligent voice assistants. With their work, the researchers want to promote conscious interaction with voice assistants and develop training modules for users.

research project

MOTIV – Digital Interaction Literacy: monitor, training, and visibility

Can people build a personal relationship with their voice assistant?

Carolin Wienrich: Studies have indeed shown that users often develop an emotional relationship with their voice assistants and even entrust them with secrets. The device thus fulfils a social function. Even the naming of the manufacturers – let’s take Alexa or Siri – gives technical devices human traits. The interaction with speech alone is also primordially human. By the way, most voice assistants have female names – an exciting gender aspect.

Andreas Hotho: Voice assistants are also very close to their users – take the smartphone in your jacket pocket or the smart speaker in your living room, for example. They have become part of the private sphere and are therefore practically omnipresent. And above all: they are easily accessible and always have an “open ear”.

Why do we actually trust machines with secrets about us?

Wienrich: On the one hand, this trust can be traced back to the aforementioned emotional attachment to the device. Another interesting aspect is that the more technical and thus less human a device is, the more users trust it. This can be attributed, for example, to the fact that a machine does not evaluate the statements made – a human often does. Furthermore, the anonymity of the data seems to be preserved: you talk to the device and nothing happens to the personal data – but this is not the case, of course.

What happens to all the information – whether confidential or not?

Hotho: The provider stores the information in the form of a voice message on a central server – this means that employees have access to the data in principle. Wherever data is processed and stored centrally, there is a potential risk that such systems will be hacked – even if the companies naturally do everything they can to prevent this from happening. In addition, users of voice assistants should be aware that the providers have a commercial interest in the stored data, depending on the respective business model.

Do I even have the chance to decide what I reveal about myself?

Hotho: To a certain extent, that is possible. This is where our research in the MOTIV project comes in and we want to sensitise users to the use of voice assistants. We have focused on the voice assistant Alexa offered by Amazon. In addition to analysing data from interaction with smart speakers in laboratory and long-term studies, we automatically analysed large amounts of data available online, for example from forums, using data science methods and thus gained a picture of the users. This data was then interpreted and classified using our psychological model. The monitoring and evaluation phase has already been completed.

We are currently in the final phase of the three-year project and are developing concrete training to raise awareness among users.

Prof. Dr. Andreas Hotho To the profile

Why do users need extra training for voice assistants at all?

Wienrich: Very few people have specific knowledge about their voice assistant – or have misconceptions. A classic misconception is: I can’t influence anything anyway. We want to counteract this and promote a confident approach to technology. We explain which setting should be selected in order to disclose as little data as possible. Above all, we want to explain which design aspects influence our interaction and our expectations of the interaction. But the conscious choice of the activation word also plays an important role. Voice assistants are usually in our vicinity and can potentially always “listen in” on what is being said as soon as they are activated.

Hotho: The following example: someone uses the activation word “Alexa” for their voice assistant. At the same time, a very general contribution about the Amazon company is running on the radio and the word Alexa is mentioned. Then that can already be enough for the voice assistant to be activated in the home and then record without being asked.

How exactly will the training sessions on voice assistants be designed?

Wienrich: In designing our training units, we use media-didactic methods and approaches from instructional psychology. We do not assume any special prior knowledge, for example on topics such as machine learning, so that we address as many people as possible with our trainings. Our exercises are based on question and answer options, gap or learning texts and – in the sense of gamification – on quizzes or entertaining video games. The exchange and dialogue with users is particularly important to us. Thus, we are represented at this year’s AI.BAY in Munich at the bidt stand and are presenting the first training courses.

We use media-didactic methods and approaches from instructional psychology in the design of our training units. We do not assume any special prior knowledge, for example on topics such as machine learning, so that we address as many people as possible with our trainings.

Prof. Dr. Carolin Wienrich To the profile

Shouldn’t the manufacturers also raise awareness in dealing with their devices?

Hotho: Of course, it would be very desirable if training units were offered directly by the manufacturer in the future. Ideally, there would be a direct interaction option with the device after purchase. This could work in such a way that you buy a voice assistant, turn it on and the first thing you get is a short, entertaining training session on how to set it up and use it as part of the commissioning process.

What potential do you see for voice assistants with new developments like ChatGPT?

Hotho: Currently, the use of voice assistants is still rather limited. Systems like ChatGPT, which enter into a “natural” dialogue with users, can mean a boost for the development of a next generation of voice assistants. It is conceivable that in future we will conduct dialogues with the voice assistant via our voice. Currently, ChatGPT still requires tasks to be typed in via the input field. A connection between the voice assistant and the chat system opens up completely new possibilities.

However, this makes the voice assistant seem more and more human ..

Wienrich: Exactly. Interaction with the device takes on a much more human component. In addition to levels such as dialogue guidance, physical aspects such as gestures or facial expressions also play a major role in communication. This could be incorporated into the development of voice assistants. For example, the voice assistance could be part of a robot that has strong human features. That may still sound like dreams of the future, but there is a lot of development potential in it. Whether and which secrets and topics we will then discuss with our voice assistants remains to be seen.

Thank you very much for the interview!

The interview was conducted by Nadine Hildebrandt, scientific officer in the bidt dialogue team.