| News | Blog | Fit for voice-based AI systems

Fit for voice-based AI systems

Voice-based AI systems are widely used. However, their functionality is often opaque and they harbour risks for self-determined interaction. Based on the Digital Interaction Literacy model, an online training platform has been developed to promote confident interaction and help shape the digital future.

Voice assistants such as Alexa, Siri and others are based on artificial intelligence (AI ) and utilise advanced techniques such as natural language processing. However, many users do not know how these systems work, which can lead to misunderstandings. The dialogue capability of the systems tempts many to regard them as friends or family members. This social familiarity can unconsciously result in users disclosing more private information to these systems and accepting the information provided without hesitation. Misconceptions and gaps in knowledge about how voice-based AI systems work can give rise to unrealistic expectations. These can make interaction with the systems frustrating or prevent them from realising their full potential. There are also risks to privacy security and the possibility of unconscious influence, which jeopardise self-determined interaction with these systems.

Against this backdrop, there are both opportunities and challenges for voice-based AI systems in the digital world. Measures are needed that enable all people to interact with voice-based AI systems in a self-determined way and to navigate competently through the technology-based landscape.

The digital interaction literacy model

The MOTIV project team at the Julius-Maximilians-Universität Würzburg has developed the Digital Interaction Literacy Model (DIL model for short) to ensure autonomous and efficient use of language assistants. The model represents a competence framework that bundles the necessary skills for self-determined interactions with voice-based AI systems. It is based on interviews with experts from various disciplines, including research (e.g. psychology, education), development (e.g. programming), design (e.g. user experience) and media regulation (e.g. data protection law). The results of the interviews were compared with scientific literature and analysed in greater depth. This resulted in a holistic competence framework that contains three central competence dimensions for self-determined interactions with voice-based AI systems. These are in turn subdivided into further sub-dimensions and cover a broad spectrum of knowledge and skills: 1) understanding the functional principles, 2) mindful handling and 3) target group-specific competences.

Digital Interaction Literacy

Understanding the functional principles includes knowledge of the internal processes involved in the processing of voice commands by voice-based AI systems. It also includes an understanding of the efficient handling and functioning of integrated AI technologies and an awareness of the underlying algorithms.

Mindful handling includes the ability to plan and critically scrutinise interactions with voice-based AI systems. This also includes understanding data protection risks and the potential for social influence and being able to apply suitable data protection measures. Mindful interaction also includes the reflective and emotionally constructive use of the systems, which helps to regulate misconceptions and shape future interactions.

Target group-specific competences refer to skills that are relevant for certain groups of users depending on their preferences and responsibilities. They include, for example, development skills such as programming skills, which are important for people who want to customise device functions to individual needs. Communication and teaching skills are relevant for guardians, teachers and educators in order to convey knowledge about these systems in an understandable way.

Strengthening digital interaction skills in a targeted manner

The online training platform offers a total of six training modules on the topics of understanding functions, AI learning, persuasion literacy, privacy literacy, operating methods and algorithms in the context of voice assistants. To enable a holistic understanding in the respective area, each module consists of three training units that build on each other.

The training modules consist of learning texts and learning videos that are optimised in terms of media didactics and instructional psychology in order to offer a positive learning experience. Each training module begins with an overview of the topic, which is expanded upon as the training progresses. At the end of each training module, learners can consolidate and test their acquired knowledge using interactive tasks. Various types of tasks such as multiple choice, cloze texts, matching tasks and learning games are available for this purpose. Integrated gamification elements such as feedback and praise promote motivation during the learning process. There are also plans to develop an educational chatbot that answers questions on the training topic and arouses interest in learning. After completing a training module, learners receive performance feedback and the opportunity to repeat tasks they have solved incorrectly.

The training courses take place online and are accessible free of charge to enable broad participation. The training modules are also scientifically evaluated to ensure their effectiveness. Through these measures, the platform is intended to enable independent learning and strengthen the digital interaction skills of all learners.

Personalised learning experiences

Particular attention is paid to the design of personalised learning experiences. Learners determine their own learning pace and can take tests before starting the training, on the basis of which a recommendation system suggests suitable training modules. The system uses a 36-question interest test to identify preferred training modules. Optionally, training participants can share the usage data of their voice assistant. On this basis, the recommendation system analyses keywords and voice commands to identify training needs and generate personalised training suggestions. Data transfer is voluntary and all users are informed about the potential risks of uploading data. Although no usage data is provided, a high level of training quality is still guaranteed.

As part of the analysis, learners receive an individual usage profile with insightful information such as the number and type of voice commands used. The privacy of the users is fully protected. All usage information from the voice assistant is processed exclusively locally on the learner’s PC and is not stored by the training platform.


In a world full of voice-based AI systems, it is crucial to strengthen our digital interaction skills. The DIL model provides a framework for developing basic skills. It prepares us specifically for self-determined interactions with current voice-based AI systems and for the next generations, such as those possible with ChatGPT. The MOTIV training platform uses customised training modules to improve our use of technology and make the most of the opportunities and challenges of this new era.

The blog posts published by bidt reflect the views of the authors; they do not reflect the position of the institute as a whole.