In a world increasingly permeated by intelligent voice assistants, we live in a fascinating and challenging time. Voice-based AI systems such as smart speakers (e.g. Alexa) or chatbots (e.g. ChatGPT) offer a completely new form of seamless interaction in our everyday lives, characterised by convenience and efficiency. With a simple voice command, we can play music, create a shopping list or call up information while we concentrate on other activities such as cooking or housework. Despite their great benefits, voice-based AI systems also harbour risks. Studies show that users of voice assistants have fewer privacy concerns and often unknowingly disclose more private information than intended in order to reap the benefits of these technologies. [1], [2] The social influence potential of voice assistants is particularly high. Based on the paradigm “Computers Are Social Actors” [3] voice assistants unconsciously trigger social reactions in users due to their human-like characteristics (e.g. ability to engage in dialogue, human names) and are even perceived as friends or family members. [4] This social effect can tempt users to engage in ill-considered interactions, such as making more impulsive purchasing decisions or assessing the information provided as more trustworthy. [5], [6], [ 7] The social impact of voice assistants sets them apart from other technologies, but this is aggravated by the fact that users often do not fully understand how voice assistants work. [8], [9] An inadequate understanding of how voice assistants work can lead to fears and misconceptions that manifest themselves, for example, in unrealistic data protection concerns or overly high expectations of the systems’ dialogue capabilities. [10], [11] As a result, many people find it difficult to formulate voice commands that work, are frustrated when speech recognition fails or do not fully utilise the performance potential of voice assistants. [8], [12], [13]
Usage risks and the consequences of misconceptions about voice assistants jeopardise efficient, safe and self-determined interaction with them. [10], [14] This emphasises the high demand for digital skills in the context of artificial intelligence. Carolus et al. (2023) have developed the Digital Interaction Literacy model, which summarises the skills required for self-determined interaction with voice-based AI systems. These three dimensions with a total of ten sub-dimensions include 1) understanding functional principles, 2) mindful use of voice assistants and 3) target group-specific competences. These dimensions are briefly described below:
- Dimension 1: Understanding functional principles
This dimension includes competences that enable a comprehensive understanding of how to use voice-based AI systems. The general knowledge of how the system works relates to how voice assistants process voice commands from users step by step. Awareness of algorithms enables users to understand that voice assistants use algorithms and on what basis this can lead to distortions in the results of search queries. Handling the device relates to how voice commands need to be formulated to ensure efficient interaction. AI learning involves understanding that voice assistants require training data to learn and which influencing factors determine the quality and learning performance of voice assistants. This involves processing and generating large amounts of data. - Dimension 2: Mindful handling
This dimension is about critically scrutinising the use of voice assistants, regulating risks and ensuring that they are used in line with needs. Privacy literacy involves understanding potential data protection risks and implementing data protection measures. These measures enable users to (partially) conceal their identity. Persuasion literacy enables users to recognise the influence of voice assistants and apply defence strategies. Emotional-affective competence enables users to deal constructively with frustration and fear when using language assistants in order to counteract misconceptions and facilitate learning progress. The ability to reflect supports the weighing up of privacy, benefits of use and ethical aspects for future interactions.

- Dimension 3: Target group-specific competences
This dimension includes skills that are relevant depending on individual needs and responsibilities. Development skills such as programming skills are an advantage for people who want to technically adapt voice assistants to their needs. Communication and teaching skills are particularly important for parents and teachers. They should be able to convey knowledge about voice assistants in an understandable way and evaluate the child’s usage behaviour. This enables the child to better protect their own privacy or develop an interest in the technology, for example.
Comparability with other digital phenomena
Although chatbots and voice assistants appear different on the surface, they share the ability to understand human language and respond appropriately. Both applications embody language-based AI and use Natural Language Understanding (NLU), for example, to interpret requests and find suitable answers. The main difference lies in the input and output: voice assistants require a transcription of spoken language in the form of machine-readable text, while chatbots interact directly with text.
Chatbots have a similar social impact on users as voice assistants. Chatbots with human-like characteristics (e.g. name) can increase users’ trust and make them more likely to follow their recommendations. [15], [16] In addition, users tend to reveal more about themselves after a chatbot has revealed personal information. This phenomenon is known as reciprocal self-disclosure and has also been observed in the context of voice assistants. [17], [18]
ChatGPT takes chatbots to a new dimension. They can respond in a variety of ways and generate original answers. Their ability to remember previous conversations enables continuous and personalised interaction, which sets them apart from current voice assistants such as Alexa, Siri and Co. [19] It stands to reason that the DIL model is transferable to chatbots, as they share a common technological and social foundation with voice assistants. Future research should investigate this transferability in more detail and, if necessary, determine new competences for the self-determined use of chatbots. However, it remains to be seen to what extent future developments in the field of voice-based AI systems will place new demands on users.
Comparability with analogue phenomena
People are also supported by personal assistants outside the digital world. Some hotels have concierges that take care of booking dinner or running errands for guests, for example. Both voice assistants and concierges take over tasks such as shopping so that the person can attend to other tasks such as a meeting in the meantime.
In addition to voice assistants, concierges also initially need personal information about the customer’s interests in order to complete tasks as appropriately as possible. Unlike voice assistants, concierges are not dependent on digital interfaces. For example, they can also book a table in a restaurant, even if the restaurant does not have a website. However, unlike voice assistants, they are not able to process several requests at the same time. However, the digitalised world helps them to process tasks more quickly.
Voice assistants have a high persuasive potential: they can appear more trustworthy to users, elicit personal information from them or influence them to follow their advice. [7], [20] The same also applies to analogue phenomena, as in the case of a concierge. It is possible that guests are recommended an unsuitable restaurant because the restaurant pays the concierge a commission for the recommendation. It is also possible that guests will criticise the unsuitable restaurant less if they have known the concierge for a longer period of time. Effects of this kind will be all the more pronounced the deeper the bond, sympathy and trust between the guests and the concierge. [21], [22], [23] Voice assistants can reinforce this phenomenon, as the voice-based AI systems are directly integrated into the personal environment of the user and can therefore quickly establish a personal relationship. [24]
Even if the voice assistant has a lot of information about the user, they are (partially) able to protect their identity or data. They can have their data released or deleted by legal guidelines. As data about guests is also stored in hotels, similar guidelines apply here. It is also more difficult for hotel guests to maintain their anonymity as they often interact directly with people.
Social relevance
The digital world presents both challenges and opportunities where improving Digital Interaction Literacy (DIL) can be crucial. Targeted support can enable all people to interact with voice-based AI systems such as smart speakers or chatbots in a self-determined way. It also helps people to protect themselves from persuasion and to navigate safely in a technology-influenced landscape. Training is playing an increasingly important role in enabling users to interact effectively and reflectively with AI technologies [25], [26], which is precisely where the online training platform of the MOTIV project at the University of Würzburg comes in. On the platform, competences of the DIL model are trained in several modules. Learners can expand their knowledge of language-based AI systems using instructionally optimised learning texts and learning videos. Interactive tasks and gamification elements support learning progress. All training courses take place online and are free of charge. This ensures that they are accessible to a heterogeneous target group and encourage broad participation. In this way, the training platform is intended to offer the opportunity for independent learning and contribute to overcoming the digital divide.
Sources
- Kang, H./Oh, J. (2023). Communication privacy management for smart speaker use: Integrating the role of privacy self-efficacy and the multidimensional view. In: New Media & Society 25(5), 1153–1175. https://doi.org/10.1177/14614448211026611
- Lutz, C./Newlands, G. (2021). Privacy and smart speakers: A multi-dimensional approach. In: The Information Society 37(3), 147–162. https://doi.org/10.1080/01972243.2021.1897914
- Nass, C./Steuer, J./Tauber, E. R. (1994). Computers are social actors. Proceedings of the SIGCHI conference on Human factors in computing systems. Boston, Massachusetts (USA).
- Purington, A. et al. (2017). „Alexa is my new BFF“ Social roles, user satisfaction, and personification of the Amazon Echo. Proceedings of the 2017 CHI conference extended abstracts on human factors in computing systems. Denver, Colorado (USA).
- Gaiser, F./Utz, S. (2023). Is hearing really believing? The importance of modality for perceived message credibility during information search with smart speakers. In: Journal of Media Psychology: Theories, Methods, and Applications. https://doi.org/10.1027/1864-1105/a000384
- Rzepka, C./Berger, B./Hess, T. (2020). Why another customer channel? Consumers’ perceived benefits and costs of voice commerce. Proceedings of the 53rd Hawaii International Conference on System Sciences., Honolulu, (Hawaii).
- Wienrich, C./Reitelbach, C./Carolus, A. (2021). The trustworthiness of voice assistants in the context of healthcare investigating the effect of perceived expertise on the trustworthiness of voice assistants, providers, data receivers, and automatic speech recognition. In: Frontiers in Computer Science 3, 685250. https://doi.org/10.3389/fcomp.2021.685250
- Luger, E./Sellen, A. (2016). "Like Having a Really Bad PA" The Gulf between User Expectation and Experience of Conversational Agents. Proceedings of the 2016 CHI conference on human factors in computing systems, San Jose, California, USA.
- Pasquale, F. (2015). The black box society: The secret algorithms that control money and information. Harvard University Press. https://doi.org/10.4159/harvard.9780674736061.c8
- Carolus, A.et al. (2023). Digital interaction literacy model–Conceptualizing competencies for literate interactions with voice-based AI systems. In: Computers and Education: Artificial Intelligence 4, 100114. https://doi.org/10.1016/j.caeai.2022.100114
- Görnemann, E. (2019). Sprachassistenten – Funktion, Markt und Datenschutz. Digitale Woche Kiel, Kiel.
- Goetsu, S./Sakai, T. (2020). Different types of voice user interface failures may cause different degrees of frustration. arXiv preprint arXiv:2002.03582. https://doi.org/10.48550/arXiv.2002.03582
- Kim, S./Choudhury, A. (2021). Exploring older adults’ perception and use of smart speaker-based voice assistants: A longitudinal study. In: Computers in Human Behavior, 124, 106914. https://doi.org/10.1016/j.chb.2021.106914
- Chetty, K. et al. (2018). Bridging the digital divide: measuring digital literacy. In: Economics 12(1), 20180023. https://doi.org/10.5018/economics-ejournal.ja.2018-23
- Bălan, C. (2023). Chatbots and voice assistants: digital transformers of the company–customer interface—a systematic review of the business research literature. In: Journal of Theoretical and Applied Electronic Commerce Research 18(2), 995–1019. https://doi.org/10.3390/jtaer18020051
- Konya-Baumbach, E/Biller, M./von Janda, S. (2023). Someone out there? A study on the social presence of anthropomorphized chatbots. In: Computers in Human Behavior 139, 107513. https://doi.org/10.1016/j.chb.2022.107513
- Lee, Y.-C. et al. (2020). „I hear you, I feel you“: Encouraging deep self-disclosure through a chatbot. Proceedings of the 2020 CHI conference on human factors in computing systems, New York (United States).
- Moon, Y. (2000). Intimate exchanges: Using computers to elicit self-disclosure from consumers. Journal of consumer research 26(4), 323–339. https://doi.org/10.1086/209566
- Chaturvedi, R. et al. (2023). Social companionship with artificial intelligence: Recent trends and future avenues. In: Technological Forecasting and Social Change, 193, 122634. https://doi.org/10.1016/j.techfore.2023.122634
- Voorveld, H. A./Araujo, T. (2020). How social cues in virtual assistants influence concerns and persuasion: The role of voice and a human name. In: Cyberpsychology, Behavior, and Social Networking 23(10), 689–696. https://doi.org/10.1089/cyber.2019.0205
- Bower, A. B./Landreth, S. (2001). Is beauty best? Highly versus normally attractive models in advertising. In: Journal of advertising 30(1), 1–12. https://doi.org/10.1080/00913367.2001.10673627
- Ladhari, R./Massa, E./Skandrani, H. (2020). YouTube vloggers’ popularity and influence: The roles of homophily, emotional attachment, and expertise. In: Journal of Retailing and Consumer Services 54, 102027. https://doi.org/10.1016/j.jretconser.2019.102027
- De Meza, D./Irlenbusch, B./Reyniers, D. J. (2010). Disclosure, trust and persuasion in insurance markets. IZA Discussion Paper No. 5060. http://dx.doi.org/10.2139/ssrn.1648345
- Wang, J. et al. (2020, April). Alexa as coach: Leveraging smart speakers to build social agents that reduce public speaking anxiety. Proceedings of the 2020 CHI conference on human factors in computing systems (pp. 1–13), New York, (USA). https://doi.org/10.1145/3313831.3376561
- Long, D./Magerko, B. (2020). What is AI literacy? Competencies and design considerations. Proceedings of the 2020 CHI conference on human factors in computing systems, Honolulu (Hawaii).
- Ng, D. T. K. et al. (2021). Conceptualizing AI literacy: An exploratory review. In: Computers and Education: Artificial Intelligence 2, 100041. https://doi.org/10.1016/j.caeai.2021.100041