Human Cues in Self-help Lifestyle Interventions: an Experimental Field Study

Background: Self-help eHealth interventions are generally less effective than human-supported ones, as they suffer from a low level of adherence. Nevertheless, self-help interventions are useful in the prevention of non-communicable diseases, as they are easier and cheaper to widely implement. Adding humanness in the form of a text-based conversational agent (TCA) could provide a solution to non-adherence. In this study we investigate whether adding human cues to a TCA facilitates relationshipbuilding with the agent, and makes interventions more attractive for people to adhere to. We will investigate the effects of two types of human cues, which are visual cues (eg, human avatar) and relational cues (eg, showing empathy). Objective: We aim to investigate if adding human cues to a TCA can help increase adherence to a self-help eHealth lifestyle intervention and explore the role of working alliance as a possible mediator of this relationship. Methods: Participants (N=121) followed a 3-week app-based physical activity intervention delivered by a TCA. Two types of human cues used by the TCA were manipulated, resulting in four experimental groups, which were (1) visual cues-group, (2) relational cues-group, (3) both visual and relational cues-group, and (4) no cues-group. Participants filled out the Working Alliance Inventory Short Revised form after the final day of the intervention. Adherence was measured as number of days participants responded to the messages of the TCA. Results: One-way ANOVA revealed a significant difference for adherence between conditions. Against our expectations, the groups with visual cues showed lower adherence compared to those with relational only or no cues (t(117) = -3.415, P = .001). No significant difference was found between the relationaland no cues-groups. Working alliance was not affected by cue-type, but showed to have a significant positive relationship with adherence (t(75) = 4.136, P < .001). Conclusions: We hypothesize that the negative effect of visual cues is due to a lack of transparency about the true nature of the coach. Visual resemblance of a human coach could have led to high expectations that could not be met by our digital coach. Furthermore, the inability of TCAs to use non-verbal communication could provide an explanation for the lack of effect of relational cues or the effect of cue-type on working alliance. We give suggestions for future studies to test these potential mechanisms. Clinical Trial: Pre-registration: OSF Registries, https://osf.io/mgw2s https://preprints.jmir.org/preprint/30057 [unpublished, non-peer-reviewed preprint] JMIR Preprints Cohen Rodrigues et al (JMIR Preprints 29/04/2021:30057) DOI: https://doi.org/10.2196/preprints.30057


Table of Contents
Human Cues in Self-help Lifestyle Interventions: an Experimental Field Study Original Manuscript agent; chatbot; adherence; working alliance.
Non-communicable diseases (eg, cardiovascular diseases, type-2 diabetes) are the leading cause of death globally [1]. Engaging in a healthy lifestyle can help in the treatment and prevention of many of those diseases [2,3]. This could be facilitated by eHealth, which are digital tools that can be used by a healthcare professional to provide remote support, or that can provide automated support [4].
Studies show that eHealth is effective in improving lifestyle behaviors, and in the prevention and treatment of non-communicable diseases [5,6]. Automated self-help interventions are easier and cheaper to widely implement as they require no interference of healthcare professionals, who indicate to experience barriers in lifestyle support such as a lack of time or insufficient experience with lifestyle support [7][8][9]. However, meta-analyses show that human-supported digital lifestyle interventions are more effective than self-help ones [5,10,11]. Adherence, or the extent to which a person uses the eHealth intervention as intended, is a problematic within self-help interventions [12][13][14][15]. As intervention adherence is related to more positive health outcomes [16], finding ways to make people adhere to self-help interventions would be necessary to reach optimal effectiveness.
Although human-supported interventions generally outperform self-help ones, this does not necessarily imply the support of a healthcare professional. Meta-analyses revealed that contact with a nonprofessional or administrative support by a human being is enough to both ensure intervention effectiveness and prevent people from dropping out of the intervention [17][18][19][20]. It seems that some level of "humanness" rather than professional guidance is the key ingredient within human-supported interventions. The underlying reason why people would like a level of humanness into the intervention, could be the need of a personal relationship with their care provider [21]. This so-called working alliance can be defined as the level of agreement on goals that are set for treatment, on tasks that must be performed to reach this goal, and the relational bond between healthcare professional and patient [22,23]. Working alliance with a human care provider is a predictor of intervention adherence and effectiveness both in regular face-to-face [24,25] and in digital therapy or treatment [26,27]. However, people are also able to form relationships with computers. People interact with computers as they would do with human beings, and apply similar social rules and heuristics [28,29]. Studies show that people can also develop a working alliance within fully automated digital interventions, and that this leads to more positive treatment outcomes [30][31][32][33][34].
In self-help interventions, humanness can be added by the use of a so-called conversational agent (CA). These computer-based agents can mimic human-like conversational behavior (eg, respond to input, generate output, apply turn-taking) [35], and be used to provide automated support.
An embodied conversational agent (ECA) is visually present on screen and can provide non-verbal cues (eg, hand gestures), while a text-based conversational agent (TCA) is able to communicate with text only [36]. A TCA has the advantage of being easier to develop, being easier to apply in a mobile app, and therefore being more suitable for widespread implementation. Studies show that people show more relational behaviors or feel more social presence when they believe that their interaction partner is a human being rather than a computer [37,38]. To enhance these perceptions while interacting with CAs, human cues could be applied, such as an avatar of a human being, a human tone-of-voice [39], or lower speed of feedback [40]. Furthermore, non-verbal communication can be replaced by adding emoticons [41]. Besides the appearance of the messages and CA, human cues could be applied to the content of its messages and behavior. It is possible to add human conversation rules in computer-generated conversations, such as humor, empathy, and small talk, which are often used by humans to establish a relationship [42,43]. Studies with ECAs show that applying such human cues to the interaction increases the working alliance users experience with the ECA [42] and their intention to use the ECA [44].
Although studies about digital interventions with TCAs have been conducted before, only a small amount has focused on their application in lifestyle change interventions [45]. Additionally, the effects of human cues are predominantly tested with ECAs. Therefore there is little knowledge about how human cues affect people's relationship with TCAs, or their adherence to TCA-supported interventions. Furthermore, the majority of studies test the effects of either using human cues or not, while we are interested in the effects of two different types of human cues and how these interact. In this study we will focus on both the appearance and behavior, or in other words, the visual and relational cues, that can be integrated into a TCA to increase the level of humanness. We predict that these will improve the working alliance people experience with the TCA, and in turn their adherence to the intervention. To test this, we conducted a field-experiment in which people followed a 3-week app-based physical activity intervention with automated support from a TCA. We manipulated the cues used by the TCA, and measured the working alliance that people experience with the TCA, and the number of days they adhere to the intervention. This allows us to answer the following research questions: RQ1: Is there an effect of human cue type (visual and relational cues) on adherence?
RQ2: Does working alliance mediate the effect of human cue type (visual and relational cues) on adherence?

Study Design
The three-week field experiment was conducted in March and April 2020. We employed a 2 (visual cues: yes, no) x 2 (relational cues: yes, no) between-subjects design. There were four experimental groups, in which the TCA used (1) visual and relational cues, (2) visual cues, (3) relational cues, or (4) no human cues.

Participants
We recruited (N = 269) healthy participants between 18 and 30 years old with flyers on the university campus and via social media, who were willing to work on their level of physical activity, had access to a smartphone running iOS or Android, and were sufficiently proficient in English. We excluded participants who were not able to engage in a normal physical activity pattern using the Physical Activity Readiness Questionnaire (PAR-Q) [46]. Students from Leiden University received credits (required to complete their first year) for their participation, and all participants who would complete the study would enroll in a lottery (with the chance of winning one of the three Fitbit devices, or one of the 100 vouchers worth €10,-). Power calculations (G*Power) [47] identified a minimum sample size of 128 to detect a medium between-group effect (f = .25) of cue-type with an alpha of .05 (ANOVA with 4 groups). Given the high attrition rates in similar studies (eg, [48]), we aimed to recruit about double the required number of participants. All participants provided their consent before the start of the experiment, and the study was approved by the Psychology Research Ethics Committee of Leiden University.

Intervention
The aim of the intervention was to enhance participants' physical activity levels by increasing daily step counts. The intervention consisted of daily exercises, such as a quiz about the health consequences of physical activity and a decisional balance worksheet (see Appendix 1 for overview of daily exercises). These exercises would take about 5 to 10 minutes each day to complete. The exercises were based on behavior change techniques (BCTs), such as prompts, information about health consequences, review of goals, and social reward [49]. These are intervention components designed to regulate behavior by reinforcing factors that facilitate behavior change, and mitigating factors that hinder behavior change. These behavior change techniques were incorporated following the Transtheoretical Model of health behavior change [50], which views behavioral change as an upward spiral process involving progress through five stages (i.e. pre-contemplation, contemplation, preparation, action, maintenance). The model has been used to target a wide range of health behaviors [51].

Technical Implementation of the Benefit StepCoach for Android and Apple's iOS Platforms
The Benefit StepCoach app was implemented with MobileCoach (www.mobile-coach.eu) [52,53,54], an open-source software platform for smartphone-based and chatbot-delivered behavioral interventions (eg, [55]) and ecological momentary assessments (eg, [56] Appropriate interactions were implemented, i.e. asking participants for their permission, to allow the apps to access the step data. Moreover, each experimental group was assigned a dedicated TCA.

Text-based Conversational Agent
Participants interacted daily with a TCA, the virtual coach who delivered the intervention and offered various conversational turns. Via the chat feature, the TCA delivered daily exercises in line with the intervention and would respond to messages of the participants via conversational turns (see Figure 1). All conversational turns were scripted. Each day would consist of two to four conversational turns. The first message would be sent in the morning (9:00 am), and the following messages after a reply of the participant. If the participant would not have replied yet, the TCA would send a reminder in the afternoon (3:00 pm).
Across the experimental groups, the intervention (eg, tasks and feedback) was identical, but the conversational turns differed in the type of cues the TCA used. We manipulated two types of human cues: (1) visual cues, which were related to the humanness of the communication style, and the design and appearance of the messages (human avatar, use of emoticons, human tone-of-voice, and response delay), and (2) relational cues, which were related to the content of the messages, and to what extent these followed social scripts and human conversation rules (eg, showing empathy, self-disclosure, humor, small talk, and meta-relational communication) (see Figure 2).

Adherence
Adherence was measured based on the number of days participants finished the session of conversational turns. Participants were marked as "adherent" for a particular day if they had replied to the final message of the TCA before the end of the day (12:00 pm at midnight). Given the duration of the intervention, the level of adherence over the whole study could range between 1 and 21 days.

Physical Activity
We measured physical activity through objective step count data retrieved from Apple Health or Google Fit (depending on the smartphone of the participant). Effectiveness was based on the baseline average step count in the week before the intervention, as well as on the step counts retrieved during the intervention itself.
To assess baseline levels of physical activity, the International Physical Activity Questionnaire Short Form (IPAQ-SF) [57] was used. The questionnaire consists of seven items asking the participants about their time spent on vigorous and moderate physical activities, walking, and sitting during the previous week. The output is a MET score, representing the amount of energy used to carry out the reported physical activities. The IPAQ-SF has been shown to have a high reliability, but minimal validity [57,58]. Therefore we decided to additionally use objective step count as baseline measurement.

Working Alliance
Working alliance with the TCA was measured with an adjusted version of the Working Alliance Inventory Short Revised form (WAI-SR) [59]. The WAI-SR consists of 12 items measured on a 5-point Likert-type scale ranging from 1 (seldom) to 5 (always), subdivided in 3 subscales: agreement on tasks, agreement on goals, and bond. Questions were adjusted to fit the context of the study by using the words "coach", "lifestyle" and "intervention" (eg, "The coach and I collaborate on setting lifestyle goals."). The WAI-SR has been shown to have sufficient reliability and validity [59], and our adjusted version showed to have a high internal consistency (Cronbach's α = .945).

Procedure
A week before the start of the intervention 282 participants provided digital informed consent and filled in a screening survey assessing the inclusion and exclusion criteria. 226 eligible participants received a link to the iOS or Android app store to download the Benefit StepCoach app.
Once the app was downloaded, participants were asked to go through the onboarding procedure to correctly configure the app (eg, allowing push messages and access to step count data via Apple Health or Google Fit), and to complete the baseline survey. Participants were reminded through emails and text messages to complete the onboarding and baseline survey (measuring demographics and baseline characteristics) after 3, 4 and 5 days, and excluded if they did not do so before the start of the intervention. Participants were allocated to one of the four conditions by an automated mechanism within the app. All participants started simultaneously with the three-week (21 days) intervention. Each day, the TCA would send the participants one or several short exercises to complete that day (eg, quiz or worksheet, see Appendix 1 for overview of daily exercises) via a push notification. After completing the final survey on day 22 (measuring Working Alliance), participants would receive the debriefing.

Data Analysis
The analyses were preregistered via the Center for Open Science [60]. Intervention effectiveness independent of cue-type was tested using a one-tailed dependent samples t test in which we compared average step count during the baseline week with the final week of the intervention.
For the first research question (Is there an effect of human cue type (visual and relational cues) on adherence?), we predicted that the condition with both visual and relational cues would lead to the highest adherence, followed by the conditions with either visual or relational cues, and the condition with no human cues. To test this, a one-way between-subjects ANOVA with planned contrasts was conducted. Because the differences in mean adherence of each group were different than expected, the planned contrasts we ran were different from those pre-registered. In the first alternative post-hoc analysis we compared the visual & relational cues-and visual cues-conditions with the relational cues-and no cues-conditions. In the second, we compared the relational cues-condition with the no human cues-condition.
For the second research question (Does working alliance mediate the effect of human cue type on adherence?), we predicted that working alliance with the TCA would be highest in the condition with both visual and relational cues, followed by the conditions with either visual or relational cues, and then the condition with no human cues. In turn, we expected that a higher working alliance would lead to a higher adherence. A one-way between-subjects ANOVA was conducted to compare the working alliance with the TCA between the four different conditions. As we did not find significant differences, we did not conduct our preregistered mediation analysis. We did run an additional regression analysis with working alliance as independent and adherence as dependent variable. Statistical analyses were conducted with SPSS (version 26; IBM Corp) using a p-value of .05 as the level of significance.  Table 1).

Intervention Effectiveness
In our preregistration, we proposed to test hypotheses with effectiveness as outcome variable.
Our power calculations identified a minimum sample size of 128 to detect the expected effects of experimental group on effectiveness. However, as we had insufficient cases with both a valid baseline step count and a minimum of 5 days of step count registered in the final week, we did not have enough power to detect this effect. We therefore decided to focus in this paper on adherence as outcome variable, and report the analyses with effectiveness as outcome variable in Appendix 2.
To test whether the intervention was effective (independently of the experimental condition),

Adherence
We found a significant difference in adherence between the conditions, F(3, 117) = 3.901 , P = .011 (see Table 2 for mean and SD per group). By visually inspecting the means, we saw that the differences between groups were not as expected (see Figure 3). The contrast analyses showed that in the relational cues-and no cues-conditions there was a significantly higher adherence than in the other two conditions, t(117) = -3.415, P = .001. However, adherence in the relational cues condition was not higher than in the no human cues condition, t(117)= .458, P = .65. So contrary to what was expected, participants were less adherent to the intervention in the groups in which the TCA used visual cues compared to the groups without visual cues. Furthermore, when the TCA used relational cues, participants were not more adherent than when the TCA used no human cues at all. per experimental condition, with 95% confidence intervals, including the post-hoc contrast between groups with and groups without visual cues.

Working Alliance
There was no significant difference in working alliance between the conditions, F(3, 73) = 1.110 , P = .35 (see Table 2 for mean and SD per group). However, we did find a positive relationship between working alliance and adherence, b = .202, t(75) = 4.136, P < .001. These outcomes indicate that adding human cues did not lead to a better (or worse) reported working alliance with the TCA, but that at the same time, participants who reported a better working alliance were more adherent to the intervention.

DISCUSSION
We investigated if and how a TCA could help increase adherence to a self-help eHealth lifestyle intervention. Regarding our first research question (Is there an effect of human cue type (visual and relational cues) on adherence?), the results of our field experiment showed that, contrary to our expectations, the use of human cues by the TCA did not lead to a higher adherence. In contrast, visual cues even led to a lower adherence. To answer the second research question (Does working alliance mediate the effect of human cue type (visual and relational cues) on adherence?), we found that human cues did not lead to a higher working alliance with the TCA. We did, however, find that a better reported working alliance was related to a better adherence to the intervention.
Our results did not show the positive effects of human cues and visual elements that have been reported in previous studies [30,39,42,43]. On the contrary, we even found a negative effect on intervention adherence for visual cues. One reason for this could be that, before the start of the intervention, we did not tell participants whether they would be coached by a human being or a computer. This lack of transparency might have led to expectations that could not be met by the TCA, which might have led to frustration among users [61]. Many studies however show a positive effect on user perceptions and user behavior of not disclosing the nature of an automated chatbot, or of suggesting that users are interacting with a human being while they are not [62][63][64]. However, Mozafari and colleagues [65] show that the effects of disclosure depend on whether there are errors in the conversation with a chatbot. In their study with a customer-service bot, they found that when the chatbot was not able to solve a customer's issue, the negative responses to these errors could be prevented by disclosing the chatbots true nature beforehand. Although our study concerned a lifestyle intervention, similar effects could have occurred. As visual cues might have wrongly suggested communication with a human being and our CA was not always able to respond fully correctly (as the messages were preprogrammed), correctly informing participants about the nature of the agent could have prevented the negative effects of the errors within the conversations. Furthermore, the avatar we used in the visual cues conditions might have played a role. We intentionally chose a younger and healthy-looking female agent to resemble the psychology student population, and because a young female peer agent is generally preferred in health coaching tasks [66,67]. However, some literature suggest that male agents are preferred as athletic trainer, which might have influenced the results if our participants perceived the TCA to be an athletic coach rather than a health coach [68]. Furthermore, another study shows that non-ideal overweight agents are seen as more trustworthy and lead to higher use intentions [69], which suggests our TCA might have been too slender and healthy looking. Additionally, similarity between CA and target population can have a downside when the agent is perceived as unhelpful [70]. The user's perception of the helpfulness of the agent depends on the goal of the user [71], and as our participants might have participated in the intervention with other goals than increasing physical activity (eg, gaining participant-credits), the similar-looking agent might have been unhelpful to them, leading to a lower adherence. All in all, the negative effect of the visual cues on adherence could therefore have been due to a lack of transparency about the true nature of the CA, and the type of visual cues we applied to our CA.
The absence of an effect for relational cues in our study contradicts previously mentioned studies in which a positive effect was found [42,43]. However, what is important to note is that these studies concern ECAs, while we used a TCA. ECAs generally outperform text-based ones [44,72], which can be explained by an additional range of design characteristics an ECA can make use of [73]. In one study though, there was no difference found between a TCA and an ECA, which the authors argued was due to the lack of incorporating non-verbal communication in the latter one [74]. Finally, we found that people who reported a better working alliance with the CA were more adherent to the intervention. This result is in line with studies about regular face-to-face interventions [24,25], digital therapy or treatment [26,27], and automated digital interventions [30,31,32,33,34]. Nonetheless, we did not find an effect of human cues on the reported working alliance with the CA. This lack might also be due to the fact that TCAs are unable to use non-verbal communication.
Building a relationship is an ambiguous process, which is more difficult to establish in a less rich text-based communication environment [75]. Moreover, in studies that did find an improved working alliance with a CA either the interactions with the agent or the intervention itself were longer compared to those in our study [30,32]. In other studies, although a high working alliance was reported within shorter periods of time, the interactions with the CA followed after introduction by a human healthcare professional [33,34]. It is therefore unclear whether a TCA is less able to build a relationship with the user, or that it requires a longer time or introduction in a face-to-face introduction to do so. So even though our findings do support that working alliance is an important mechanism within interventions with TCAs, how to foster a relationship with a TCA still remains a question.

Limitations and Suggestions for Future Work
We aimed to test the effects of visual and relational cues in general, and whether a combination of visual and relational cues would lead to a higher adherence than visual or relational cues alone. However, this led to a combination of various visual and relational cues that were simultaneously manipulated. Future studies could dismantle these, and compare the effects of individual visual and relational cues with another to test which have the biggest influence on intervention adherence.
Secondly, to mimic human behavior, we intentionally chose to apply subtle human cues to our CA. However, participants might not have processed the messages of the agent elaboratively enough to notice these subtle cues, resulting in a lack of effects. We suggest that future studies investigate whether stronger cues are needed in TCAs compared to ECAs to have similar effects.
Additionally, these studies could investigate whether longer interactions do lead to an improved working alliance, and thus adherence, for interventions with a TCAs that elicit human cues.
Finally, we did not inform our participants beforehand whether they were interacting with a computer or a human being. Therefore, the expectations of people might have varied, which might have affected our results. Future studies could keep these expectations constant by being transparent about the true nature of the automated agent. Another option would be to manipulate the description of the CA to more closely represent a human being or a computer, and ask participants about their expectations towards support by a human being or a computer, to test how these influence the effects of human cues within automated interventions.
Our study was the first to test the effect of human cues within TCA-supported lifestyle interventions. Future studies could investigate the differences in applying human cues between TCAs and ECAs. Our results suggest that the lack of using non-verbal communication limited the capability to successfully apply relational cues. It would be interesting to test this hypothesis, and how to overcome the lack of non-verbal communication within TCAs.

Conclusion
We found that human cues do not improve adherence to TCA-supported interventions, and that visual cues even lead to lower levels of adherence. This is in contrast to the positive results of human cues found in studies with ECAs. In line with previous studies, we did find a positive relationship between working alliance with the TCA and adherence. The results suggest that being transparent about the computer-based nature of a CA and thereby setting the right expectations might be key. Besides, we found that factors that work for ECAs, in this case human cues, possibly cannot be carelessly copied to TCAs. This knowledge could help us gaining further knowledge that help us design better automated interventions in the future, which lead to better working alliances, higher levels of adherence, and in turn a healthier lifestyle for us all.
University) for her assistance in the practical implementation of the study procedure, and during the development of the intervention.

Conflicts of Interest
None declared.

Analyses with effectiveness as outcome variable
Effectiveness was measured through objective step count data retrieved from Apple Health or Google Fit (depending on the smartphone of the participant). We calculated the mean difference between the average baseline step count (measured in the week before the intervention) and the average step count in the final week of the intervention. Participants were included in the analyses if both a valid baseline step count and a minimum of 5 days of step count in the final week were registered.
We conducted an one-way between subjects ANOVA to compare intervention effectiveness between the four different conditions. There was no significant difference in effectiveness between the conditions, F(3, 43) = .726 , P = .54 (see Table 3 for mean and SD per group). Also the post-hoc tests comparing the three human cues conditions with the no human cues condition (t(39) = -1.021, P = .31), and the test comparing the condition with both visual and relational cues with the visual cues only and relational cues only groups (t(39) = .171, P = .87) showed no significant differences.