I worked with Capital One’s Director of Content Strategy and a product manager to design the home and auto loan voice experience on Alexa. We launched these 2 new features in July 2016.
As a content designer on the project, I wrote the conversations that Capital One customers could have with Alexa about their home and auto loans. I also wrote copy for Capital One’s product landing page and the Skill card on Amazon Alexa mobile app before the launch.
Home loan voice features:
- “When is my next mortgage payment?”
- “How much do I have to pay for my next mortgage payment?”
- “Pay this month’s mortgage.”
- “What’s my principal balance for home loan?”
Auto loan voice features:
- “How much until my car is paid off?”
- “When’s my next car payment due?”
- “How much is my next car payment?”
- “Make my car payment for [month].”
- “What’s the principal balance on my car loan?”
Reach feature parity with other lines of businesses (consumer bank & credit card)
The features were part of the overall effort to help customers manage their money on their terms – anytime and anywhere. We were tasked with creating an MVP experience for helping customers access their home and auto loan accounts.
Some of the content was already written by the Director of Content Strategy when I started the project so my role was to flesh out the experience and getting it ready for launch.
Timeline: 8 weeks
Designing for the right expectations
To continue building trust with people in a new channel, we needed to design with the system’s capabilities and limits in mind. We wanted our Skill to appear smart without exceeding the system’s ability to be smart.
Reducing cognitive load
People brought their own preconceived ideas about how to interact with our Skill. Using words they would use was key to creating good experiences, along with paying attention to voice inflection and cadence.
Building hypotheses into responses
We wanted to make smart assumptions but not the wrong ones. We treated content as hypotheses to answer the question, “Do people even want to do/hear this?” Since Amazon didn’t share platform data with Capital One, the product team could only make inferences about users’ questions and language based on which intents were invoked.
Personality differentiated an experience but it shouldn’t get in the way. For the MVP, we kept the personality to a minimum but with plans to explore appropriate places to injected it.
Creating shared expectations between product and design teams
Every content strategy project sits on a foundation called Content Pillars which guides the design of every piece of product content we put out, including voice UI. Both teams used them to evaluate the quality of a design.
- Use case specific
- Contextually relevant
- Natural language
Both teams agreed on a set of design principles, but they were mostly for us.
- Design to answer their questions first; ask clarifying questions second.
- Push the limits of what’s possible by talking like a normal person. Don’t lead with traditional bank syntax.
- Design for specificity, not for scale. Scale what works; we can’t get there by being generic.
- Alexa isn’t sarcastic! If statements could end with “…, asshole” – rewrite it.
Getting to know the technology and its constraints
To communicate more effectively with my product and engineering partners, I did desk research to understand how Alexa worked as well as the technology stack that we would be using.
Customers don’t care about the complexities behind their conversation with Alexa, but we do. I mapped out a customer’s end-to-end interaction with our voice features as a blueprint to consider the entire product experience.
Satisfying user intents: design around use cases that bring differentiated value
Align with user motivations to achieve business goals
Conversation design uses informed imagination–to imagine all the (likely) reasons why someone might interact with your product in the first place about a topic. What brings users here? What need are we solving for?
The Jobs To Be Done framework distilled our users’ situations, motivations, and outcomes into job stories–backed by call center log analyses.
Architect conversations by asking “what if” questions
With the MVP scope set, I mindmapped possible conversation paths and stress cases we should include in our voice UX. Job stories helped me ask questions that our users expected Alexa to know.
I worked with the Director of Content Strategy and my product manager to decide on which conversation paths were worth designing for.
Find natural points of divergence during conversational UI prototyping
Every conversation path is a hypothesis. The If-Then conversation structure lets users explore where they want to go next while encouraging us to consider the trade-offs between value and effort (to personalize).
To do this, I simulated real conversations with another designer. The Director of Content Strategy and I used pair writing to explore possible flows. Writing and editing together helped us keep the interaction honest and useful. Pair writing also enabled us to create our list of hypotheses to test later.
Meeting users where they are: design conversations that recognize user context
Test a design’s “fitness” in a specific context
Each context changes the expectations that people bring to an interaction. Expectation shapes experience. To get it right the first time, we needed to consider the different factors that could influence people’s experience with the skill.
Mindmapping helped me visualize the pressures our customers might be under when they interact with our Skill. (See a pattern?)
Break conversational UI into nouns and verbs
How does context shape voice UX? One way is tone of voice. But more fundamentally, context shapes usability. Sequencing nouns and verbs into logical order helps reduce cognitive load.
Take the above conversation. Consider the noun order in Alexa’s response when a user asks:
[Customer] Alexa, what’s my car payment for this month?
The data that the customer is interested in is $1,231. The cognitive load would be lessened if it were put at the end of a sentence rather than at the beginning.
Putting the $ amount at the end also makes the yes/no prompt “Would you like to pay it now?” less ambiguous as to what “it” refers to.
Rearrange multiple sets of nouns and verbs to help users remember
More complicated voice UX benefits from this object-oriented view. I switched the order of the $ amount with the explanation of principal balance for customers with multiple auto and home loans.
Balance concision and clarity
Although nouns and verbs clarified user inputs (slots), we needed to compensate for users’ language within a specific context. For example, the word balance could mean your:
- credit card balance
- checking/savings account balance
- home/auto loan balance
I clarified balance to mean home loan principal balance. The trade-off is less concision, but the payoff is in the clarity it gained.
We created controlled vocabularies to define preferred and variant terms so Alexa would be able to recognize synonyms customers used.
Talking the way customers talk: refine voice UI to sound like an in-person conversation
Play test conversations to engage design partners (and build relationships!)
I role played conversations with my collaborators to iterate my design in real-time. Together, we:
- identified points of anxiety
- replaced bank jargon
- gut checked for “would customers really say this in real life?”
Use play testing to embed hypotheses into the conversational UI itself
Since we won’t know which conversation paths customers will take – and which ones were worth spending more time designing for, we wanted customers tell us what they want to do next. This lowered the risk of designing something that won’t make an impact.
For example, if customers missed their payments for more than 1 month, Alexa says:
[Alexa] Unfortunately, your amount is past due. I’m unable to share the due dates at this time. Please go online to get the information.
- People won’t need any more information beyond “Please go online to get the information.”
- This scenario didn’t happen enough to warrant a new use case.
We can use quantitative data after launch to figure out how many people actually need to go through that conversation path. The trick is to keep customers talking – if they stop talking, we stop learning what they want to do next.
Use play testing to identify new scenarios
For example, I identified missed payments for home loan as a potential edge case to design for. But how many customers actually face that scenario? Or even use Alexa for that scenario?
I designed a prompt to act as a signal for our team to prioritize that conversation if enough users activate that path. Embedding hypotheses into the conversational UI prevented me from over-designing features that create more work for developers.
Test voice UI in-situ to adjust inflection and pacing
What sounds good on paper doesn’t always translate well to Alexa verbalizing your script. I ran every piece of the conversation through the voice simulator in Alexa’s Skills Kit to identify awkward voice inflections and jumbled words that could affect interpretation and parsing.
To add natural pauses, I used the Speech Synthesis Markup Language (SSML) to test where to add comma splices. I also replaced words that provided the right inflection. (Does this sound like a statement or a question?)
Stress testing voice UI: gather feedback from potential users
Conduct usability testing for voice design
We tested the usability of voice UI with both Capital One and non-Capital One customers. We used testing to:
- Assess meaning of words
- Identify conversation flows that didn’t make sense
- Discover new utterances (new ways of asking Alexa for something)
- Empathize with how people felt while using our Skill
For example, at the end of the Pay Bill use case, Alexa says,
[Alexa] And keep in mind, you can cancel it online later.
This phrase exists in the UI to address people’s fear of making a mistake. Participants reacted positively and we validated that the language eased people’s anxiety around paying a bill through Alexa.
Meet Amazon’s voice service standards
I worked with our AI engineering team to implement recommendations from Amazon’s Skill Certification Team. For example, I made the confirmation language more specific for the home loan Pay Bill use case.
Ensure new voice UI features play nice with the existing design
I adjusted the parts of voice UI (intents) serving other lines of businesses so the auto and home loan features didn’t negatively impact those experiences.
For example, in the Recent Transactions use case, listing additional accounts (i.e. auto and/or home loan) could overwhelm users. We added a selection step so people would only hear about relevant accounts.
[Alexa] Ok, you have the following accounts. Home loan, auto loan, checking, and credit card. Please say the account type you’d like the transactions for.
Design for learning
Instead of making a big assumption, let people reveal the next steps and the new use cases. People satisfice in different ways, and what “nailing it” means is different for everyone. Ultimately we are modeling people, not systems; we use the system to refine our understanding about the people we’re serving.
What natural language means will differ with different kinds of people
Design at an atomic level so we don’t use generic, meaningless (or wrong) language that will work for a system but not for the person. Every single sentence or word should be a hypothesis we could test to see what “natural” means to people.
Don’t be limited by the technology or the system in place
Challenge convention: in moments of truth, ask “why shouldn’t we be able to do (this)?” We’re here to serve people so necessarily technology should work around people, not limit them. You’re not annoying the engineering team (maybe a little bit); you’re being an advocate for customers you’re designing for.