Designing and launching a voice UI for Capital One

Illustration of a Capital One customer asking a question to Alexa about their account.

I worked with Capital One’s Director of Content Strategy and a product manager to design the home and auto loan voice experience on Alexa. We launched these 2 new features in July 2016.

As a content designer on the project, I wrote the conversations that Capital One customers could have with Alexa about their home and auto loans. I also wrote copy for Capital One’s product landing page and the Skill card on Amazon Alexa mobile app before the launch. 

Home loan voice features:

  • “When is my next mortgage payment?”
  • “How much do I have to pay for my next mortgage payment?”
  • “Pay this month’s mortgage.”
  • “What’s my principal balance for home loan?”

Auto loan voice features:

  • “How much until my car is paid off?”
  • “When’s my next car payment due?”
  • “How much is my next car payment?”
  • “Make my car payment for [month].”
  • “What’s the principal balance on my car loan?”

Read more about the launch on Digital Trends, Bloomberg, CNET, Business Insider, and PYMNTS.

PROJECT GOAL

Reach feature parity with other lines of businesses (consumer bank & credit card)

The features were part of the overall effort to help customers manage their money on their terms – anytime and anywhere. We were tasked with creating an MVP experience for helping customers access their home and auto loan accounts.

Some of the content was already written by the Director of Content Strategy when I started the project so my role was to flesh out the experience and getting it ready for launch.

Timeline: 8 weeks

Key Considerations

Designing for the right expectations

To continue building trust with people in a new channel, we needed to design with the system’s capabilities and limits in mind. We wanted our Skill to appear smart without exceeding the system’s ability to be smart.

Reducing cognitive load

People brought their own preconceived ideas about how to interact with our Skill. Using words they would use was key to creating good experiences, along with paying attention to voice inflection and cadence.

Building hypotheses into responses

We wanted to make smart assumptions but not the wrong ones. We treated content as hypotheses to answer the question, “Do people even want to do/hear this?” Since Amazon didn’t share platform data with Capital One, the product team could only make inferences about users’ questions and language based on which intents were invoked.

Limiting personality

Personality differentiated an experience but it shouldn’t get in the way. For the MVP, we kept the personality to a minimum but with plans to explore appropriate places to injected it.

APPROACH

Creating shared expectations between product and design teams

Every content strategy project sits on a foundation called Content Pillars which guides the design of every piece of product content we put out, including voice UI. Both teams used them to evaluate the quality of a design.

  1. Use case specific
  2. Contextually relevant
  3. Natural language

Both teams agreed on a set of design principles, but they were mostly for us.

  • Design to answer their questions first; ask clarifying questions second.
  • Push the limits of what’s possible by talking like a normal person. Don’t lead with traditional bank syntax.
  • Design for specificity, not for scale. Scale what works; we can’t get there by being generic.
  • Alexa isn’t sarcastic! If statements could end with “…, asshole” – rewrite it.

Getting to know the technology and its constraints

To communicate more effectively with my product and engineering partners, I did desk research to understand how Alexa worked as well as the technology stack that we would be using.

An illustration of high-level system architecture to show how Amazon's Alexa responds to a customer's question about their account.

Customers don’t care about the complexities behind their conversation with Alexa, but we do. I mapped out a customer’s end-to-end interaction with our voice features as a blueprint to consider the entire product experience.

An interaction model of a generic conversational structure for Capital One's Alexa skill.

Satisfying user intents: design around use cases that bring differentiated value

Align with user motivations to achieve business goals

Conversation design uses informed imagination–to imagine all the (likely) reasons why someone might interact with your product in the first place about a topic. What brings users here? What need are we solving for?

The Jobs To Be Done framework distilled our users’ situations, motivations, and outcomes into job stories–backed by call center log analyses.

Jobs-to-be-done statements to categorize different customer cognitive states.
The conversation between Alexa and a customer would change based on the person’s awareness of their own finances and the likelihood of filling all required slots.

Architect conversations by asking “what if” questions

With the MVP scope set, I mindmapped possible conversation paths and stress cases we should include in our voice UX. Job stories helped me ask questions that our users expected Alexa to know.

Mind map to visualize potential use cases for auto loans, in the form of "what if" questions.
Example of one use case I mindmapped.

I worked with the Director of Content Strategy and my product manager to decide on which conversation paths were worth designing for.

Home loans and auto loans conversation architecture, including the use cases our team was designing for.
Since asking about payment due date and amount signaled a higher intent for paying a bill, the conversation for those use cases flowed into the pay bill use case, whereas other use cases were one-off conversations.

Find natural points of divergence during conversational UI prototyping

Every conversation path is a hypothesis. The If-Then conversation structure lets users explore where they want to go next while encouraging us to consider the trade-offs between value and effort (to personalize).

Voice UX design process starts with writing conversations that lets customers explore different conversation paths. Conversation is written out in an if-then format.

To do this, I simulated real conversations with another designer. The Director of Content Strategy and I used pair writing to explore possible flows. Writing and editing together helped us keep the interaction honest and useful. Pair writing also enabled us to create our list of hypotheses to test later.

Meeting users where they are: design conversations that recognize user context

Test a design’s “fitness” in a specific context

Each context changes the expectations that people bring to an interaction. Expectation shapes experience. To get it right the first time, we needed to consider the different factors that could influence people’s experience with the skill.

Mindmapping helped me visualize the pressures our customers might be under when they interact with our Skill. (See a pattern?)

Mind map of different categories of contextual factors that might affect customer interactions with Capital One's Alexa skill.
I explored factors that could influence our designs within these contexts: environmental, personal, technological, social and cultural, temporal, and business. The contexts that our team had the most control over were personal and temporal.

Break conversational UI into nouns and verbs

How does context shape voice UX? One way is tone of voice. But more fundamentally, context shapes usability. Sequencing nouns and verbs into logical order helps reduce cognitive load.

Example voice UI for asking about the amount and due date for this month's auto loan payment.

Take the above conversation. Consider the noun order in Alexa’s response when a user asks:

[Customer] Alexa, what’s my car payment for this month?

Example voice UI revision for making car payments. The change reduced the cognitive load of Alexa's response.
The first sentence (top) is more usable than the second sentence (bottom).

The data that the customer is interested in is $1,231. The cognitive load would be lessened if it were put at the end of a sentence rather than at the beginning.

Putting the $ amount at the end also makes the yes/no prompt “Would you like to pay it now?” less ambiguous as to what “it” refers to.

Rearrange multiple sets of nouns and verbs to help users remember

More complicated voice UX benefits from this object-oriented view. I switched the order of the $ amount with the explanation of principal balance for customers with multiple auto and home loans.

Differentiating voice UI for separate use cases for customers who have both home and auto loans through Capital One.
The third conversation (bottom) flips the order of balance amount with the explanation of principal balance to keep the amount towards the end of the sentence as much as possible. This helps users remember the amount–the most important piece of info in the UI.

Balance concision and clarity

Although nouns and verbs clarified user inputs (slots), we needed to compensate for users’ language within a specific context. For example, the word balance could mean your:

  • credit card balance
  • checking/savings account balance
  • home/auto loan balance

I clarified balance to mean home loan principal balance. The trade-off is less concision, but the payoff is in the clarity it gained.

We created controlled vocabularies to define preferred and variant terms so Alexa would be able to recognize synonyms customers used.

Talking the way customers talk: refine voice UI to sound like an in-person conversation

Play test conversations to engage design partners (and build relationships!)

I role played conversations with my collaborators to iterate my design in real-time. Together, we:

  • identified points of anxiety
  • replaced bank jargon
  • gut checked for “would customers really say this in real life?”

Use play testing to embed hypotheses into the conversational UI itself

Since we won’t know which conversation paths customers will take – and which ones were worth spending more time designing for, we wanted customers tell us what they want to do next. This lowered the risk of designing something that won’t make an impact.

For example, if customers missed their payments for more than 1 month, Alexa says:

[Alexa] Unfortunately, your amount is past due. I’m unable to share the due dates at this time. Please go online to get the information. 

Hypotheses:

  1. People won’t need any more information beyond “Please go online to get the information.”
  2. This scenario didn’t happen enough to warrant a new use case.

We can use quantitative data after launch to figure out how many people actually need to go through that conversation path. The trick is to keep customers talking – if they stop talking, we stop learning what they want to do next.

Use play testing to identify new scenarios

For example, I identified missed payments for home loan as a potential edge case to design for. But how many customers actually face that scenario? Or even use Alexa for that scenario?

Examples voice UI for mortgage due date use case.

I designed a prompt to act as a signal for our team to prioritize that conversation if enough users activate that path. Embedding hypotheses into the conversational UI prevented me from over-designing features that create more work for developers.

Test voice UI in-situ to adjust inflection and pacing

What sounds good on paper doesn’t always translate well to Alexa verbalizing your script. I ran every piece of the conversation through the voice simulator in Alexa’s Skills Kit to identify awkward voice inflections and jumbled words that could affect interpretation and parsing.

Voice Simulator on Amazon's Alexa Skills Kit used for functional testing.

To add natural pauses, I used the Speech Synthesis Markup Language (SSML) to test where to add comma splices. I also replaced words that provided the right inflection. (Does this sound like a statement or a question?)

Stress testing voice UI: gather feedback from potential users

Conduct usability testing for voice design

We tested the usability of voice UI with both Capital One and non-Capital One customers. We used testing to:

  • Assess meaning of words
  • Identify conversation flows that didn’t make sense 
  • Discover new utterances (new ways of asking Alexa for something)
  • Empathize with how people felt while using our Skill

For example, at the end of the Pay Bill use case, Alexa says,

[Alexa] And keep in mind, you can cancel it online later.

This phrase exists in the UI to address people’s fear of making a mistake. Participants reacted positively and we validated that the language eased people’s anxiety around paying a bill through Alexa.

Meet Amazon’s voice service standards

I worked with our AI engineering team to implement recommendations from Amazon’s Skill Certification Team. For example, I made the confirmation language more specific for the home loan Pay Bill use case.

Revised voice UI after reviewing the design with Amazon's Skill Certification Team.

Ensure new voice UI features play nice with the existing design

I adjusted the parts of voice UI (intents) serving other lines of businesses so the auto and home loan features didn’t negatively impact those experiences.

For example, in the Recent Transactions use case, listing additional accounts (i.e. auto and/or home loan) could overwhelm users. We added a selection step so people would only hear about relevant accounts.

[Alexa] Ok, you have the following accounts. Home loan, auto loan, checking, and credit card. Please say the account type you’d like the transactions for.

LESSONS LEARNED

Design for learning

Instead of making a big assumption, let people reveal the next steps and the new use cases. People satisfice in different ways, and what “nailing it” means is different for everyone. Ultimately we are modeling people, not systems; we use the system to refine our understanding about the people we’re serving.

What natural language means will differ with different kinds of people

Design at an atomic level so we don’t use generic, meaningless (or wrong) language that will work for a system but not for the person. Every single sentence or word should be a hypothesis we could test to see what “natural” means to people.

Don’t be limited by the technology or the system in place

Challenge convention: in moments of truth, ask “why shouldn’t we be able to do (this)?” We’re here to serve people so necessarily technology should work around people, not limit them. You’re not annoying the engineering team (maybe a little bit); you’re being an advocate for customers you’re designing for.

More Projects