Designing and launching a voice UI for Capital One

Illustration of a Capital One customer asking a question to Alexa about their account.

I worked with the Director of Content Strategy and Senior Product Manager to design the home and auto loan voice experiences on Amazon’s Alexa. We launched these two new features in July 2016.

As a content designer, I wrote the conversations that Capital One customers could have with Alexa about their home and auto loans. Before the launch, I also wrote the copy for Capital One’s Alexa Skill landing page and the Skill card on Amazon’s Alexa mobile app. 

Home loan voice features:

  • “When is my next mortgage payment?”
  • “How much do I have to pay for my next mortgage payment?”
  • “Pay this month’s mortgage.”
  • “What’s my principal balance for home loan?”

Auto loan voice features:

  • “How much until my car is paid off?”
  • “When’s my next car payment due?”
  • “How much is my next car payment?”
  • “Make my car payment for [month].”
  • “What’s the principal balance on my car loan?”

Read more about the launch on Digital Trends, Bloomberg, CNET, Business Insider, and PYMNTS.

Business Case

To provide a consistent end-to-end digital experience, our business partners wanted to reach feature parity with other lines of businesses (consumer bank & credit card). This was part of an overall company initiative to help customers manage their money on their own terms–anytime and anywhere.

Project Goal

Our team was tasked with creating an MVP experience for our virtual assistant on Amazon’s Alexa-enabled devices. The goal was to help Capital One’s customers access their home and auto loan accounts and make voice-activated payments.

The Director of Content Strategy had already assembled content patterns from voice experiences for credit and debit card accounts. My role was to build on the content, refine it, and prepare for roll out to all customers.

Timeline: 8 weeks


Establish design principles

To go from zero to one, our team set expectations around how to evaluate experience quality. The Director of Content Strategy led these meetings to articulate a set of design principles that both the product and design team can follow.

  • Design to answer their questions first; ask clarifying questions second.
  • Push the limits of what’s possible by talking like a normal person. Don’t lead with traditional bank syntax.
  • Design for specificity, not for scale. Scale what works; we can’t get there by being generic.
  • Alexa isn’t sarcastic! If statements could end with “…, [expletive]” – rewrite it.

Design with technology constraints in mind

Customers don’t care about how technology works behind the scenes, but we do. Since I hadn’t designed a voice UI before, I got familiar with the interaction framework so I can design end-to-end experiences and spot potential friction points.

An interaction model of a generic conversational structure for Capital One's Alexa skill.

One major challenge was managing the cognitive load from using voice-activated technology itself. Customers didn’t necessarily remember the right syntax to use with Alexa or even what they can ask about their Capital One account. As a result, customers couldn’t experience the full value of our features if they don’t use the Alexa Skill at all.

Articulate the central tension between customer and business problem

Customers might ask a question in many different ways and expect Alexa to understand their intent. When Alexa doesn’t, it lowers its utility, which could lead to lower engagement. On the other hand, we want Alexa’s response to be as useful as possible without overwhelming the customer so they ask follow-up questions.

Customer Problem:

  • Don’t know what they can ask Alexa about their account
  • Hard to remember and interpret information output from Alexa

Business Problem:

  • Encourage timely payments
  • Build trust and engagement with customers through a new channel

Key Considerations:

  • Appear smart without exceeding the system’s ability to be smart
  • Pay attention to word choice as well as voice cadence and inflection
  • Ask: “Do people even want to do/hear this?”

Empathize with customers by asking “what if” questions

Before writing a single word, I needed to understand our customers’ various intents so that Alexa can invoke the right conversational experience that solved their problem. To “nail” the conversation, Alexa also needed to recognize varying syntaxes of customer queries. I devised 4 categories to capture customers’ awareness of what they could ask.

  • Unaware: customer is clueless about what to ask
  • Aware complete: customer provides all necessary information for Alexa to fetch a correct response (perfect syntax/happy path)
  • Aware incomplete: customer doesn’t remember parts of the necessary syntax or command (i.e. forgot they needed to provide the last 4 digits of their account number)
  • Unaware incomplete: customer is unaware of erroneous information provided (i.e. the last 4 digits of their account number was wrong)

I asked series of “what if” questions to mindmap conversation topics that customers may expect Alexa to engage in. While not all of them were feasible, technology should work around people, not limit them. By asking “why shouldn’t we be able to do (this)?”, I challenged my own perceived limits of technology and remain customer-centered.

Mind map to visualize potential use cases for auto loans, in the form of "what if" questions.

While asking “what if” questions, I also imagined scenarios where Alexa could make smart assumptions. It could use what it knows about the customer’s context to fill in required blank spaces in a customer query–called “slots” in Amazon’s terminology–to reduce cognitive load.

For example, the conversational experience should be different for customers with only 1 account. Alexa could default to that single account instead of needing the customer to remember the last 4 digits of their account number during the interaction.

The mindmap revealed opportunities to make smart assumptions and functioned a tool to group related customer concerns. I worked with the Director of Content Strategy and my product manager to draw boundaries around the main use cases and evaluate feasibility of smart defaults.

Home loans and auto loans conversation architecture, including the use cases our team was designing for.

One main design goal was helping customers pay their bill on time. Since asking about payment due date and amount signaled a higher intent for paying a bill, we made the decision to fold these use cases into the pay bill use case. Mapping the conversation flow before designing helped ensure we were solving the right customer problem while fulfilling business goals.

Prototype fast and cheaply with pair writing

Words are the cheapest material to prototype with. I designed the voice UI with various stakeholders to iterate it in real-time. We role played conversations that customers might have with Alexa (aka playtesting) to see if the interaction was both natural and valuable.

Pair writing helped me build better relationships with my stakeholders, and it opened up the design process for collaboration. We used several techniques to refine the design.

Technique 1: Treat every conversation path as a hypothesis

Since we didn’t have any analytics data to inform which use case customers will activate, we treated each conversation path as a hypothesis. The trick is to keep customers talking–if they stop talking, we stop learning what they want to do next.

I started designing the conversation using an If-Then structure to isolate the hypothesis at each node in the flow. This method reduced the risk of over-designing an interaction that creates more work for developers but makes little impact on achieving customer and business goals.

Voice UX design process starts with writing conversations that lets customers explore different conversation paths. Conversation is written out in an if-then format.

The If-Then structure clearly laid out the information architecture of the voice experience that we could later test (similar to tree testing). For example, if a customer missed their payment for more than 1 month, the experience is:

[Alexa] Unfortunately, your amount is past due. I’m unable to share the due dates at this time. Please go online to get the information. 

We formed 2 hypotheses:

  1. People won’t need any more information beyond “Please go online to get the information.”
  2. This scenario didn’t happen enough to warrant a new use case.

Each hypothesis acted as a signal for validating a user need. If enough customers activated a conversation path after launch, we could build out the experience for the next release.

Technique 2: Break conversational UI into nouns and verbs

To design each UI string more intentionally, I separated it into its constituent parts–nouns and verbs. We don’t want to use generic, meaningless, or “unnatural” language that will work for the system but not for a human. To optimize for voice content usability, I explored word choices to test later, since what feels “natural” is different for everyone.

For example, I created 3 different ways to express the date value so we could test the language more precisely with customers to optimize for understandability.

[Customer] What’s the due date on my mortgage?

[Alexa] Your payment is due February 4th | Tomorrow at 5PM Eastern | Tomorrow, May 20th. Would you like to pay it now?

I also explored sequencing nouns and verbs into different orders to ease strain on human memory. If customers have to ask Alexa to repeat information all the time, it would not be a usable experience.

[Customer] Alexa, what’s my car payment for this month?

Option 1: [Alexa] Your total amount due on Thursday, March 16th, is $1,231. Would you like to pay it now?

Option 2: [Alexa] Your total amount due is $1,231, and it’s due on Thursday, March 16th. Would you like to pay it now?

[Customer] Yes

[Alexa] Great, I’m ready to transfer your payment of $1,231 from [paymentAccount]. Is this the account you want me to use?

Option 1 is more usable than Option 2. The data value that the customer is interested in is $1,231. Putting the dollar amount at the end of the sentence helps customers remember the amount–the most important piece of information in the voice UI. This also makes it clear that the question prompt is referring to the total amount due.

I applied this strategy to personalize conversational experiences for customers in different scenarios.

Differentiating voice UI for separate use cases for customers who have both home and auto loans through Capital One.

The third use case flips the order of the balance amount with the explanation of principal balance. This kept the balance amount towards the end of the sentence as much as possible.

Technique 3: Test voice inflection and pacing

I “wireframed” the voice UI by running the conversation script through the voice simulator in Alexa’s Skills Kit. I identified awkward voice inflections and jumbled words that could affect voice usability and understandability.

Voice Simulator on Amazon's Alexa Skills Kit used for functional testing.

For example, to slow down Alexa’s speech, I used Amazon’s Speech Synthesis Markup Language (SSML) to test where to add comma splices in the script so Alexa’s enunciation was clearer. To ensure Alexa’s voice inflection didn’t confuse customers, I replaced words that sounded good on paper but didn’t translate well when Alexa spoke them out loud (i.e. does this sound like a statement or a question?).

Conduct usability testing with users

I contributed to the research plan that the product manager created. We tested the usability of voice UI with both Capital One and non-Capital One customers. We used testing to:

  • Assess meaning of words
  • Identify conversation flows that didn’t make sense 
  • Discover new utterances (new ways of asking Alexa for something)
  • Empathize with how people felt while interacting our Skill

For example, at the end of the Pay Bill use case, Alexa says,

[Alexa] And keep in mind, you can cancel it online later.

Participants reacted positively, and we validated that the language eased people’s anxiety around paying a bill through a new channel.

I also worked with our AI engineering team to implement recommendations from Amazon’s Skill Certification Team. The Skill Certification Team ensured all Skills met Amazon’s voice service standards. For example, I made the confirmation language more specific for the home loan Pay Bill use case.


[Alexa] Ok, all set. I’ve made that payment for you, and you can see your confirmation code by signing in to your Capital One account online.


[Alexa] Ok, all set. I’ve made that mortgage payment for you, and you can see your confirmation code by signing in to your Capital One account online.

I ensured that the new voice UI features didn’t negatively impact other lines of businesses. For example, in the Recent Transactions use case, listing additional accounts (i.e. auto and/or home loan) could overwhelm customers. I added a selection step so people would only hear about relevant accounts.

[Alexa] Ok, you have the following accounts. Home loan, auto loan, checking, and credit card. Please say the account type you’d like the transactions for.

Lessons Learned

Design for learning: People satisfice in different ways. “Nailing it” means something different for everyone. Ultimately we are modeling people, not systems. We use the system to refine our understanding about the people we’re serving.

Practice object-oriented thinking: In voice UI, literally every word is data. Treat every sentence or word as a hypothesis to test.

Challenge convention: Don’t be limited by the technology or systems in place. Technology should work around people, not limit them.

More Projects