Ep. 92 | Amazon Polly Overview & Exam Prep | ML | SAA-C03 | AWS Solutions Architect Associate

Chris 0:00
Hey everybody, welcome back for another deep dive. Yeah,

Kelly 0:02
welcome back. Today,

Chris 0:03
we're gonna be focusing on a service that's pretty fascinating, and I know a lot of cloud engineers are interested in it. Absolutely. It's Amazon Polly. It's one

Kelly 0:12
of those services that has just a ton of potential, especially as we move into, you know, this world where everything is becoming more voice activated, right? And

Chris 0:19
people are using their voices more and more. So, yeah, definitely. So let's just jump right into it. Okay, sounds good. What is Amazon Polly in a nutshell? So

Kelly 0:29
Amazon Polly, it's basically a text to speech service, okay, that takes text as input and spits out audio.

Chris 0:36
So I could give it like a sentence or a paragraph, yeah? Exactly

Kelly 0:39
a sentence, a paragraph, an entire document, wow. And it will read it back to you in a voice, in a voice, and you can choose different voices. Oh, that's cool. Different languages, different accents. So

Chris 0:51
this isn't like that, that robotic voice we all remember from like, the early days of text to speech. No, no.

Kelly 0:57
It's come a long way. Okay, the voices are much more natural sounding these days.

Chris 1:01
So what would a cloud engineer actually do with this? How does this fit into like, building applications in the cloud?

Kelly 1:10
Yeah, that's a great question. Yeah, I think there are tons of use cases, but let me give you a couple of examples. So imagine you're building an E learning platform. You've got all these course materials, text documents, presentations, right? With Polly you can instantly turn those into audio so

Chris 1:28
somebody could listen to the content instead of reading it

Kelly 1:30
exactly. You could have Polly read the course materials aloud. Oh, that's cool. That makes the content accessible to people who are visually impaired, right, or people who just prefer to learn by listening, yeah, I actually do a lot of audio learning. Yeah, it's much easier. And another really common use case is in IVR systems. You mean those automated phone menus? Yeah, exactly those things that everybody hates, right? They can be pretty annoying, yeah, instead of having that robotic voice, yeah, you can use Polly to make those prompts sound a lot more

Chris 2:00
natural. Well, that's interesting, so it actually improved the user experience exactly.

Kelly 2:04
It makes it less jarring for the person who's calling in. Okay, cool. And of course, with the rise of voice assistants and smart devices, yeah, Polly is really powerful for giving those devices a

Chris 2:16
voice so it can respond to you, yeah, it can interact

Kelly 2:19
with you in a way that feels more natural, more human. That makes a lot of sense, and these are just a few examples. Okay, there are so many ways you can use this. So

Chris 2:28
now that we have a good idea of what Polly is and why it's useful for cloud engineers, let's dig a little deeper. Okay, let's do it. What are some of the key features that really make this service stand out? Well,

Kelly 2:40
one of the things that I really love about Polly is the variety of voices. Okay. I mean, you have this huge library of voices to

Chris 2:48
choose from, so I'm not stuck with just one voice. No, not at all. Oh, that's great. You can pick different languages. So I could build an app that speaks in Spanish, absolutely French or Japanese

Kelly 2:57
or whatever you need. Wow, that's

Chris 2:59
pretty cool. It's

Kelly 3:00
all about providing that personalized and localized experience. Yeah,

Chris 3:04
that's super important these days for applications, exactly. So we can choose different languages, yeah.

Kelly 3:09
And it goes beyond just language selection, okay, you can also fine tune the pronunciation. How's that work? Well, you can use things like SSML, tags, snml, yeah, it stands for speech synthesis, markup language, okay? And it lets you control things like emphasis, pauses, speech rate and volume.

Chris 3:29
So I can really customize how the speech sounds exactly. You can make it sound exactly the way you want. Wow. The level of control is pretty impressive. Yeah, it is okay. So we've got a huge variety of voices, fine grained control over the pronunciation. Are there any other features that stand out to

Kelly 3:45
you? Yeah, definitely. One thing that gives you a lot of flexibility is the different audio formats that Polly supports. Okay? So it can generate speech in mp three format. Okay, that's a common one, yeah, very common. AUG Vorbis, okay, and even PCM.

Chris 4:00
So why does that matter? Why would I care about the format? Well, it really depends

Kelly 4:04
on your use case. Okay, some formats are better for storage efficiency, right? Others are optimized for streaming quality, I say, and some might be required for compatibility with certain devices. Okay,

Chris 4:15
so having those different options is really helpful. Yeah, it gives you a lot of flexibility. All right, so we've talked about the features. What about the benefits of using Polly from like, a cost and scalability perspective?

Kelly 4:26
So like most AWS services, Polly is designed to be both cost effective and highly scalable. That's good to hear. Yeah. It's a pay as you go service. So I only pay for what I use Exactly. Yeah. So if you have a month where you're not using Polly much, right? Your bill is going to be very small, okay, that's good to know. It's a lot more attractive than hiring voice actors, yeah, that could get really expensive, especially if you need multiple voices or different languages. So Polly

Chris 4:53
is a good way to save money,

Kelly 4:54
definitely, and in terms of scalability, it's built on AWS in. Infrastructure, yeah, you know it can handle a

Chris 5:01
lot, so I don't have to worry about it breaking if my application gets really popular.

Kelly 5:05
No, you can be confident that Polly can scale to meet your needs. Okay,

Chris 5:09
that's reassuring. Yeah, so we've covered the definition, the features, the benefits, yeah, I'm kind of curious now, how does Polly fit into, like, the big picture of AWS? How does it work with other services?

Kelly 5:22
Yeah, that's a great question, and it's important to understand that Polly isn't just a standalone service, okay, it integrates really well with a lot of other AWS services. Give me an example. Okay, so let's say you're using AWS Lambda for serverless computing. Yeah, exactly. You can trigger Polly directly from a Lambda function, interesting to generate speech on the

Chris 5:42
fly, so based on something that happens in my application, yeah, in response

Kelly 5:45
to an event or user interaction or something like that. That's pretty cool. Yeah, it's very powerful. And you can also combine Polly with Amazon S3 Oh, for storage, yeah, to store and retrieve the generated speech files. So

Chris 5:58
I could have, like, a library of pre recorded audio

Kelly 6:01
exactly, and you can manage it all in S3 Oh, that's convenient, yeah. And then there's AWS IoT for Internet of Things devices. Yeah, you can use Polly to give those devices a voice. So my

Chris 6:13
smart refrigerator could actually talk to me. It could tell you when you're out of milk. That's awesome. Yeah, it's pretty cool. Okay, so it sounds like

Kelly 6:19
Polly can be used in a lot of different ways. Yeah, the possibilities are really endless. I'm really starting

Chris 6:24
to see the potential here. Yeah, it's a game changer. But before we move on, I think it's important to acknowledge that no technology is perfect. True. Are there any limitations of using Amazon Polly that we should be aware

Kelly 6:38
of? Yeah, I mean, like any text to speech, engine Polly does have some limitations, okay, like, what? One thing to keep in mind is that while the voices are getting more and more realistic, right, they can sometimes struggle with complex text, okay, or unusual language patterns, so

Chris 6:55
like technical jargon, yeah, or slang, or things like that, okay, so I might need to review the output carefully, yeah, especially

Kelly 7:01
if you're working with content that's a little outside of Polly's comfort zone, right? And Another limitation is that there might be a slight delay

Chris 7:09
in the speech generation, yeah, okay, so it's not truly real time,

Kelly 7:13
it's near real time, but there could be a slight lag, okay,

Chris 7:17
so if I'm building an application where speed is critical, yeah, I need to be aware of that exactly.

Kelly 7:21
And while Polly's voices are constantly evolving, right, they still might not have the nuance and expressiveness of a human voice actor. Yeah, that makes sense. So you know, you need to choose the right voice, okay, carefully craft your text right, get the desired effect. These

Chris 7:39
are all great points to keep in mind. I think we've laid a really solid foundation for understanding Amazon Polly. I agree. So let's shift gears a little bit now and talk about the exam. Okay, let's do it. What kind of questions might you see on a cloud certification exam about Amazon Polly?

Kelly 7:54
All right, let's get into exam mode. All right. So let's say you're taking the exam and you come across a question like this, a company is building a mobile app and they want to use Polly to generate audio feedback for users. Okay, which of the following factors should they consider when choosing the right Polly voice? So

Chris 8:14
this is about selecting the best voice for the app.

Kelly 8:17
Exactly. They need to think about who their target audience is, right? Like, are they targeting a specific age group, yeah, or demographic, and what's the overall tone and style of the app? So, like, is

Chris 8:29
it a serious app, yeah, or is it more playful,

Kelly 8:32
right? Is it formal or informal, okay? And

Chris 8:35
how would those factors influence the choice of voice? Well,

Kelly 8:39
you wouldn't want to use a very formal sounding voice for a casual gaming app that

Chris 8:44
would sound weird. Yeah, and you probably

Kelly 8:46
wouldn't want a voice that sounds like a child. You're building an app for business professionals, so

Chris 8:51
it's about matching the voice to the brand and the audience exactly. Okay, that makes sense. What other types of exam questions might we see?

Kelly 8:59
You might get a question about Polly's integration with other services, something like this. A developer wants to create a system that automatically generates audio versions of news articles as they're published. Okay? Which AWS services could be used to achieve this? Hmm?

Chris 9:17
So we need to think about how to automate this process,

Kelly 9:20
right? And how to connect different services together. Okay, well,

Chris 9:23
I know we talked about how Polly works with Lambda, yeah. So maybe we could use Lambda to trigger the speech generation.

Kelly 9:30
That's a good start. What would trigger the Lambda function? Oh, right.

Chris 9:33
We need a way to know when a new article is published Exactly. So maybe we could use SNS. SNS, yeah, simple notification service, Oh, I see.

Kelly 9:41
So the news website could publish a message to SNS right

Chris 9:46
whenever a new article is ready, okay, and then that message would trigger the Lambda function, which would then call the Polly API, yeah, to generate the audio version of the article. Exactly.

Kelly 9:55
This would be cool, yeah. It's a great example of how you can use different AWS services. Together build a powerful solution, yeah, and it shows that you understand how Polly fits into the bigger picture.

Chris 10:06
Okay, I like it. What else might they ask us about Polly?

Kelly 10:10
They might give you a scenario where a company is having trouble with the quality of the speech output, okay? And they ask you what steps the company can take to improve it. Hmm. So

Chris 10:21
this is about troubleshooting. Yeah, okay. Well, one thing that comes to mind is checking the SSML tags. Good point. Maybe they're using the wrong tags or not using them effectively Exactly.

Kelly 10:31
SSML gives you a lot of control right over the pronunciation and the pro city, so

Chris 10:37
if the speech sounds unnatural, yeah, it could be an SSML issue, and they

Kelly 10:41
could also try experimenting with different voices. Okay, some voices are better suited for certain types of content, like we talked about earlier, yeah, and sometimes just switching to a different voice can make a big difference.

Chris 10:53
And what if the problem is with the text itself?

Kelly 10:56
That's another possibility, yeah, me, the text is too complex, okay, or it contains a lot of jargon or abbreviations, so Polly's having trouble understanding it right. In those cases, they might need to simplify the text, okay?

Chris 11:08
Or provide Polly with some additional guidance, like a

Kelly 11:12
custom pronunciation lexicon, exactly, so Polly knows how to pronounce those unusual words. Yeah.

Chris 11:17
It's all about making sure that the input is something that Polly can handle. Okay,

Kelly 11:21
that makes sense. Are there any other general exam tips that you can share?

Chris 11:26
Yeah, I think one of the most important things is to focus on understanding the use cases. Okay. Like, when would you use Polly? Yeah, versus some other service, right?

Kelly 11:36
The right tool for the job, exactly.

Chris 11:38
And don't get bogged down and trying to memorize every single detail,

Kelly 11:42
like all the different voices and formats. Yeah, you don't need to know all of that. Focus on the big picture, the concepts, yeah, the why and the how I like it. This has been really helpful. I'm glad to hear that I feel much more prepared to tackle those exam questions. Now, that's the goal, all right. So we've spent a good amount of time now really digging into Amazon. Polly,

Chris 12:03
yeah, we covered a lot six to those tricky exam questions. Hopefully

Kelly 12:07
you're feeling a lot more confident about Polly now, yeah, definitely more prepared. But before we let you go, yeah, I always like to end these deep dives with something to think about. Okay, let's get philosophical. So we've talked a lot about the technical side of Polly, yeah, all the nuts and bolts. But as this technology keeps getting better, and the

Chris 12:26
voices are becoming so realistic, it really makes you wonder about the bigger picture.

Kelly 12:30
What are the implications of this?

Chris 12:32
Yeah, I mean, are we headed toward the world where we can't even tell the difference

Kelly 12:34
between a real voice and a synthetic voice? It's a little bit mind blowing, it is, and I think it raises some important ethical questions too,

Chris 12:43
yeah, how do we use this technology responsibly? Right? What are the potential downsides? It's something we need to think carefully about,

Kelly 12:49
absolutely so to everyone listening out there, keep

Chris 12:53
those questions in mind as

Kelly 12:55
you explore Amazon Polly, experiment with it. Think about all the possibilities and the challenges, because this technology is gonna keep evolving, and it's gonna have a huge impact on the future. Well, that's all the time we have for today. Yeah, thanks for joining us.

Chris 13:09
We hope you learned a lot about Amazon Polly, and we'll

Kelly 13:12
see you next time for another deep dive.

Ep. 92 | Amazon Polly Overview & Exam Prep | ML | SAA-C03 | AWS Solutions Architect Associate
Broadcast by