Ep. 131 Bonus Ep. 2 | VPC Masterclass: Networking Demystified for the Solutions Architect Exam
Kelly 0:00
If you're building anything serious in the cloud, you know this challenge keeping up with AWS services, especially networking. It moves so fast, it's like this invisible backbone holding everything together. But honestly, it's often where things get tricky, where you can get tripped up. So today, we're gonna try and cut through some of that complexity. Think of this as a focus, deep dive, just for you the mid level cloud engineer listening. We're hitting Amazon, virtual private cloud, VPC, networking essentials, but not just definitions. We want to see how these things actually work in the real
Chris 0:35
world. Exactly. The goal here isn't just to rehash the docs. It's about giving you a shortcut, maybe some surprising facts, those crucial distinctions you need. We want you to feel confident designing, troubleshooting, you know, optimising these cloud networks. It's about really understanding how it all fits together.
Kelly 0:49
Okay, so what's the plan? How are we tackling this? We'll start with the core, VPC bits, the building blocks. Then we'll get more hands on looking at security. Hybrid networking architectures, critical stuff. And finally, we'll wrap up with some exam focus points, because understanding this deeply that's key for certs and for actually building solid systems. All right, let's dive in AWS networking. It all starts with your VPC, your virtual private cloud, this idea of your own virtual data centre. What does that actually mean day to day?
Chris 1:25
Well, the fundamental thing, and it sounds basic, but is powerful, is that resources inside your VPC, they are private by default, completely isolated. It's not about making them private. It's about you controlling how, or even if they talk to the outside world. You have total network control from the start that inherent
Kelly 1:39
privacy. Yeah, that's a huge starting point. So building on that foundation, on that foundation, let's get into the nuts and bolts, the building blocks, CIDR blocks. First. We know this is the main IP range for the whole VPC, but the strategic part, that's where people sometimes miss the maraq early on,
Chris 1:52
absolutely choosing your V PCs, main CIDR block isn't just about how many IPS you think you need. Now it's a long term decision. Think about future scaling, maybe merging networks. If your company acquires another, connecting to on prem, choosing too small a range, like a 24 for a whole VPC can cause massive headaches later. Oh,
Kelly 2:12
yeah. Reiping projects are not fun. Nobody wants to explain that, and we're usually using those private ranges, right, like 10 pod X or 170, 2.16
Chris 2:20
exactly 10 point 0.0, 2.081 6.1610101219 2.168 point 0.016, those RFC 1918, ranges are what you should be using internally. Plan carefully upfront,
Kelly 2:34
and here, the key distinction is public versus private subnets. It sounds obvious, but the implications are huge for security. Public subnets have a route out to the internet, usually via an Internet Gateway. They're for things that need direct Internet access, like web servers. Private subnets, though they're isolated by default, perfect for back end services, databases, stuff you definitely don't want exposed directly. This is how you build tiered architectures. Makes sense. And to get that internet connection for the public subnets. You need that bridge for IPv four. That's the Internet Gateway, the IGW, pretty straightforward, but what about IPv six? That's becoming more common, right?
Chris 3:09
For IPv six, if you need instances to reach the internet, but you don't want the internet reaching into them unsolicited, you use an egress only Internet Gateway. It's specifically for outbound only IPv six traffic, think updates, patching, fetching data, but no inbound connections allowed. It's a security feature for IPv six. Got
Kelly 3:28
it one way street out for IPv six. So we have IPs, subnets, gateways. How does traffic know where to go? That's route tables, right? The VPCs, GPS.
Chris 3:37
Exactly every subnet has to be associated with a route table. No exceptions. And these tables contain the rules, does traffic stay local? Does it go to the IGW for internet access as it head towards a VPN gateway or maybe a transit gateway? The route table decides,
Kelly 3:50
Okay, so CIDR blocks for the overall IP space, subnets to segment public and private gateways to connect out and route tables to direct the traffic. That's the core. When you put it all together, you see the power. But where do people under utilise this beyond just basic isolation? Well, it's the
Chris 4:07
foundation for really secure and scalable designs. Micro segmentation is a big one, separating your web tier, app tier, data tier into different subnets with specific rules, and it's absolutely essential for multi account strategies. In AWS organisations, you use VPCs to keep workloads separate, enforce governance, limit blast radius. It's strategic network design,
Kelly 4:27
okay, foundation laid. Let's get more practical now our hands on deep dive, starting with security at the network edge, security groups versus network ACLs, huge topic. Let's start with security groups. SGS, I always think of these as the firewall right on the instance, the virtual bouncer for your EC2 or RDS
Chris 4:44
instance. That's a good analogy. And the key features, they are stateful. This is critical. If you allow an outbound connection, say port 80 to the web, the return traffic is automatically allowed back in. You don't need a specific inbound rule for the response. They also only. Support allow rules. Anything you don't explicitly allow is denied by default,
Kelly 5:05
and you can use other security group IDs as sources or destinations, right? That's pretty neat for tier to tier communication,
Chris 5:11
exactly, instead of managing lists of IP addresses for your app servers, your web servers. SG, can just allow traffic to the app server is. G, very elegant, very dynamic,
Kelly 5:22
so stateful, allow only instance level, like a personal bouncer. Okay. So then what about network? ACLs or NACLs? How do they compare?
Chris 5:29
If SDS are the bouncers at the instant store, NACLs are like the main gate security for the entire neighbourhood, or, in this case, the entire subnet. They operate at the subnet boundary.
Kelly 5:38
Okay, so a broader scope, what's the fundamental difference in how they work compared to SGS?
Chris 5:43
The biggest difference NACLs are stateless. They don't track connections. If you allow inbound traffic on port 80, you also need an explicit outbound Rule to allow the return traffic on the ephemeral ports. Both directions need rules. And unlike SGS, NaCl support both allow and deny rules. So you can explicitly block certain IP addresses or ranges at the subnet level, and the rule order matters for NACLs, doesn't it? Critically, NaCl process rules in numerical order, starting with the lowest number, the first rule that matches the traffic is applied, and that's it. So if deny rule matches before an allow rule, the traffic is blocked. Order is everything,
Kelly 6:20
stateful versus stateless, allow only versus allow, deny, instance versus subnet. Got it crucial distinctions. Okay, let's move to connecting outside AWS hybrid networking. Many places still have on prem data centres. First up VPN connections. These are the encrypted tunnels, usually over the public Internet, linking your office or data centre to your VPC,
Chris 6:39
yep. Typically using IPsec for encryption. You've got choices for routing too, BGP, Border Gateway Protocol is dynamic and scales better for complex setups. Or you can use static routes, which are simpler for basic connections but harder to manage at scale, these VPNs connect to what's called a Virtual Private gateway or vPg attached to your VPC. It's reliable, secure, but performance can fluctuate because, well, it's the internet
Kelly 7:05
right for more demanding needs. There's AWS Direct Connect or dx. This is a big step up. It's a dedicated, private fibre connection from your facility straight into an AWS network location.
Chris 7:16
And the benefits are significant precisely because it avoids the public Internet. You generally get much higher, more consistent bandwidth, lower latency, predictable performance. It's ideal for heavy data transfer, real time applications, things like
Kelly 7:29
that. What about connecting it up? You mentioned vifs, right? Virtual interfaces,
Chris 7:33
you'll typically use a private vif to connect directly to your VPC via that virtual private gateway. We mentioned, there's also a public vif for accessing AWS public services like S3 or Glacier directly without going through your VPC. And a transit vif connects to a transit gateway.
Kelly 7:51
Can you encrypt over Direct Connect? I mean, it's private, but still
Chris 7:55
good question. Yes, you absolutely can, even though it's a dedicated circuit. You can layer IPsec VPNs over it, or use MC sec, which encrypts at layer two for added security if
Kelly 8:04
needed. Okay, and you mentioned transit gateway, TGW. This sounds like the solution for when you have lots of VPCs, maybe multiple accounts, maybe VPNs and Direct Connect, all needing to talk the central hub.
Chris 8:15
It really is a game changer for complex networks. Transit gateway act as a Cloud Router. You connect your VPCs, your VPNs, your direct grid, connects all to the TGW, and the magic is transitive routing. Everything connected to the TGW can potentially talk to everything else connected to it through the TGW based on the routing rules you set up there no more messy VPC peering meshes. It simplifies management massively as you scale.
Kelly 8:38
Yeah, managing dozens of peering connections sounds like a nightmare. TGW makes sense. Okay, shifting slightly. Let's cover some common architectural patterns within a VPC. First, the NAT gateway. This is for when instances and your private subnets need to get out to the internet, maybe for updates, but you don't want the internet getting into them
Chris 8:56
exactly. And a key point, yeah, the NAT gateway itself must live in a public subnet. It needs a route to the Internet Gateway to do its job. Also be mindful of cost. There's an hourly charge plus data processing fees, and crucially, for high availability,
Kelly 9:11
ah, yes. H, A you need one per AZ, I absolutely do.
Chris 9:15
For true fault tolerance, you need to deploy a NAT gateway in each availability zone where you have private instances needing all bound access. Yeah, and configure your private subnet route tables in that AZ to point to the NAT gateway. In the same AZ, they don't automatically fail over across AZs that catches people out.
Kelly 9:31
Good tip. Next pattern, the bastion host or jump box, the secure entry point,
Chris 9:36
yep, it's typically an easy to instance sitting in public subnet. Its only job is to provide a hardened, controlled point for administrators to SSH or RDP into, and from there, jump into instances in the private subnets. The absolute critical thing here is locking down the Bastion security group only allow SSH RDP from specific trusted corporate IP addresses, minimise its attack. Surface
Kelly 10:00
makes sense tighten that security. Okay? Another important one, VPC endpoints, these let you talk to AWS services privately.
Chris 10:07
That's right, privately, connecting your VPC to services like S3 DynamoDB, Kinesis, EC2, APIs, et cetera, without your traffic ever having to go out over the public internet or even through a NAT gateway. It stays on the AWS private
Kelly 10:20
network. And there are two types I remember, Gateway and interface
Chris 10:24
Correct. Gateway endpoints are the older type, specifically for S3 and DynamoDB. They work by adding a prefix list for the service directly to your subnets route table. The big advantage there's no extra charge for using gateway endpoints. Interface endpoints, powered by AWS privatelink are for most other services, they actually place an elastic network interface, an Eni, with a private IP address from your subnet directly inside your VPC. You then communicate with the service via that private
Kelly 10:53
IP so better security stays off the internet and potentially saves on data transfer costs, especially with Gateway endpoints for S3 DynamoDB in the same region,
Chris 11:01
exactly. Big benefits, especially for security conscious workloads. All right. Last
Kelly 11:05
pattern for this section, elastic load balancing ELB within the VPC, distributing traffic,
Chris 11:10
essential for availability and scale. ELB spread incoming traffic across multiple targets, if you two instances, containers, IPS often across multiple availability zones.
Kelly 11:21
And the main types we usually deal with are a lb and NLB. Those are the workhorses
Chris 11:24
application load balancer. ALB is layer seven, perfect for HTTP, HTTPS. It's smart about routing based on URL paths, host names, etc. Think web apps, network load balancer. NLB is layer four, built for raw speed and ultra low latency with TCP, UDP and TLS. Traffic handles millions of requests per second, and
Kelly 11:43
nlbs have that cool ability to target IPS outside the VPC too, right? Like on prem via dx,
Chris 11:49
exactly. Nlbs can target IPS anywhere reachable from the VPC, including on premises via direct connect or VPN, which is powerful for hybrid setups. And both ALB and NLB use health checks constantly to make sure they only send traffic to healthy targets.
Kelly 12:03
Okay, fantastic overview of those patterns. Now let's pivot to exam prep, focus sharpening up on key distinctions and tricky areas. First, let's hammer this home again, security groups versus network. ACLs, instance level, stateful, allow only for SGS, subnet level, stateless, allow and deny order matters. For any CLS, you have to know this cold
Chris 12:20
absolutely and the next common confusion point when to use a NAT gateway versus a VPC endpoint. Remember, Nat gateway is for general outbound Internet access from private subnets. VPC endpoints are for private access, specifically to AWS services like S3 DynamoDB APIs. Think security, performance and maybe cost, especially gateway endpoints being free for S3 DynamoDB,
Kelly 12:44
right and for hybrid, the quick trade off VPN is cost effective. Uses the internet Direct Connect is high performance, dedicated, private, choose based on bandwidth, latency and budget needs.
Chris 12:54
Now some architectural gotchas that often appear high availability for NAT gateway, as we said it before, but it's critical. Deploy one for AZ, you needed it. No automatic cross AZ, failover, also slightly related, but important for Docker, EBS volumes are AZ, locked to move data, you snapshot the EBS volume, then restore that snapshot into a new volume in your target. AZ,
Kelly 13:14
that's the pattern, and how does DNS play into this like Route 53 Oh, good
Chris 13:18
point. Route 53 health checks are powerful. They can monitor instances or endpoints inside your VPC. Then you can configure Route 53 routing policies like failover routing or latency based routing, to use those health checks. So if an instance in one AZ fails, Route 53 automatically stops sending traffic there and directs users to healthy instances, maybe in another AZ or region. It ties networking and DNS together for resilience.
Unknown Speaker 13:44
That makes a lot of sense.
Chris 13:45
Okay, final exam prep, bit troubleshooting. This is real world too. Let's say you deploy an EC2 instance, maybe in a public subnet, and you just can't reach it from the internet. Or maybe an instance in a private subnet can't reach out for updates. What's the checklist? Okay, systematic approach. First, if it's supposed to be reachable from the internet. Does it actually have a public IP address or an elastic IP associated with it easy to forget. Second, check the route table for the instances subnet. Is there a route 0.0, point 0.00, pointing to the internet gateway for public or to the NAT gateway for private. Outbound traffic needs a path. Third, inspect the security group attached to the instance. Remember stateful. Did you allow the inbound traffic? For example, Port 80 for web, and sometimes people forget to check outbound rules too. Does it need to initiate connections? Fourth, check the network ACLs associated with the subnet. Remember stateless you have both an inbound rule along the traffic and an outbound rule allowing the return traffic, and crucially, check the rule numbers is an earlier deny rule blocking it
Kelly 14:44
public IP route table, security group, network, ACL hit those four systematically that covers most common connectivity issues.
Chris 14:51
Wow. Okay, we've covered a lot of ground, but it really drives home how understanding VPC networking is just foundational. You can't build robust, secure, cost effective. Cloud solutions without getting this right Absolutely. We hope this deep dive has helped build a stronger mental model for you, moving beyond just knowing service names to really grasping how they interoperate, why certain choices matter, that understanding is gold for certifications and definitely for real world builds.
Kelly 15:17
So here's something to think about. As cloud engineers, we are literally designing these invisible highways and structures within the cloud. How does truly mastering VPCs, understanding the traffic flow, the security layers, the connectivity options, empower you to build beyond just the basic requirements? How can you use this knowledge to design systems that are not only resilient and secure, but maybe even innovative in ways only possible when you really commend the network layer, definitely encourage you to apply these concepts. Get hands on in your own AWS account. Maybe try breaking and fixing things. See how mastering VPC really does open up more possibilities for complex and interesting cloud projects and.
