Welcome everyone! Today I'm going to walk you through Azure Network Watcher, which is Microsoft's comprehensive network monitoring and diagnostic service.
Think of Network Watcher as your network detective - it helps you understand what's happening in your Azure network infrastructure, diagnose connectivity issues, and monitor performance in real-time.
By the end of this presentation, you'll know how to set up monitoring, configure diagnostics, troubleshoot network problems, and optimize your network performance using Network Watcher's powerful tools.
Let me start by showing you how Network Watcher fits into your Azure environment. This architecture diagram illustrates the complete ecosystem.
Network Watcher sits at the center, connecting to various diagnostic tools like Connection Troubleshoot, Next Hop analysis, and Packet Capture. Notice how it integrates with Log Analytics for data analysis and Storage Accounts for raw data storage.
The beauty of this architecture is that it provides both real-time diagnostics and historical analysis. Your VMs and network security groups feed data into the system, while tools like Traffic Analytics provide insights back to you.
This integrated approach means you get a complete picture of your network health from multiple perspectives.
Before we dive into the cool features, let's get Network Watcher properly set up. The first step is enabling Network Watcher in your region.
Here's something important to know - Network Watcher is automatically enabled when you create your first virtual network in a region, but I always recommend explicitly enabling it to ensure it's in the right resource group.
Notice we're specifying the location as 'eastus' - you'll need to do this for each region where you have resources. The great thing is that once it's enabled, it works across all your virtual networks in that region.
This command ensures Network Watcher is properly configured and ready to monitor your network infrastructure.
Now we need a place to store all our diagnostic data. This storage account will hold packet captures, flow logs, and other network diagnostic information.
I'm using Standard_LRS here because it's cost-effective for most scenarios. If you need higher availability, you could use Standard_GRS for geo-redundancy, but that comes with additional costs.
The StorageV2 kind gives us access to all the latest features including blob storage tiers, which is perfect for managing costs on older diagnostic data.
Remember, this storage account name needs to be globally unique across all of Azure, so choose something descriptive but unique to your organization.
The Log Analytics workspace is where the magic happens for data analysis. This is where Network Watcher will send processed flow logs and where Traffic Analytics will generate insights.
I'm using the PerGB2018 pricing tier, which offers the best value for most organizations. It gives you 31 days of retention and pay-per-GB pricing. If you have predictable data volumes, you might consider commitment tiers for cost savings.
This workspace will become your central hub for querying network data, creating alerts, and building dashboards. You'll spend a lot of time here once everything is configured!
This sequence diagram shows exactly what happens when network traffic flows through your infrastructure. Let me walk you through each step.
When a user makes a network request, it first hits your Network Security Group. The NSG evaluates its security rules and makes a decision - allow or deny.
Here's where it gets interesting - Flow Logs capture this decision and simultaneously do two things: store the raw logs in your storage account and send processed data to Log Analytics.
Traffic Analytics then takes this processed data and generates insights like geo-mapping, security threat detection, and performance analytics. It's like having a network analyst working 24/7!
Let's start with basic flow log configuration. This command creates a flow log for a specific Network Security Group.
Notice we need the full resource ID of the NSG - this ensures we're targeting exactly the right security group. The storage account parameter tells Network Watcher where to store the raw flow data.
This gives us basic logging, but we're going to enhance this in the next step with Traffic Analytics and version 2 logging for much richer data.
The location parameter should match where your NSG is deployed - this is important for data residency and performance.
Now let's supercharge our flow logging! This enhanced configuration adds several powerful features.
Version 2 logs include additional fields like flow state, throughput information, and enhanced security details. The 30-day retention means we keep logs in storage for a month before automatic cleanup.
The real game-changer here is Traffic Analytics. When enabled, it provides geo-mapping showing where your traffic originates, identifies top talkers, and even detects potential security threats.
The workspace parameter connects this to our Log Analytics workspace, enabling powerful KQL queries and alerting capabilities.
This diagram shows the complete packet capture workflow. Unlike flow logs which show decisions, packet captures give you the actual network packets for deep analysis.
You can trigger captures manually when troubleshooting specific issues, schedule them for regular monitoring, or even set up alert-based triggers for automatic capture when problems occur.
The captured data gets stored in your storage account, and then you can analyze it using tools like Wireshark, Microsoft Network Monitor, or custom scripts.
The key is using filters to capture only what you need - unfiltered captures can generate massive amounts of data very quickly!
Here's how to create a basic packet capture. This is your starting point for deep network analysis.
The command targets a specific VM - this is where the Network Watcher extension will be installed automatically if it's not already there. The capture will include all traffic flowing through that VM's network interfaces.
This basic capture uses default settings, which means no time limits or size restrictions. While this gives you complete data, it can quickly consume storage space, so we'll add controls in the next example.
This is where packet capture becomes really powerful. Look at all these optimization settings!
The 5-minute time limit prevents runaway captures, while the 128 bytes per packet limit captures just the headers - this reduces file size by up to 90% while preserving the information you need for most troubleshooting.
The 100MB session limit provides an additional safety net. These limits work together to give you meaningful data without breaking your storage budget.
The storage path parameter lets you organize captures into folders - very helpful when you're running multiple captures for different issues.
This is where packet capture becomes surgical. Instead of capturing everything, we're using filters to target exactly what we need.
This filter captures only TCP traffic on port 80 from a specific local IP address. The asterisks for remote IP and port mean we'll capture traffic to any destination.
You can create multiple filters in the JSON array to capture different types of traffic. For example, you might capture both HTTP and HTTPS traffic, or focus on traffic between specific subnets.
Filtering is crucial for reducing noise and focusing on the actual problem you're trying to solve.
This flowchart shows my systematic approach to connection troubleshooting. Network Watcher's connectivity tools make this process much more efficient.
We start by defining the source and destination, then run a connectivity check. Based on whether the connection succeeds or fails, we follow different diagnostic paths.
For successful connections, we analyze performance metrics. For failures, we systematically check NSG rules, route tables, and firewall configurations.
The key is being methodical - this approach helps you quickly identify the root cause instead of guessing at solutions.
Let's test basic connectivity between two points in your network. This command checks if traffic can flow from a source VM to a destination IP on a specific port.
The test happens at the network layer and shows you exactly where traffic might be getting blocked. You'll get back connectivity status, latency information, and a hop-by-hop analysis of the path.
Port 443 here means we're testing HTTPS connectivity. This is super useful for validating application connectivity or testing after configuration changes.
The diagnostic returns detailed information about each hop in the path, making it easy to identify where problems occur.
This advanced version shows testing between two VMs using resource IDs instead of IP addresses. This is particularly useful because it automatically resolves to current IP addresses.
We're testing SQL Server connectivity on port 1433 here. The protocol specification ensures we're testing the exact type of traffic your application uses.
The IPv4 preference is useful in dual-stack environments where you want to force testing over a specific IP version.
Resource-based targeting is great for dynamic environments where IP addresses might change due to scaling or redeployment.
This diagram illustrates how Azure determines the next hop for network traffic. Understanding this is crucial for troubleshooting routing issues.
Azure evaluates routes in a specific order: User Defined Routes have the highest priority, followed by BGP routes, then system routes. The first matching route determines the next hop.
Next hop types include Internet for external traffic, Virtual Appliance for traffic going through firewalls or routers, VnetLocal for traffic staying within the virtual network, and None when traffic should be dropped.
This systematic evaluation helps you predict and troubleshoot how traffic will flow through your network.
Network topology gives you a comprehensive view of your network architecture. This command discovers all the network resources and their relationships.
You'll see virtual networks, subnets, virtual machines, network security groups, and how they're all connected. This is invaluable for understanding your network layout and planning changes.
The topology view helps identify potential single points of failure, validates security boundaries, and ensures your network design matches your intended architecture.
Next hop analysis tells you exactly where traffic will go from a specific source to a destination. This is incredibly useful for troubleshooting routing problems.
The command shows the next hop type, the route table that made the decision, and the specific IP address traffic will be sent to. This helps validate that your custom routes are working as expected.
If you're having connectivity issues, this tool quickly shows whether it's a routing problem or something else like NSG rules or firewall configurations.
This diagram shows how Network Security Group rules are evaluated. Understanding this process is essential for security troubleshooting.
Traffic first encounters subnet-level NSG rules, then network interface-level rules. Rules are processed by priority number - 100 is highest priority, 4096 is lowest.
The first matching rule determines the action - allow or deny. This means rule order and priority are crucial for getting the security behavior you want.
Many connectivity issues are actually security rule problems, so understanding this evaluation process helps you troubleshoot faster.
This command shows you the effective security rules for a specific VM, including both subnet and network interface level rules.
The output shows rule priorities, actions, and which NSG each rule comes from. This is invaluable for understanding the complete security posture of a VM.
Instead of manually checking multiple NSGs, this tool gives you a consolidated view of all rules that apply to your VM, making security auditing much easier.
This diagram shows how all the monitoring pieces fit together in a comprehensive alerting and reporting system.
Network events flow into Azure Monitor, get processed in Log Analytics, and trigger alert rules that notify your team through multiple channels.
The system also supports custom dashboards, Power BI integration, and scheduled reports for proactive monitoring.
Action groups define who gets notified and how - email, SMS, webhooks, or even ITSM integrations for enterprise environments.
Action groups are the foundation of your alerting system. They define who gets notified when alerts fire.
The short name is limited to 12 characters and appears in SMS and email notifications, so make it descriptive but concise.
This basic action group will be enhanced in the next steps with specific notification methods like email, SMS, and webhook integrations.
Here we're adding email notifications to our action group. You can add multiple email addresses by repeating this command with different parameters.
The display name appears in the alert emails, so use descriptive names like "Network Admin" or "NOC Team" to make it clear who should respond.
You can also add SMS notifications using the same pattern: --add-action sms "Name" "Country Code" "Phone Number".
This creates an alert for high network traffic. Notice we're monitoring Network In Total - this tracks incoming traffic to the VM.
The threshold is set to 1GB, but you'll want to adjust this based on your normal traffic patterns. The 15-minute window with 5-minute evaluation frequency provides good responsiveness without too many false positives.
Severity level 2 indicates a warning level alert. Use severity 0 for critical issues that need immediate response.
Here's where Log Analytics becomes really powerful. This KQL query analyzes external traffic patterns to identify potential security threats.
We're looking at the last hour of external public traffic and summarizing total bytes by source IP. This helps identify potential data exfiltration or denial-of-service attacks.
The top 10 results give you the highest traffic sources, which you should investigate if they seem unusual for your environment.
Run queries like this regularly to establish baseline traffic patterns and quickly spot anomalies.
This query focuses on blocked traffic analysis - essentially showing you what your NSG rules are protecting you from.
We're filtering for denied flows (FlowStatus_s == "D") and grouping by the NSG rule that blocked the traffic, destination port, and protocol.
This tells you which rules are most active and reveals common attack patterns by showing frequently blocked ports and protocols.
Use this information to validate that your security rules are working correctly and to identify potential threats.
This decision matrix provides a structured approach to network troubleshooting. Different types of issues require different diagnostic tools.
Connectivity issues use the connection troubleshoot tool, performance problems need packet capture and metrics analysis, security issues require flow logs and NSG analysis, and routing problems need next hop and topology tools.
The key is matching the right tool to the type of problem you're investigating. This systematic approach saves time and leads to faster resolution.
Sometimes you need to test connectivity to external services. This command tests HTTPS connectivity to Microsoft's website.
This validates outbound internet access, DNS resolution, and HTTPS connectivity all in one test. It's perfect for troubleshooting application connectivity issues.
The test shows you the complete path from your VM to the external destination, helping identify where connectivity might be failing.
For performance analysis, we need longer capture windows and focused filtering. This 10-minute capture focuses specifically on HTTPS traffic.
Longer capture windows help identify intermittent performance issues that might not show up in shorter captures.
By filtering for port 443, we're eliminating noise and focusing on potentially problematic HTTPS traffic patterns.
Let me wrap up with some key best practices I've learned from implementing Network Watcher in production environments.
Cost optimization is crucial - enable flow logs only on critical NSGs, use packet capture limits to prevent excessive storage usage, and implement data retention policies.
For security, remember that packet captures contain sensitive data, so secure your storage accounts properly and limit access to authorized personnel.
Consider automation using Logic Apps or Azure Functions to automatically respond to alerts and trigger diagnostics when issues occur.
These practices will help you get maximum value from Network Watcher while controlling costs and maintaining security.