3 Tips for Better Network Route Management

Picture this: You’re a humble cloud engineer. You make a seemingly innocuous change to a route on a table in AWS. Five minutes later, a critical application is down because it can’t reach it’s microservices in another VPC and everyone is freaking out. What went wrong? It was just supposed to be a simple route update!

To the untrained eye, route management is extremely daunting and difficult to understand – much less to manage – but it certainly doesn’t have to be that way. Here are 3 tips from my own experience, in no particular order, to help mature your approach to route management and set yourself up for success:

*Note – I am a cloud specialist, and therefore focusing on cloud network routing. But most, if not all of this, in theory relates to on-premise networking too!

1) Know the high-level structure of your network environment. This is a big one when you’re coming into an existing network. When I was a child, it was hard to fully conceive of how local addresses worked until I began driving. You could’ve named a street only a block or two down from me and I would’ve hardly known where to find it – until I began driving and developing a firsthand conception in my head of how local streets were laid out. Then it quickly came to me and I could envision the entire layout of the city in my head.

So it is with your network – the more you understand the high-level architecture, the easier it will be to make sense of individual routes and route tables in-context. If diagrams are available, review them; if they don’t exist, take some initiative and create them. The chances of making a bad route change will be drastically lower if you can conceive in your head of how a route change will impact the big-picture.

2) Have an intelligent plan for automated route importation. I went through a phase in my tech-life where I hated importing routes – I wanted every route table to be completely static so that I felt like I had complete control of the environment with no possibility of “ghost in the machine” problems.

This worked fine, for a while – but I later ran into two problems:

a. As my environment grew, so did my route management. I soon had scores of routes I was managing manually (through infra-as-code, but it still required considering each route on an individual basis). This became a hassle before long, a lot to keep straight. With a decent automated import/export strategy wherein you know for sure that the necessary (and ONLY the necessary) routes are being imported, you don’t have to spend any time at all on route management for the most part – everything will fall into place as you go.

b. I was tempted to take “shortcuts” – like using 0.0.0.0/0 routes in places where it seemed efficient in theory, but caused problems in practice. In one example, I used a 0.0.0.0/0 route for traffic returning from a cross-cloud connection because all return traffic was going to go to the same firewall subnet regardless of final destination – but this caused 0.0.0.0/0 to be exported across the connection to the other cloud, which then interefered with routes on the other side causing routing mismatches. I had to get the IP range of the actual destination and use that instead. With strategic route importation in use, I could have had the cross-cloud route table import the correct route to begin with and cut down the room for making such a mistake.

Done correctly, automated route export/import will save a lot of hassle and eliminate surface area for human problems.

3) Make sure there is fine demarcation between Prod and Dev environments. This should go without saying but you might be surprised how often I’ve found critical prod infra running on the same routing environment as dev infra – for example, a Transit Gateway in AWS. I think it is easy for engineers and architects to view something like a Transit Gateway as “agnostic” to development environment, but that is mistake – the health and configuration of the Transit Gateway impacts the environment as much as anything else. Don’t allow production infra running critical business applications to be influenced by changes in dev, that shouldn’t even be possible – put them on their own routers with their own tables.

These are just three tips from top-of-mind – I’m sure a seasoned network engineer could share countless tips. Is that you? Have some in mind? Share them in the comments!



Categories: Cloud, Tips

Tags: , ,