Considerations for a Hub and Spoke Model When Deploying Infrastructure in the Cloud

Mục Lục

Considerations for a Hub and Spoke Model When Deploying Infrastructure in the Cloud

Don’t revolve around a legacy design; choose a network design that evolves with the organization.

Analyst Perspective

Cloud adoption among organizations increases gradually across both the number of services used and the amount those services are used. However, network builders tend to overlook the vulnerabilities of network topologies, which leads to complications down the road, especially since the structures of cloud network topologies are not all of the same quality. A network design that suits current needs may not be the best solution for the future state of the organization.

Even if on-prem network strategies were retained for ease of migration, it is important to evaluate and identify the cloud network topology that can not only elevate the performance of your infrastructure in the cloud, but also that can make it easier to manage and provision resources.

An “as the need arises” strategy will not work efficiently since changing network designs will change the way data travels within your network, which will then need to be adopted to existing application architectures. This becomes more complicated as the number of services hosted in the cloud grows.

Keep a network strategy in place early on and start designing your infrastructure accordingly. This gives you more control over your networks and eliminates the need for huge changes to your infrastructure down the road.

This is a picture of Nitin Mukesh

Nitin Mukesh
Senior Research Analyst, Infrastructure and Operations
Info-Tech Research Group

Executive Summary

Your Challenge

The organization is planning to move resources to the cloud or devise a networking strategy for their existing cloud infrastructure to harness value from the cloud.

The right topology needs to be selected to deploy network level isolation, design the cloud for management efficiencies, and provide access to shared services in the cloud.

A perennial challenge for infrastructure in the cloud is planning for governance vs. flexibility, which is often overlooked.

Common Obstacles

The choice of migration method may result in retaining existing networking patterns and only making changes when the need arises.

Networking in the cloud is still new, and organizations new to the cloud may not be aware of the cloud network designs they can consider for their business needs.

Info-Tech’s Approach

Define organizational needs and understand the pros and cons of cloud network topologies to strategize for the networking design.

Consider the layered complexities of addressing the governance vs. flexibility spectrum for your domains when designing your networks.

Insight Summary

Don’t wait until the necessity arises to evaluate your networking in the cloud. Get ahead of the curve and choose the topology that optimizes benefits and supports organizational needs in the present and future.

Your challenge

Selecting the right topology: Many organizations migrate to the cloud retaining a mesh networking topology from their on-prem design, or they choose to implement the mesh design leveraging peering technologies in the cloud without a strategy in place for when business needs change. While there may be many network topologies for on-prem infrastructure, the network design team may not be aware of the best approach in cloud platforms for their requirements, or a cloud networking strategy may even go overlooked during the migration.

Finding the right cloud networking infrastructure for:

Management efficiencies
Network-level isolation of resources
Access to shared services

Deciding between governance and flexibility in networking design: In the hub and spoke model, if a domain is in the hub, the greater the governance over it, and if it sits in the spoke, the higher the flexibility. Having a strategy for the most important domains is key. For example, some security belongs in the hub and some security belongs in the spoke. The tradeoff here is if it sits completely in the spoke, you give it a lot of freedom, but it becomes harder to standardize across the organization.

Mesh network topology

A mesh is a design where virtual private clouds (VPCs) are connected to each other individually creating a mesh network. The network traffic is fast and can be redirected since the nodes in the network are interconnected. There is no hierarchical relationship between the networks, and any two networks can connect with each other directly.

In the cloud, this design can be implemented by setting up peering connections between any two VPCs. These VPCs can also be set up to communicate with each other internally through the cloud service provider’s network without having to route the traffic via the internet.

While this topology offers high redundancy, the number of connections grows tremendously as more networks are added, making it harder to scale a network using a mesh topology.

Mesh Network on AWS

This is an image of a Mesh Network on AWS

Source: AWS, 2018

Constraints

The disadvantages of peering VPCs into a mesh quickly arise with:

Transitive connections: Transitive connections are not supported in the cloud, unlike with on-prem networking. This means that if there are two networks that need to communicate, a single peering link can be set up between them. However, if there are more than two networks and they all need to communicate, they should all be connected to each other with separate individual connections.
Cost of operation: The lack of transitive routing requires many connections to be set up, which adds up to a more expensive topology to operate as the number of networks grows. Cloud providers also usually limit the number of peering networks that can be set up, and this limit can be hit with as few as 100 networks.
Management: Mesh tends to be very complicated to set up, owing to the large number of different peering links that need to be established. While this may be manageable for small organizations with small operations, for larger organizations with robust cybersecurity practices that require multiple VPCs to be deployed and interconnected for communications, mesh opens you up to multiple points of failure.
Redundancy: With multiple points of failure already being a major drawback of this design, you also cannot have more than one peered connection between any two networks at the same time. This makes designing your networking systems for redundancy that much more challenging.

Number of virtual networks

100

Peering links required
[(n-1)*n]/2

190

1225

4950

Proportional relationship of virtual networks to required peering links in a mesh topology

Case study

INDUSTRY: Blockchain
SOURCE: Microsoft

An organization with four members wants to deploy a blockchain in the cloud, with each member running their own virtual network. With only four members on the team, a mesh network can be created in the cloud with each of their networks being connected to each other, adding up to a total of 12 peering connections (four members with three connections each). While the members may all be using different cloud accounts, setting up connections between them will still be possible.

The organization wants to expand to 15 members within the next year, with each new member being connected with their separate virtual networks. Once grown, the organization will have a total of 210 peering connections since each of the virtual networks will then need 14 peering connections. While this may still be possible to deploy, the number of connections makes it harder to manage and would be that much more difficult to deploy if the organization grows to even 30 or 40 members. The new scale of virtual connections calls for an alternative networking strategy that cloud providers offer – the hub and spoke topology.

This is an image of the connections involved in a mesh network with four participants.

Source: Microsoft, 2017

Hub and spoke network topology

In hub and spoke network design, each network is connected to a central network that facilitates intercommunication between the networks. The central network, also called the hub, can be used by multiple workloads/servers/services for hosting services and for managing external connectivity. Other networks connected to the hub through network peering are called spokes and host workloads.

Communications between the workloads/servers/services on spokes pass in or out of the hub where they are inspected and routed. The spokes can also be centrally managed from the hub with IT rules and processes.

A hub and spoke design enable a larger number of virtual networks to be interconnected as each network only needs one peered connection (to the hub) to be able to communicate with any other network in the system.

Hub and Spoke Network on AWS

This is an image of the Hub and Spoke Network on AWS

What hub and spoke networks do better

Ease of connectivity: Hub and spoke decreases the liabilities of scale that come from a growing business by providing a consistent connection that can be scaled easily. As more networks are added to an organization, each will only need to be connected once – to the hub. The number of connections is considerably lower than in a mesh topology and makes it easier to maintain and manage.
Business agility and scalability: It is easier to increase the number of networks than in mesh, making it easier to grow your business into new channels with less time, investment, and risk.
Data collection: With a hub and spoke design, all data flows through the hub – depending on the design, this includes all ingress and egress to and from the system. This makes it an excellent central network to collect all business data.
Network-level isolation: Hub and spoke enables separation of workloads and tiers into different networks. This is particularly useful to ensure an issue affecting a network or a workload does not affect the rest.
Network changes: Changes to a separated network are much easier to carry out knowing the changes made will not affect all the other connected networks. This reduces work-hours significantly when systems or applications need to be altered.
Compliance: Compliance requirements such as SOC 1 and SOC 2 require separate environments for production, development, and testing, which can be done in a hub and spoke model without having to re-create security controls for all networks.

Hub and spoke constraints

While there are plenty of benefits to using this topology, there are still a few notable disadvantages with the design.

Point-to-point peering

The total number of total peered connections required might be lower than mesh, but the cost of running independent projects is cheaper on mesh as point-to-point data transfers are cheaper.

Global access speeds with a monolithic design

With global organizations, implementing a single monolithic hub network for network ingress and egress will slow down access to cloud services that users will require. A distributed network will ramp up the speeds for its users to access these services.

Costs for a resilient design

Connectivity between the spokes can fail if the hub site dies or faces major disruptions. While there are redundancy plans for cloud networks, it will be an additional cost to plan and build an environment for it.

Leverage the hub and spoke strategy for:

Providing access to shared services: Hub and spoke can be used to give workloads that are deployed on different networks access to shared services by placing the shared service in the hub. For example, DNS servers can be placed in the hub network, and production or host networks can be connected to the hub to access it, or if the central network is set up to host Active Directory services, then servers in other networks can act as spokes and have full access to the central VPC to send requests. This is also a great way to separate workloads that do not need to communicate with each other but all need access to the same services.

Adding new locations: An expanding organization that needs to add additional global or domestic locations can leverage hub and spoke to connect new network locations to the main system without the need for multiple connections.

Cost savings: Apart from having fewer connections than mesh that can save costs in the cloud, hub and spoke can also be used to centralize services such as DNS and NAT to be managed in one location rather than having to individually deploy in each network. This can bring down management efforts and costs considerably.

Centralized security: Enterprises can deploy a center of excellence on the hub for security, and the spokes connected to it can leverage a higher level of security and increase resilience. It will also be easier to control and manage network policies and networking resources from the hub.

Network management: Since each spoke is peered only once to the hub, detecting connectivity problems or other network issues is made simpler in hub and spoke than on mesh. A network manager deployed on the cloud can give access to network problems faster than on other topologies.

Hub and spoke – mesh hybrid

The advantages of using a hub and spoke model far exceed those of using a mesh topology in the cloud and go to show why most organizations ultimately end up using the hub and spoke as their networking strategy.

However, organizations, especially large ones, are complex entities, and choosing only one model may not serve all business needs. In such cases, a hybrid approach may be the best strategy. The following slides will demonstrate the advantages and use cases for mesh, however limited they might be.

Where it can be useful:

An organization can have multiple network topologies where system X is a mesh and system Y is a hub and spoke. A shared system Z can be a part of both systems depending on the needs.

An organization can have multiple networks interconnected in a mesh and some of the networks in the mesh can be a hub for a hub-spoke network. For example, a business unit that works on data analysis can deploy their services in a spoke that is connected to a central hub that can host shared services such as Active Directory or NAT. The central hub can then be connected to a regional on-prem network where data and other shared services can be hosted.

Hub and spoke – mesh hybrid network on AWS

This is an image of the Hub and spoke – mesh hybrid network on AWS

Why mesh can still be useful

Benefits Of Mesh

Use Cases For Mesh

Security: Setting up a peering connection between two VPCs comes with the benefit of improving security since the connection can be private between the networks and can isolate public traffic from the internet. The traffic between the networks never has to leave the cloud provider’s network, which helps reduce a class of risks.

Reduced network costs: Since the peered networks communicate internally through the cloud’s internal networks, the data transfer costs are typically cheaper than over the public internet.

Communication speed: Improved network latency is a key benefit from using mesh because the peered traffic does not have to go over the public internet but rather the internal network. The network traffic between the connections can also be quickly redirected as needed.

Higher flexibility for backend services: Mesh networks can be desirable for back-end services if egress traffic needs to be blocked to the public internet from the deployed services/servers. This also helps avoid having to set up public IP or network address translation (NAT) configurations.

Connecting two or more networks for full access to resources: For example, consider an organization that has separate networks for each department, which don’t all need to communicate with each other. Here, a peering network can be set up only between the networks that need to communicate with full or partial access to each other such as finance to HR or accounting to IT.

Specific security or compliance need: Mesh or VPC peering can also come in handy to serve specific security needs or logging needs that require using a network to connect to other networks directly and in private. For example, global organizations that face regulatory requirements of storing or transferring data domestically with private connections.

Systems with very few networks that do not need internet access: Workloads deployed in networks that need to communicate with each other but do not require internet access or network address translation (NAT) can be connected using mesh especially when there are security reasons to keep them from being connected to the main system, e.g. backend services such as testing environments, labs, or sandboxes can leverage this design.

Designing for governance vs. flexibility in hub and spoke

Governance and flexibility in managing resources in the cloud are inversely proportional: The higher the governance, the less freedom you have to innovate.

The complexities of designing an organization’s networks grow with the organization as it becomes global and takes on more services and lines of business. Organizations that choose to deploy the hub and spoke model face a dilemma in choosing between governance and flexibility for their networks. Organizations need to find that sweet spot to find the right balance between how much they want to govern their systems, mainly for security- and cost-monitoring, and how much flexibility they want to provide for innovation and other operations, since the two usually tend to have an inverse relationship.

This decision in hub and spoke usually means that the domains chosen for higher governance must be placed in the hub network, and the domains that need more flexibility in a spoke. The key variables in the following slide will help determine the placement of the domain and will depend entirely on the organization’s context.

The two networking patterns in the cloud have layered complexities that need to be systematically addressed.

Designing for governance vs. flexibility in hub and spoke

If a network has more flexibility in all or most of these domains, it may be a good candidate for a spoke-heavy design; otherwise, it may be better designed in a hub-centric pattern.

Function: The function the domain network is assigned to and the autonomy the function needs to be successful. For example, software R&D usually requires high flexibility to be successful.
Regulations: The extent of independence from both internal and external regulatory constraints the domain has. For example, a treasury reporting domain typically has high internal and external regulations to adhere to.
Human resources: The freedom a domain has to hire and manage its resources to perform its function. For example, production facilities in a huge organization have the freedom to manage their own resources.
Operations: The freedom a domain has to control its operations and manage its own spending to perform its functions. For example, governments usually have different departments and agencies, each with its own budget to perform its functions.
Technology: The independence and the ability a domain has to manage its selection and implementation of technology resources in the cloud. For example, you may not want a software testing team to have complete autonomy to deploy resources.

Optimal placement of services between the hub and spoke

Shared services and vendor management

Resources that are shared between multiple projects or departments or even by the entire organization should be hosted on the hub network to simplify sharing these services. For example, e-learning applications that may be used by multiple business units to train their teams, Active Directory accessed by most teams, or even SAAS platforms such as O365 and Salesforce can leverage buying power and drive down the costs for the organization. Shared services should also be standardized across the organization and for that, it needs to have high governance.

Services that are an individual need for a network and have no preexisting relationship with other networks or buying power and scale can be hosted in a spoke network. For example, specialized accounting software used exclusively by the accounting team or design software used by a single team. Although the services are still a part of the wider network, it helps separate duties from the shared services network and provides flexibility to the teams to customize and manage their services to suit their individual needs.

Network egress and interaction

Network connections, be they in the cloud or hybrid-cloud, are used by everyone to either connect to the internet, access cloud services, or access the organization’s data center. Since this is a shared service, a centralized networking account must be placed in the hub for greater governance. Interactions between the spokes in a hub and spoke model happens through the hub, and providing internet access to the spokes through the hub can help leverage cost benefits in the cloud. The network account will perform routing duties between the spokes, on-prem assets, and egress out to the internet.

For example, NAT gateways in the cloud that are managed services are usually charged by the hour, and deploying NAT on each spoke can be harder to manage and expensive to maintain. A NAT gateway deployed in a central networking hub can be accessed by all spokes, so centralizing it is a great option.

Note that, in some cases, when using edge locations for data transfers, it may be cost effective to deploy a NAT in the spoke, but such cases usually do not apply to most organizational units.

A centralized network hub can also be useful to configure network policies and network resources while organizational departments can configure non-network resources, which helps separate responsibilities for all the spokes in the system. For example, subnets and routes can be controlled from the central network hub to ensure standardized network policies across the network.

Security

While there needs to be security in the hub and the spokes individually, finding the balance of operation can make the systems more robust. Hub and spoke design can be an effective tool for security when a principal security hub is hosted in the hub network. The central security hub can collect data from the spokes as well as non-spoke sources such as regulatory bodies and threat intelligence providers, and then share the information with the spokes.

Threat information sharing is a major benefit of using this design, and the hub can take actions to analyze and enrich the data before sharing it with spokes. Shared services such as threat intelligence platforms (TIP) can also benefit from being centralized when stationed in the hub. A collective defense approach between the hub and spoke can be very successful in addressing sophisticated threats.

Compliance and regulatory requirements such as HIPAA can also be placed in the hub, and the spokes connected to it can make use of it instead of having to deploy it in each spoke individually.

Cloud metering

The governance vs. flexibility paradigm usually decides the placement of cloud metering, i.e. if the organization wants higher control over cloud costs, it should be in the central hub, whereas if it prioritizes innovation, the spokes should be allowed to control it. Regardless of the placement of the domain, the costs can be monitored from the central hub using cloud-native monitoring tools such as Azure Monitor or any third-party software deployed in the hub.

For ease of governance and since resources are usually shared at a project level, most cloud service providers suggest that an individual metering service be placed in the spokes. The centralized billing system of the organization, however, can make use of scale and reserved instances to drive down the costs that the spokes can take advantage of. For example, billing and access control resources are placed in the lower levels in GCP to enable users to set up projects and perform their tasks. These billing systems in the lower levels are then controlled by a centralized billing system to decide who pays for the resources provisioned.

Don’t get stuck with your on-prem network design. Design for the cloud.

Peering VPCs into a mesh design can be an easy way to get onto the cloud, but it should not be your networking strategy for the long run.
Hub and spoke network design offers more benefits than any other network strategy to be adopted only when the need arises. Plan for the design early on and keep a strategy in place to deploy it as early as possible.
Hybrid of mesh and hub and spoke will be very useful in connecting multiple large networks especially when they need to access the same resources without having to route the traffic over the internet.
Governance vs. flexibility should be a key consideration when designing for hub and spoke to leverage the best out of your infrastructure.
Distribute domains across the hub or spokes to leverage costs, security, data collection, and economies of scale, and to foster secure interactions between networks.

Cloud network design strategy

Bibliography

Borschel, Brett. “Azure Hub Spoke Virtual Network Design Best Practices.” Acendri Solutions, 13 Jan. 2022. Web.
Singh, Garvit. “Amazon Virtual Private Cloud Connectivity Options.” AWS, January 2018. Web.
“What Is the Hub and Spoke Information Sharing Model?” Cyware, 16 Aug. 2021. Web.
Youseff, Lamia. “Mesh and Hub-and-Spoke Networks on Azure.” Microsoft, Dec. 2017. Web.