3.1. Troubleshooting overlay networks · Orchestration de conteneurs
Mục Lục
Troubleshooting overlay networks
Troubleshooting overlay networks
- We want to run tools like
ab
orhttping
on the internal network
- Ah, if only we had created our overlay network with the
--attachable
flag …
- Oh well, let’s use this as an excuse to introduce New Ways To Do Things
Breaking into an overlay network
-
We will create a dummy placeholder service on our network
-
Then we will use
docker exec
to run more processes in this container
- Start a “do nothing” container using our favorite Swiss-Army distro:
docker service create --network dockercoins_default --name debug \ --constraint node.hostname==
$HOSTNAME
alpine sleep 1000000000
The constraint
makes sure that the container will be created on the local node.
Entering the debug container
- Once our container is started (which should be really fast because the alpine image is small), we can enter it (from any node)
-
Locate the container:
docker ps
-
Enter it:
docker
exec
-ti containerID sh
Labels
-
We can also be fancy and find the ID of the container automatically
-
SwarmKit places labels on containers
-
Get the ID of the container:
CID=$(docker ps -q --filter label=com.docker.swarm.service.name=debug)
-
And enter the container:
docker
exec
-ti$CID
sh
-
Ideally, you would author your own image, with all your favorite tools, and use it instead of the base
alpine
image -
But we can also dynamically install whatever we need
- Install a few tools:
apk add --update curl apache2-utils drill
Investigating the rng
service
- First, let’s check what
rng
resolves to
- Use drill or nslookup to resolve
rng
:drill rng
This give us one IP address. It is not the IP address of a container.
It is a virtual IP address (VIP) for the rng
service.
Investigating the VIP
- Try to ping the VIP:
ping -c 3 rng
It should ping. (But this might change in the future.)
With Engine 1.12: VIPs respond to ping if a
backend is available on the same machine.
With Engine 1.13: VIPs respond to ping if a
backend is available anywhere.
(Again: this might change in the future.)
What if I don’t like VIPs?
-
Services can be published using two modes: VIP and DNSRR.
-
With VIP, you get a virtual IP for the service, and a load balancer
based on IPVS(By the way, IPVS is totally awesome and if you want to learn more about it in the context of containers,
I highly recommend this talk by @kobolog at DC15EU!) -
With DNSRR, you get the former behavior (from Engine 1.11), where
resolving the service yields the IP addresses of all the containers for
this service -
You change this with
docker service create --endpoint-mode [VIP|DNSRR]
Looking up VIP backends
-
You can also resolve a special name:
tasks.<name>
-
It will give you the IP addresses of the containers for a given service
- Obtain the IP addresses of the containers for the
rng
service:drill tasks.rng
This should list 5 IP addresses.
Testing and benchmarking our service
- We will check that the service is up with
rng
, then
benchmark it withab
-
Make a test request to the service:
curl rng
-
Open another window, and stop the workers, to test in isolation:
docker service update dockercoins_worker --replicas 0
Wait until the workers are stopped (check with docker service ls
)
before continuing.
Benchmarking rng
We will send 50 requests, but with various levels of concurrency.
-
Send 50 requests, with a single sequential client:
ab -c 1 -n 50 http://rng/10
-
Send 50 requests, with fifty parallel clients:
ab -c 50 -n 50 http://rng/10
Benchmark results for rng
-
When serving requests sequentially, they each take 100ms
-
In the parallel scenario, the latency increased dramatically:
-
What about
hasher
?
Benchmarking hasher
We will do the same tests for hasher
.
The command is slightly more complex, since we need to post random data.
First, we need to put the POST payload in a temporary file.
- Generate 10 bytes of random data:
curl http://rng/10 >/tmp/random
Benchmarking hasher
Once again, we will send 50 requests, with different levels of concurrency.
-
Send 50 requests with a sequential client:
ab -c 1 -n 50 -T application/octet-stream -p /tmp/random http://hasher/
-
Send 50 requests with 50 parallel clients:
ab -c 50 -n 50 -T application/octet-stream -p /tmp/random http://hasher/
Benchmark results for hasher
-
The sequential benchmarks takes ~5 seconds to complete
-
The parallel benchmark takes less than 1 second to complete
-
In both cases, each request takes a bit more than 100ms to complete
-
Requests are a bit slower in the parallel benchmark
-
It looks like
hasher
is better equipped to deal with concurrency thanrng
Why?
Why does everything take (at least) 100ms?
rng
code:
Figure 89 : RNG code screenshot
hasher
code:
Figure 90 : HASHER code screenshot
But …
WHY?!?
Why did we sprinkle the code with sleeps?
-
Deterministic performance
(regardless of instance speed, CPUs, I/O…) -
Actual code sleeps all the time anyway
-
When your code makes a remote API call:
-
it sends a request;
-
it sleeps until it gets the response;
-
it processes the response.
-
Why do rng
and hasher
behave differently?
Figure 91 : Equations on a blackboard
(Synchronous vs. asynchronous event processing)
Global scheduling → global debugging
-
Traditional approach:
- log into a node
- install our Swiss Army Knife (if necessary)
- troubleshoot things
-
Proposed alternative:
- put our Swiss Army Knife in a container (e.g. nicolaka/netshoot)
- run tests from multiple locations at the same time
(This becomes very practical with the docker service log
command, available since 17.05.)
More about overlay networks
.blackbelt[DC17US: Deep Dive in Docker Overlay Networks (video)]
.blackbelt[DC17EU: Deeper Dive in Docker Overlay Networks (video)]