How much traffic from your machine goes to the cloud?

Paulo Almeida
6 min readJan 30, 2020

Irrespective of your opinion in the cloud vs on-premise debate, this is a question that all of us wondered at least once, whether this is because it’s so intriguing or due to disbelief (for whatever reason).

Some solutions have been proposed to learn who might be hosting the website you are accessing in the form of Chrome plugins. Unquestionably interesting but yet, that felt to me like something was missing…

I remember that it was like yesterday to me when S3 suffered a big blow in the us-east-1 region due to an operational error and several other services where impacted. This alone created a cascade failure effect which affected several websites worldwide, many of which I used (and probably still use). Some people even made analogies like “Amazon S3 is the storage of the Internet” — several twitter accounts.

Having said that, it’s clear to me the question I asked in the title can’t be answered by solely checking the hosting provider of a website you accessed but by the traffic coming from those cloud providers. With that in mind (and also to put into practice things I am currently studying) I created a Linux Kernel module (a.k.a LKM) to help me get these stats.

The kernel-mod-cloud-packet-stats solution

This module basically opens the TCP header of each packet and compares it against a list of CIDR blocks from each cloud provider. For the time being it supports AWS, Azure and GCP… but we never know, I would love to see if someone gets interested in adding another provider to it.

The compilation, installation and clean up procedures can be found on the Github repository above, so I will jump straight into what matters most.

Once installed, this LKM will start aggregating the number of packets sent to each cloud provider and expose them via sysfs as shown below:

My computer stats while writing this article alone

Technical gotchas

While I have no aspirations of further developing this LKM, I think it would be cool if I could list some of the technologies I used and train-of-thoughts I had when developing this module. BTW, in case you find things that can be done better, please open a PR and I will gladly review it while I learn what you’ve done :)

Kernel modules and files

In order to obtain the CIDR blocks from cloud providers, you must obtain them online and parse a JSON file before you can use it. As you might have expected, this poses a great problem when we are talking about LKMs.

This isn’t new and has been explained with a reasonable amount of details by no one else but Greg K.H. https://www.linuxjournal.com/article/8110

The approach taken was to download it during the build preparation phase and build it as part of the source code. A similar approach is used (although undeniably more sophisticated than mine) with syscalls in the Linux kernel source code.

Comparing IP from the TCP header with a list of block ranges

When it comes to being able to listen for TCP packets, you have to register a callback for one or more hooks available from the kernel networking core via the Netfilter framework.

An graphical representation of the Netfilter components from Wikipedia
A graphical representation of the Netfilter components from Wikipedia

As you may have imagined, the matching between the packet IP and one of the CIDR blocks must be done blazingly fast. Given the fact that the more cloud providers we add to it, the more CIDR ranges will have to be compared against each packet, then an O(n) search algorithm has no place here. Then again, in Kernel land speed matters, right?

The approach I took was to transform CIDR blocks from their string representation “x.x.x.x/y” to a pair of integers (from, to respectively) in their host byte order.

The trick is really just avoiding these transformations in runtime if we can. The host-byte order is more of a way to make it more convenient than anything in our case ;)

You can find below a code snippet of how I am doing this.

//nf_hook_ops callback method
static unsigned int cloud_packet_tcp_packet_filter(void *priv,
struct sk_buff *skb, const struct nf_hook_state *state)
{
struct iphdr *iph; /* IPv4 header */
u32 saddr; /* Source address */

if (!skb) /* packet is empty */
return NF_ACCEPT;
iph = ip_hdr(skb); /* get IP header */
saddr = ntohl(iph->saddr); /* host byte order */

process_ip(saddr); /* Let's find it now */
return NF_ACCEPT;
}

The second approach taken here was to order the list of CIDR integer pair so that we could search them using a binary search O(logN) instead. Fortunately, someone had already proposed a generic implementation which was merged so I took advantage of it :)

I couldn’t think about an O(1) solution for this problem but if you think about something let me know in the comments. Seems a fun thing to play with, right?

The packet counters

The packet counters per cloud provider were probably the ones that took the longest for me to decide how to approach it. Here is the problem:

  • The user can have multiple physical network interfaces and each one of them routing hardware interruptions via a different IRQ number. This increases significantly the chances of having packets processed by a different CPU. In my case, my wireless card has its packets processed only by the CPU3 with interruptions being routed through IRQ 18 as shown below.
  • Even if the user has a single interface, there are some tunings that can make IRQs to be balanced across multiple CPUs.

Having a shared variable that can be shared across multiple CPUs and at the same time incremented is not ideal. I was shying away from locks and atomic operations as much as I could and for different reasons.

The approach I took here was to use the DEFINE_PER_CPU macro which is extensively used in the kernel internals. There is an old-but-gold post about it that I believe that one would like to read: https://lwn.net/Articles/22911/

The TL;DR; of that article is that per-cpu variables have fewer locking requirements and in our case, we can benefit from that. On the other hand, when using certain functions, we still need to enable/disable scheduler preemption to ensure chaos won’t pay us a visit sometime soon :)

So basically speaking, to increment the counter we increment the CPU-specific variable and to read it, we sum all variables from each CPU and return the result. Interesting, right? You can find the implementation below:

In case you got curious about how DEFINE_PER_CPU macro works and want to know more, I found this material which was an eye-opener https://0xax.gitbooks.io/linux-insides/content/Concepts/linux-cpu-1.html

New network devices becoming available/unavailable

The dynamic nature of the kernel can present chalenges like:

  • What will happen if a user plugs in a new NIC and traffic from cloud providers comes from there?
  • What will happen if network namespaces are created/destroyed while the LKM is already listening for TCP packets? (This was a tip from a person from reddit that reviewed my code. Pretty smart person for sure…I hadn’t even thought about it)

Again, fortunately some crazily smart person forseen it and created some notifiers that you can register to and receive those events from the networking core part of the kernel. Cool right?

https://elixir.bootlin.com/linux/v5.5/source/include/net/net_namespace.h#L407

Things I either haven’t implemented or can be improved

Conclusion

While this isn’t a definite answer to the question asked, I believe that this takes another step into a bit more accurate direction. I had a lot of fun playing with that and I hope you have liked not only the project but also the blog post.

--

--

Paulo Almeida

Interested in technical deep dives and the Linux kernel; Opinions are my own;