Home - Teleport Blog - How We Built SELinux Support for Kubernetes in Gravity 7.0 - May 20, 2020
How We Built SELinux Support for Kubernetes in Gravity 7.0
As one of the engineers on the Gravity team here at Teleport, I was tasked with adding SELinux support to Gravity 7.0, released back in March. The result of this work is a base Kubernetes cluster policy that confines the services (both Gravity-specific and Kubernetes) and user workloads.
In this post, I will explain how I built it, which issues I ran into, and some useful tips I’d like to share. Specifically, we will look at the use of attributes for the common aspects of the policy. We will see how to search for and select appropriate builtin interfaces to increase the readability of the policy and how file equivalence rules should be taken into account when working with file labels.
After that, I’ll show you how we are confining Kubernetes services and workloads in our clusters.
A Bit of History
The problem we had was that Gravity was never designed to run on SELinux-enabled hosts and our customers kept randomly breaking their installations when trying to run Gravity with SELinux enabled.
This left us with a only a handful of options to fix the issue:
- Prevent the installation on SELinux-enabled hosts
- Detect whether SELinux was in enforcing mode and disabling it before the installation
- Add proper support for SELinux to our software
We decided to invest in proper SELinux support.
Before embarking on the journey to adopt SELinux, I knew next to nothing about the technology. After spending three months working on it, I still cannot claim to be an expert. However, I’ll go through some of what I did learn and some of the open problems with adapting a complex application to SELinux.
All processes and files have an associated SELinux security context. SELinux provides a whitelisting approach to limit a running application’s access to kernel-provided resources. This implicit denial of all access creates a huge challenge for application developers in providing an exhaustive list of access permissions for complex applications under every possible scenario.
Having experience working with SELinux and the ability to fully exercise the application is essential to developing an SELinux policy to ship with it.
To set the scene, let’s start with the basic information about our software. It is a CNCF-certified Kubernetes distribution with a lot of batteries included. From the SELinux point of view, the software is split in two groups:
-
Monolithic binary for installing, upgrading, and managing a Kubernetes cluster
-
A container to provide consistent runtime environment across different Linux distributions
The first thing that is universally recommended when starting with a new technology is exploring the existing tooling. In the case of SELinux, I quickly stumbled on sepolgen that helps generate the initial stubs for the application. Dan Walsch wrote a post about the tool and it is definitely a good start.
To get started, I generated the stubs with:
$ sepolicy generate --init /path/to/gravity
Loaded plugins: fastestmirror
Created the following files:
/home/centos/selinux/gravity.te # Type Enforcement file
/home/centos/selinux/gravity.if # Interface file
/home/centos/selinux/gravity.fc # File Contexts file
/home/centos/selinux/gravity_selinux.spec # Spec file
/home/centos/selinux/gravity.sh # Setup Script
The command generated 5 files in the current directory using gravity
as the prefix for the new confined domain (gravity_t
):
I’m only going to discuss three of those files - specifically the Type Enforcement file (gravity.te
), the Interface file (gravity.if
), and the contexts file (gravity.fc
).
The gravity.te
file is the main source code with Type Enforcement
rules. This is where I created process and file types and explicitly enabled permissions. Interface files are similar to C header files and have the purpose of providing extension points when someone wants to use your policy. Instead of using the types directly, the users use interfaces to hook into the policy. The gravity.fc
file manages the SELinux security contexts for file resources (regular files, sockets, directories, etc.).
Although there are two syntax flavors (or rather two languages) to develop the policy modules, I developed using a Kernel Policy Language,
which is then compiled to a Common Intermediate Language
(CIL
). You can read more about CIL
here.
Next, I will focus on the things I found helpful while working on the policy.
Use Attributes
An attribute is similar to a base type in other programming languages - it helps create abstractions for common aspects of the policy.
Let’s demonstrate this with an example.
One of the important operational facets of the Gravity binary is, without a doubt, the ability to install a Kubernetes cluster. But once installed, the cluster requires maintenance, and the same binary is used for maintenance tasks afterwards. While the installer facet needs access to the user's home directory (from where the installer tarball is usually extracted) and expects to be able to load policy modules to implement automatic policy configuration, the maintenance facet does not need those permissions.
In order to capture the described semantics in the policy module, we introduce an attribute that will be used to represent the binary in its general case and will use specific types when we need to introduce type-specific semantics (i.e. when we need to differentiate between installation and maintenance).
We’re calling this attribute gravity_domain
, and it applies to all operational aspects of the binary:
attribute gravity_domain;
Now, we introduce the two concrete domains for the two use cases I talked about above:
gravity_domain_template(gravity)
gravity_domain_template(gravity_installer)
We define common permissions by referring directly to the gravity_domain
attribute within the policy file. No matter which use-case is running - the binary will be able to do name resolution, connect to a domain socket, and create outbound connections:
sysnet_dns_name_resolve(gravity_domain)
gravity_container_stream_connect(gravity_domain)
allow gravity_domain gravity_port_type:tcp_socket { connect name_connect };
And now we can use the concrete type, such as our installer type gravity_installer_t
to grant access to home directories on the system, where sysadmins generally copy assets for the installation:
# allow read access to user home directories
userdom_read_admin_home_files(gravity_installer_t)
userdom_read_user_home_content_files(gravity_installer_t)
There's one caveat to attributes though - if a macro uses a construct that operates on types you cannot use an attribute as input. For example typeattribute
would not work when used on an attribute, as is illustrated in the following example.
Given this rule:
kernel_read_messages(gravity_domain)
Compilation results in an error:
gravity.te:352:ERROR 'unknown type gravity_domain' at token ';' on line 22608:
typeattribute gravity_domain can_receive_kernel_messages;
It can be corrected by using concrete types instead:
kernel_read_messages(gravity_t)
kernel_read_messages(gravity_installer_t)
Use interface files for general-purpose functions
Interface files are not restricted to external use only - they can be used to capture common patterns in your own code.
Using the two process domains from before, we use a template we’ll call gravity_domain_template
to help us create the actual domain types and add a set of common permissions to them:
template(`gravity_domain_template',`
gen_require(`
attribute gravity_domain;
attribute gravity_executable_domain;
')
type $1_t, gravity_domain;
type $1_exec_t, gravity_executable_domain;
role system_r types $1_t;
application_domain($1_t, $1_exec_t)
// ...
')
To use this, we invoke it in the Type Enforcement
file:
# create gravity domains
gravity_domain_template(gravity)
gravity_domain_template(gravity_installer)
Let’s break down what we have defined:
Introduce the attributes into the template:
`gen_require(``
attribute gravity_domain;
attribute gravity_executable_domain;
')
Create a new process domain and a new executable type:
type $1_t, gravity_domain;
type $1_exec_t, gravity_executable_domain;
This will create a type using the provided prefix - for gravity
it will be the gravity_t
process domain with the gravity_domain
attribute assigned to it. Additionally, gravity_exec_t
will be the executable type with the gravity_executable_domain
attribute assigned to it.
Allow system role to use the new type:
role system_r types $1_t;
This is important because we have a handful of systemd services, and since system_r
is the default domain for all services, it needs access to the type in order to be able to run it.
Add common permissions:
application_domain($1_t, $1_exec_t)
Here, we list common permissions (i.e. we declare gravity_t
an application domain and specify that the gravity_exec_t
label is the entrypoint to the gravity_t
process domain).
The same will be done for the gravity_installer
prefix, resulting in gravity_installer_t
and gravity_installer_exec_t
types respectively.
Use builtin Interfaces
Making good use of the base policy and its interfaces is a sign of experience when it comes to writing a policy, as interfaces tend to render the policy more readable than lists of raw allow
and dontaudit
rules. Sometimes, interfaces ”know” better - they provide a more complete and correct solution for a set of required permissions. Navigating the existing interfaces can be a challenge, though. This tip comes from a book called SELinux Cookbook by Sven Vermeulen in the form of a set of bash functions to search and display contents of interfaces and defines. You can grab a copy of the functions from here.
Using the functions is trivial - you can simply add them to your bashrc
file.
Here is how they can be used:
$ sefindif 'kernel.*message'
…
kernel/kernel.if: ## Allow caller to read kernel messages
kernel/kernel.if: interface(`kernel_read_messages',`
kernel/kernel.if: attribute can_receive_kernel_messages;
kernel/kernel.if: typeattribute $1 can_receive_kernel_messages;
kernel/kernel.if: ## Allow caller to mounton the kernel messages file
kernel/kernel.if: interface(`kernel_mounton_messages',`
…
And then display the contents of an interface to learn how it’s been implemented:
$ seshowif kernel_read_messages
interface(`kernel_read_messages',`
gen_require(`
attribute can_receive_kernel_messages;
type proc_kmsg_t, proc_t;
')
read_files_pattern($1, proc_t, proc_kmsg_t)
typeattribute $1 can_receive_kernel_messages;
')
The same applies to searching for and displaying of defines (macros):
$ sefinddef 'domtrans*'
define(`spec_domtrans_pattern',`
define(`domtrans_pattern',`
$ seshowdef domtrans_pattern
define(`domtrans_pattern',`
domain_auto_transition_pattern($1,$2,$3)
allow $3 $1:fd use;
allow $3 $1:fifo_file rw_inherited_fifo_file_perms;
allow $3 $1:process sigchld;
')
I follow an algorithm similar to this when looking for the most appropriate interface to use:
Determine the cause of permission failure
For example, the previously introduced gravity_t
domain does not have the permission to run ping
...
$ runcon -t gravity_t /usr/bin/ping
runcon: /usr/bin/ping: Permission denied
$ ausearch --start recent --success no | audit2allow
#============= gravity_t ==============
allow gravity_t ping_exec_t:file entrypoint;
...which indicates exactly that - the domain is not allowed to execute ping.
Search for interfaces to allow ping execution
$ sefindif 'ping_exec'
dmin/netutils.if: interface(`netutils_domtrans_ping',`
admin/netutils.if: type ping_t, ping_exec_t;
admin/netutils.if: domtrans_pattern($1, ping_exec_t, ping_t)
admin/netutils.if: allow $1 ping_exec_t:file map;
admin/netutils.if: interface(`netutils_exec_ping',`
admin/netutils.if: type ping_exec_t;
admin/netutils.if: can_exec($1, ping_exec_t)
…
netutils_exec_ping
seems promising. Then in our policy, instead of directly writing this...
gen_require(`
type ping_exec_t;
')
can_exec(mydomain, ping_exec_t)
...which is also wrong since it does not allow mydomain
to search the host’s bin
directory,
we use the interface instead:
netutils_exec_ping(mydomain)
To me, this comes as a much more readable (and correct!) alternative.
Understand File Context Equivalence Rules
One interesting bit of the SELinux file context configuration is the file context equivalency.
Instead of specifying labels for every path on the system, file context equivalency allows us to apply filesystem labels to a directory based on the label from another directory. As an example, we can define that /run
and /var/run
are equivalent and as such, should apply the same labels.
The effective equivalence rules can be viewed with semanage
(they will be all the way at the bottom):
$ semanage fcontext -l
...
SELinux Distribution fcontext Equivalence
/usr/local/lib64 = /usr/lib
/run = /var/run
/var/home = /home
...
Fcontext equivalency can bite if not taken into account properly.
As another example, let’s say we have a socket file (conventionally stored as /var/run/planet.sock
). Note also, that /var/run
is usually a symlink to /run
. We write the following in our fcontexts
file:
/run/planet\.sock -s gen_context(system_u:object_r:container_var_run_t,s0)
And attempt to match this path to a type:
$ matchpathcon /run/planet.socket
/run/planet.socket system_u:object_r:var_run_t:s0
$ matchpathcon /var/run/planet.socket
/var/run/planet.socket system_u:object_r:var_run_t:s0
For some reason, the lookup returns the base type for the file (i.e. var_run_t
instead of the expected container_var_run_t
). Now, if we used /var/run/planet.sock
instead (the equivalency target), we would get the expected results:
/var/run/planet\.sock -s gen_context(system_u:object_r:container_var_run_t,s0)
$ matchpathcon /run/planet.socket
/run/planet.socket system_u:object_r:container_var_run_t:s0
$ matchpathcon /var/run/planet.socket
/var/run/planet.socket system_u:object_r:container_var_run_t:s0
Teleport cybersecurity blog posts and tech news
Every other week we'll send a newsletter with the latest cybersecurity news and Teleport updates.
SELinux for the Container
As I've mentioned previously, on a Gravity cluster, Kubernetes and system-specific workloads are encapsulated inside a Linux container that we use for consistent Linux distribution regardless of the host distribution.
Because the container is effectively a Linux distribution, my initial thought was to try and reuse SELinux configuration from the host for the container environment. I reasoned that by doing this, I would be reusing the existing process domains and the file labels - all that good stuff - at no cost of additional configuration. This turned out to be problematic for several reasons:
- Host and container can be a different distribution, and hence the file system layouts are not identical.
- Process domain rules might differ with regards to configuration on host and inside the container leading to inconsistent behavior.
- The gravity binary is responsible for extracting and editing the rootfs directory of the system container. By using the same file system labels inside the container rootfs, the binary would gain permissions to edit all system directories on the host file system, defeating the purpose of SELinux isolations.
Next, I looked at the container SELinux policy. The policy is limited to a single SELinux file context and a single process domain out of the box. Since we are using container technology and are logically shipping an entire Linux distribution like a VM, confining to a single type does not facilitate this deployment model and needs extension. Fortunately, extending the policy was easy.
SELinux container configuration resides in the file found at /etc/selinux/targeted/contexts/lxc_contexts
(where targeted
is the name of the base SELinux policy) on the host.
For example on CentOS 7
, the contents of this file are:
process = "system_u:system_r:svirt_lxc_net_t:s0"
content = "system_u:object_r:virt_var_lib_t:s0"
file = "system_u:object_r:svirt_sandbox_file_t:s0"
Why svirt_sandbox_file_t
? Simple - it's an alias for container_file_t
:
$ seinfo -tsvirt_sandbox_file_t
TypeName container_file_t
Aliases
svirt_sandbox_file_t
svirt_lxc_file_t
The process label is an alias for container_t
:
$ seinfo -tsvirt_lxc_net_t
TypeName container_t
Aliases
svirt_lxc_net_t
With standard container runtime implementations such as Docker, the entire container's rootfs is relabeled with the file context from this file. Consequently, every process is executed with the configured process context - with the exception of super-privileged container processes, which run under a different process label or when explicitly overridden on the command line.
Instead, I wanted the container runtime to be able to execute a process in its own domain as specified by the policy. So, for instance, it would run a kubernetes service confined as gravity_kubernetes_t
, but run a Gravity-specific service as gravity_service_t
.
To achieve this, we embed a set of type transitions from an arbitrary process domain and ask SELinux to compute a type transition when required. I opted to use the domain of the systemd init process running inside the Gravity container as the source type for computing the transitions and ended with the following set of rules:
domtrans_pattern(gravity_container_init_t, gravity_kubernetes_exec_t, gravity_kubernetes_t)
domtrans_pattern(gravity_container_init_t, gravity_service_exec_t, gravity_service_t)
domtrans_pattern(gravity_container_init_t, gravity_exec_t, gravity_t)
domtrans_pattern(gravity_container_init_t, gravity_container_systemctl_exec_t, gravity_container_systemctl_t)
domtrans_pattern(gravity_container_init_t, gravity_container_file_t, gravity_container_t)
domtrans_pattern(gravity_container_init_t, container_file_t, gravity_container_t)
Each of them describes a type transition from the init process (running as gravity_container_init_t
) to the fixed set of other process domains we are interested in.
We can ask SELinux to compute a transition to any domain specifying the file path and the source domain of the init process with ComputeCreateContext:
import "github.com/opencontainers/selinux/go-selinux"
// ...
targetDomain, err:= selinux.ComputeCreateContext(
"system_u:system_r:gravity_container_init_t:s0",
"/path/to/binary",
"process")
and spawn a new process inside the Gravity container using the computed domain targetDomain
.
To demonstrate how this would work, let’s say we wanted to start kubelet
confined in its own domain (gravity_kubernetes_t
)
We can inspect the SELinux file label by adding Z
option to ls
:
$ ls -lZ /usr/bin/kubelet
-rwxr-xr-x. 1 planet planet system_u:object_r:gravity_kubernetes_exec_t:s0 111630104 Apr 15 19:33 /usr/bin/kubelet
We can see that kubelet
has gravity_kubernetes_exec_t
as its entrypoint which means the process is supposed to run in the gravity_kubernetes_t
domain according to our policy.
To compute the correct domain we invoke selinuxexeccon which uses the same API as described above:
$ selinuxexeccon /usr/bin/kubelet system_u:system_r:gravity_container_init_t:s0
system_u:system_r:gravity_kubernetes_t:s0
and it correctly reports the domain to use for the kubelet
binary as gravity_kubernetes_t
.
Conclusion
Writing a policy is non-trivial but an extremely rewarding and valuable experience. But we are not done yet - a lot of work still needs to be done.
One of the open questions is the support for user-specific SELinux configuration and policies. Currently, user workloads are all treated the same and use the same generic container process domain. Same is with persistent storage - the file system resources are all labeled with the same container file label.
udica is interesting in this regard as a way to simplify the implementation of custom SELinux support for containers.
SELinux-operator is another interesting project for managing SELinux policies in a Kubernetes cluster as custom resources.
Troubleshooting problems rooted in SELinux requires a lot of experience. We are working on improving our tools to simplify this.
Keeping policy up-to-date especially with the rapidly evolving Kubernetes ecosystem requires that we invest into automation and additional tooling to keep pace with its development.
Tags
Teleport Newsletter
Stay up-to-date with the newest Teleport releases by subscribing to our monthly updates.