Navigating Access Challenges in Kubernetes-Based Infrastructure
Sep 19
Virtual
Register Today
Teleport logo

Teleport Blog - Using Datalog to Test for Access - Aug 26, 2021

Using Datalog to Test for Access

by Rui Li

Using Datalog to Test for Access

Introduction

This summer, I was fortunate enough to get an internship at Teleport. Being part of the co-op program at the University of Waterloo, I have worked at many different companies before, and this internship will be my fourth placement as I finish my first term of the third year. The project that I was assigned to was an interesting one.

For some background, Teleport is an Access Platform that unifies access to SSH servers, Kubernetes clusters, web applications and databases across multiple environments. Teleport provides a Role-Based Access Control (RBAC) system that allows cluster administrators to allow or deny access to specific cluster resources such as SSH nodes, applications, databases, etc., based on the assigned labels. This system is similar to other cloud-based providers like AWS and Microsoft Azure and their RBAC systems.

However, even though this system is powerful, role configurations can be pretty complex. Teleport currently does not provide good tools at cluster admins' disposal to troubleshoot any access-related issues. Questions such as "Can Alice SSH into Node x as root?", "Which roles prevent Alice from accessing Node x as root?" and "Which nodes can Alice access?" are hard to answer without understanding the access configuration.

The proposed project's purpose is to implement an access tester for Teleport that allows admins to answer these questions, using Datalog to help answer these queries.

What is Teleport?

As mentioned previously, Teleport is an Access Platform unifies access to SSH servers, Kubernetes clusters, web applications, and databases across multiple environments. Essentially, Teleport provides a convenient and secure way for users to access everything within an infrastructure. Teleport also has many different features that provide insight into all layers of the tech stack, such as the Unified Resource Catalog, Session Recoding, Audit Log, and Access Controls. In particular, we will need to dive deeper into the Role-Based Access Control (RBAC) system within Teleport.

For starters, there are users and roles. Users can be assigned roles that control access to specific resources. In Teleport, we can define this access in the allow and deny sections. These sections are a list of resource/verb combinations, such as node labels and logins. It is important to remember that the deny section always overrides the allow section, and the default behavior of the allow section is to allow nothing. There are many more configuration options for Teleport found in the documentation. In addition, there are role templates that can help avoid manual configuration when people join, leave, or form new teams. These templates can dynamically define access using template variables and user traits. For example, we can list logins as {{internal.logins}} and then further defined per user in the traits configuration. For more information about role templates, there is an official guide.

What is Datalog?

So what is Datalog, and why is it helpful in answering our questions? Well, Datalog is a logic programming language that is a subset of Prolog, and so an everyday use case for Datalog is as a query language for deductive databases. We thought that a querying language is a good fit for answering complex access-related questions in a coherent manner.

Datalog programs are written as a collection of logical constraints which the compiler uses to compute the solution. These constraints are sentences in logical form and are a finite set of facts (which are assertions of the world in which we operate) and rules (which are sentences that allow us to infer new facts from existing ones).

Here is an example of how Datalog works:

route(toronto, ottawa).

route(ottawa, montreal).

Here we have defined two facts: there is a route from Toronto to Ottawa and from Ottawa to Montreal. It is important to remember that the order of the constants do not matter; however, they will make a difference once we start writing the rules. In Datalog, variables usually start with an uppercase letter, whereas literal strings begin with a lowercase letter like above.

path(X, Y) :- route(X, Y).

path(X, Y) :- path(X, Z), route(Z, Y).

We have defined two rules, where we determine that there is a path from X to Y if X has a route to Y and that X has a path to Y if there is a path from X to Z, and Z has a route to Y. Next, breaking down the anatomy of these rules, the path(X, Y) part is called an atom. Each atom consists of the predicate, path in this case, and the variables/constants, which are X and Y. Different sources might use different terminology; for example, rules can be called clauses, and atoms might be called literals. Now, notice the ordering of the variables within the rule makes the routes we defined earlier as one-way routes, meaning there is a path from Toronto to Ottawa, but not from Ottawa to Toronto.

?- path(toronto, X).

?- path(ottawa, Y).

Now we can perform a query on Toronto, where we are essentially asking: "What cities, X, are connected to Toronto?" This query will return Montreal and Ottawa as specified by the rules previously. Similarly, the second query will only return Montreal.

It is essential to remember that Datalog distinguishes between extensional predicate symbols (defined by facts) and intensional predicate symbols (defined by rules). Thus, in our simple example program above, route is an extensional predicate, and path is an intensional predicate. In our implementation for the access tester, the extensional predicates will represent what we already know. Intensional predicates for our tester will be represented by the data from the Teleport API that define the details of the RBAC configuration.. The intensional predicates will represent things that we are trying to infer from the facts, such as the access of users and which roles deny/allow access. If this was confusing, it could help to read through this article that describes Datalog in more detail before moving on.

Next, we will describe a model representing Teleport's RBAC system which we can then query to answer our desired questions.

Research and design

First, we need to determine what extensional predicates we have. In other words, which facts will we need to figure out the accesses later on. For the initial scope of this project, we decided to focus on SSH nodes. However, in the future, we can expect to cover Kubernetes, application, and database access.

PredicateExampleMeaning
HasRole(user, role)HasRole(jean, dev)User 'jean' has role 'dev'
HasTrait(user, trait_key, trait_value)HasTrait(jean, login, dev)User 'jean' has the login trait 'dev'
NodeHasLabel(node, label_key, label_value)NodeHasLabel(node-1, environment, staging)SSH node 'node-1' has the label 'environment:staging'
RoleAllowsNodeLabel(role, label_key, label_value)RoleAllowsNodeLabel(dev, environment, staging)Role 'dev' is allowed access to SSH nodes with label 'environment:staging'
RoleDeniesNodeLabel(role, label_key, label_value)RoleDeniesNodeLabel(bad, environment, production)Role 'bad' is denied access to SSH nodes with label 'environment:production'
RoleAllowsLogin(role, login)RoleAllowsLogin(admin, root)Role 'admin' can login as os user 'root' to SSH nodes
RoleDeniesLogin(role, login)RoleDeniesLogin(dev, root)Role 'dev' cannot login as os user 'root' to SSH nodes

We capture most of the RBAC configuration for SSH nodes with these facts. From here, we can develop the rules:

HasAllowNodeLabel(Role, Node, Key, Value) <- RoleAllowsNodeLabel(Role, Key, Value),
NodeHasLabel(Node, Key, Value);

HasDenyNodeLabel(role, node, key, value) <- RoleDeniesNodeLabel(role, key, value),
NodeHasLabel(node, key, value);

These rules will determine which labels are denied and allowed for a node.

HasAllowRole(User, Login, Node, Role) <- HasRole(User, Role),
HasAllowNodeLabel(Role, Node, Key, Value), RoleAllowsLogin(Role, Login),
!RoleDeniesLogin(Role, Login);

HasAllowRole(User, Login, Node, Role) <- HasRole(User, Role),
HasAllowNodeLabel(Role, Node, Key, Value), HasTrait(User, login_trait, Login),
!RoleDeniesLogin(Role, Login), !RoleDeniesLogin(Role, login_trait);

HasDenyRole(User, Node, Role) <- HasRole(User, Role),
HasDenyNodeLabel(Role, Node, Key, Value);

HasDeniedLogin(User, Login, Role) <- HasRole(User, Role),
RoleDeniesLogin(Role, Login);

HasDeniedLogin(User, Login, Role) <- HasRole(User, Role),
HasTrait(User, login_trait, Login), RoleDeniesLogin(Role, login_trait);

These rules determine whether a user has a role that allows/denies them access to a specified node. Note that we have separate rules for role configurations involving login traits since we can consider those login traits separately from explicitly defined logins.

The next few rules will be what is queried in the end. This will be the determination of whether a user has access to a specified node as a login, whether a user is denied access to a specific node or denied with a specific login, and which roles are denying the user access.

HasAccess(User, Login, Node, Role) <- HasAllowRole(User, Login, Node, Role),
!HasDenyRole(User, Node, Role), !HasDeniedLogin(User, Login, Role);

DenyAccess(User, Login, Node, Role) <- HasDenyRole(User, Node, Role),
HasTrait(User, login_trait, Login);

DenyAccess(User, Login, Node, Role) <- HasDenyRole(User, Node, Role),
HasAllowRole(User, Login, Node, Role);

DenyLogins(User, Login, Role) <- HasDeniedLogin(User, Login, Role);

There’s a lot to take in, but if we walk through each rule carefully, we can see that we’ll be able to infer which nodes a user has access to, which logins a user can use, and if the user is denied access, which nodes/logins are denied and which roles are denying the user access. We will be able to answer the motivating questions from before, and all that’s left is the implementation.

Picking the approach

With Datalog being very esoteric, there were not many libraries available, especially in Go. The lack of usable libraries made implementation tricky, as we could not just plug in a Go Datalog library into Teleport. There are a few options that we could potentially implement, with each having its pros and cons.

One option would be to use a real-time deductive database such as Datomic. However, that means the end-user will have to set this up either on-premises or within the cloud. This option would also be overkill for our use case and create unnecessary overhead, so we decided to keep looking.

A second option we considered was extending existing Go Datalog libraries or writing our own Datalog library. Unfortunately, there were not many Go Datalog libraries, and the decent ones did not implement negation. Datalog with negation involves stratification, essentially not allowing recursion involving negation. Evaluating stratified Datalog is an entire topic altogether and would require a good understanding of how the Go Datalog libraries evaluate Datalog under the surface. This option is not horrible, but with my basic knowledge of Datalog and Go, this could easily go out of scope quickly.

Finally, we could use an existing Rust library, Crepe, and call Rust from Go to integrate directly with tctl. This option is very appealing since there were many well-implemented Rust Datalog libraries, surprisingly. Crepe was one such library that uses the semi-naive method of Datalog evaluation while also providing stratification for negation. It had everything we needed while also being highly performant.

Now, to figure out how to build the Rust and Go interop.

Rust & Go

Calling Rust from Go involves using Rust FFI and cgo to call Rust from C and Go from C. We would compile the Rust program as a static library and link that using a small C header file to the Go program. With primitives such as numbers, it is straightforward. However, once we start allocating memory on either the Rust or Go side and passing more complex structs, we need a better way of managing memory and accessing struct members.

One such method is called the opaque pointer method, where we will pass a pointer of the struct type, but it will not be a concrete structure on the Go end. So to access this result struct, we would call functions from Rust that would return the actual struct members. At the end of this process, we must remember to clear the memory on both the Go side for the C pointers and the Rust end for the heap allocating structs and byte buffers. An example of this is a status struct that is returned to Go land, and then subsequently freed on the Rust end:

Rust

#[repr(C)]
pub struct Status {
    num_field: i32,
    error: i32
}

#[no_mangle]
extern “C” fn some_function(...) {
    ...
    Box::into_raw(Box::new(Status{
        num_field: 1,
        error: 0
    }))
}

#[no_mangle]
extern "C" fn drop_status_struct(status: *mut Status) {
    if status.is_null() {
        return;
    }
    unsafe {
        Box::from_raw(status);
    }
}

Go

status := C.some_function(...)
defer C.drop_status_struct(status)
… do some stuff with status

And the opaque pointer method would look something like this with the Status struct defined above (Note that we don’t need to know what fields the Status struct holds; instead, we call getter methods that will return the primitives to us directly):

Rust

#[no_mangle]
pub extern "C" fn status_num_field(
    status: *mut Status
) -> i32 {
    if status.is_null() {
        return 0
    }

    let st = unsafe {
        &*status
    };
    st.num_field
}

#[no_mangle]
pub extern "C" fn status_error(
    status: *mut Status
) -> i32 {
    if status.is_null() {
        return 0
    }

    let st = unsafe {
        &*status
    };
    st.error
}

Go

status := C.some_function(...)
defer C.drop_status_struct(status)
numField := C.status_num_field(status)
err:= C.status_error(status)
… Do stuff with the struct fields

The actual meat of the Rust end is the Datalog evaluation, which needs to receive data from the Teleport API, which is on the Go end. We can solve this by using protocol buffers since its binary serialization is much faster than JSON. It also provides a schema for both the Rust and Go programs, so deserialization and parsing of the data is much more structured and consistent. By passing a byte buffer of our protobufs from Go to Rust, we can quickly get all the required data to evaluate accesses for Teleport.

One caveat with this implementation: passing strings between Rust and Go would be more expensive and unnecessary. Notice how Datalog does not care which literals are present — only that they must be uniquely represented. We can then hash all the string values and pass a primitive integer hash to Datalog to evaluate. Doing it this way would function the same, yet more performant. In the end, we can easily map the hashes back to their original string values to output.

Result

The goal of the upcoming tctl access tool is to condense the ability to execute complex access queries into an easy-to-use CLI tool. It will function similarly to the existing tsh ls command that shows cluster nodes, except it displays information about access users have within the cluster.

The results that show up in the first table will indicate that a particular Teleport user has access to a particular node and the role names granting the access. The second table will show denied access for a specified user. There will be three filters that can be used, --user, --login, and --node. Some example usage and the output are shown below.

List all accesses

Command: tctl access ls

User  Login   Node               Allowing Roles
----- ------- ------------------ --------------
bob   bob     prod.example.com   admin, dev
bob   bob     secret.example.com admin
bob   bob     test.example.com   admin, dev
bob   dev     prod.example.com   dev
bob   dev     test.example.com   dev
bob   root    prod.example.com   admin
bob   root    secret.example.com admin
bob   root    test.example.com   admin
bob   ubuntu  prod.example.com   admin, dev
bob   ubuntu  secret.example.com admin
bob   ubuntu  test.example.com   admin, dev
joe   joe     secret.example.com lister
joe   lister  secret.example.com lister
julia auditor secret.example.com auditor
julia auditor test.example.com   auditor

User  Logins           Node             Denying Role
----- ---------------- ---------------- ------------
bob   admin            *                dev
joe   admin            *                dev
joe   dev, joe, lister prod.example.com lister
joe   dev, joe, lister test.example.com lister
julia julia            *                auditor
julia auditor, julia   prod.example.com auditor
rui   rui              *                intern
rui   rui              prod.example.com intern

List all nodes that bob can SSH into

Command: tctl access ls --user bob

User  Login   Node               Allowing Roles
----- ------- ------------------ --------------
bob   bob     prod.example.com   admin, dev
bob   bob     secret.example.com admin
bob   bob     test.example.com   admin, dev
bob   dev     prod.example.com   dev
bob   dev     test.example.com   dev
bob   root    prod.example.com   admin
bob   root    secret.example.com admin
bob   root    test.example.com   admin
bob   ubuntu  prod.example.com   admin, dev
bob   ubuntu  secret.example.com admin
bob   ubuntu  test.example.com   admin, dev

User  Logins           Node             Denying Role
----- ---------------- ---------------- ------------
bob   admin            *                dev

List all nodes that bob can SSH into as the Linux user dev

Command: tctl access ls --user bob --login dev

User  Login   Node               Allowing Roles
----- ------- ------------------ --------------
bob   dev     prod.example.com   dev
bob   dev     test.example.com   dev

No denied access found.

Determine if bob SSH into node with hostname prod.example.com as the Linux user dev

Command: tctl access ls --user bob --login dev --node prod.example.com

User  Login   Node               Allowing Roles
----- ------- ------------------ --------------
bob   dev     prod.example.com   dev

No denied access found.

Teleport cybersecurity blog posts and tech news

Every other week we'll send a newsletter with the latest cybersecurity news and Teleport updates.

Conclusion

RBAC systems provide a powerful way to define user access for organizations of all sizes. However, troubleshooting the configuration can be a pain, especially as the access system grows. This blog introduces an upcoming Teleport feature that will alleviate the difficulties with troubleshooting RBAC configurations while providing a simple-to-use interface.

The use of Datalog provides an elegant CS solution to this problem that is easily scalable and robust. Without careful consideration and research of different solutions, this project could have easily gone out of scope very quickly. Choosing the right solution with the given constraints makes the project implementation easier to manage and creates an easy yet powerful feature to use.

The access testing feature will be released in an upcoming version of Teleport. Subscribe to our community Slack and Github discussions pages to stay up-to-date with releases.

Tags

Teleport Newsletter

Stay up-to-date with the newest Teleport releases by subscribing to our monthly updates.

background

Subscribe to our newsletter

PAM / Teleport