How to properly manage ssh keys for server access

This article was on the hacker news frontpage. You can find the related discussion here.

Every developer needs access to some servers for example to check the application logs.

Usually, this is done using public-private key encryption where each developer generates their own public-private key pair. The public keys of each developer are added to the authorized_keys file on each server they should have access to.

Painful manual changes

So far so good. However, what happens when one developer leaves the company?

In that case, the public keys of that developer should be removed from all servers. This could be quite a bit of work depending on how many servers they had access to. And even worse, if it’s done manually, there is quite some risk that the key is still forgotten on some server, so the access remains open.

Alternative solutions

There are some commercial and open source solutions out there which want to help out with this problem. The basic idea is that you add and maintain the keys and the access lists on that service and when you remove a key, they will remove it from all your servers.

Sounds good, but it has one very big disadvantage: it’s a potential single source of failure. If someone captures access to that service, they can gain access to all your servers. And if you lose access to that service, you also lose access to all your servers in the worst case.

The solution: signing keys

When I was facing this problem, I asked on HackerNews how others are tackling this problem.

There were some great suggestions and insights from the community and the best solution to the problem seems to be the signing of keys which I will present to you here in detail.

The rough idea

The rough idea is this: You still generate a public-private key pair for each developer. However, you don’t upload the public keys to your servers.

Instead, you sign the public keys with a so-called certificate authority (CA) key which you generate before. This signing simply generates a third certificate file which you give back to the developer and they put it inside of their .ssh/ folder next to the private and public key.

On the servers, you simply tell the server the public key of your CA and the server can detect if a user has a properly signed certificate and only allows access to the developers who have such a signed certificate.

The advantages

When you sign a certificate, you can determine how long that signing is valid. So if you sign it with a validity of 3 months and the developer leaves the company, then after 3 months they won’t have access to any of the servers for sure.

Now you say: well, but I don’t want to sign keys of everyone every 3 months which is a fair point.

One possibility is to automate the process for example by building a service where a user can automatically get a signed certificate when they authorize with their company e-mail and password, but that is beyond the scope of this article.

The simple alternative is that you issue certificates that are valid longer and then if someone leaves the company, you can revoke the certificate, i.e. invalidate it. You can put a list of invalid certificates on your servers and they will not accept the user any more. This could for example be done by having this list on AWS S3 or some other storage and a cronjob on each server that regularly pulls this.

Show me how to do this

Glad that you asked!

It’s actually super simple once you know the drill.

First, you generate a certificate authority public-private key pair of which you should keep the private key very secure:

umask 77                        # you want it to be private
mkdir ~/my-ca && cd ~/my-ca
ssh-keygen -C CA -f ca -b 4096  # be sure to use a passphrase and store it securely

Then on your server you specify that all users signed by your CA are allowed to access the server:

Upload the public key of your CA on your server, e.g. at /etc/ssh/ca.pub
Tell the server to allow access to users signed by it by adding a line to /etc/ssh/sshd_config:

TrustedUserCAKeys /etc/ssh/ca.pub # Trust all with a certificate signed by ca.pub

To make the changes effective, you should reload the ssh service: sudo service ssh reload.

Now if a developer generated their public-private key pair (e.g. ssh-keygen -t ecdsa -b 521), they simply send you their public key (note that you never need to send any private keys around!). Then you sign their public key to generate their certificate:

# Inside your ~/my-ca folder, sign their public key (here: id_ecdsa.pub)
ssh-keygen -s ca -I USER_ID -V +12w -z 1 id_ecdsa.pub

Quick explanation for the different parts:

-s ca - you want to use your CA to sign
-I USER_ID - the id of your user / the username
-V +12w - how long before the certificate expires - here valid for 12 weeks
-z 1 - the serial number of this certificate - can be used to make this particular certificate invalid later, should be unique
id_ecdsa.pub: the public key of the developer which you want to sign

It will generate the certificate id_ecdsa-cert.pub which you can send to the developer and they put it into their ~/.ssh folder next to their public-private key pair.

It gets even better

Sounds cool, right? But you can do even better!

You probably have developers with different experience and different teams and roles and not everyone accessing the same servers.

So let’s add roles into the signing process!

That way, on the server you specify which roles are allowed to access the server and during the signing process you specify the roles of the developer you are signing.

Then, that developer can access all servers matching to their roles.

When you on board a new developer, you only need to generate that one certificate and boom they have access to all relevant servers without adding anything on those servers.

Here is how this looks like schematically:

This is how you configure roles on a server:

First, create the folder to configure access: sudo mkdir /etc/ssh/auth_principals Inside that folder, you can create files with the name of the server user that someone could login as. For example to grant root access to some roles, add the file /etc/ssh/auth_principals/root.

Inside /etc/ssh/auth_principals/root you simply list all roles which should be able to login as root with one role per line:

admin
senior-developer

Finally, configure on the server to use roles by again adding a line to /etc/ssh/sshd_config:

AuthorizedPrincipalsFile /etc/ssh/auth_principals/%u

To make the changes effective, you should reload the ssh service: sudo service ssh reload.

This is how you sign a key with roles (they are added to the certificate):

ssh-keygen -s ca -I USER_ID -n ROLE1,ROLE2 -V +12w -z 2 id_ecdsa.pub

It’s the same as before, but with the -n ROLE1,ROLE2 flag. Important: there can’t be spaces between the comma for different roles!

Now, that developer could log onto any server where ROLE1 or ROLE2 are in an auth_principals file for a user name they try to login as.

Revoking keys

Finally, if you want to invalidate a certificate, you can do that by the user name or the serial number of the certicate (-z flag). It’s recommended to make a list of generated certificates in an Excel spreadsheet or have a database depending on the number of your peeps.

ssh-keygen -k -f revoked-keys -u -s ca list-to-revoke

This is when you already have a revoked-keys list and want to update it (-u flag). For the initial generation, use it without the update flag.

The list-to-revoke needs to consist of usernames (ids) or serial numbers (-z flag during generation) like this:

serial: 1
id: test.user

This would revoke access to the certificate with serial 1 and all certificates with id test.user.

To make the server respect revoked keys, you need to add the generated / updated revoked keys file to /etc/ssh/revoked-keys and configure it again in /etc/ssh/sshd_config:

Warning: make sure that the revoked-keys file is accessable and readable, otherwise you might lose access to your server

RevokedKeys /etc/ssh/revoked-keys

Summary: good ssh key management

In my opinion, this solution is as good as it gets. You have the option to manage the access to your servers via ssh based on roles. You only need to configure your servers once (which roles are allowed to access it). For each new developer, you only need to generate a signed certificate and they immediately have access to all relevant machines matching their role / experience. And when they leave the company, you can revoke their access also in a simple way.

And even if a mishap occurs and a developer leaves without having their access revoked, their certificate will expire after some time, so they also lose access automatically.

For small teams, you can do these steps manually as they are very fast to do; then as you grow, you can automate the certificate signing with a login service based on company authentication details.

Happy ssh-ing!