Introduction

SSH provides secure access to Unix-like servers in most enterprises/cloud vendors. It is used to connect to a remote machine and perform activities on that remote machine. To provide SSH access to someone, either a password for the remote host needs to be shared, or the SSH public key of a user needs to be added to the remote host. This becomes a problem when there are too many users and/or remote machines to manage.

Through this tutorial, we will explore how to manage SSH access with the Hashicorp Vault by beginning to understand the problems associated with the traditional approach and how the SSH architecture can be utilized.

Problems with the traditional approach

The traditional approach for managing SSH access has a range of key problems such as:

  • Creating secure passwords for all the machines and storing/sharing them securely is tedious. Not to mention rotation of passwords.
  • Each user's SSH key must be added to all the remote machines. That's too many keys to manage! What if a user leaves the team? Their keys should be removed from all the remote machines. To add a new user to the team, their key needs to be added to all the remote machines they are meant to have access to, which means keeping tabs on access requirements.
  • Managing the keys across multiple systems and cloud environments consistently is complex.

A better solution to the above problems is to use a combination of Hashicorp Vault and SSH certificate authentication. The architecture is designed to solve the above problems along with the following features/outcomes:

  1. Enables identity-based access, where users should authenticate before being granted the ability to SSH into the remote machine.
  2. Role-based access control (RBAC) for SSH access where users can be assigned different permissions on the host machine.
  3. Shorter-lived SSH credentials.
  4. Scales well with a large number of users as well as with multiple machines distributed across different enterprises.

SSH certificate authentication

If you know how SSL (Secure Sockets Layer) certificates work, then SSH certificates work in a similar way to them. The SSH certificate is just a public key signed by a trusted entity called a certificate authority(CA) using its private key. SSH certificates are valid only for a certain period, similar to SSL certs. If they are expired, then they can no longer be used.

The SSH protocol implementations come with many configuration options, but usually, these configurations are not commonly used. One such configuration which we use here is sshd AuthorizedPrincipalsFile.

Hashicorp Vault

HashiCorp Vault is an identity-based secret and encryption management system. A secret is anything that you want tight control access to, such as API encryption keys, passwords, and certificates. The Vault provides encryption services that are gated by authentication and authorization methods.

Vault offers different kinds of secret engines like generic key-value pairs, PKI certificates, Cloud IAM, etc. This architecture will use SSH secrets engine, allowing it to function as our trusted SSH CA.

The architecture

Architecture diagram

The numbers in the diagram above represent the following steps:

  1. User creates a personal SSH key pair.
  2. User authenticates to Vault with OIDC credentials (Vault provides other authentication methods as well).
  3. Once authenticated, the user sends their SSH public key to Vault for signing.
  4. Vault signs the SSH key and returns the SSH certificate to the user.
  5. User initiates SSH connection using the SSH certificate.
  6. Remote Host verifies the client SSH certificate is signed by the trusted SSH CA and allows the connection.

There can be multiple vault roles with levels of permissions. So the SSH certificate can be signed with different parameters and principles depending on the vault role. Once a user successfully authenticates the Vault, a Vault token will be dynamically generated with an attached policy that dictates which services and secrets can be accessed by the user.

Prerequisites

  • A Kubernetes cluster and helm installed.
  • To enable OIDC authentication with Google on your vault server, follow these steps
    • We will use Google as an Oath2 provider. Follow this documentation to create Oath2 credentials.
    • A G-suite account with super-admin role to perform Domain-wide delegation of authority in the Google workspace.
    • Serviceaccount obtained from the previous step.
  • If you want to enable Vault auto unsealing using GCP, create a key-ring and crypto key in GCP. For simplicity, you can create a new service account to read key_ring and crypto_key from GCP. I have added permissions() for the service account created while enabling domain-wide delegation of authority.

Vault installation

We will use helm to install Vault with Highly-Available mode on our cluster. To use the Helm chart, add the Hashicorp helm repository. Before installing Vault, create a Kubernetes secret vault-serviceaccount with the serviceaccount created above. We will mount this secret inside Vault to access the serviceaccount for auto unsealing and OIDC authentication:

helm repo add hashicorp https://helm.releases.hashicorp.com
helm install vault hashicorp/vault --namespace vault -f values.yaml

Use the below values.yaml file in the command:

global:
  enabled: true
  tlsDisable: true

injector:
  enabled: false

server:
  extraVolumes:
    - type: secret
      name: vault-serviceaccount

  auditStorage:
    enabled: true
    storageClass: civo-volume

  dataStorage:
    storageClass: civo-volume

  standalone:
    enabled: false

  ha:
    enabled: true
    raft:
      enabled: true
      config: |
        ui = true
        listener ""tcp"" {
          address = "0.0.0.0:8200"
          tls_disable = 1
          cluster_address = "0.0.0.0:8201"
        }
        storage ""raft"" {
          path = "/vault/data"
          retry_join {
            leader_api_addr = "http://vault-0.vault-internal:8200"
          }
          retry_join {
            leader_api_addr = "http://vault-1.vault-internal:8200"
          }
          retry_join {
            leader_api_addr = "http://vault-2.vault-internal:8200"
          }

          autopilot {
            cleanup_dead_servers = "true"
            last_contact_threshold = "200ms"
            last_contact_failure_threshold = "10m"
            max_trailing_logs = 250000
            min_quorum = 2
            server_stabilization_time = "10s"
          }
        }

        seal ""gcpckms"" {
            credentials = "/vault/userconfig/vault-serviceaccount/vault-serviceaccount.json"
            project     = "<GCP-project-id"
            region      = "global"
            key_ring    = "vault-auto-unseal-key-ring"
            crypto_key  = "vault-auto-unseal-key"
          }
        service_registration "kubernetes" {}
    replicas: 3
ui:
  enabled: true

Once Vault is installed on your cluster, it will create a Kubernetes service called vault-ui of type LoadBalancer with a public IP address. Now exec into one of the pods (vault-0, vault-1, vault-2) and initialize the Vault server:

vault operator init

This will start the Vault server and automatically unseals all the Vault pods. The output of the above command is significant. It contains the root_token and recovery keys. Store this information somewhere secure.

Once Vault is initialized and unsealed, we can configure Vault from our local machine (instead of running the vault command inside pods).

Set this environment variable to configure the Vault cluster from your local machine:

export VAULT_ADDR=http://<public-ip-of-vault-UI-service>:8200

Now, you need to run:

vault login

From your local machine, it will ask for a token. Provide the root_token obtained when we initialized Vault (note that the Root token has access to everything inside Vault, so store it somewhere safe). Once logged in, you can run the commands below to configure Vault.

  1. Enable OIDC authentication:

    vault auth enable oidc
    
  2. Create a vault policy for your team:

    vault policy write sre-policy -<< EOF
    # List available SSH roles
    path "ssh-client-signer/roles/*" {
     capabilities = ["create", "read", "update", "delete", "list"]
    }
    # Allow access to SSH role
    path "ssh-client-signer/sign/*" {
     capabilities = ["create", "read", "update", "delete", "list"]
    }
    EOF
    
  3. Configure OIDC authentication: You will need the clientid, clientsecret and super admin email obtained in the prerequisites steps:

    vault write auth/oidc/config -<<EOF
    {
    "oidc_discovery_url": "https://accounts.google.com",
    "oidc_client_id": "<oidc-client-id>",
    "oidc_client_secret": "<oidc-client-secret>",
    "default_role": "sre",
    "provider_config": {
        "provider": "gsuite",
        "fetch_groups": true,
        "gsuite_service_account": "/path/for/serviceaccount.json",
        "gsuite_admin_impersonate": "<SUPER_ADMIN_EMAIL>",
        "groups_recurse_max_depth": 5
    }
    }
    EOF
    
  4. Create an OIDC role:

    vault write auth/oidc/role/sre -<<EOF
    {
        "user_claim":"sub",
        "oidc_scopes":"https://www.googleapis.com/auth/admin.directory.group.readonly",
        "bound_audiences":"<oidc-client-id>",
        "allowed_redirect_uris":"http://VAULT_EXTERNAL_ADDR/ui/vault/auth/oidc/oidc/callback,http://localhost:8250/oidc/callback",
        "policies":"civo-sre-policy",
        "groups_claim":"groups",
        "ttl":"1h",
        "bound_claims": {
                "groups": ["sre@domain.com"]
            }    
    }
    EOF
    

VAULT_EXTERNAL_ADDR can be your Domain name for vault service.

  1. Create Vault group and group-alias:
vault write identity/group name="sre@domain.com" type="external" \
        policies="sre-policy" \
        metadata=responsibility="SRE Group"

export GROUP_ID="output-from-the-above-command"

vault auth list -format=json  \
        | jq -r '."oidc/".accessor' > accessor.txt

vault write identity/group-alias name="sre@domain.com" \
        mount_accessor=$(cat accessor.txt) \
        canonical_id="$GROUP_ID"

Now you can test your OIDC setup.

Let's assume there is a user called john@domain.com, and they are a part of the GSuite group sre@domain.com. When the above configuration is followed, a Vault login via CLI should look like the following:

$ vault login -method=oidc
...
Key                  Value
---                  -----
token                <TOKEN>
token_accessor       <TOKEN_ACCESSOR>
token_duration       768h
token_renewable      true
token_policies       ["default" "reader"]
identity_policies    ["sre"]
policies             ["default", "sre"]
token_meta_role      sre

Now let's setup ssh-secret engine on Vault.

  1. Mount Vault SSH Certificate Secret Engine and Generate a SSH CA Key Pair. This will be our trusted key which we will distribute across all remote machines. Store the ssh-publickey obtained from this command as it will be used later:

    vault secrets enable -path=ssh-client-signer ssh
    vault write ssh-client-signer/config/ca generate_signing_key=true
    
  2. Create a vault role for signing client SSH keys: We will configure the Vault role to sign and issue SSH certificates with specific configurations based on the user's functional roles to sign the client keys. You can create multiple roles and map vault policies to each role, but we will create one role.

    allowed_users: This is the list of allowed users this CA role will sign for. It will fail if the requester attempts to get a key signed by specifying a different user not in the allowed_users list for that role.

    ttl is where certificate expiry is set when signing an SSH key. In this example, it is set for 30 minutes. Once the certificate expires, a user must authenticate to Vault and request another signed SSH certificate.

    vault write ssh-client-signer/roles/sre -<<EOH
    {
     "allow_user_certificates": true,
     "allowed_users": "sre",
     "allowed_extensions": "",
     "default_extensions": [
     {
     "permit-pty": ""
     }
     ],
     "key_type": "ca",
     "default_user": "sre",
     "ttl": "30m0s"
    }
    EOH
    

Remote machine(host) Configurations

These configurations should be made on a remote machine (VM) to which ssh access needs to be provided for other users in the team.

  1. Create local users on the server. These are the users that clients will use to SSH into the server: bash sudo useradd -m sre
  2. Update Trusted SSH CA Public Key: Add the public key obtained while mounting the ssh-secret engine by navigating to /etc/ssh. If there is no file called trusted-CA.pem then create one: bash echo "public-key" >> trusted-CA.pem
  3. Configure AuthorizedPrincipalsFile File. AuthorizedPrincipalsFile configurations are essential to further control which SSH principals are accepted for certificate authentication. For client authentication to be successful, the principal in the signed SSH certificate must appear in the AuthorizedPrincipalsFile file:

    cd /etc/ssh
    
    mkdir auth_principals/
    echo `sre` > sre
    
  4. Update sshd_config and restart sshd service.

    Add these changes to the sshd_config config file under /etc/ssh:

    AuthorizedPrincipalsFile /etc/ssh/auth_principals/%u
    ChallengeResponseAuthentication no
    TrustedUserCAKeys /etc/ssh/trusted-CA.pem
    

    The AuthorizedPrincipalsFile is the path containing the files listing the accepted principal names. The %u signifies that the file name will be the username of the Linux user.

    Restart sshd service bash sudo service sshd restart

Client configurations

Now we can try to ssh into the remote machine we configured earlier.

  1. Create an ssh key pair. The SSH public key will be signed by the Vault SSH CA and returned. This signed SSH certificate will then be used to connect to the target host(remote machine):

    ssh-keygen -b 2048 -t rsa -f ~/.ssh/vault-test
    ssh-add ~/.ssh/vault-test
    
  2. Login to Vault

    You can either use vault UI or use Vault from the command line to follow the rest of the steps. I will use the Vault command line:

    vault login -method=oidc role=sre
    
  3. Request SSH key signing:

    vault write -field=signed_key ssh-client-signer/sign/sre \
    public_key=@$HOME/.ssh/vault-test.pub valid_principals=sre > ~/.ssh/sre-signed-key.pub
    

    Take note of the validprincipals requested: sre. If any other principal not in the allowedusers list of the Vault SSH CA role is requested, it will fail. This ensures that only authorized lists of SSH principals can be signed, preventing users from requesting other principals used by other teams. The signed key will be stored in ~/.ssh/sre-signed-key.pub

  4. Login into the remote machine. The key signed above will be valid for only 30 minutes. You can change this by changing the TTL while creating a vault role for ssh-signer to test if the signed key is valid:

    ssh-keygen -Lf ~/.ssh/sre-signed-key.pub
    

    To login to a remote machine as a sre user using the signed certificate:

    ssh -i ~/.ssh/sre-signed-key.pub sre@<ip-of-remote-machine>