It is interesting to observe how efficient engineers are able to deliver their work in the short period of time. But if we look deeper in the way how they do that, then we can see it is not so simple. Good engineers usually are doing that by having rich professional experience, they will use frameworks with high level of automation. But of course, there is another side of fast delivery, some product will be delivered but with ignoring good security practice, such as ignoring of
strict SSH host checking.
On Pan-Net cloud all services are built on the top of an OpenStack platform as a foundation. OpenStack is open source and API based, that enables a high level of automation of product delivery, by using a series of pipelines that will provision virtual machines (VM), deliver and setup applications that are fitting our needs.
So what seems to be a problem? In cloud, things like virtual instances (compute, storage, network) are delivered via cloud API using provisioning tools (Heat, Terraform), and that’s clear. But then operating system needs to be configured and application should be delivered and configured as well. Ansible can help here, but in that case Linux based system should be manipulated by using SSH as managing protocol.
Ansible will usually use the dynamic discovery of all provisioned instances and will in parallel access to the many provisioned virtual machines at the same time, and then identify itself with SSH private/public keys. This is not all, SSH client by default will try authenticate SSH server as well. To that purpose SSH server during service initialization will generate its key pair, usually stores it inside of
/etc/ssh/ folder. During connection establishing phase will deliver the public key to the client, that client needs to trust (is visible to the client as FINGERPRINT) to have a valid connection.
But wait; we have new Linux machines, and SSH client doesn’t have trusted established connection (recorded inside of
knows_hosts file) between the SSH client and server so the client needs to accept/reject fingerprint of SSH server. That is blocking deployment automation. To avoid this kind of behavior what an engineer typically does, is to disable
strict SSH host checking in Ansible configuration and treat all SSH servers as trusted.
Basically what this configuration does, it turns off SSH server authentication, that SSH protocol supports, by providing its public SSH key to the client.
Let’s assume, malicious actor can do some tricks as prerequisite, like using standard L2 way of attacking (like ARP poisoning), exploiting some vulnerability in SDN (Software Defined Network) or DNS spoofing and redirect traffic to him-/herself. So let’s analyze what can go wrong:
- Malicious actor can execute MITM and stand in the channel between SSH client and Server, proxy SSH call and steal the private keys.
Actually, this is a wrong assumption, because the SSH private key will never leave the client. The client needs to prove to the server that it’s in the possession of the private key (authentication), with the combination of some tricks that include Diffie-Hellman.
- Another threat that can be addressed is actually destination forging. Upon traffic hijacking, malicious actor can pretend to be “real destination” and simulate authentication handshaking with the client. But in reality will accept any SSH connection and fake authenticate any SSH user.
In this case, malicious actor will not be able to steal private keys, but it is giving him ability to gain all content that is planned to be delivered to VM. This content could contain but not limited to; user credentials, secrets, configurations, and all other sensitive content. So to summarize it gives malicious actor to collect scary content what later can use to exploit operating system or any other bad things as blackmailing.
So, what we can do about it?
There are two options available to mitigate this kind of possibility that threat actor will exploit the vulnerability:
To use cloud-init and
vendor dataof its part, to deliver host private/public keys to the SSH server that are generated outside of the VM. But it potentially opens other challenges like the proper way of handling a freshly generated key pair, and at the end how this kind of solution will scale.
Another more convenient approach is somehow pick up host public key
out of band, for example through the OpenStack API. After some research, we found out that
cloud-init, which is a basic utility that is triggered after the first boot of the VM actually prints public keys into a console that are reflected in the logs.
Proof of concept
For scenario 2, we have all elements we need, public keys and out of band way of collecting public keys. But still, we are missing how to preserve public keys. After some time, logs will be rotated and the public keys will not be available anymore, but maybe we would like to do more customization of VMs later and enable trust with them again. For that reason we can introduce protected Gitlab branch where we can preserve public keys.
know_hosts file format
man sshd command can give us some insight how client constructs
know_hosts file. It simply says:
Each line in these files contains the following fields: markers (optional), hostnames, keytype, base64-encoded key, comment. The fields are separated by spaces.
Later we can see that basically entries could contain
hostname in two different formats; as hashed values to hide names or in plain text form.
Alternately, hostnames may be stored in a hashed form which hides host names and addresses should the file’s contents be disclosed.
Now we have enough material to implement PoC and prove that concept is really working. PoC is implemented using python so for that reason we needed to include
gitlab python modules.
The pre-request is to understand how to use OpenStack API and OpenStackSDK and GitLab API in python code.
Core of PoC is to collect console logs from selected virtual server, so in SDK it is documented as
get_server_console_output(server, length=None) method. Than next step is to extract keys:
In the code above, there could be the case when method don’t return anything, this is the case when logs have been already rotated and unavailable to or script.
python fingerutility.py -i xxxxx gitlab --b know_hosts -k xxxx -u https://gitlab.example.com -p xxx
File known_hosts created inside of branch know_hosts
In example above utility will collect keys for instance (-1), and will store it into gitlab as file known_hosts by using API token (-p).
Now let’s test it:
That is ok, we still don’t have new VM as trusted, but let us pick know_hosts file created and stored on gitlab by using fingerutility utility.
So from the output, we can see that our PoC is working, and it is up to us to adapt it and integrate into our process of automation.