Scanning images running in k8s? It's easy if you do it smart.

Mar 16, 2022 · 1396 words · 7 minute read Vulnerability Security AppSec SecOps SafeSCARF k8s

Our first article was about the elimination of the vulnerability mess using proper testing. you can find it here. You can find there a lot of useful information regarding testing and two different approaches applied - Traditional and “Shift left”. To describe this in the most convenient way, we use some real case examples. We would like to continue this way, and because we know that not always you are able to scan images before they are used/running, we will focus on this case now. We also realize over time that in our previous article was “something” missing. We identify what the “something” represents, and we decided to add it by this article, which can be considered as a free continuation of the previous one.

Let’s start and set the scene in a way that we will use Kubernetes (k8s) as an environment where our service(s) will be deployed and run. Every service or application that runs on top of the k8s consists of multiple containers, that are grouped into objects named pods. To eliminate vulnerabilities within the services which we are providing, we would like to not only discover vulnerabilities on the “supply chain” level but also on our containers. Actually, images from which containers are instanced, for our currently running services. It is possible using appropriate scans. That is exactly the point where SafeSCARF can help you in a way that it can periodically scan k8s clusters, and put the results into a single place for further notifications and analysis.

From the beginning, we should manage things to be as automated as possible, to have the minimal human intervention of gathering information about services (there are plenty of reasons behind why it is a good approach), scanning, and importing results to the SafeSCARF. For that reason, we should use one of the CI tools, for example, GitLab (GL) CI, Jenkins, or GitHub actions. But first, let’s try to imagine which steps we need in order to achieve the highest level of automation for the image scanning, especially for the running one. It should not be as difficult as it looks like. Actually, it could be simple:

  1. List all images in the scope (e.g. k8s namespace)
  2. Scan it with one of the image scanners (e.g. Anchore, Trivy, etc.)
  3. Send or import results to SafeSCARF

As the first precondition to make things happen, we need to have an account with enough rights to authenticate against k8s and execute get pods.

NOTE: The best practice is to create a scoped read-only account that will be consumed by CI scripts.

Kubernetes has a nice feature, which allows us to fetch Pod definitions for defining the scope in the format of JSON, for the example namespace:

kubectl get pods -n $NAMESPACE  -o jsonpath='{range .items[*]}{.spec.containers[*].image}{" "}' | tr " " "\n" | sort -u

The above code snippet will produce for us a list of images that are in use for defined namespace, defined within variable $NAMESPACE, format the output and remove redundancies.

quay.io/prometheus-operator/prometheus-config-reloader:v0.50.0
quay.io/prometheus/prometheus
quay.io/prometheuscommunity/postgres-exporter:v0.10.0

In effect, this approach can fly when versions are pinned (fixed docker tags). Then we can scan the particular tag with a selected scanner and push scanning results to the SafeSCARF. But then again, we have to create new engagement for each version of the service which we are running (app version), to track vulnerabilities for selected services. As you can already guess, there is a BUT which is appointing to the bigger problem. It is when the latest tag is in use. In the given example case Prometheus images contain tags latest (even not explicitly defined). So, if we send it for scanning, we will always have results for the last available image version, and not the one we are currently running inside of our k8s cluster (it is the case when there is a newer version of the image in our repository).

There is a way how to pin the version no matter tag latest is in use or NOT. Docker images can be identified not only by using tags, but also using SHA2 digest of the image, so “imageID” can help us here.

  kubectl get pods -n $NAMESPACE  -o jsonpath="{..imageID}" | tr " " "\n" | sort -u

This nice, short and useful command will give us all images that are currently in use, together with their SHA2 digest.

quay.io/prometheus-operator/prometheus-config-reloader@sha256:8b42df399f6d8085d9c7377e4f5508c10791d19cd1df00ab41de856741c65d28
quay.io/prometheus/prometheus@sha256:fb7e3a27469dd7e15f3b181cf510954db04b855722be94448dc23494005c433c
quay.io/prometheuscommunity/postgres-exporter@sha256:edc929fd808c7ee5d0a50d4199ff067cb23f8c7b9551053acdee78ac46ba0305

So now we can pull a particular image by referencing its digest, just like:

docker pull quay.io/prometheus/prometheus@sha256:fb7e3a27469dd7e15f3b181cf510954db04b855722be94448dc23494005c433c
quay.io/prometheus/prometheus@sha256:fb7e3a27469dd7e15f3b181cf510954db04b855722be94448dc23494005c433c: Pulling from prometheus/prometheus
009932687766: Pull complete
ff9264fbb6f4: Pull complete
1ad6d9643fdd: Pull complete
e6f7fea04459: Pull complete
63fc05a36a59: Pull complete
604ad6adddc4: Pull complete
54b552d4bfbc: Pull complete
a23b2328402c: Pull complete
9ec38b0764ed: Pull complete
cf8aa72409ac: Pull complete
7104c0fa6750: Pull complete
b782297611dd: Pull complete
Digest: sha256:fb7e3a27469dd7e15f3b181cf510954db04b855722be94448dc23494005c433c
Status: Downloaded newer image for quay.io/prometheus/prometheus@sha256:fb7e3a27469dd7e15f3b181cf510954db04b855722be94448dc23494005c433c
quay.io/prometheus/prometheus@sha256:fb7e3a27469dd7e15f3b181cf510954db04b855722be94448dc23494005c433c

We can check this hypothesis when ‘latest’ and the SHA256ed images are the same one:

docker image inspect  quay.io/prometheus/prometheus@sha256:fb7e3a27469dd7e15f3b181cf510954db04b855722be94448dc23494005c433c | jq ".[].Id"
"sha256:514e6a882f6e74806a5856468489eeff8d7106095557578da96935e4d0ba4d9d"
 docker image inspect  quay.io/prometheus/prometheus:latest | jq ".[].Id"
"sha256:514e6a882f6e74806a5856468489eeff8d7106095557578da96935e4d0ba4d9d"

Based on the above result, we can say it is nice and helpful. When GitLab CI is used, and we want to be on the safe side, then it would be good to put the names of the images under the quotes (to be sure, that we do not break anything). So now we have:

kubectl get pods -n $NAMESPACE  -o jsonpath="{..imageID}" | tr " " "\n" | sort -u > list.txt
for i in $(cat list.txt |  sed '/^$/d;s/^/"/;s/$/"/g' ); do
  LINES=$LINES$i,
done
LINES=${LINES%?}

Since now, we can just simply iterate over the image list and scan each image for example with Trivy(it is one example of the supported scanners within the SafeSCARF now) and send results to the SVMP (SafeSCARF vulnerability management portal). At this point, you may think that sequential scanning is time-consuming. We can just agree. For this reason, a better approach would be to use parallel jobs. Actually, it means to dispatch jobs per multiple processes using GitLab matrix(Jenkins or GitHub actions are supporting matrices as well). As it goes in life, there are not only advantages but also downsides. For the matrix, we have to consider this minus. Matrix items can’t be loaded dynamically, and have to be predefined. Fortunately, there is a workaround that lies in using a trigger that spins child jobs in the next stage of the pipeline. We are aware that the previous sentence does not make you clear about what and how it should work. Thats why we try to explain what this child’s job actually means. It is just another definition of the job loaded from the YAML file, where we can use a trick to inject matrix items dynamically based on enumerated images.

Template YAML file below.

scan-docker-image-trivy-child:
  stage: test
  image:
    name: registry.safescarf.in.pan-net.eu/trivy
    entrypoint: [""]
  parallel:
    matrix:
      - IMAGE: [{{images}}]
  variables:
    SAFESCARF_HOST: safescarf.example.com
    SAFESCARF_ENG_ID: "ID"
  script: |-
          export TEST_NAME=$(echo $IMAGE | sed s/@.*$//g)
          ci-connector --version
          echo "Doing $IMAGE"
          time trivy image --format json \
            --output "$CI_PROJECT_DIR/trivy-container-scanning-report.json" "$IMAGE"
          ci-connector upload-scan --scanner 'Trivy Scan' --test-name $TEST_NAME -e $SAFESCARF_ENG_ID -f "$CI_PROJECT_DIR/trivy-container-scanning-report.json"

The next step we will explain will do the following. It will replace {{images}} with the list of images we want to scan, and then store it as a GL artifact.

cat .gitlab-ci/sec-test-child | sed "s|{{images}}|$LINES|g" > cat-sec-child.yaml

At this point, we have all defined and prepared. The next stage we can simply run a job that contains a trigger that will execute jobs based on the defined matrix.

scan-docker-image-trivy:
   stage: sec-test
   trigger:
     include:
       - artifact: cat-sec-child.yaml
         job: collect-images

One of the last steps is defined inside of the YAML template. There are defined TEST_NAME(s) which are the equivalents to image names, with an isolated part starting from @. It is because we don’t want to produce a new test name for each update, rather to automatically update the existing test for image scan. Then, based on previously mentioned, the script will execute a scan for each image in the matrix, create a JSON result representative.

Produced pipeline

Produced pipeline

The final step defined inside of the template is the process of pushing results into SafeSCARF and frankly speeking based on the SafeSCARF product design this is very easy, hassle-free, and straightforward.

ci-connector upload-scan --scanner 'Trivy Scan' --test-name $TEST_NAME -e $SAFESCARF_ENG_ID -f "$CI_PROJECT_DIR /trivy-container-scanning-report.json"

From CI-connector version 0.6.2, the test can be defined by its name. So why shouldn’t we take advantage of it and do not call the test by the name of the image?

Produced Trivy results

Produced Trivy results

Now, when we have all results inside of SVMP, we can analyze it. From the analysis, we will be able to apply the proper “Vulnerability Management” strategy and save resources (human, time, and money).

The complete example of the pipeline definition can be downloaded from here.

Dubravko Sever
Production Factory Security Senior Specialist
Martin Zatko
Senior Security Specialist & Product Manager