Why Collecting the Right Metadata is Crucial for Scaling a Security Program

Author: Gurneet Kaur, Adobe Operational Security Team
Date Published: 23 February 2022

Editor’s note: The following is a sponsored blog post from Adobe:

At Adobe, security is a critical priority for us, and we believe in defense-in-depth, which begins with monitoring — from collecting event logs and configuration data made available by public cloud providers, to logs from EDR systems and vulnerability scanning pipelines. These logs are centrally collected and analyzed by the Adobe security organization using SIEM tools in order to proactively identify potential vulnerabilities or misconfigurations and generate action items (in the form of conveniently trackable tickets) for product teams.

But to do this effectively, it is important to identify who owns a particular set of resources. In an organization with thousands of services and developers, this is not a simple task. Unassigned or misassigned tickets can delay resolution of security issues that could increase exposure to malicious attacks.

So how do we bridge this gap between visibility (using monitoring tools to detect potential risks at the infrastructure layer) and accountability (assigning and remediating these issues in a timely manner)? By collecting the requisite ownership information in multiple places.

Tracking owners at multiple levels to accurately route tickets
At the highest level, teams provision cloud accounts from a centralized cloud account management portal, where they provide details on how they intend to use the requested accounts. Security-relevant pieces of metadata collected include environment (e.g., dev/stage/prod), point(s) of contact, a Jira project to log security tickets, and the name of the team or service requesting the account. This last item is chosen from a service registry, which is essentially a catalogue containing all Adobe services and products. When combined with data from our in-house and commercial tooling, that helps to give us a bird’s-eye view into all resources and activity within Adobe-owned public cloud accounts. This information helps ensure that we have an owner for every single VM, S3 bucket, EKS cluster, etc., that supports Adobe applications on the public cloud.

In addition to global account-level metadata, we encourage teams to tag their low-level resources — including VMs and buckets — with any other relevant information, such as labels they would like on their security tickets, for better tracking and prioritization. Adobe has created a resource tagging standard to ensure this process is carried out with uniformity across diverse teams. While best practices encourage teams to provision separate accounts for their microservices, we do come across cases where services need to share underlying infrastructure and, hence, run in a single cloud account. Here again, resource-level tags help the Adobe security team identify and route potential issues to the right microservice owners.

Addressing challenges in a shared multi-tenant environment
While this accountability model worked well for us in the world of VMs, things got a little tricky with applications migrating to containers. We currently use an in-house Kubernetes-based platform we call “Ethos” to provide a containerized environment with a managed CI/CD pipeline, which allows product teams to focus on their applications and accelerates GTM (go-to-market) strategy.

To help drive adoption, we kept the onboarding process simple: Teams only need to provide their Git repository (containing application code) and a billing cost center for provisioning namespaces on this shared platform, which comprises a large number of Kubernetes (k8s) clusters running on AWS and Azure accounts owned and managed by the Ethos team. In parallel to migrating teams to this shared multi-tenant platform, we began to develop tooling that could provide visibility into the inherently opaque Kubernetes infrastructure. One of these tools, Faros, provides snapshots of all running k8s objects and their complete configuration. In addition, we set up an image scanning pipeline to scan container images for CVEs, achieving good security coverage for Ethos.

Or so we thought. When we took a step back to evaluate the end-to-end pipeline, we realized that the attribution metadata was at cloud account level, which meant we would be routing security issues to Ethos team (the platform provider) and not to the appropriate product or service team responsible for remediating these issues.

Introducing the Argus Project
We had to get creative to resolve this problem. The result is the “Argus” project. The goal of Argus was to attribute each of the tens of thousands of containers running on Ethos to a product team and a Jira project where we could log security tickets. We decided to focus on the information that we did have: Git repositories, some of which were private. In a discussion with the Git team, we found that it was possible to leverage the GitHub APIs to retrieve Git organization administrators for both public and private repositories. Armed with this information, we had the first piece of data in place: points of contact.

Another metadata opportunity was the workflow which teams VPC-peered Ethos cloud accounts with their own accounts in order to give their containers running on a shared platform access to databases and other privately owned resources. This helped us identify the product team owning a given set of namespaces and led us to their Jira projects by virtue of the metadata.

Another data source came from applying string similarity to match “App name,” a free-form text field that teams filled out while onboarding to Ethos, using standard product or service names from a catalog and selecting high-quality matches.

Stitching together all these data sources gave us an attribution of 30 percent when we initially started. We built a UI and periodically e-mailed identified owners to review and fill in the remaining 70 percent of the metadata. A year later, with the help of program managers who tenaciously drove adoption, we are very close to achieving the 100 percent attribution target.

Collaboration and accountability are keys to effective security
Finally, the Adobe security team collaborated with the Ethos and service registry teams to help ensure we were collecting the right metadata and adding validation checks as part of the platform onboarding process, preventing this process from becoming a continually moving target. Again, for cases where teams needed to run multiple microservices within a single k8s namespace, annotations applied on individual pods can override namespace-level defaults and provide more accurate attribution.

In summary, while a lot of attention is paid to enhancing visibility into systems, it is equally important to help ensure the right engineers and teams are aware and accountable for effective security. The lack of good attribution metadata results in a broken pipeline in which security tools may be finding gaps that never reach the product or service team. Collecting attribution metadata in a standardized manner (e.g., fewer text fields and more dropdown menus) helps bridge that gap, thereby improving overall security posture.

About the author: Gurneet is a member of Adobe's Operational Security team that develops and manages tooling to help improve Adobe’s security posture across all of our product operations. She is based in Noida, India. Prior to Adobe Gurneet was an intern at the Indian Space Research Organization (ISRO).