Service accounts on Google Cloud are used when a workload needs to access resources or conduct actions without end-user involvement. There are multiple methods of authenticating using service accounts, including using service accounts as part of Google Compute Engine instances, impersonating service accounts, or using service accounts with a key file — an option which should be carefully considered.
A common objective is to achieve keyless service account architectures on Google Cloud, but this can be difficult across an entire organization. There are a number of reasons why teams may opt to generate service account keys, ranging from developer validation to third-party software integration requirements.
In this post, we will look at ways you can reduce risk when the use of service account keys can’t be avoided. We’ll focus on providing insights into understanding the usage of a service account within Google Cloud, which can enable us to reduce risk of unintended application failures when rotating service account keys. Let’s get started!
Guidance from the CIS Benchmarks
The Center for Information Security (CIS) Benchmarks provide a set of recommended security hardening guidelines. With respect to Google Cloud Platform, CIS has most recently published the CIS Google Cloud Platform Foundation Benchmark version 1.2.0 which offer a series of controls, descriptions, rationales, impacts, audit steps, and remediations to improve your overall security posture across foundational GCP services.
Let’s take a look at a direct example from Section 1.7 of the CIS Google Cloud Platform Foundation Benchmark version 1.2.0. The benchmark reads:
“Ensure user-managed/external keys for service accounts are rotated every 90 days or less (Automated)”
This section is accompanied by a rationale, impact, CLI commands and more, providing valuable insights into the what, the why and how you can meet this benchmark. CIS indicates that the rotation of service account keys “reduces the window of opportunity” for access that is associated with a potentially compromised account. The remediation is to rotate your service account keys:
- Audit and identify keys older than 90 days
- Delete keys in scope
- Create a new key (if needed)
In referencing the above 3 steps, the process is seemingly straightforward. However, there are nuanced use cases that you should be aware of.
When deleting a Service Account Key, you effectively eliminate all current and future access for the associated private key. This can result in unintended consequences, such as loss of access for applications, pipelines, or third-party integrations that were dependent on the underlying keyfile. So how can we guard against this?
Investigating the access rights and usage of a Service Account
One method is to conduct an investigation of access and usage of the GCP Service Account and Service Account Key. Let’s bring in 3 GCP services: Policy Analyzer, Policy Intelligence, and Cloud Logging. This tooling can help us identify the impact of deleting our intended service account key.
For our investigation, we should consider the following questions:
1. What can this service account do? (Policy Analyzer)
IAM Roles in GCP define what a service account can do. Roles are inherited hierarchically from the organization node to the folders to the projects. IAM roles can be defined on many resources within a GCP project as well, including GCS buckets, KMS key rings, Service Accounts and more.
Writing a script to determine all IAM bindings that a service account has can be quite tedious. The pseudocode could look like:
for ALL folders in the organization:
for ALL projects in each folder:
for ALL resources in each project:
get and review EACH resource's iam policy
get and review the project iam policy
get and review the folder iam policy
get and review the organization iam policy
If your organization happens to have more than just a few resources, this can become far too tiresome to meaningfully process. So what can we do? That’s where Policy Analyzer comes in to help save the day! Policy Analyzer enables access visibility for audit-related tasks and allows for queries across the entire organization. The following screenshot shows the key components of a Policy Analyzer Query, the Query scope, the Principal (or Service Account), and a set of advanced options. The queried result will be the set of all roles for all resources that the service account has been granted across the entire organization.
For ease and reuse, here is a templated link: https://console.cloud.google.com/iam-admin/analyzer/query;identity=serviceAccount:[YOUR_SA]@[YOUR_PROJECT_ID].iam.gserviceaccount.com;expand=groups;scopeResource=[YOUR_ORG_ID];scopeResourceType=2/report?organizationId=[YOUR_ORG_ID]&supportedpurview=project
This query can assist significantly in determining the range of access that this service account may have. The service account could have access in a single GCP project, access at the organization level, or access across arbitrary resources. Using Policy Analyzer enables us to fully understand where our service account may be used.
2. When was this Service Account last used? (with Policy Intelligence)
Understanding when our service account was last used can also be valuable. For example, if our service account has not been used in the past year, it is likely a safe assumption that it can be deleted. The Policy Intelligence service allows us to query the activity of a service account. The command supports activity for Service Account Last Authentication or Service Account Key Last Authentication for a given GCP project. Here’s an example:
gcloud policy-intelligence query-activity --activity-type=serviceAccountLastAuthentication --project [YOUR_PROJECT_ID] --query-filter='activities.full_resource_name="//iam.googleapis.com/projects/[YOUR_PROJECT_ID]/serviceAccounts/[SERVICE_ACCOUNT_EMAIL]"'
Knowing that the service account authenticated during a given observation period is an indicator of whether this service account has been used. If a service account has recently been used (I’ll leave the definition of recent up to you and your organization), you may want to exercise more caution before deleting the service account key.
Additionally, using Policy Intelligence to identify the last authentication for a key can provide even more granularity if you happen to have more than one service account key (this is not recommended practice). On the other hand, if we do not see recent activity for the queried service account, we should have more confidence that deleting the service account is unlikely to have unintended consequences.
Policy Intelligence is project-scoped, so if our service account has roles across multiple projects, we will need to run this query within each project. Also, Policy Intelligence returns a result from a given observation period. The observation period may not include the most recent activity (such as an activity from a few minutes ago), so we can also refer to Cloud Logging…
3. What has this service account done recently? (Cloud Logging)
In order to determine everything our service account has done recently, we will need to leverage Cloud Logging. We will query for a list of activities over a specific timeframe. We can use the following query:
For a given service account key, we can go one step further and run the following logging query.
When conducting this Cloud Logging-based investigation, there should be a few watchpoints. First, not all log types in GCP (for example, data access logs or VPC flow logs) are enabled by default. If we want to see that level of log type granularity, we would need to ensure logs are enabled on the corresponding resources. Second, querying across multiple projects may be tedious, so we may wish to make a risk-based decision around the key rotation or alternatively, export our logs to a centralized location through the use of a logging sink and analyze them holistically in BigQuery or a similar tool.
You should now have a good sense of some of the risks around service account key use and strategies to mitigate them. First, try to avoid service account key creation whenever possible. Using GCP-managed options such as workload identity, the virtual machine service account or service account impersonation can limit the number of use cases requiring generation of service account keys.
Second, if you do have user-managed service account keys, ensure that you rotate the keys. 90 days is a reasonable baseline, but ultimately, it would be better to automate the rotation of keys more frequently than 90 days.
Finally, if you have a project with service account keys that you do not know the intended usage of (e.g., as might happen in a single project shared across many users), leverage the 3 GCP tooling recommendations of log queries, Policy Analyzer, and Policy Intelligence to reduce the chance of unintended application failures occurring upon key deletion.
While these recommendations do not provide a full-proof mechanism for identifying usage of your service accounts, they should improve your confidence in your ability to safely rotate your service account keys and reduce risk in the process.
By: Garrett Wong (Strategic Cloud Engineer)
Source: Google Cloud Blog