Skip to content

Add support for X.509 Pod Certificates in GCS Fuse CSI driver#1317

Draft
yangspirit wants to merge 2 commits intoGoogleCloudPlatform:mainfrom
yangspirit:pod_certificate
Draft

Add support for X.509 Pod Certificates in GCS Fuse CSI driver#1317
yangspirit wants to merge 2 commits intoGoogleCloudPlatform:mainfrom
yangspirit:pod_certificate

Conversation

@yangspirit
Copy link
Copy Markdown
Member

What type of PR is this? /kind feature

What this PR does / why we need it: This PR adds support for X.509 Pod Certificates for authentication in the GCS Fuse CSI driver.

Native support for Pod Certificates in OSS Kubernetes (via CSI TokenRequest) is a long-term effort requiring KEPs and substantial development time (6-12 months). This PR implements a workaround to unblock an important customer by leveraging the mutating webhook and refactoring the driver's token exchange logic to use the official STS library.

Which issue(s) this PR fixes:

Fixes b/495760030

Special notes for your reviewer: This PR contains changes in two main areas:

Webhook (pkg/webhook): Added support for annotations gke-gcsfuse/use-pod-certificate and gke-gcsfuse/workload-identity-credential-configmap to inject the necessary environment variables and volume mounts. It also automatically injects volume attributes needed by the driver.
CSI Driver (pkg/cloud_provider/auth): Refactored fetchTokenViaCertificate to use the official Google STS library and resolved issues with 404 (base URL) and 400 (missing certificate chain in payload) errors.
All unit tests for the modified pkg/webhook package passed successfully.

Does this PR introduce a user-facing change?:

release-note
Added support for X.509 Pod Certificates in GCS Fuse CSI driver via new pod annotations gke-gcsfuse/use-pod-certificate and gke-gcsfuse/workload-identity-credential-configmap.

@google-oss-prow
Copy link
Copy Markdown

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@google-oss-prow
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: yangspirit

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for using pod certificates to authenticate with GCS via the Security Token Service (STS). Key changes include the implementation of a certificate-based token source, updates to the CSI driver to handle certificate secrets, and modifications to the mutating webhook to inject the required volumes and environment variables based on pod annotations. The review feedback highlights several improvement opportunities, such as handling errors during annotation parsing, implementing nil checks for Kubernetes Secret data to prevent panics, removing unused function parameters, and lowering the verbosity of diagnostic logs to reduce noise.

// Inject GCP workload identity credential config configmap and token/certificate.
usePodCertificate := false
if usePodCertStr, ok := pod.Annotations[GcsFuseUsePodCertificateAnnotation]; ok {
usePodCertificate, _ = ParseBool(usePodCertStr)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error from ParseBool is ignored here. It's better to handle it and return an admission error if the annotation value is invalid, ensuring consistency with how other boolean annotations are processed in this file. Additionally, ensure the error message uses a constant format string to satisfy CI requirements.

Suggested change
usePodCertificate, _ = ParseBool(usePodCertStr)
usePodCertificate, err := ParseBool(usePodCertStr)
if err != nil {
return admission.Errored(http.StatusBadRequest, fmt.Errorf("the acceptable values for %q are 'True', 'true', 'false' or 'False'", GcsFuseUsePodCertificateAnnotation))
}
References
  1. When using fmt.Errorf, ensure the format string is constant to avoid non-constant format string in call to fmt.Errorf errors in CI.


if usePodCertificate {
signerName := "kubernetes.io/kube-apiserver-client" // Default signer
filename, credentialConfig, err = appendPodCertificateVolumes(si.K8SClient, pod, configMapName, signerName)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The signerName variable is passed to appendPodCertificateVolumes but is not used within that function. It should be removed from the call site and the function signature to keep the code clean.

Suggested change
filename, credentialConfig, err = appendPodCertificateVolumes(si.K8SClient, pod, configMapName, signerName)
filename, credentialConfig, err = appendPodCertificateVolumes(si.K8SClient, pod, configMapName)

},
}
if usePodCertificate {
sidecarCredentialConfig.GaccEnv = &corev1.EnvVar{Name: "GOOGLE_API_CERTIFICATE_CONFIG", Value: fmt.Sprintf("%s/%s", SidecarContainerWICredentialConfigMapVolumeMountPath, "certificate_config.json")}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The filename certificate_config.json is hardcoded here, while GacEnv uses the dynamic filename determined from the ConfigMap. If the ConfigMap contains a single entry with a different name, or if certificate_config.json is missing, this environment variable will point to a non-existent file.

return filename, credConfig, nil
}

func appendPodCertificateVolumes(client kubernetes.Interface, pod *corev1.Pod, configMapName string, signerName string) (string, *CredentialConfig, error) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The signerName parameter is unused in the function body. Consider removing it if it's not required for the implementation.

Suggested change
func appendPodCertificateVolumes(client kubernetes.Interface, pod *corev1.Pod, configMapName string, signerName string) (string, *CredentialConfig, error) {
func appendPodCertificateVolumes(client kubernetes.Interface, pod *corev1.Pod, configMapName string) (string, *CredentialConfig, error) {

Comment thread pkg/csi_driver/node.go
// prepareStorageService prepares the GCS Storage Service using the Kubernetes Service Account from VolumeContext.
func (s *nodeServer) prepareStorageService(ctx context.Context, vc map[string]string) (storage.Service, error) {
ts := s.driver.config.TokenManager.GetTokenSourceFromK8sServiceAccount(vc[VolumeContextKeyPodNamespace], vc[VolumeContextKeyServiceAccountName], vc[VolumeContextKeyServiceAccountToken], "" /*audience*/, false)
klog.Infof("prepareStorageService called, VolumeContext: %+v", vc)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Logging the entire VolumeContext at Infof level can be quite noisy and might inadvertently log sensitive configuration data. It's recommended to use klog.V(4) for this type of diagnostic information.

Suggested change
klog.Infof("prepareStorageService called, VolumeContext: %+v", vc)
klog.V(4).Infof("prepareStorageService called, VolumeContext: %+v", vc)

Comment thread pkg/csi_driver/node.go
Comment on lines +503 to +504
certData := secret.Data["tls.crt"]
keyData := secret.Data["tls.key"]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It's safer to verify that the Secret's Data map is not nil before accessing its keys to prevent a panic, and then verify that the keys tls.crt and tls.key exist. If they are missing, tls.X509KeyPair will return a generic error later, making it harder to diagnose the configuration issue.

Suggested change
certData := secret.Data["tls.crt"]
keyData := secret.Data["tls.key"]
if secret.Data == nil {
return nil, fmt.Errorf("secret %s Data is nil", secretName)
}
certData, certOk := secret.Data["tls.crt"]
keyData, keyOk := secret.Data["tls.key"]
if !certOk || !keyOk {
return nil, fmt.Errorf("secret %s is missing tls.crt or tls.key", secretName)
}
References
  1. When using the Kubernetes Go client to read a resource like a ConfigMap or Secret, the returned Data map can be nil. Always perform a nil check before accessing its keys to prevent a panic.
  2. When using fmt.Errorf, ensure the format string is constant to avoid non-constant format string in call to fmt.Errorf errors in CI.

}

func (ts *GCPTokenSource) fetchTokenViaCertificate(ctx context.Context) (*oauth2.Token, error) {
klog.Infof("fetchTokenViaCertificate called with audience: %s", ts.audience)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This log message (and others at lines 119 and 171) is emitted during routine token operations. Using klog.V(4) is preferred for informational messages that aren't critical for standard operation logs.

Suggested change
klog.Infof("fetchTokenViaCertificate called with audience: %s", ts.audience)
klog.V(4).Infof("fetchTokenViaCertificate called with audience: %s", ts.audience)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant