Arjan Singh Bal [Fri, 26 Dec 2025 06:24:09 +0000 (11:54 +0530)]
alts: Fix buffer alignment with 16KB records (#8791)
gRPC Go receives ALTS records of max 16KB under high load (see
https://github.com/grpc/grpc-go/pull/8512#issuecomment-3193280949 for
details). When the ALTS conn has a partial encrypted frame in its
buffer, it attempts to copy the frame to the beginning of the buffer to
read the remainder.
When using a buffer of exactly 32KiB, almost the entire second frame of
16KiB is stored, but not the full frame. As a result, a large copy of
~16KiB is performed.
## Solution
This PR increases the read buffer length by 512 bytes to ensure two
entire 16KiB frames can be stored. This ensures that usually, only ~512
bytes needs to be moved to the front.
## Benchmark
In a GCS directpath benchmark downloading files in a loop, the time
spent on memory copies in the ALTS code is eliminated, saving ~3.7% of
CPU time.
### Before
<img width="1532" height="480" alt="image"
src="https://github.com/user-attachments/assets/c2051b7f-e828-44d6-a82a-0d444cace2df"
/>
### After
<img width="1550" height="690" alt="image"
src="https://github.com/user-attachments/assets/5715c5b8-b163-474c-81a6-5bcc6808d2ff"
/>
eshitachandwani [Fri, 19 Dec 2025 21:13:21 +0000 (02:43 +0530)]
xds: Add cluster endpoint watchers to depndency manager (#8744)
This is part of A74 implementation which add CDS/EDS/DNS watchers to the
dependency manager. It also adds a temporary flag that is disabled by
default so that it is not used in the current RPC paths , but enabled in
the dependency manager tests.
Madhav Bissa [Thu, 18 Dec 2025 06:38:39 +0000 (12:08 +0530)]
stats/otel: a79 scaffolding to register an async gauge metric and api to record it- part 2 (#8755)
A79: This change introduces the API surface required to support
asynchronous metrics (e.g., OpenTelemetry Observable Gauges) in gRPC-Go.
This change updates the internal MetricsRecorder interface to support
registering asynchronous metric reporters. This is the second of three
PRs. It establishes the contracts and wiring without adding the
OpenTelemetry implementation logic. This functionality is required to
support OpenTelemetry Observable Gauges, which allow components like RLS
and xDS to report stateful metrics (e.g., current active requests) via
callbacks.
RELEASE NOTES:
* stats/otel: MetricsRecorder interface updated to include a new method
RegisterAsyncReporter that registers a AsyncMetricReporter.
* stats/otel: AsyncMetricReporter is added which is an interface for
types that record metrics asynchronously.
Pranjali-2501 [Tue, 16 Dec 2025 02:47:44 +0000 (08:17 +0530)]
xds: refactor xdsresource.Endpoint to add resolver.Endpoint (gRFC A81) (#8750)
This PR updates the internal `xdsresource.Endpoint` struct to contain a
`resolver.Endpoint` instead of a `[]string` to store the list of
addresses associated with the endpoint [gRFC
A81](https://github.com/grpc/proposal/blob/master/A81-xds-authority-rewriting.md).
This change standardizes how backend information is stored and ensures
that attributes (such as Hostname) are correctly associated with the
endpoint hierarchy.
### Key Changes:
**Struct Update:**
* `xdsresource.Endpoint` now includes a `ResolverEndpoint` field (of
type `resolver.Endpoint`) to store addresses and attributes. Remove the
existing `Address` field (of type `[]string`) and store address as a
`resolver.Endpoint` field.
**Attribute Handling:**
* Added `SetHostname` and `GetHostname` helpers to manage hostname
metadata within `resolver.Endpoint.Attributes`.
**Parsing Logic:**
* Updated `parseEndpoints` in `unmarshal_eds.go` to correctly populate
the `resolver.Endpoint` object.
eshitachandwani [Tue, 16 Dec 2025 02:40:13 +0000 (08:10 +0530)]
xds: move e2e tests in clusterresolver package to cdsbalancer package (#8768)
This PR change the tests in e2e_test package in `clusterresolver` folder
to be e2e instead of configuring `clusterresolver `LB policy as high
level policy. Also moved the tests to `cdsbalancer` package as we will
remove `clusterresolver` package in later PR.`
credentials/tls: Strip port before validating authority override (#8726)
Fixes: https://github.com/grpc/grpc-go/issues/8719
Splits the host and port in the HTTP2 `:authority` header in
`ValidateAuthority` before calling `VerifyHostname`. Added test cases to
check multiple different types of values possible for `:authority`
RELEASE NOTES:
* credentials/tls: strip port before validating authority override.
Joy Bestourous [Mon, 15 Dec 2025 06:25:03 +0000 (01:25 -0500)]
transport: add status details even when aborting early (#8754)
Modifies `earlyAbortStreamHandler` to include status details if present.
Most use cases of `earlyAbortStreamHandler` are for circumstances where
there are certainly no error details (bad HTTP methods, bad
content-type, internal error, etc). However, tap handlers also typically
go through the `earlyAbortStreamHandler`. In `http2_server.go`:
```
if t.inTapHandle != nil {
var err error
if s.ctx, err = t.inTapHandle(s.ctx, &tap.Info{FullMethodName: s.method, Header: mdata}); err != nil {
t.mu.Unlock()
if t.logger.V(logLevel) {
t.logger.Infof("Aborting the stream early due to InTapHandle failure: %v", err)
}
stat, ok := status.FromError(err)
if !ok {
stat = status.New(codes.PermissionDenied, err.Error())
}
t.controlBuf.put(&earlyAbortStream{
// ...
status: stat, // <-- CAN have details!
})
```
Yet the handler does **not** include error details by default, limiting
how tap handlers can be used and breaking some user assumptions
surrounding which information is propagated.
This PR fixes this by checking for status details and including the
header for them if present.
RELEASE NOTES:
* transport: propagate status details from tap handlers.
Note that this PR only implements the LB policy and does not implement
the xDS integration specified here:
https://github.com/grpc/proposal/blob/master/A68-random-subsetting.md#xds-integration
RELEASE NOTES:
- balancer/randomsubsetting: Implementation of the `random_subsetting`
LB policy
Arjan Singh Bal [Fri, 12 Dec 2025 06:24:48 +0000 (11:54 +0530)]
github: Verify PR description before labels (#8767)
Release notes in the PR description must be added by authors, whereas
labels can only be added by maintainers. Moving the description check to
the top ensures authors can see and resolve errors before labels are
applied.
Tom Wieczorek [Thu, 11 Dec 2025 06:06:34 +0000 (07:06 +0100)]
gracefulswitch: Wait for all goroutines on close (#8746)
Goroutines spawned during balancer swaps could outlive the call to
Balancer.Close(). Monitor these via a wait group and wait for them to
finish before returning from Close(). This prevents any noticeable side
effects that could otherwise occur after Close() returns.
Pranjali-2501 [Thu, 11 Dec 2025 05:53:21 +0000 (11:23 +0530)]
xds/resolver: pass route's auto_host_rewrite to LB picker (gRFC A81) (#8740)
This PR implements the ConfigSelector changes required for [gRFC
A81](https://github.com/grpc/proposal/blob/master/A81-xds-authority-rewriting.md).
It ensures that the `auto_host_rewrite` field from the xDS Route
Configuration is correctly propagated through the resolver and made
available to the Load Balancer picker via the RPC context.
### Key Changes:
* Pass the `AutoHostRewrite` field value from `Route` struct via RPC
context.
* Add helper functions for `AutoHostRewrite` in
`internal/xds/balancer/cluserimpl/picker.go`.
* Update `ConfigSelector.SelectConfig` to pass the `AutoHostRewrite`
boolean in RPC context.
- Add `grpc.WithAcceptedCompressionNames` so a client can explicitly cap
the `grpc-accept-encoding` header to a vetted subset of registered
compressors
- Ensure the new value is propagated to the proper code while still
appending per-call legacy
compressors when needed
Updates #2786.
RELEASE NOTES:
* client: Add `experimental.AcceptCompressors` so callers can restrict
the `grpc-accept-encoding` header advertised for a call.
---------
Signed-off-by: Israel Blancas <iblancasa@gmail.com>
Pranjali-2501 [Thu, 11 Dec 2025 05:40:20 +0000 (11:10 +0530)]
xds/clusterimpl: update TestChildPolicyChangeOnConfigUpdate to use custom lb policy. (#8730)
Fixes #8703
### Description
This PR fixes the data race in `TestChildPolicyChangeOnConfigUpdate`.
### Changes
* **Isolated Policy:** Instead of overwriting the global `pick_first`
policy (and triggering a race condition via `Unregister`), the test now
registers a unique custom stub policy named `test_pick_first`.
* **TypedStruct Configuration:** Updated the management server resource
update to use `v3xdsxdstypepb.TypedStruct`and specify the custom
`type.googleapis.com/test_pick_first` policy via TypeURL.
client: Change connectivity state to CONNECTING when creating the name resolver (#8710)
Fixes https://github.com/grpc/grpc-go/issues/7686
#### Current Behavior
- When client exits IDLE and creates the name resolver, it stays in IDLE
until the connectivity state is set by the LB policy.
- When exiting IDLE mode (because of `Connect` being called or because
of an RPC), if name resolver creation fails, we stay in IDLE.
#### New Behavior
- When the client exits IDLE and creates the name resolver, it moves to
CONNECTING. Moving forward, the connectivity state will be set by the LB
policy.
- When exiting IDLE mode (because of `Connect` being called or because
of an RPC), we have already moved to CONNECTING (because of the previous
bullet point). If name resolver creation fails, we will move to
TRANSIENT_FAILURE and start the idle timer and move back to IDLE when
the timer fires
#### Implementation details:
- The client channel now treats resolver build errors encountered during
exiting IDLE identically to resolver errors received prior to valid
updates.
- `Build` uses a new unsafe API on the idleness manager to mark the
channel as exited IDLE.
- The idleness Manager invokes the channel's `ExitIdleMode` (which now
does not return an error) and updates internal state to reflect that it
is no longer in IDLE.
- `OnFinish` call options are now invoked even if stream creation fails
during an RPC. This fulfills the guarantee for these options and ensures
the idleness Manager’s `activeCallsCount` remains accurate.
RELEASE NOTES:
- client: Change connectivity state to CONNECTING when creating the name
resolver (as part of exiting IDLE).
- client: Change connectivity state to TRANSIENT_FAILURE if name
resolver creation fails (as part of exiting IDLE).
- client: Change connectivity state to IDLE after idle timeout expires
even when current state is TRANSIENT_FAILURE.
- client: Fix a bug that resulted in `OnFinish` call option not being
invoked for RPCs where stream creation failed.
This PR makes available the `backend_service` (cluster name) label which
is decided in clusterimpl (since we are pre-A75). It is added to WRR
per-call metrics.
RELEASE NOTES:
* stats/otel: add backend service label to wrr metrics as part of A89
deps: update golang.org/x/net to v0.47.0 (tagged version) (#8732)
- follow-up to https://github.com/grpc/grpc-go/pull/8657
commit 363018c3d687d40153225f283301d3e491e7c5a4 updated the
golang.org/x/net dependency to unclude some changes that had not yet
been released. Now that v0.47.0 was released, we can switch back to
released versions.
full diff: https://github.com/golang/net/compare/63d1a5100f82...v0.47.0
RELEASE NOTES: N/A
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
eshitachandwani [Wed, 3 Dec 2025 07:33:35 +0000 (13:03 +0530)]
internal/xds: change xds_resolver to use dependency manager (#8711)
This change is part of
[A74](https://github.com/grpc/proposal/blob/master/A74-xds-config-tears.md)
implementation.
This PR removes the listener and route watchers from resolver and
changes it so that we get the resources from xds dependency manager.
RELEASE NOTES:
* xds/resolver:
* Changes the behavior such that getting no matching virtual host in a
route resource will now drop any previous resource and report the error.
* Changes the behavior so that receiving a configuration error (LDS/RDS
ambient error) after a successful update will now only be logged, and
the system will continue using the previous resource to avoid transient
channel failures
Unknown is defined as follows in
[grpc/grpc@master/doc/statuscodes.md](https://github.com/grpc/grpc/blob/master/doc/statuscodes.md)
```
Unknown error. For example, this error may be returned when a Status value received
from another address space belongs to an error space that is not known in this
address space. Also errors raised by APIs that do not return enough error information
may be converted to this error.
```
It also mentions of returning Unknown for parsing errors in the table in
the above doc.
We are currently returning Internal for status parsing errors as well,
which is contrary to what is mentioned in the above spec. This PR
changes it to return Unknown.
RELEASE NOTES:
* transport/client : Return status code `Unknown` on malformed
grpc-status.
Madhav Bissa [Tue, 2 Dec 2025 11:54:45 +0000 (17:24 +0530)]
stats/otel: a79 scaffolding to register an async gauge metric and api to record it- part 1 (#8731)
Addresses
https://github.com/grpc/proposal/blob/master/A79-non-per-call-metrics-architecture.md
This PR creates scaffolding to register an async gauge metric. It adds
an AsyncMetricsRecorder interface that defines the api for recording an
int64 async gauge metric.
RELEASE NOTES:
* stats/otel: Add scaffolding to register an async gauge metric. Add an
AsyncMetricsRecorder interface that defines the api for recording an
int64 async gauge metric.
Pranjali-2501 [Mon, 1 Dec 2025 16:13:52 +0000 (21:43 +0530)]
xdsclient/xdsresource: add AutoHostRewrite and Endpoint Hostname support (#8728)
This PR implements the validation logic and extracting per endpoint
Hostname attributes from xDS resources for [gRFC
A81](https://github.com/grpc/proposal/blob/master/A81-xds-authority-rewriting.md)
### Key Changes:
1. **RDS Resource Validation :**
* The boolean value of `RouteAction.auto_host_rewrite` is extracted from
the RDS resource and stored in route struct
* This field is only set to `true` in the parsed route struct if the
`trusted_xds_server` option is present in the `ServerConfig` and the
global environment variable for authority overriding is enabled.
2. **EDS Resource Validation:**
* The `Endpoint.hostname` field is extracted from the EDS resource and
will be stored as a `hostname` string in parsed endpoint struct. It will
be changed to be an per-endpoint resolver attribute in a follow-up PR.
xds: make it possible to create a StringMatcher from arguments outside of test code (#8723)
Changes in this PR:
- Add a new constructors for StringMatcher that can be shared between test and non-test code. This will be used as part of an internal feature to support ext_authz.
- Create new pointers to match strings instead of using the ones from the proto. This would ensure that the xDS proto structs (which are usually huge) can be garbage collected earlier that currently.
- Fixes a bug involving the regex matcher, which should not be considering the ignore_case field, but was.
RELEASE NOTES:
* xds: Fix a bug in StringMatcher where regexes would match incorrectly when ignore_case is set to true.
Pranjali-2501 [Tue, 25 Nov 2025 19:54:49 +0000 (01:24 +0530)]
xds/bootstrap: add `trusted_xds_server` server feature (#8692)
This PR implements the Bootstrap config changes for [gRFC
A81](https://github.com/grpc/proposal/blob/master/A81-xds-authority-rewriting.md).
Authority rewriting is a security-sensitive feature that should only be
enabled when the xDS server is explicitly trusted to provide such
configuration. gRFC A81 specifies that this trust is indicated by adding
`trusted_xds_server` to the server_features list for a given server in
the bootstrap file.
eshitachandwani [Tue, 25 Nov 2025 11:36:47 +0000 (17:06 +0530)]
xdsclient: call resourceError methods from serializer (#8725)
Fixes an issue where ResourceError methods were incorrectly called
outside of the serializer's scope.
Updates the documentation for the ResourceWatcher interface to
explicitly state the guarantee that all its defined methods will always
be called from the serializer.
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts page](https://github.com/grpc/grpc-go/network/alerts).
Damien Neil [Fri, 21 Nov 2025 22:35:25 +0000 (14:35 -0800)]
dns: drop test depending on invalid URL "dns://::1/foo.bar.com" (#8716)
Go 1.26's url.Parse will reject invalid URLs containing unbracketed
colons in the hostname.
For example, Go 1.25 and earlier are willing to parse the URLs
"https://localhost:80:443" (hostname:"localhost:80", port:443)
and "https://::1" (hostname:":", port:1).
The test TestCustomAuthority contains a case which depends on
url.Parse("dns://::1/foo.bar.com") succeeding. In Go 1.26, this
case will fail.
Drop the test as not exercising a useful path: This URL is invalid
and earlier Go versions being willing to parse it was a bug.
The correct URL is "dns://[::1]/foo.bar.com" (which is also
exercised by TestCustomAuthority).
RELEASE NOTES:
* client: Reject target URLs containing unbracketed colons in the
hostname in Go version 1.26+.
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts page](https://github.com/grpc/grpc-go/network/alerts).
Chris Carlon [Mon, 17 Nov 2025 22:37:38 +0000 (17:37 -0500)]
mem: Allocate at 4KiB boundaries in the fallback buffer pool. (#8705)
By rounding up to the nearest page, we avoid repeatedly allocating
similar sizes if requests happen to arrive in roughly increasing order.
The GCS client sends messages with 2MiB of data repeatedly when writing
a large object. Therefore it has to repeatedly allocate just over 2MiB.
This ultimately results in many, many allocations in the fallback buffer
pool. In practice rounding up yields at least a 10x reduction in RAM
when running 100 concurrent large writes. This is probably not unique to
GCS: anyone who sends large messages may be affected.
This change in simpleBufferPool seems worthwhile vs. adding a tier. We
use simpleBufferPool for any size greater than 1MiB, so this effectively
lets us discover a reasonably tight tier around any large message size
that comes in frequently. It increases infrequent allocation sizes by no
more than 0.4%.
RELEASE NOTES:
* mem: round up to nearest 4KiB for pool allocations larger than 1MiB
eshitachandwani [Sun, 16 Nov 2025 09:24:58 +0000 (14:54 +0530)]
internal/xds: move the LDS and RDS watchers to dependency manager (#8651)
This PR moves the LDS and RDS watchers to dependency manager without
changing the current functionality or behaviour. This is a part of
implementation of gRFC
[A74](https://github.com/grpc/proposal/blob/master/A74-xds-config-tears.md).
test: use the connectivity state watcher API (#8708)
The current tests are using an LB policy to record the state transitions
of subchannels. But these tests are meant to test the connectivity state
transitions of the channel.
Also, in a follow-up PR I will be making the change to transition the
channel to CONNECTING as soon as it exits IDLE
(https://github.com/grpc/grpc-go/issues/7686). With that change, the
current test don't work anymore. So, we will have to change these tests
anyway.
So, I took the opportunity to clean up them and use the connectivity
state watcher API to record the state transitions of the grpc channel
and compare them against the expected states.
priority: add a test helper to override the init timeout in tests (#8704)
This is an initial cleanup before the actual work required for
https://github.com/grpc/grpc-go/issues/8516
The way the init timeout is currently overridden in existing tests is
unnecessary complicated and hard to read. This simplifies things and
ensures that the previous pattern is not followed when new tests are
written.
xdsclient: stop batching writes on the ADS stream (#8627)
Fixes https://github.com/grpc/grpc-go/issues/8125
#### The original race in the xDS client:
- Resource watch is cancelled by the user of the xdsClient (e.g.
xdsResolver)
- xdsClient removes the resource from its cache and queues an
unsubscribe request to the ADS stream.
- A watch for the same resource is registered immediately, and the
xdsClient instructs the ADS stream to subscribe (as it's not in cache).
- The ADS stream sends a redundant request (same resources, version,
nonce) which the management server ignores.
- The new resource watch sees a "resource-not-found" error once the
watch timer fires.
#### The original fix:
Delay the resource's removal from the cache until the unsubscribe
request was transmitted over the wire, a change implemented in
https://github.com/grpc/grpc-go/pull/8369. However, this solution
introduced new complications:
- The resource's removal from the xdsClient's cache became an
asynchronous operation, occurring while the unsubscribe request was
being sent.
- This asynchronous behavior meant the state maintained within the ADS
stream could still diverge from the cache's state.
- A critical section was absent between the ADS stream's message
transmission logic and the xdsClient's cache access, which is performed
during subscription/unsubscription by its users.
#### The root cause of the previous seen races can be put down two
things:
- Batching of writes for subscribe and unsubscribe calls
- After batching, it may appear that nothing has changed in the list of
subscribed resources, even though a resource was removed and added
again, and therefore the management server would not send any response.
It is important that the management server see the exact sequence of
subscribe and unsubscribe calls.
- State maintained in the ADS stream going out of sync with the state
maintained in the resource cache
#### How does this PR address the above issue?
This PR simplifies the implementation of the ADS stream by removing two
pieces of functionality
- Stop batching of writes on the ADS stream
- If the user registers multiple watches, e.g. resource `A`, `B`, and
`C`, the stream would now send three requests: `[A]`, `[A B]`, `[A B
C]`.
- Queue the exact request to be sent out based on the current state
- As part of handling a subscribe/unsubscribe request, the ADS stream
implementation will queue the exact request to be sent out. When
asynchronously sending the request out, it will not use the current
state, but instead just write the queued request on the wire.
- Don't buffer writes when waiting for flow control
- Flow control is already blocking reads from the stream. Blocking
writes as well during this period might provide some additional flow
control, but not much, and removing this logic simplifies the stream
implementation quite a bit.
RELEASE NOTES:
- xdsclient: fix a race in the xdsClient that could lead to
resource-not-found errors
xds/resolver: Optimize Interceptor Chain Construction (#8641)
#### Existing behavior:
- At routing time, when an RPC matches a route and a cluster is
selected, the interceptor chain for that specific RPC is built.
- This chain is built on a per-RPC basis.
- A subsequent RPC that matches the exact same route and cluster will
trigger the entire chain reconstruction again, even if no configuration
has changed.
#### New behavior:
- The interceptor chain is now pre-built for every route and every
pickable cluster associated with that route.
- The chains are constructed once when the config selector is built.
#### Other changes:
- Existing unit tests have been converted to be more e2e style tests.
- This lays the necessary groundwork for upcoming changes to the filter
API, specifically to support filter state retention
Pranjali-2501 [Mon, 10 Nov 2025 10:31:28 +0000 (16:01 +0530)]
credentials/xds: fix goroutine leak in testServer (#8699)
Fixes #8694
This PR fixes a goroutine leak in `credentials/xds/xds_client_test.go`.
Previously, the `testServer` used standard `Send()` calls . If a test
timed out or failed before reading the expected value, the `testServer`
goroutine would block indefinitely on the channel, causing a leak.
Replaced blocking `Send` calls with `SendContext` in `handleConn`. This
ensures that if the test ends (canceling the context), the `testServer`
stops trying to send and exits its goroutine gracefully.
Pranjali-2501 [Mon, 10 Nov 2025 10:30:18 +0000 (16:00 +0530)]
test: Fix goroutine leak in TestParsedTarget_WithCustomDialer (#8698)
Fixes #8695
Fixes a goroutine leak in clientconn_parsed_target_test.go where
TestParsedTarget_WithCustomDialer() could leave dialer goroutines
blocked on sending to addrCh if the test finished early or stopped
reading.
This change replaces the blocking channel send with a select statement
using a timeout/context to ensure goroutines can always exit.
Arjan Singh Bal [Tue, 4 Nov 2025 06:11:41 +0000 (11:41 +0530)]
transport: Set buffer pool in tests (#8688)
This PR correctly sets the buffer pool for test clients and servers not
created through the public gRPC API. This allows non-test code to assume
the buffer pool is always present.
Dimitar Pavlov [Mon, 3 Nov 2025 11:15:23 +0000 (11:15 +0000)]
xds bootstrap: enable using JWT Call Credentials (part 2 for A97) (#8536)
Part two for https://github.com/grpc/proposal/pull/492 (A97), following
#8431 .
What this PR does is:
- update `internal/xds/bootstrap` with support for loading multiple
PerRPCCallCredentials specifed in a new `call_creds` field in the
boostrap file as per A97
- adjust `xds/internal/xdsclient/clientimpl.go`to use the call
credentials when constructing the client
- update `xds/bootstrap` to register the `jwtcreds` call credentials and
make them available if `GRPC_EXPERIMENTAL_XDS_BOOTSTRAP_CALL_CREDS` is
enabled
Relates to https://github.com/istio/istio/issues/53532
RELEASE NOTES:
- xds: add support for loading a JWT from file and use it as Call
Credentials (A97). To enable this feature, set the environment variable
`GRPC_EXPERIMENTAL_XDS_BOOTSTRAP_CALL_CREDS` to `true` (case
insensitive).
Arjan Singh Bal [Mon, 3 Nov 2025 09:03:17 +0000 (14:33 +0530)]
transport: Remove buffer copies while writing HTTP/2 Data frames (#8667)
This PR removes 2 buffer copies while writing data frames to the
underlying net.Conn: one [within
gRPC](https://github.com/grpc/grpc-go/blob/58d4b2b1492dbcfdf26daa7ed93830ebb871faf1/internal/transport/controlbuf.go#L1009-L1022)
and the other [in the
framer](https://cs.opensource.google/go/x/net/+/master:http2/frame.go;l=743;drc=6e243da531559f8c99439dabc7647dec07191f9b).
Care is taken to avoid any extra heap allocations which can affect
performance for smaller payloads.
A [CL](https://go-review.git.corp.google.com/c/net/+/711620) is out for
review which allows using the framer to write frame headers. This PR
duplicates the header writing code as a temporary workaround. This PR
will be merged only after the CL is merged.
## Results
### Small payloads
Performance for small payloads increases slightly due to the reduction
of a `deferred` statement.
```
$ go run benchmark/benchmain/main.go -benchtime=60s -workloads=unary \
-compression=off -maxConcurrentCalls=120 -trace=off \
-reqSizeBytes=100 -respSizeBytes=100 -networkMode=Local -resultFile="${RUN_NAME}"
### Large payloads
Local benchmarks show a ~5-10% regression with 1 MB payloads on my dev
machine. The profiles show increased time spent in the copy operation
[inside the buffered
writer](https://github.com/grpc/grpc-go/blob/58d4b2b1492dbcfdf26daa7ed93830ebb871faf1/internal/transport/http_util.go#L334).
Counterintuitively, copying the grpc header and message data into a
larger buffer increased the performance by 4% (compared to master).
To validate this behaviour (extra copy increasing performance) I ran
[the k8s benchmark for 1MB
payloads](https://github.com/grpc/grpc/blob/65c9be86830b0e423dd970c066c69a06a9240298/tools/run_tests/performance/scenario_config.py#L291-L305)
and 100 concurrent streams which showed ~5% increase in QPS without the
copies across multiple runs. Adding a copy reduced the performance.
Load test config file:
[loadtest.yaml](https://github.com/user-attachments/files/23055312/loadtest.yaml)
```
# 30 core client and server
Before
QPS: 498.284 (16.6095/server core)
Latencies (50/90/95/99/99.9%-ile): 233256/275972/281250/291803/298533 us
Server system time: 93.0164
Server user time: 142.533
Client system time: 97.2688
Client user time: 144.542
After
QPS: 526.776 (17.5592/server core)
Latencies (50/90/95/99/99.9%-ile): 211010/263189/270969/280656/288828 us
Server system time: 96.5959
Server user time: 147.668
Client system time: 101.973
Client user time: 150.234
# 8 core client and server
Before
QPS: 291.049 (36.3811/server core)
Latencies (50/90/95/99/99.9%-ile): 294552/685822/903554/1.48399e+06/1.50757e+06 us
Server system time: 49.0355
Server user time: 87.1783
Client system time: 60.1945
Client user time: 103.633
After
QPS: 334.119 (41.7649/server core)
Latencies (50/90/95/99/99.9%-ile): 279395/518849/706327/1.09273e+06/1.11629e+06 us
Server system time: 69.3136
Server user time: 102.549
Client system time: 80.9804
Client user time: 107.103
```
RELEASE NOTES:
* transport: Avoid two buffer copies when writing Data frames.
Arjan Singh Bal [Fri, 31 Oct 2025 06:27:49 +0000 (11:57 +0530)]
transport: Avoid buffer copies when reading Data frames (#8657)
This change incorporates changes from
https://github.com/golang/go/issues/73560 to split reading HTTP/2 frame
headers and payloads. If the frame is not a Data frame, it's read
through the standard library framer as before. For Data frames, the
payload is read directly into a buffer from the buffer pool to avoid
copying it from the framer's buffer.
## Testing
For 1 MB payloads, this results in ~4% improvement in throughput.
```sh
# test command
go run benchmark/benchmain/main.go -benchtime=60s -workloads=streaming \
-compression=off -maxConcurrentCalls=120 -trace=off \
-reqSizeBytes=1000000 -respSizeBytes=1000000 -networkMode=Local -resultFile="${RUN_NAME}"
For smaller payloads, the difference in minor.
```sh
go run benchmark/benchmain/main.go -benchtime=60s -workloads=streaming \
-compression=off -maxConcurrentCalls=120 -trace=off \
-reqSizeBytes=100 -respSizeBytes=100 -networkMode=Local -resultFile="${RUN_NAME}"
Mike Kruskal [Thu, 30 Oct 2025 18:41:22 +0000 (11:41 -0700)]
protoc-gen-go-grpc: Update supported edition to 2024 (#8685)
Fixes: #8642
grpc-go isn't doing anything that should be affected by edition 2024, so
it should already support it. This simply advertises it so protoc will
send it edition 2024 protos without requiring an
`--experimental_editions` flag.
relnotes for cmd/protoc-gen-go-grpc:
* Add support for protobuf edition 2024.
James O'Gorman [Thu, 30 Oct 2025 16:48:21 +0000 (16:48 +0000)]
advancedtls: Apply defaults before version check (#8684)
Prior to this change, creating an Options like
tlsOpts := &advancedtls.Options{
IdentityOptions:
advancedtls.IdentityCertificateOptions{IdentityProvider: certProvider},
// Note: Only MinTLSVersion is set, not MaxTLSVersion
MinTLSVersion: tls.VersionTLS13,
}
Would result in error:
the minimum TLS version is larger than the maximum TLS version
The documentation for the Options struct states that the default for
MaxTLSVersion is TLS 1.3 but the default was being applied after
checking whether MinTLSVersion > MaxTLSVersion.
The defaults are now applied first, prior to any checks.
Fixes #8649
Thank you for your PR. Please read and follow
https://github.com/grpc/grpc-go/blob/master/CONTRIBUTING.md, especially
the
"Guidelines for Pull Requests" section, and then delete this text before
entering your PR description.
Arjan Singh Bal [Tue, 28 Oct 2025 16:59:58 +0000 (22:29 +0530)]
pickfirst: Remove old pickfirst (#8672)
Fixes: #8561
Addresses: #6472
The new pickfirst has been the default since [gRPC Go
v1.71.0](https://github.com/grpc/grpc-go/releases/tag/v1.71.0) and all
reported bugs have been fixed. This PR removes the old pickfirst policy
completely.
The exported symbols in the `pickfirstleaf` package are retained with a
deprecation notice for removal after one release.
RELEASE NOTES:
* pickfirst: Remove the old `pick_first` LB policy. The new `pick_first`
has been the default since v 1.71.0.
Arjan Singh Bal [Tue, 28 Oct 2025 06:40:03 +0000 (12:10 +0530)]
mem: Remove Reader interface and export the concrete struct (#8669)
This PR changes the exported slice reader from an interface to a
concrete struct.
This approach follows the precedent set by standard library packages,
such as [`bufio`'s `bufio.Reader`](https://pkg.go.dev/bufio#Reader).
This interface was not intended for users to implement, and gRPC does
not plan to provide alternative implementations. Users who require an
interface for abstraction or testing can define one in their own
packages.
This change provides two main advantages:
* Performance: It avoids a couple of heap allocations per stream that
were previously required to hold the interface value.
* Maintainability: Adding new methods to the concrete struct is a
backward-compatible change, whereas adding methods to an interface is a
breaking change.
## Benchmarks
```sh
# test command
$ go run benchmark/benchmain/main.go -benchtime=60s -workloads=unary \
-compression=off -maxConcurrentCalls=200 -trace=off \
-reqSizeBytes=100 -respSizeBytes=100 -networkMode=Local -resultFile="${RUN_NAME}"
Pranjali-2501 [Mon, 27 Oct 2025 08:21:01 +0000 (13:51 +0530)]
xds/googlec2p: support custom bootstrap config per channel. (#8648)
xds/googlec2p: Fix channel-specific xDS bootstrap configurations by
allowing xdsclient creation with per-target config. Removes global
fallback config usage, enabling multiple distinct xDS clients to coexist
in the same process.
### Notable implementation detals
* `grpc.lb.backend_service` is not implemented yet (marked as optional
in the gRFC)
* modifies the tests to make sure we can cover all the cases for
`enforced`/`unenforced` without repeating the test setup.
RELEASE NOTES:
* outlierdetection: add metrics for enforced
(grpc.lb.outlier_detection.ejections_enforced) and unenforced
(grpc.lb.outlier_detection.ejections_unenforced) outlier ejections.
Arjan Singh Bal [Tue, 21 Oct 2025 05:14:02 +0000 (10:44 +0530)]
credentials/tls: Remove environment variable for disabling ALPN (#8660)
Related issue: https://github.com/grpc/grpc-go/issues/434
RELEASE NOTES:
* credentials/tls: Remove the `GRPC_ENFORCE_ALPN_ENABLED` environment
variable. ALPN is now enforced by default. Users who must disable ALPN
enforcement can temporarily use the [experimental transport
credentials](https://pkg.go.dev/google.golang.org/grpc@v1.76.0/experimental/credentials).
These experimental credentials will be removed in an upcoming release;
users who depend on them must vendor this version of gRPC or copy the
relevant code into their own codebase.
Arjan Singh Bal [Fri, 17 Oct 2025 06:49:47 +0000 (12:19 +0530)]
stats: Re-use objects while calling multiple Handlers (#8639)
This PR improves performance by eliminating heap allocations when
multiple stats handlers are configured.
Previously, iterating through a list of handlers caused one heap
allocation per handler for each RPC. This change introduces a Handler
that combines multiple Handlers and implements the `Handler` interface.
The combined handler delegates calls to the handlers it contains.
This approach allows gRPC clients and servers to operate as if there
were only a single `Handler` registered, simplifying the internal logic
and removing the per-RPC allocation overhead. To avoid any performance
impact when stats are disabled, the combined `Handler` is only created
when at least one handler is registered.
# Tested
Since existing benchmarks don't register stats handler, I modified the
benchmark to add 2 stats handlers each on the server and client
(https://github.com/grpc/grpc-go/pull/8639/commits/36ba616d7c40deb1cb79d5c8d1636057f94fc88a).
```sh
# test command
go run benchmark/benchmain/main.go -benchtime=60s -workloads=unary \
-compression=off -maxConcurrentCalls=200 -trace=off \
-reqSizeBytes=100 -respSizeBytes=100 -networkMode=Local -resultFile="${RUN_NAME}"
There is a new version of the envoy protos (v1.35.0) that was released
recently that contains proto changes required for the ext_authz support
that I'm currently working on.
Ran the following command twice (as mentioned in our release docs):
```
for x in $(find . -name 'go.mod' | xargs dirname | sort); do
pushd "${x}"
go get -u ./...
go mod tidy -compat=1.24
popd
done
```
The cancel method is not called in
[`deleteStream`](https://github.com/grpc/grpc-go/blob/2d922719c02bb46f34482d592c35e72dc4a9ad92/internal/transport/http2_server.go#L1302).
This change invokes `deleteStream` through `closeStream` in the flaking
test to ensure the stream is always cancelled to avoid leaking timers.
This PR *only* changes the implementation of the listener resource type
to adhere to the external xdsclient API. The other resource type
implementations will be handled in subsequent PRs, and once all resource
type implementations have switched to the external xdsclient API, we can
get rid of some of existing APIs.
eshitachandwani [Tue, 14 Oct 2025 09:01:27 +0000 (14:31 +0530)]
delegatingresolver: add default port to addresses (#8613)
Fixes: https://github.com/grpc/grpc-go/issues/8607
RELEASE NOTES:
- Fixes a bug where default port 443 was not being added to addresses
without port being sent to proxy.
- Adds a new environment variable
`GRPC_EXPERIMENTAL_ENABLE_DEFAULT_PORT_FOR_PROXY_TARGET` for adding a
default port to addresses being sent to proxy which is set by default.
xdsclient: fix the flaky ADS stream restart test (#8631)
The ADS stream restart test can be flaky for the following reason:
- It requests a CDS resource and unrequests it before the stream breaks.
- And then once the stream restarts, it verifies that this resource is
not requested again.
- But the ACK for this resource may or may not be received at the
management server before the stream breaks. This can falsely cause the
test to conclude that the request was re-requested after the restart.
This PR changes the test in the following ways:
- Use a single resource
- Verify ACK before the stream is restarted
Arjan Singh Bal [Thu, 9 Oct 2025 04:19:03 +0000 (09:49 +0530)]
transport: Replace closures with interfaces to avoid heap allocations (#8630)
In Go, creating a closure results in a heap allocation if the compiler
determines the closure might outlive the function in which it was
created. This change removes two such closures, replacing them with
interfaces that are implemented by the `ClientStream` and `ServerStream`
structs.
While this pattern may slightly reduce readability, the performance
benefit is worthwhile, as this transport code is executed for every new
stream. This reduces allocs/unary RPC by 2.5%.
## Testing
```sh
# test command
go run benchmark/benchmain/main.go -benchtime=60s -workloads=unary \
-compression=off -maxConcurrentCalls=500 -trace=off \
-reqSizeBytes=100 -respSizeBytes=100 -networkMode=Local -resultFile="${RUN_NAME}" -recvBufferPool=simple
Arjan Singh Bal [Thu, 9 Oct 2025 04:03:51 +0000 (09:33 +0530)]
benchmark/benchmain: Enable buffer pooling by default (#8638)
This PR enables buffer pooling in the benchmark test to align with the
library's current default configuration.
The benchmark originally disabled buffer pooling because the feature was
opt-in when introduced (#5862). Since buffer pooling is now enabled by
default (#7356), this change ensures the benchmark accurately measures
the performance of gRPC's default behavior.
Madhav Bissa [Wed, 8 Oct 2025 20:28:35 +0000 (01:58 +0530)]
client: ignore http status header for gRPC streams (#8548)
Fixes https://github.com/grpc/grpc-go/issues/8486
When a gRPC response is received with content type application/grpc, we
then do not expect any information in the http status and the status
information needs to be conveyed by gRPC status only.
In case of missing gRPC status, we will throw an Internal error instead
of Unknown in accordance with https://grpc.io/docs/guides/status-codes/
Changes :
- Ignore http status in case of content type application/grpc
- Change the default rawStatusCode to return Internal for missing grpc
status
RELEASE NOTES:
* client : Ignore the HTTP header status for gRPC streams and return
Internal error for missing gRPC status.
xds: Store WeightedClusters as a slice instead of a map inside the Route (#8632)
Reasons for this change:
- The `WeightedClusters` field is never used as a map.
- Weighted clusters are stored as a list in the original envoy proto as
well.
- In tests that require deterministic WRR behavior, we use the
`testutils.NewTestWRR` to get rid of the randomness. But the output
still depends on the order in which items are added to the WRR. Maps in
Go are non-deterministic.
Elric [Tue, 7 Oct 2025 22:25:07 +0000 (07:25 +0900)]
transport: Increment metrics only when the stream is active (#8573)
Fixes: https://github.com/grpc/grpc-go/issues/8529
This PR fixes to increment metrics only when the stream is active which
is found in the activeStreams map.
#### as-is
- The deleteStream was incrementing channelz metrics every time it was
called, even when stream was already removed from activeStreams or not
exists in activeStreams.
#### to-be
- Added check to ensure metrics are only incremented once when a stream
is actually removed from activeStreams.
RELEASE NOTES:
* server: Fix a bug that caused overcounting of channelz metrics for
successful and failed streams.
Arjan Singh Bal [Tue, 7 Oct 2025 09:41:48 +0000 (15:11 +0530)]
transport: Reduce pointer usage in Stream structs (#8624)
The pprof profiles for unary RPC benchmarks indicate significant time
spent in `runtime.mallocgc` and `runtime.gcBgMarkWorker`. This indicates
gRPC is spending significant CPU cycles allocating or garbage
collecting.
This change reduces the number of pointer fields in the structs that
represent client and server stream. This will reduce number of memory
allocations (faster) and also reduce pressure on garbage collector
(faster garbage collections) since the GC doesn't need to scan
non-pointer fields. For structs which were stored as pointers to ensure
values are not copied, a `noCopy` struct is embedded that will cause `go
vet` to fail if copies are performed. Non-pointer fields are also moved
to the end of the struct to improve allocation speed.
## Results
There are improvements in QPS, latency and allocs/op for unary RPCs.
```sh
# test command
go run benchmark/benchmain/main.go -benchtime=60s -workloads=unary \
-compression=off -maxConcurrentCalls=500 -trace=off \
-reqSizeBytes=100 -respSizeBytes=100 -networkMode=Local -resultFile="${RUN_NAME}" -recvBufferPool=simple
Gregory Cooke [Fri, 3 Oct 2025 19:32:56 +0000 (15:32 -0400)]
testing: SPIFFE Bundle Maps - Swap to a real unsupported key type (#8626)
EC Keys are actually supported. The test using this file previously
failed because we had `EC` in the `kty` field, but not the associated
`crv`, `x`, and `y` values. Change this test to use an actual
unsupported key type.
When using the health producer for health checks, and the health package
is not imported by the application, a no op health producer is used
without logging any errors. This PR adds an error log similar to the one
for the old health checks started by the subchannel.
Arjan Singh Bal [Fri, 3 Oct 2025 04:05:19 +0000 (09:35 +0530)]
pickfirstleaf: fix bug in address de-duplication (#8611)
Due to a bug in the new pickfirst balancer, it wasn't de-duplicating
addresses in the resolver update. The only user visible impact of this
seems to be less frequent picker updates after the first pass in happy
eyeballs and incorrect interleaving of IPv4/IPv6 addresses during the
first happy eyeballs pass.
RELEASE NOTES:
* balancer/pickfirst: Fix a bug where duplicate addresses were not being ignored as intended.
examples/health: fix markdown formatting and improve content (#8625)
This PR fixes some markdown formatting issues flagged by an internal
tool (when attempting to import recent changes into google3). It also
makes minor improvements to the content.
xdsclient: fix race in ADS stream flow control causing indefinite blocking (#8605)
Fixes https://github.com/grpc/grpc-go/issues/8594
The above issue clearly describes the condition under which the race
manifests. The changes in this PR are as follows:
- Remove the `readyCh` field in the flow control that was previously
used to block when waiting for flow control. Instead use a condition
variable.
- Have two bits of state inside the flow control:
- One to indicate if there is a pending update that is waiting
consumption by all watchers
- One to indicate that the stream is closed
- The flow control objects no longer needs to be recreated every time a
new stream is created
- The flow control object is stopped when the `adsStreamImpl` is stopped
This PR also makes other minor changes:
- Fix a flaky test by ensuring that the test stream implementation
unblocks from a `Recv` call when the underlying stream context is
cancelled
- Couple of logging improvements
RELEASE NOTES:
- xdsclient: fix a race in the ADS stream implementation that could
result in resource-not-found errors, causing the gRPC client channel to
move to `TransientFailure`
examples: improve interceptor example with better markdown formatting (#8612)
This PR changes the existing example to use fenced code blocks instead
of backticks to show the interceptor signatures. This greatly improves
how the document is rendered. The PR also changes the word `overload` to
`override` because it is the latter that we are showcasing here, and Go
does not support overloading anyways.
eshitachandwani [Wed, 1 Oct 2025 19:24:25 +0000 (00:54 +0530)]
xds/cdsbalancer: change tests to use xds resolver (#8579)
Change the tests in xds cds balancer to use xds resolver instead of
manual resolver
This change is being done as part of gRFC [A74 : xDS Config
tears](https://github.com/grpc/proposal/blob/master/A74-xds-config-tears.md).
This is to make sure the tests pass after the change too.
xds/resolver: minor cleanup in the config selector implementation (#8609)
This PR injects the dependencies of `configSelector` at creation time,
instead of passing a reference to the `xdsResolver` and having the
former directly access fields from the latter.
On connection breakage, the pickfirst leaf balancer enters idle and
returns an `Idle picker` that calls the balancer's `ExitIdle` method
only the first time `Pick` is called. The following sequence of events
will cause the balancer to get stuck in `Idle` state:
1. Existing connection breaks, SubConn [requests re-resolution and
reports
IDLE](https://github.com/grpc/grpc-go/blob/bb71072094cf533965450c44890f8f51c671c393/clientconn.go#L1388-L1393).
In turn PF updates the ClientConn state to IDLE with an `Idle picker`.
1. An RPC is made, triggering `balancer.ExitIdle` through the idle
picker. The balancer attempts to re-connect the failed SubConn.
1. The resolver produces a new endpoint list, removing the endpoint used
by the existing SubConn. PF removes the existing SubConn. Since the
balancer didn't update the ClientConn state to CONNECTING yet, pickfirst
thinks that it's still in IDLE and doesn't start connecting to the new
endpoints.
1. New RPC requests trigger the idle picker, but it's a no-op since it
only [triggers the balancer's ExitIdle method
once](https://github.com/grpc/grpc-go/blob/bb71072094cf533965450c44890f8f51c671c393/balancer/pickfirst/pickfirstleaf/pickfirstleaf.go#L663https://github.com/grpc/grpc-go/blob/bb71072094cf533965450c44890f8f51c671c393/balancer/pickfirst/pickfirstleaf/pickfirstleaf.go#L663).
## Fix
This change moves the ClientConn into Connecting immediately when the
`ExitIdle` method is called. This ensures that the balancer continues to
re-connect when a new endpoint list is produced by the resolver.
RELEASE NOTES:
* balancer/pickfirst: Fix bug that can cause balancer to get stuck in
`IDLE` state on connection failure.
transport: Invoke `net.Conn.SetWriteDeadline` in `http2_client.Close` (#8534)
Fixes: #8425
This PR adds a call to `net.Conn.SetWriteDeadline`, as discussed in
https://github.com/grpc/grpc-go/issues/8425#issuecomment-3057938248.
Additionally, it updates the previous call to `SetReadDeadline` to log
any non-nil error value (this doesn't affect behavior but proved helpful
in some earlier debugging).
RELEASE NOTES:
* client: Set a read deadline when closing a transport to prevent it
from blocking indefinitely on a broken connection.
pickfirstleaf: Fix shuffling of addresses in resolver updates without endpoints (#8610)
The new `pick_first`, which is the default, doesn't shuffle the
addresses at all for resolver updates that are missing the `Endpoints`
field. This change fixes that. Since [gRPC automatically sets the the
missing
`Endpoints`](https://github.com/grpc/grpc-go/blob/1059e84f885bf7ed65b3b1a4fbe914360d8ab5b1/resolver_wrapper.go#L136-L138),
occurrence of this bug should be uncommon in practice.
RELEASE NOTES:
* balancer/pick_first: When configured, shuffle addresses in resolver
updates that lack endpoints. Since gRPC automatically adds endpoints to
resolver updates, this bug should only affect implementers of custom LB
policies that use pick_first for delegation but don't forward the
endpoints.
Evan Jones [Thu, 25 Sep 2025 17:53:20 +0000 (13:53 -0400)]
examples/features/health: Clarify docs for health import (#8597)
The google.golang.org/grpc/health package must be imported for client
health checking to work. I somehow missed this, even though it is in the
README, the client example, and the health package docs. Attempt to make
it clearer with a few extra mentions, since it is quite hard to debug
this misconfiguration.
* Remove deprecated grpc.WithBlock function
* Make service config const since it isn't modified
xdsclient: improve fallback test involving three servers (#8604)
The existing fallback test that involves three servers is flaky. The
reason for the flake is because some of the resources have the same name
in different servers. The listener resource is expected to have the same
name across the different management servers, but we generally expect
the other resources to have different names.
See the following from the gRFC:
- In
https://github.com/grpc/proposal/blob/master/A71-xds-fallback.md#reservations-about-using-the-fallback-server-data,
we have the following:
```
We have no guarantee that a combination of resources from different xDS servers form a valid cohesive
configuration, so we cannot make this determination on a per-resource basis. We need any given gRPC
channel or server listener to only use the resources from a single server.
```
- In
https://github.com/grpc/proposal/blob/master/A71-xds-fallback.md#config-tears,
we have the following:
```
Config tears happen when the client winds up using some combination of resources from the primary and
fallback servers at the same time, even though that combination of resources was never validated to work
together. In theory, this can cause correctness issues where we might send traffic to the wrong location or
the wrong way, or it can cause RPCs to fail. Note that this can happen only when the primary and fallback
server use the same resource names.
```
This PR ensures that all the different management servers have different
resource names for all resources except the listener. Also, ran the test
on forge 100K times with no failures.
This PR also improves a couple of logs that I found useful when
debugging the failures.
opentelemetry: Remove chatty log in client (#8606)
Removing this debug log to reduce noise. This log fires on every RPC
call but provides no useful debugging value. The action it logs (adding
callInfo to the context) is part of the normal flow, and the message
contains no helpful variables.
benchmark: Hold read+write lock while updating server state (#8601)
The `lastResetTime` and `rusageLastReset ` fields in the
`benchmarkServer` are written while holding a read lock. This can result
in concurrent modifications. This change replaces the `RWMutex` with a
regular `Mutex` to avoid such problems. This lock is acquired a couple
of times during the entire test run, so contention is not a major
concern.