]> git.feebdaed.xyz Git - 0xmirror/grpc.git/log
0xmirror/grpc.git
7 days ago[Build Fix] grpc/core/master/linux/grpc_flaky_network (#41272)
Vignesh Babu [Fri, 19 Dec 2025 21:58:47 +0000 (13:58 -0800)]
[Build Fix] grpc/core/master/linux/grpc_flaky_network (#41272)

Should fix build failure: https://btx.cloud.google.com/invocations/0a1f31fb-d340-4ff5-af2c-25543f19068e/log

Closes #41272

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41272 from Vignesh2208:cbf 28a57fb83eebedce3353f59a8a0c528b58c8c24e
PiperOrigin-RevId: 846857224

8 days ago[channelz] Ensure ExecCtx is available for call combiner call (#41266)
Craig Tiller [Fri, 19 Dec 2025 21:51:17 +0000 (13:51 -0800)]
[channelz] Ensure ExecCtx is available for call combiner call (#41266)

Closes #41266

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41266 from ctiller:exec_ctx_91237 c71a347c5045cf0372cb4a5795de6251b7fc461c
PiperOrigin-RevId: 846854179

8 days ago[PH2][ChanelArgs] Support AckPing
Akshit Patel [Fri, 19 Dec 2025 08:37:24 +0000 (00:37 -0800)]
[PH2][ChanelArgs] Support AckPing

PiperOrigin-RevId: 846606868

8 days ago[PH2][Nits][Trivial] Fixing comments from a previous review
Tanvi Jagtap [Fri, 19 Dec 2025 08:27:28 +0000 (00:27 -0800)]
[PH2][Nits][Trivial] Fixing comments from a previous review
https://github.com/grpc/grpc/pull/41225

PiperOrigin-RevId: 846603041

8 days ago Adding layering_check and parse_headers in end2end test build file (#41240)
Rishesh Agarwal [Fri, 19 Dec 2025 04:49:21 +0000 (20:49 -0800)]
 Adding layering_check and parse_headers in end2end test build file (#41240)

Closes #41240

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41240 from rishesh007:layering_test_2 d9daaf9bb00fb3d6b09a12b8acca79b6e45c1873
PiperOrigin-RevId: 846532341

8 days agoAutomated Code Change
Jie Luo [Thu, 18 Dec 2025 23:30:57 +0000 (15:30 -0800)]
Automated Code Change

PiperOrigin-RevId: 846438320

8 days ago[python] aio: fix race condition causing `asyncio.run()` to hang forever during the...
Kai-Hsun Chen [Thu, 18 Dec 2025 22:54:59 +0000 (14:54 -0800)]
[python] aio: fix race condition causing `asyncio.run()` to hang forever during the shutdown process (#40989)

# Root cause
* gRPC AIO creates a Unix domain socket pair, and the current thread passes the read socket to the event loop for reading, while the write socket is passed to a thread for polling events and writing a byte into the socket.
* However, during the shutdown process, the event loop stops reading the read socket without closing it before the polling thread receives the final event to exit the thread.
* The shutdown process will hang if (1) the event loop stops reading the read socket before the polling thread receives the final event to exit the thread, and (2) the polling process stuck at `write` syscall.
  * The `write` syscall may get stuck at [sock_alloc_send_pskb](https://elixir.bootlin.com/linux/v5.15/source/net/core/sock.c#L2463) when there is not enough socket buffer space for the write socket. Hence, the polling thread hangs at write and cannot continue to the next iteration to retrieve the final event. As a result, the event loop no longer reads the read socket, so the allocable buffer size for the write socket does not increase any longer. Therefore, the current thread hangs when waiting for the polling thread to `join()`.
* `asyncio` will shutdown the default executor (`ThreadPoolExecutor`) when `asyncio.run(...)` finishes. Hence, it hangs because some threads can't join.

# Reproduction

* Step 0: Reduce the socket buffer size to increase the probability to reproduce the issue.
   ```sh
   sysctl -w net.core.rmem_default=8192
   sysctl -w net.core.rmem_default=8192
   ```
* Step 1: Manually update `unistd.write(fd, b'1', 1)` to `unistd.write(fd, b'1' * 4096, 4096)`. The goal is to make write (4096 bytes per write) faster than read (1 byte per read), thereby filling the write buffer nearly full.
https://github.com/grpc/grpc/blob/8e67cb088d3709ae74c1ff31d1655bea6c2b86c0/src/python/grpcio/grpc/_cython/_cygrpc/aio/completion_queue.pyx.pxi#L31

* Step 2: Create an `aio.insecure_channel` and use it to send 100 requests with at most 10 in-flight requests. After all requests finish, the shutdown process will be triggered, and it's highly likely to hang if you follow Steps 0 and 1 correctly. In my case, my reproduction script reproduces the issue 10 out of 10 times.

* Step 3: If it hangs, check the following information:
  * `ss -xpnm state connected | grep $PID` => You will find there are two sockets that belong to the same socket pair, and one has non-zero bytes in the read buffer while the other has non-zero bytes in the write buffer. In addition, write buffer should be close to `net.core.rmem_default`.
  * Check the stack of the `_poller_thread` by running `cat /proc/$PID/task/$TID/stack`. The thread is stuck at `sock_alloc_send_pskb` because there is not enough buffer space to finish the `write` syscall.
  * Use GDB to find the `_poller_thread` and make sure it's stuck at `write()`, then print its `$rdi` to confirm that the FD is the one with a non-zero write buffer in the socket.

# Test

Follow Steps 0, 1, and 2 in the 'Reproduction' section with this PR. It doesn't hang in 10 out of 10 cases.

<!--

If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.

If your pull request is for a specific language, please add the appropriate
lang label.

-->

Closes #40989

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/40989 from kevin85421:asyncio-hang ff74508a2c29e7c71dfe88365d1178f901d69787
PiperOrigin-RevId: 846425459

8 days ago[CI][Python] Upgrade deps assuming python 3.9+ (#40323)
Sergii Tkachenko [Thu, 18 Dec 2025 22:48:35 +0000 (14:48 -0800)]
[CI][Python] Upgrade deps assuming python 3.9+ (#40323)

- Regen requirements.bazel.lock with Python 3.9
- bump isort to 6.0.1 (except in pylint, which needs to be updated separately)
- fix python version specifiers for black, isort and pylint, typeguard
- fix default ignore patterns for isort and pylint
- consistent debug info: python version, pip list
- consistent virtualenv naming: `.venv-ci-*`
- bazel: bump typeguard to 4.4.2
- bazel: bumped gevent to `25.9.1`, greenlet to `3.2.4` to support Python 3.13, closes #40685
- bazel: bump pyyaml for python 3.14 support
- bazel: take care of temporary pins to support 3.8-based CIs

Bazel RBE CIs upgraded in the following changelists, and currently run Python 3.10:
- cl/845778848
- cl/845816768

Relevant testing was done in #41239.

Closes #40323

PiperOrigin-RevId: 846423001

9 days ago[Fix][CI] Fix master Bazel RBE jobs running on vanilla Ubuntu 22 (#41251)
Sergii Tkachenko [Thu, 18 Dec 2025 02:16:21 +0000 (18:16 -0800)]
[Fix][CI] Fix master Bazel RBE jobs running on vanilla Ubuntu 22 (#41251)

Fixes

```pytb
+ python3 ./tools/run_tests/python_utils/upload_rbe_results.py --invocation_id=c9453d05-8c0a-43bc-abb8-1b5d34a163b8
Traceback (most recent call last):
  File "/tmpfs/altsrc/github/grpc/./tools/run_tests/python_utils/upload_rbe_results.py", line 31, in <module>
    import big_query_utils
  File "/tmpfs/altsrc/github/grpc/tools/gcp/utils/big_query_utils.py", line 21, in <module>
    from apiclient import discovery
ModuleNotFoundError: No module named 'apiclient'
```

Closes #41251

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41251 from sergiitk:fix/ci/rbe-vanilla-ubuntu22 4e7095021b2065fde224659c41d70b51db2021d6
PiperOrigin-RevId: 845995209

11 days ago[Security - Test] Fix OpenSSL 1.0.2 tests that incorrectly assume TLS 1.3 is negotiat...
Gregory Cooke [Tue, 16 Dec 2025 20:49:53 +0000 (12:49 -0800)]
[Security - Test] Fix OpenSSL 1.0.2 tests that incorrectly assume TLS 1.3 is negotiated. (#41241)

OpenSSL1.0.2 doesn't support TLS1.3, so the assumption on this test was wrong - it falls back to TLS1.2 behavior.

Closes #41241

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41241 from gtcooke94:portability_fix cee64677d3d016b49ae978ca0ed0691f8a7dfaf9
PiperOrigin-RevId: 845396113

11 days ago[client-fuzzer] Fix crash (#41238)
Craig Tiller [Tue, 16 Dec 2025 19:56:47 +0000 (11:56 -0800)]
[client-fuzzer] Fix crash (#41238)

Closes #41238

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41238 from ctiller:crashypoo2 aed422fff37c91fd06edf4a36e2336f50f01c0b2
PiperOrigin-RevId: 845376005

12 days ago[PH2][Metadata] Fixing bug in HeaderAssembler
Tanvi Jagtap [Mon, 15 Dec 2025 11:02:08 +0000 (03:02 -0800)]
[PH2][Metadata] Fixing bug in HeaderAssembler
We need to finish parsing the entire slice buffer in case of Stream Error.

PiperOrigin-RevId: 844685426

12 days ago[PH2][E2E][CallV3]
Akshit Patel [Mon, 15 Dec 2025 09:50:54 +0000 (01:50 -0800)]
[PH2][E2E][CallV3]

1. Set `send_initial_metadata_op` flags in the `initial_metadata` being sent.
2. Fix the `is_client` parameter passed from `ClientCall` to `Call`.

PiperOrigin-RevId: 844662056

12 days agoAdding layering_check and parse_headers in test files (#41226)
Rishesh Agarwal [Mon, 15 Dec 2025 07:01:40 +0000 (23:01 -0800)]
Adding layering_check and parse_headers in test files (#41226)

Closes #41226

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41226 from rishesh007:layering_test 02e735767f22e934850d5e150c5e3d577bdfbecb
PiperOrigin-RevId: 844606259

12 days ago[PH2][FlowControl][Bug] Adding missing flow control plumbing
Tanvi Jagtap [Mon, 15 Dec 2025 06:09:41 +0000 (22:09 -0800)]
[PH2][FlowControl][Bug] Adding missing flow control plumbing
This is a hack. The actual fix needs some work for Call V3 stack which is scheduled for later.

PiperOrigin-RevId: 844589014

12 days ago[PH2][E2E] Enable the following tests
Akshit Patel [Mon, 15 Dec 2025 04:53:34 +0000 (20:53 -0800)]
[PH2][E2E] Enable the following tests
1. RetryTests
2. RetryHttp2Tests.Connectivty
3. RetryHttp2Tests.MaxConnectionIdle

PiperOrigin-RevId: 844564435

2 weeks ago[Fix][CI] grpc_bazel_rbe_nonbazel job: align kokoro and bazel timeouts (#41231)
Sergii Tkachenko [Sat, 13 Dec 2025 08:50:58 +0000 (00:50 -0800)]
[Fix][CI] grpc_bazel_rbe_nonbazel job: align kokoro and bazel timeouts (#41231)

Target `//tools/bazelify_tests/test:cpp_distribtest_cmake_aarch64_cross_linux` seems to go over Bazel's `--test_timeout` limit from time to time.

Bazel `--test_timeout` flag was initially introduced in #38123 and set to 30 minutes below Kokoro's job `timeout_mins`. Since then, we've increased Kokoro's timeout several times without making corresponding changes to bazel's `test_timeout`.

This PR updates Bazel's test timeout to aligned with Kokoro's job timeout, and adds a reminder to keep those in sync. In addition, I've broken `BAZEL_FLAGS` in multiple lines for better readability, proto text format spec [allows it](https://protobuf.dev/reference/protobuf/textformat-spec/#string).

Closes #41231

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41231 from sergiitk:fix/ci/rbe-nonbazel-bazel-test-timeout 22970a18ce2768693292f566f31dd765916fbca0
PiperOrigin-RevId: 843999341

2 weeks ago[PH2][Trivial] Disabling flaky PH2 tests.
Tanvi Jagtap [Sat, 13 Dec 2025 03:28:49 +0000 (19:28 -0800)]
[PH2][Trivial] Disabling flaky PH2 tests.
Will enable them after bug is triaged and fixed.

PiperOrigin-RevId: 843919616

2 weeks agoImplement StreamingFromClient and StreamingFromServer for callback API in gRPC benchm...
Alisha Nanda [Fri, 12 Dec 2025 17:41:13 +0000 (09:41 -0800)]
Implement StreamingFromClient and StreamingFromServer for callback API in gRPC benchmarks.

PiperOrigin-RevId: 843726088

2 weeks ago[PH2][Metadata]
Tanvi Jagtap [Fri, 12 Dec 2025 16:48:49 +0000 (08:48 -0800)]
[PH2][Metadata]
We were treating all HPack errors as Connection Errors. Some of them are actually stream errors.

PiperOrigin-RevId: 843708403

2 weeks ago[Tcp Server] Fix missing ExecCtx during server shutdown
Vignesh Babu [Fri, 12 Dec 2025 15:35:00 +0000 (07:35 -0800)]
[Tcp Server] Fix missing ExecCtx during server shutdown

PiperOrigin-RevId: 843683281

2 weeks ago[PH2][BUILD] Build change (#41230)
Tanvi Jagtap - Google LLC [Fri, 12 Dec 2025 11:08:19 +0000 (03:08 -0800)]
[PH2][BUILD] Build change (#41230)

[PH2][BUILD] Build change

Closes #41230

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41230 from tanvi-jagtap:build_change 82e290f3c90a605eeec0282461a2bb946b8cf394
PiperOrigin-RevId: 843607980

2 weeks ago[PH2][StreamQ] Return `became_writable` from `ReceivedFlowControlWindowUpdate`
Akshit Patel [Fri, 12 Dec 2025 06:35:27 +0000 (22:35 -0800)]
[PH2][StreamQ] Return `became_writable` from `ReceivedFlowControlWindowUpdate`

PiperOrigin-RevId: 843526509

2 weeks agoAdding layering_check and parse_headers in each bazel src/proto build file
Rishesh Agarwal [Fri, 12 Dec 2025 06:17:21 +0000 (22:17 -0800)]
Adding layering_check and parse_headers in each bazel src/proto build file

PiperOrigin-RevId: 843521259

2 weeks agoEnable layering_check and parse_headers in //src/compiler.
Rishesh Agarwal [Fri, 12 Dec 2025 04:40:32 +0000 (20:40 -0800)]
Enable layering_check and parse_headers in //src/compiler.

PiperOrigin-RevId: 843490724

2 weeks ago[Test] Don't need any versioning block for SPIFFE tests (#41218)
Gregory Cooke [Thu, 11 Dec 2025 18:25:03 +0000 (10:25 -0800)]
[Test] Don't need any versioning block for SPIFFE tests (#41218)

This should further fix the flakiness in the portability tests.
Specifically, we shouldn't have left this compiler directive in the code in the previous PR https://github.com/grpc/grpc/pull/41205

Closes #41218

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41218 from gtcooke94:update_spiffe_ossl_102 73159122c0693fd0abe716b3c0b64b51e129958f
PiperOrigin-RevId: 843278192

2 weeks ago[PH2][Settings][BinaryMetadata] Plumbing BinaryMetadata setting
Tanvi Jagtap [Thu, 11 Dec 2025 10:24:03 +0000 (02:24 -0800)]
[PH2][Settings][BinaryMetadata] Plumbing BinaryMetadata setting

PiperOrigin-RevId: 843125464

2 weeks ago[PH2][Settings][BinaryMetadata]
Tanvi Jagtap [Thu, 11 Dec 2025 07:15:20 +0000 (23:15 -0800)]
[PH2][Settings][BinaryMetadata]
Honouring Peer BinaryMetadata setting
Pending : Plumbing our BinaryMetadata setting

PiperOrigin-RevId: 843066196

2 weeks ago[PH2][Flake] Fix settings flake, add LOGs
Tanvi Jagtap [Thu, 11 Dec 2025 06:46:36 +0000 (22:46 -0800)]
[PH2][Flake] Fix settings flake, add LOGs
1. Fix settings flake by allowing some more buffer for promise scheduling delays.
2. Increasing timeouts to some reasonable numbers. If I keep time buffer as 0.5 , it fails once in 20000 times. Reducing it to 0.4 makes it pass all of 100000 times. Which is good enough.
3. Adding LOGs to help to debug another flow control related bug

PiperOrigin-RevId: 843057963

2 weeks ago[PH2][E2E][Flake] Fix a case where transport is being cleanup up without `ExecCtx`
Akshit Patel [Thu, 11 Dec 2025 05:31:17 +0000 (21:31 -0800)]
[PH2][E2E][Flake] Fix a case where transport is being cleanup up without `ExecCtx`

PiperOrigin-RevId: 843036060

2 weeks ago[channelz] Add missing SourceDestructing calls (#41197)
Craig Tiller [Wed, 10 Dec 2025 22:10:04 +0000 (14:10 -0800)]
[channelz] Add missing SourceDestructing calls (#41197)

These were missed with the initial implementation of call tracing in channelz, but luckily our fuzzers found them. Add the calls, and a regression test.

Closes #41197

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41197 from ctiller:clifuzz 9f6bc6dfaec21b60accd7f1ec4a83b5e648ecae9
PiperOrigin-RevId: 842874369

2 weeks ago[PH2][E2E] Fix a tsan flake
Akshit Patel [Wed, 10 Dec 2025 08:42:04 +0000 (00:42 -0800)]
[PH2][E2E] Fix a tsan flake

PiperOrigin-RevId: 842588919

2 weeks agoAdd a hook to collect TCP Telemetry via the Instruments API
Aananth V [Wed, 10 Dec 2025 05:29:56 +0000 (21:29 -0800)]
Add a hook to collect TCP Telemetry via the Instruments API

PiperOrigin-RevId: 842528207

2 weeks ago[Testing] Fix spiffe portability (#41205)
Gregory Cooke [Wed, 10 Dec 2025 05:23:02 +0000 (21:23 -0800)]
[Testing] Fix spiffe portability (#41205)

Fix a few issues when build with OpenSSL versions

OpenSSL1.0.2 - copied some CRL related test code that was not valid assumptions for these tests.
OpenSSL1.1.1 - The regex is too sensitive, only do the regex check for BoringSSL
OpenSSL3 - We though the Invalid UTF8-SAN behavior should cause handshake failures for OpenSSL3 here and included different behavior, but that is still what is breaking. Let's revert that change.

Closes #41205

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41205 from gtcooke94:fix_spiffe_portability 8818df50053944444c1093bdf500944b690422d3
PiperOrigin-RevId: 842526173

2 weeks ago[Fix][CI] Skip tests build in gcc-8 portability tests (#41204)
Sergii Tkachenko [Tue, 9 Dec 2025 18:34:07 +0000 (10:34 -0800)]
[Fix][CI] Skip tests build in gcc-8 portability tests (#41204)

Fixes "Bazel RBE Non-Bazel Tests" job timeouts. This affects:

-
[`grpc/core/master/linux/bazel_rbe/grpc_bazel_rbe_nonbazel`](https://btx.cloud.google.com/invocations;p=830293263384?q=JOB_NAME:grpc%2Fcore%2Fmaster%2Flinux%2Fbazel_rbe%2Fgrpc_bazel_rbe_nonbazel)
-
[`grpc/core/pull_request/linux/bazel_rbe/grpc_bazel_rbe_nonbazel`](https://btx.cloud.google.com/invocations;p=830293263384?q=JOB_NAME:grpc%2Fcore%2Fpull_request%2Flinux%2Fbazel_rbe%2Fgrpc_bazel_rbe_nonbazel)

The issue is with the
`//tools/bazelify_tests/test:runtests_cpp_linux_dbg_gcc_8_build_only`
target, which is a part of the portability suite
(`//tools/bazelify_tests/test:portability_tests_linux`). With gcc-8,
building `buildtests_cxx` make target either times out, or fails with
`collect2: fatal error: ld terminated with signal 9`.

I've investigated this as an OOM issue (a common cause of `collect2:
fatal error: ld terminated`), but increasing memory limits does not
help. I've updated RBE stack from `n1-standard-16` (60 GB RAM) to
`e2-standard-32` (128 GB RAM) with no effect. Increasing various job
timeouts (kokoro, bazel, target, etc) didn't help either. See PR #41028
for more details and other attempts at root-causing.

The most important part of portability tests is to verify that gRPC can
be built with all supported compilers. Since we are having a problem
with building the tests with gcc-8, we've decided to stop covering the
tests for that compiler..

Specifically, this PR changes `runtests_c*_linux_dbg_gcc_8_build_only`
bazel target to skip building test make targets (via
`--cmake_configure_extra_args=-DgRPC_BUILD_TESTS=OFF`), and only build
`grpc++` make target. See `build_cxx.sh`:
https://github.com/grpc/grpc/blob/cb2db8fc21b31ac322d463dff5b7eff9fbbab97d/tools/run_tests/helper_scripts/build_cxx.sh#L49-L55

Notes and observations:
- Only gcc-8 and only cpp version is affected:
- Portability tests for other gcc versions have no problems building
`buildtests_cxx` of their corresponding
`runtests_c*_linux_dbg_gcc_*_build_only`.
- The C version of gcc-8 portability test
(`runtests_c_linux_dbg_gcc_8_build_only`) has not issues building tests
([sample run with full target
log](https://btx.cloud.google.com/invocations/0b3d41e7-3cf2-4ff8-b6d5-2bc0d52179cd/targets/%2F%2Ftools%2Fbazelify_tests%2Ftest:runtests_c_linux_dbg_gcc_8_build_only;config=815e4ca9071c7e1d8ca72b9c87c1347399a51eb1246eb9c49dd54d9a24ef5cba/tests)).
- However, unfortunately, this change skips the test targets for
`runtests_c_linux_dbg_gcc_8_build_only` too.
- We already had the logic to skip tests for gcc-7, but for a different
reason: #37257

2 weeks ago[LB] remove SubchannelCallTrackerInterface::Start() method (#41099)
Mark D. Roth [Tue, 9 Dec 2025 16:59:10 +0000 (08:59 -0800)]
[LB] remove SubchannelCallTrackerInterface::Start() method (#41099)

This is needed for gRFC A105 (https://github.com/grpc/proposal/pull/516).  Specifically, see the "Interaction with xDS Circuit Breaking" section.

It's possible for an LB pick to be happening at the same time as the subchannel sees its underlying connection fail.  In this case, the picker can return a subchannel, but when the channel tries to start a call on the subchannel, the call creation fails, because there is no underlying connection.  In that case, the channel will queue the pick, on the assumption that the LB policy will soon notice that the subchannel has been disconnected and return a new picker, at which point the queued pick will be re-attempted with that new picker.

When the picker returns a complete pick, it can optionally return a `SubchannelCallTracker` object that allows it to see when the subchannel call starts and ends.  In the current API, when the channel successfully creates a call on the subchannel, it will immediately call `Start()`, and then when the subchannel call later ends, it will call `Finish()`.  However, when the race condition described above occurs, the `SubchannelCallTracker` object will be destroyed without `Start()` or `Finish()` ever having been called.  This API allows us to handle call counter incrementing and decrementing for things like xDS circuit breaking: we check the counter in the picker to see that it's currently below the limit, we increment the counter in `Start()`, and decrement it in `Finish()`.  If the subchannel call never starts, then the counter never gets incremented.

With the introduction of connection scaling functionality in the subchannel, this approach will no longer work, because the call may be queued inside of the subchannel rather than being immediately started on a connection, and the channel can't tell if that is going to happen.  In other words, there's no longer any benefit to the `Start()` method, because it will no longer actually indicate that the call is actually being started on a connection.  As a result, I am removing that method from the API.

For xDS circuit breaking in the xds_cluster_impl LB policy, we are now incrementing the call counter in the picker, and the `SubchannelCallTracker` object will decrement it when either `Finish()` is called or when the object is destroyed, whichever comes first.

For grpclb, the `Start()` method was used in an ugly hack to handle ownership of the client stats object between the grpclb policy and the client load reporting filter.  The LB policy passes a pointer to this object down to the filter via client initial metadata, which contains a raw pointer and does not hold a ref.  To handle ownership, the LB policy returns a `SubchannelCallTracker` that holds a ref to the client stats object, but when `Start()` is called, it releases that ref, on the assumption that the client load reporting filter will subsequently take ownership.  I've replaced this with a slightly cleaner approach whereby the call tracker always holds a ref to the client stats object, thus guaranteeing that the client stats object exists when the client load reporting filter sees it, and the client load reporting filter takes its own ref when it runs.  (An even cleaner approach would be to instead pass the client stats object to the filter via a call attribute, similar to how we pass the xDS cluster name from the ConfigSelector to the LB policy tree, but it doesn't seem worth putting that much effort into grpclb at this point.)

Closes #41099

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41099 from markdroth:xds_circuit_breaking_counter_change eaa06bbdf1688c31c0d1e3b3cabe6a7d015fc075
PiperOrigin-RevId: 842261731

2 weeks ago[PH2][E2E] Enable the following tests:
Akshit Patel [Tue, 9 Dec 2025 06:27:39 +0000 (22:27 -0800)]
[PH2][E2E] Enable the following tests:

1. CoreDeadlineTests.CancelAfterRoundTrip
2. CoreDeadlineSingleHopTests
3. ClientChannelTests.CancelAfterRoundTrip
4. ClientChannelTests.CancelAfterAccept
5. Http2Tests.HighInitialSeqno

PiperOrigin-RevId: 842067519

2 weeks agoadding checks in bazel/build file (#41181)
Rishesh Agarwal [Tue, 9 Dec 2025 04:35:05 +0000 (20:35 -0800)]
adding checks in bazel/build file (#41181)

Closes #41181

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41181 from rishesh007:layering_check_1 d0cd289560ba168cd1f3a9ecd60786d5595389f5
PiperOrigin-RevId: 842034184

2 weeks agoSkip sleuth.so and sleuth tests on Windows
Yousuk Seung [Mon, 8 Dec 2025 23:03:07 +0000 (15:03 -0800)]
Skip sleuth.so and sleuth tests on Windows

PiperOrigin-RevId: 841929574

2 weeks agoSanity fix (#41202)
Craig Tiller [Mon, 8 Dec 2025 22:42:10 +0000 (14:42 -0800)]
Sanity fix (#41202)

<!--

If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.

If your pull request is for a specific language, please add the appropriate
lang label.

-->

Closes #41202

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41202 from ctiller:san aab326f7d099518fe2d08dd953b36bbe71d218dc
PiperOrigin-RevId: 841921474

2 weeks agoAdd promise serialization for call op promises
Craig Tiller [Mon, 8 Dec 2025 18:04:37 +0000 (10:04 -0800)]
Add promise serialization for call op promises

Allows promise display for calls to dig into the promises being executed and display more of what's going on

PiperOrigin-RevId: 841811313

2 weeks ago[chaotic-good] Deadline fixes (#41190)
Craig Tiller [Mon, 8 Dec 2025 17:41:15 +0000 (09:41 -0800)]
[chaotic-good] Deadline fixes (#41190)

* Increase test connection deadline to account for CI slowness
* Add experiment to use handshaker deadline instead of hard coded deadline (since this is likely a bug)

Closes #41190

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41190 from ctiller:flake-cg e7852678fe0982f42f386ee1ffb421d334721f5b
PiperOrigin-RevId: 841802046

2 weeks ago[ValidationErrors] de-dup error messages (#41198)
Mark D. Roth [Mon, 8 Dec 2025 17:15:16 +0000 (09:15 -0800)]
[ValidationErrors] de-dup error messages (#41198)

Also mark the test as non-polling, so we don't run it multiple times.

Closes #41198

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41198 from markdroth:validation_errors_no_dups 2ebe0f9846e9199a9ec1a3bfceeab58b7d3b65f9
PiperOrigin-RevId: 841792148

2 weeks ago[PH2][E2E] Enable the following E2E tests
Akshit Patel [Mon, 8 Dec 2025 17:04:58 +0000 (09:04 -0800)]
[PH2][E2E] Enable the following E2E tests

PiperOrigin-RevId: 841788638

2 weeks ago[PH2][Settings] Enforcing first SETTINGS frame
Tanvi Jagtap [Mon, 8 Dec 2025 15:43:56 +0000 (07:43 -0800)]
[PH2][Settings] Enforcing first SETTINGS frame

PiperOrigin-RevId: 841760285

2 weeks ago[PH2][E2E] Enable the following tests:
Akshit Patel [Mon, 8 Dec 2025 14:08:23 +0000 (06:08 -0800)]
[PH2][E2E] Enable the following tests:

1. Http2SingleHoptests.KeepaliveTimeout

PiperOrigin-RevId: 841731595

2 weeks ago[PH2][E2E] Populate `grpc_message` flags
Akshit Patel [Mon, 8 Dec 2025 13:39:50 +0000 (05:39 -0800)]
[PH2][E2E] Populate `grpc_message` flags

PiperOrigin-RevId: 841723848

3 weeks ago[pick_first] go CONNECTING when selected subchannel goes CONNECTING or TF (#41029)
Mark D. Roth [Fri, 5 Dec 2025 20:24:49 +0000 (12:24 -0800)]
[pick_first] go CONNECTING when selected subchannel goes CONNECTING or TF (#41029)

Needed as part of gRFC A105 (https://github.com/grpc/proposal/pull/516).

Currently, when the selected subchannel leaves READY state, the only possible state it can move to is IDLE, and pick_first handles that by itself going IDLE.  However, as part of A105, we are going to introduce the possibility of the subchannel going from READY to either CONNECTING or TRANSIENT_FAILURE, and in those two cases we want pick_first to go back into CONNECTING and start a new happy eyeballs pass.  This PR introduces an experiment that adds that behavior.

While I was at it, I noticed an existing misfeature.  There are two cases where pick_first will go IDLE, which is done by calling [`GoIdle()`](https://github.com/grpc/grpc/blob/24b25a0baa72a658cc37d1db28f77513a9670ea2/src/core/load_balancing/pick_first/pick_first.cc#L610):
1. The case mentioned above, where the selected subchannel goes from READY to IDLE (`GoIdle()` is called from [`SubchannelState::OnConnectivityStateChange()`](https://github.com/grpc/grpc/blob/24b25a0baa72a658cc37d1db28f77513a9670ea2/src/core/load_balancing/pick_first/pick_first.cc#L784)).
2. The case where pick_first already has a selected subchannel and receives a new address list, but none of the subchannels in the new list report READY.  In this case, pick_first knows that the currently selected subchannel is for an address that is not present in the new address list, so it unrefs the selected subchannel and goes IDLE (`GoIdle()` is called from [`SubchannelData::OnConnectivityStateChange()`](https://github.com/grpc/grpc/blob/24b25a0baa72a658cc37d1db28f77513a9670ea2/src/core/load_balancing/pick_first/pick_first.cc#L859)).

The code in `GoIdle()` currently requests a re-resolution, which is the right behavior for case 1.  However, it doesn't really make sense to do this for case 2, since we have just received a fresh resolver update in that case.  Therefore, as part of this experiment, I am moving the code that triggers the re-resolution out of `GoIdle()` and directly into `SubchannelState::OnConnectivityStateChange()`, where it will occur only for case 1.

Closes #41029

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41029 from markdroth:pick_first_ready_to_connecting fdb6ef68e3a73e0035520149b72a1d21775354c3
PiperOrigin-RevId: 840830927

3 weeks ago[PH2][Refactor]
Tanvi Jagtap [Fri, 5 Dec 2025 18:04:10 +0000 (10:04 -0800)]
[PH2][Refactor]
The Pausing and Restarting of the ReadLoop happens in a separate class.
We could generalize and re-use this mechanism elsewhere, but that is a task for later.

PiperOrigin-RevId: 840773537

3 weeks ago[PH2] Build changes (#41194)
ac-patel [Fri, 5 Dec 2025 10:53:05 +0000 (02:53 -0800)]
[PH2] Build changes (#41194)

Closes #41194

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41194 from ac-patel:test10 b5c81d5c0a7c3497c898a86717ca197613d39459
PiperOrigin-RevId: 840644590

3 weeks ago[PH2][ChannelArg] Adding support for GRPC_ARG_HTTP2_INITIAL_SEQUENCE_NUMBER. This...
Akshit Patel [Fri, 5 Dec 2025 09:13:23 +0000 (01:13 -0800)]
[PH2][ChannelArg] Adding support for GRPC_ARG_HTTP2_INITIAL_SEQUENCE_NUMBER. This CL also modifies the error message returned when the last stream is closed and the transport cannot create any new streams.

PiperOrigin-RevId: 840601223

3 weeks ago[PH2][CallV3] Fix off by one error in client call
Akshit Patel [Fri, 5 Dec 2025 07:25:19 +0000 (23:25 -0800)]
[PH2][CallV3] Fix off by one error in client call

The current while condition always skips the first pending batch which can cause some ops to never poll.

PiperOrigin-RevId: 840571167

3 weeks agoAdding layering_check and parse_headers in each bazel src/python build file
Rishesh Agarwal [Fri, 5 Dec 2025 04:04:57 +0000 (20:04 -0800)]
Adding layering_check and parse_headers in each bazel src/python build file

PiperOrigin-RevId: 840512315

3 weeks agoExtend pipelined_read_secure_endpoint experiment.
Alisha Nanda [Thu, 4 Dec 2025 19:54:51 +0000 (11:54 -0800)]
Extend pipelined_read_secure_endpoint experiment.

PiperOrigin-RevId: 840339485

3 weeks agoAdd a mark to collect actual start timestamp and fix flow end JSON formatting.
Alisha Nanda [Thu, 4 Dec 2025 19:53:52 +0000 (11:53 -0800)]
Add a mark to collect actual start timestamp and fix flow end JSON formatting.

PiperOrigin-RevId: 840339101

3 weeks ago[PH2][Settings][Refactor]
Tanvi Jagtap [Thu, 4 Dec 2025 15:26:54 +0000 (07:26 -0800)]
[PH2][Settings][Refactor]
1. Moved on_receive_settings callback logic into SettingsPromiseManager.
2. Stall reads until the first peer settings are processed.
3. Encapsulated security frame settings logic within SettingsPromiseManager.

PiperOrigin-RevId: 840235504

3 weeks ago[PH2][Bug] Fix call to `BeginCloseStream` from `HandleError`.
Akshit Patel [Thu, 4 Dec 2025 08:40:59 +0000 (00:40 -0800)]
[PH2][Bug] Fix call to `BeginCloseStream` from `HandleError`.

`HandleError` is called from a transport promise when some stream/connection error is encountered. Hence when a stream trailing metadata is passed to the call stack, it MUST be passed with a cancelled status.

PiperOrigin-RevId: 840113911

3 weeks ago[PH2][E2E] Enable the following E2E tests:
Akshit Patel [Thu, 4 Dec 2025 06:22:39 +0000 (22:22 -0800)]
[PH2][E2E] Enable the following E2E tests:
1. CoreLargeSendTests.Payload

PiperOrigin-RevId: 840073255

3 weeks agoRemove the max_age_filter_float_to_top experiment since it has been rolled out for...
Vignesh Babu [Wed, 3 Dec 2025 19:45:26 +0000 (11:45 -0800)]
Remove the max_age_filter_float_to_top experiment since it has been rolled out for a while

PiperOrigin-RevId: 839848318

3 weeks ago[PH2] Misc items
Tanvi Jagtap [Wed, 3 Dec 2025 15:21:53 +0000 (07:21 -0800)]
[PH2] Misc items
1. Move `SourceConstructed` to after the party is instantiated.
2. Update TODOs and comments.
3. Add debug info where mark (@roth) had left a TODO.
4. Rename GetActiveStreamCount to GetActiveStreamCountLocked

PiperOrigin-RevId: 839746883

3 weeks ago[PH2][E2E] Remove the max limit of a single gRPC message accepted by the transport.
Akshit Patel [Wed, 3 Dec 2025 10:23:57 +0000 (02:23 -0800)]
[PH2][E2E] Remove the max limit of a single gRPC message accepted by the transport.

PiperOrigin-RevId: 839662093

3 weeks ago[PH2][ChannelArgs] Refactor reading channel args
Akshit Patel [Wed, 3 Dec 2025 08:39:14 +0000 (00:39 -0800)]
[PH2][ChannelArgs] Refactor reading channel args

PiperOrigin-RevId: 839628262

3 weeks agoChaotic Good: Verify Peer in Chaotic Good Handshake during Data Endpoint creation
Aananth V [Wed, 3 Dec 2025 05:39:00 +0000 (21:39 -0800)]
Chaotic Good: Verify Peer in Chaotic Good Handshake during Data Endpoint creation

Since Chaotic Good enables using a group of TCP connections as a composite channel we need to ensure that all TCP connections are established with the same peer. In this change, we store a Ref to the `grpc_auth_context` of the Connection that created the Control Endpoint and compare it to the `grpc_auth_context` of the Connection requesting each Data Endpoint using the [Injectable Peer Comparison API](https://github.com/grpc/grpc/pull/39610). If no peer comparison API is installed, the identity verification will not be performed.

The updated Chaotic Good handshake is as follows: (changed steps are in **bolded**)

First the control channel is established:
   1. ALTS/TLS/LOAS/PSP: Each new TCP connection goes through the “normal” security handshakes for gRPC, checking certificates, establishing identity
   2. A Chaotic Good Settings frame is sent from the client, with data_channel == 0
   3. The server processes the received Settings frame, creates N pending data connections, and responds with a Settings frame with a randomly generated set of connection ids: 1 per requested data connection. **The created PendingDataConnections hold a reference to the Control Channel’s grpc_auth_context.**
   4. The client processes the received Settings frame and creates one data connection per received connection_id.

For each data channel requested:
   1. The TCP connection proceeds as usual (same as 1 above)
   2. The Settings frame sent will relay the connection_id for this data channel, with data_channel == 1
   3. The server responds with a Settings frame with data_channel == 1.
   4. **Finally, server looks up the association for this connection_id and verifies the equivalence of the current connection’s grpc_auth_context and the stored grpc_auth_context of the control channel.**
      - **If lookup is successful and peer is equivalent, we bind the connection with that chaotic good channel.**
      - **Else, we abort the connection.**

PiperOrigin-RevId: 839573243

3 weeks ago[PH2][Trivial] Enable Cancel and Deadline suite
Tanvi Jagtap [Wed, 3 Dec 2025 05:11:10 +0000 (21:11 -0800)]
[PH2][Trivial] Enable Cancel and Deadline suite
The flake has been fixed.

PiperOrigin-RevId: 839564751

3 weeks agoTrack allocations in tsi_zero_copy_grpc_protector towards ResourceQuota.
Siddharth Nohria [Wed, 3 Dec 2025 03:00:30 +0000 (19:00 -0800)]
Track allocations in tsi_zero_copy_grpc_protector towards ResourceQuota.

This change introduces a `set_allocator` method to the `tsi_zero_copy_grpc_protector` vtable and API. The ALTS zero-copy frame protector implementation is updated to use a provided allocator callback (`tsi_zero_copy_grpc_protector_allocator_cb`) for allocating protected and unprotected slices, falling back to `GRPC_SLICE_MALLOC` if no custom allocator is set.

PiperOrigin-RevId: 839519073

3 weeks ago[PH2][E2E] Fix a race condition in stream_data_queue
Akshit Patel [Wed, 3 Dec 2025 02:54:40 +0000 (18:54 -0800)]
[PH2][E2E] Fix a race condition in stream_data_queue

The `stream_id_` is currently accessed both in Enqueue and Dequeue operations resulting in the race. Technically, in the Enqueue flow `stream_id_` is only used for logs which is redundant and hence being removed.

PiperOrigin-RevId: 839517493

3 weeks agoAutomated Code Change
Chris Kennelly [Tue, 2 Dec 2025 15:53:14 +0000 (07:53 -0800)]
Automated Code Change

PiperOrigin-RevId: 839268035

3 weeks ago[PH2][Settings][Refactor] Step 3.3
Tanvi Jagtap [Tue, 2 Dec 2025 14:49:23 +0000 (06:49 -0800)]
[PH2][Settings][Refactor] Step 3.3

1. Removes unused includes of http2_settings_manager.h
2. Moves settings ACK handling into SettingsPromiseManager from Http2SettingsManager
3. Deletes `MaybeSendAck` related tests from http2_settings_test.cc
4. Moved tests as-is into settings_timeout_manager_test.cc from http2_transport_test.cc

PiperOrigin-RevId: 839247137

3 weeks agoInclude GlobalCollectionScope in StatsPluginGroup::GetCollectionScope.
Aananth V [Tue, 2 Dec 2025 13:32:29 +0000 (05:32 -0800)]
Include GlobalCollectionScope in StatsPluginGroup::GetCollectionScope.

Also adds a requirement that the Collection Scope returned by StatsPlugin::GetCollectionScope is a Root Scope (i.e. has no parents). This is to avoid Diamond structures in the DAG (doesn't fix the problem entirely but is a good failsafe for now).

PiperOrigin-RevId: 839223367

3 weeks ago[PH2][Experiment] Enable `sleep_use_non_owning_waker` (#41165)
ac-patel [Tue, 2 Dec 2025 13:18:21 +0000 (05:18 -0800)]
[PH2][Experiment] Enable `sleep_use_non_owning_waker` (#41165)

 Enable `sleep_use_non_owning_waker`

Closes #41165

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41165 from ac-patel:experiment1 d646a1d0c002fd2c85891adf6b83c0c2a2e554a9
PiperOrigin-RevId: 839219556

3 weeks ago[PH2][Settings][Refactor] Step 3.2 : Consolidating related functions
Tanvi Jagtap [Tue, 2 Dec 2025 11:47:05 +0000 (03:47 -0800)]
[PH2][Settings][Refactor] Step 3.2 : Consolidating related functions

| Merged functions | Final Function |
|---|---|
| `OnSettingsReceived` + `BufferPeerSettings` | `BufferPeerSettings` |
| `AckLastSend` + `OnSettingsAckReceived` | `OnSettingsAckReceived` |
| `ApplyIncomingSettings` + `TakeBufferedPeerSettings` | `ApplyBufferedPeerSettings` |

PiperOrigin-RevId: 839192044

3 weeks ago[PH2][E2E] Enable the following tests:
Akshit Patel [Tue, 2 Dec 2025 09:53:28 +0000 (01:53 -0800)]
[PH2][E2E] Enable the following tests:

1. CoreEnd2End.MaxMessageLength
2. Http2Tests.MaxMessageLength

PiperOrigin-RevId: 839155004

3 weeks ago[PH2][Settings][Refactor] Step 3.1
Tanvi Jagtap [Tue, 2 Dec 2025 08:37:13 +0000 (00:37 -0800)]
[PH2][Settings][Refactor] Step 3.1
This CL refactors HTTP/2 settings ACK handling by moving the did_previous_settings_promise_resolve_ flag from Http2SettingsManager to Http2SettingsPromiseManager. did_previous_settings_promise_resolve_ is now fully managed by Http2SettingsPromiseManager so other classes don't need to check it or set it.

PiperOrigin-RevId: 839129676

3 weeks ago[PH2][Settings][Refactor] Step 2.2 Consolidate settings management
Tanvi Jagtap [Tue, 2 Dec 2025 05:47:28 +0000 (21:47 -0800)]
[PH2][Settings][Refactor] Step 2.2 Consolidate settings management

Step 2.2
Move object of Http2SettingsManager class into SettingsPromiseManager and the Http2ClientTransport will use Http2SettingsManager via SettingsPromiseManager

PiperOrigin-RevId: 839076679

3 weeks ago[PH2][Bug][Stream]
Tanvi Jagtap [Tue, 2 Dec 2025 05:28:07 +0000 (21:28 -0800)]
[PH2][Bug][Stream]
1. Fixes a bug by preventing DATA frame processing on streams that have not yet received initial metadata.
2. Minor refactoring of existing code.

PiperOrigin-RevId: 839070320

3 weeks agoDefine TCP Metrics Domain
Aananth V [Tue, 2 Dec 2025 04:00:07 +0000 (20:00 -0800)]
Define TCP Metrics Domain

PiperOrigin-RevId: 839039889

3 weeks agoUpdate Sleuth version.
Alisha Nanda [Mon, 1 Dec 2025 18:49:53 +0000 (10:49 -0800)]
Update Sleuth version.

PiperOrigin-RevId: 838844425

3 weeks ago[Cleanup] Remove workaround Apple CFStream bug from e2e tests (#41121)
Pawan Bhardwaj [Mon, 1 Dec 2025 10:02:30 +0000 (02:02 -0800)]
[Cleanup] Remove  workaround Apple CFStream bug from e2e tests (#41121)

<!--

If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.

If your pull request is for a specific language, please add the appropriate
lang label.

-->

Closes #41121

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41121 from pawbhard:temp_check 185a3d8cc4f617d6df66d4f1adc738ef9a4b13f6
PiperOrigin-RevId: 838668301

3 weeks ago[PH2][Trivial][TODO]
Tanvi Jagtap [Mon, 1 Dec 2025 07:07:02 +0000 (23:07 -0800)]
[PH2][Trivial][TODO]

PiperOrigin-RevId: 838616732

3 weeks agoAdding layering_check and parse_headers in android bazel build file
Rishesh Agarwal [Mon, 1 Dec 2025 04:53:04 +0000 (20:53 -0800)]
Adding layering_check and parse_headers in android bazel build file

PiperOrigin-RevId: 838579798

4 weeks ago[PH2][Settings][Refactor] Move MaybeGetSettingsAndSettingsAckFrames
Tanvi Jagtap [Sat, 29 Nov 2025 19:11:02 +0000 (11:11 -0800)]
[PH2][Settings][Refactor] Move MaybeGetSettingsAndSettingsAckFrames
Make MaybeGetSettingsAndSettingsAckFrames a data member of class SettingsPromiseManager.

PiperOrigin-RevId: 838146703

4 weeks agoOptionalize linking postmortem library entirely
Craig Tiller [Fri, 28 Nov 2025 17:53:23 +0000 (09:53 -0800)]
Optionalize linking postmortem library entirely

PiperOrigin-RevId: 837871164

4 weeks ago[PH2][Settings][Refactor] Step 4 : Rename
Tanvi Jagtap [Fri, 28 Nov 2025 11:23:28 +0000 (03:23 -0800)]
[PH2][Settings][Refactor] Step 4 : Rename
Step 1 : https://github.com/grpc/grpc/pull/41103
Step 2, 3 : WIP
Step 4 : (This PR)
Rename variables and functions to ensure that the common confusion between SENT and RECEIVED settings is not there. The current structure and naming makes it hard to differentiate. We really have wasted a LOT of time here.

PiperOrigin-RevId: 837785968

4 weeks ago[build] Test fix (#41146)
Craig Tiller [Fri, 28 Nov 2025 10:04:07 +0000 (02:04 -0800)]
[build] Test fix (#41146)

<!--

If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.

If your pull request is for a specific language, please add the appropriate
lang label.

-->

Closes #41146

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41146 from ctiller:sn ef53a47393f05823090781d9bcaa9185d465e68b
PiperOrigin-RevId: 837767346

4 weeks ago[PH2][Trivial] Disable some tests
Tanvi Jagtap [Fri, 28 Nov 2025 05:16:39 +0000 (21:16 -0800)]
[PH2][Trivial] Disable some tests

PiperOrigin-RevId: 837698567

4 weeks agoAdd call inspection to channelz
Craig Tiller [Thu, 27 Nov 2025 22:25:13 +0000 (14:25 -0800)]
Add call inspection to channelz

Add a new config to enable active call inspection with channelz, disabled by default. Plumb through promise_based_filter, call-v3.

PiperOrigin-RevId: 837614415

4 weeks ago[Python] Disable layering check in grpc_tools:protoc_lib (#41142)
Sreenithi Sridharan [Thu, 27 Nov 2025 13:18:39 +0000 (05:18 -0800)]
[Python] Disable layering check in grpc_tools:protoc_lib (#41142)

Python Bazel tests have been failing since yesterday after layering check was enabled in grpcio_tools build in commit: https://github.com/grpc/grpc/commit/756389e9e75ba93d7316ef9eae2ca83126ad9f94

Temporarily disabling it after discussing IRL with @rishesh007

Closes #41142

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41142 from sreenithi:temp_fix_python_bazel_test 751c420bf3a27066d6cdd912e0e08e9c0acaebb8
PiperOrigin-RevId: 837494537

4 weeks ago[PH2][Trivial] Disabled cancel suite and remove logging
Tanvi Jagtap [Thu, 27 Nov 2025 10:14:06 +0000 (02:14 -0800)]
[PH2][Trivial] Disabled cancel suite and remove logging

PiperOrigin-RevId: 837447558

4 weeks ago[PH2][Settings][Refactor]
Tanvi Jagtap [Thu, 27 Nov 2025 04:54:29 +0000 (20:54 -0800)]
[PH2][Settings][Refactor]

Initial Design :
```
class Http2ClientTransport {
    private :

PendingIncomingSettings object1;
SettingsTimeoutManager object2;
Http2SettingsManager object3;

    public :

void TypicalTransportFunction(){
    ... other non-settings work ...
object1.DetailedWork1();
object2.DetailedWork2();
object3.DetailedWork3();
    ... other non-settings work ...
}
};
```

New Design

```
class Http2ClientTransport{
SettingsPromiseManager settings_manager_;

void TypicalTransportFunction(){
    ... other non-settings work ...
settings_manager_.SomeWork();
    ... other non-settings work ...
}
};

class SettingsPromiseManager{
Http2SettingsManager settings_;

void SomeWork(){
DetailedWork1();
DetailedWork2();
settings_.DetailedWork3();
}

private :
DetailedWork1();
DetailedWork2();
}
```

Refactor Step 1
1. Merge class `SettingsTimeoutManager` and `PendingIncomingSettings` into a new class named `SettingsPromiseManager`
2. Replace usage of `PendingIncomingSettings` and `SettingsTimeoutManager` with usage of `SettingsPromiseManager`
3. Replace `pending_incoming_settings_` with `transport_settings_`

Future Steps
1. Step 2 : Move object of `Http2SettingsManager` class into `SettingsPromiseManager` and the `Http2ClientTransport` will use  `Http2SettingsManager` via `SettingsPromiseManager`
2. Step 3 : Earlier the `Http2ClientTransport` class had interactions between `Http2SettingsManager` `SettingsTimeoutManager` and `PendingIncomingSettings` in the transport. Move this into our new `SettingsPromiseManager` class. This will make the transport lean. This PR will need careful review to the business logic. This will also make multiple permutations of settings very easily testable and debuggable.
3. Step 4 : Rename variables and functions to ensure that the common confusion between SENT and RECEIVED settings is not there. The current structure and naming makes it hard to differentiate. We really have wasted a LOT of time here.
4. Step 5 : Write unit tests for `SettingsPromiseManager` class, modelling scenarios similar to how the transport will be using the settings. Also add missing tests to `Http2SettingsManager` if needed.

PiperOrigin-RevId: 837359318

4 weeks ago[PH2][E2E] E2E . Multiple Changes
Tanvi Jagtap [Thu, 27 Nov 2025 04:26:24 +0000 (20:26 -0800)]
[PH2][E2E] E2E . Multiple Changes
1. Enable logging for 2 flaking HPack tests
2. Writing a new function which will enable logging for PH2 for flaking tests
3. Splitting the CANCEL and DEADLINE test suites so that these can be switched on and off separately.

PiperOrigin-RevId: 837349917

4 weeks ago[Ph2][E2E] Logs to debug a flake
Akshit Patel [Thu, 27 Nov 2025 04:14:26 +0000 (20:14 -0800)]
[Ph2][E2E] Logs to debug a flake

PiperOrigin-RevId: 837347558

4 weeks agoremove default_applicable_licenses to tools/codegen BUILD files.
Rishesh Agarwal [Wed, 26 Nov 2025 12:31:59 +0000 (04:31 -0800)]
remove default_applicable_licenses to tools/codegen BUILD files.

PiperOrigin-RevId: 837063317

4 weeks agoAdding layering_check and parse_headers in each bazel codegen build file
Rishesh Agarwal [Wed, 26 Nov 2025 09:07:12 +0000 (01:07 -0800)]
Adding layering_check and parse_headers in each bazel codegen build file

PiperOrigin-RevId: 837002393

4 weeks ago[PH2][E2E] Add logs to debug a flake
Akshit Patel [Wed, 26 Nov 2025 06:21:31 +0000 (22:21 -0800)]
[PH2][E2E] Add logs to debug a flake

PiperOrigin-RevId: 836947917

4 weeks agoAdding layering_check and parse_headers in each bazel distrib python build file
Rishesh Agarwal [Wed, 26 Nov 2025 05:32:25 +0000 (21:32 -0800)]
Adding layering_check and parse_headers in each bazel distrib python build file

PiperOrigin-RevId: 836934818

4 weeks ago[PH2][Trivial] Enabling cancel test suite
Tanvi Jagtap [Wed, 26 Nov 2025 03:11:46 +0000 (19:11 -0800)]
[PH2][Trivial] Enabling cancel test suite

PiperOrigin-RevId: 836893540

4 weeks ago[PH2][E2E] Fix channelZ AddData race with transport deletion.
Akshit Patel [Wed, 26 Nov 2025 02:42:51 +0000 (18:42 -0800)]
[PH2][E2E] Fix channelZ AddData race with transport deletion.

This CL moves `SourceDestructing` from the destructor to `Orphan`. It is possible that `AddData` call tries to take a ref on the transport while the transport is being destructed (before `SourceDestructing` is invoked). Calling `SourceDestructing` from `Orphan` ensures that `AddData` is not called after dropping the external transport ref.

PiperOrigin-RevId: 836886988

4 weeks ago[PH2][Settings] Multiple changes
Tanvi Jagtap [Wed, 26 Nov 2025 02:02:43 +0000 (18:02 -0800)]
[PH2][Settings] Multiple changes
1. Complete the ProcessHttp2SettingsFrame function
2. Applying the incoming settings in the MultiplexerLoop and sending an ACK for incoming settings
3. Managing initial window size settings for acked settings (this was missed in previous PR).
4. Decoupling ApplyIncomingSettings from OnSettingsReceived

PiperOrigin-RevId: 836876453

4 weeks ago[PH2][Bug] Move transport loop spawning out of the constructor
Tanvi Jagtap [Tue, 25 Nov 2025 15:37:06 +0000 (07:37 -0800)]
[PH2][Bug] Move transport loop spawning out of the constructor

Spawning transport loops from the Http2ClientTransport constructor creates a race condition. An initialization error can trigger a shutdown, causing the transport to be destroyed from within its own constructor.

This CL moves the loop-spawning logic to a new public method, SpawnTransportLoops(). The Chtttp2Connector now calls this method after the transport is fully constructed. This ensures a clean separation between object construction and the start of asynchronous operations, preventing premature closure and potential bugs.

PiperOrigin-RevId: 836663966

4 weeks agoAdd support to export Instrument -> OpenTelemetry UpDownCounters.
Aananth V [Tue, 25 Nov 2025 13:47:36 +0000 (05:47 -0800)]
Add support to export Instrument -> OpenTelemetry UpDownCounters.

PiperOrigin-RevId: 836631446