git.feebdaed.xyz Git - 0xmirror/grpc.git/log

[Testing] Fix spiffe portability (#41205)

Fix a few issues when build with OpenSSL versions

OpenSSL1.0.2 - copied some CRL related test code that was not valid assumptions for these tests.
OpenSSL1.1.1 - The regex is too sensitive, only do the regex check for BoringSSL
OpenSSL3 - We though the Invalid UTF8-SAN behavior should cause handshake failures for OpenSSL3 here and included different behavior, but that is still what is breaking. Let's revert that change.

Closes #41205

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41205 from gtcooke94:fix_spiffe_portability 8818df50053944444c1093bdf500944b690422d3
PiperOrigin-RevId: 842526173

[Fix][CI] Skip tests build in gcc-8 portability tests (#41204)

Fixes "Bazel RBE Non-Bazel Tests" job timeouts. This affects:

-
[`grpc/core/master/linux/bazel_rbe/grpc_bazel_rbe_nonbazel`](https://btx.cloud.google.com/invocations;p=830293263384?q=JOB_NAME:grpc%2Fcore%2Fmaster%2Flinux%2Fbazel_rbe%2Fgrpc_bazel_rbe_nonbazel)
-
[`grpc/core/pull_request/linux/bazel_rbe/grpc_bazel_rbe_nonbazel`](https://btx.cloud.google.com/invocations;p=830293263384?q=JOB_NAME:grpc%2Fcore%2Fpull_request%2Flinux%2Fbazel_rbe%2Fgrpc_bazel_rbe_nonbazel)

The issue is with the
`//tools/bazelify_tests/test:runtests_cpp_linux_dbg_gcc_8_build_only`
target, which is a part of the portability suite
(`//tools/bazelify_tests/test:portability_tests_linux`). With gcc-8,
building `buildtests_cxx` make target either times out, or fails with
`collect2: fatal error: ld terminated with signal 9`.

I've investigated this as an OOM issue (a common cause of `collect2:
fatal error: ld terminated`), but increasing memory limits does not
help. I've updated RBE stack from `n1-standard-16` (60 GB RAM) to
`e2-standard-32` (128 GB RAM) with no effect. Increasing various job
timeouts (kokoro, bazel, target, etc) didn't help either. See PR #41028
for more details and other attempts at root-causing.

The most important part of portability tests is to verify that gRPC can
be built with all supported compilers. Since we are having a problem
with building the tests with gcc-8, we've decided to stop covering the
tests for that compiler..

Specifically, this PR changes `runtests_c*_linux_dbg_gcc_8_build_only`
bazel target to skip building test make targets (via
`--cmake_configure_extra_args=-DgRPC_BUILD_TESTS=OFF`), and only build
`grpc++` make target. See `build_cxx.sh`:
https://github.com/grpc/grpc/blob/cb2db8fc21b31ac322d463dff5b7eff9fbbab97d/tools/run_tests/helper_scripts/build_cxx.sh#L49-L55

Notes and observations:
- Only gcc-8 and only cpp version is affected:
- Portability tests for other gcc versions have no problems building
`buildtests_cxx` of their corresponding
`runtests_c*_linux_dbg_gcc_*_build_only`.
- The C version of gcc-8 portability test
(`runtests_c_linux_dbg_gcc_8_build_only`) has not issues building tests
([sample run with full target
log](https://btx.cloud.google.com/invocations/0b3d41e7-3cf2-4ff8-b6d5-2bc0d52179cd/targets/%2F%2Ftools%2Fbazelify_tests%2Ftest:runtests_c_linux_dbg_gcc_8_build_only;config=815e4ca9071c7e1d8ca72b9c87c1347399a51eb1246eb9c49dd54d9a24ef5cba/tests)).
- However, unfortunately, this change skips the test targets for
`runtests_c_linux_dbg_gcc_8_build_only` too.
- We already had the logic to skip tests for gcc-7, but for a different
reason: #37257

[LB] remove SubchannelCallTrackerInterface::Start() method (#41099)

This is needed for gRFC A105 (https://github.com/grpc/proposal/pull/516).  Specifically, see the "Interaction with xDS Circuit Breaking" section.

It's possible for an LB pick to be happening at the same time as the subchannel sees its underlying connection fail.  In this case, the picker can return a subchannel, but when the channel tries to start a call on the subchannel, the call creation fails, because there is no underlying connection.  In that case, the channel will queue the pick, on the assumption that the LB policy will soon notice that the subchannel has been disconnected and return a new picker, at which point the queued pick will be re-attempted with that new picker.

When the picker returns a complete pick, it can optionally return a `SubchannelCallTracker` object that allows it to see when the subchannel call starts and ends.  In the current API, when the channel successfully creates a call on the subchannel, it will immediately call `Start()`, and then when the subchannel call later ends, it will call `Finish()`.  However, when the race condition described above occurs, the `SubchannelCallTracker` object will be destroyed without `Start()` or `Finish()` ever having been called.  This API allows us to handle call counter incrementing and decrementing for things like xDS circuit breaking: we check the counter in the picker to see that it's currently below the limit, we increment the counter in `Start()`, and decrement it in `Finish()`.  If the subchannel call never starts, then the counter never gets incremented.

With the introduction of connection scaling functionality in the subchannel, this approach will no longer work, because the call may be queued inside of the subchannel rather than being immediately started on a connection, and the channel can't tell if that is going to happen.  In other words, there's no longer any benefit to the `Start()` method, because it will no longer actually indicate that the call is actually being started on a connection.  As a result, I am removing that method from the API.

For xDS circuit breaking in the xds_cluster_impl LB policy, we are now incrementing the call counter in the picker, and the `SubchannelCallTracker` object will decrement it when either `Finish()` is called or when the object is destroyed, whichever comes first.

For grpclb, the `Start()` method was used in an ugly hack to handle ownership of the client stats object between the grpclb policy and the client load reporting filter.  The LB policy passes a pointer to this object down to the filter via client initial metadata, which contains a raw pointer and does not hold a ref.  To handle ownership, the LB policy returns a `SubchannelCallTracker` that holds a ref to the client stats object, but when `Start()` is called, it releases that ref, on the assumption that the client load reporting filter will subsequently take ownership.  I've replaced this with a slightly cleaner approach whereby the call tracker always holds a ref to the client stats object, thus guaranteeing that the client stats object exists when the client load reporting filter sees it, and the client load reporting filter takes its own ref when it runs.  (An even cleaner approach would be to instead pass the client stats object to the filter via a call attribute, similar to how we pass the xDS cluster name from the ConfigSelector to the LB policy tree, but it doesn't seem worth putting that much effort into grpclb at this point.)

Closes #41099

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41099 from markdroth:xds_circuit_breaking_counter_change eaa06bbdf1688c31c0d1e3b3cabe6a7d015fc075
PiperOrigin-RevId: 842261731

[PH2][E2E] Enable the following tests:

1. CoreDeadlineTests.CancelAfterRoundTrip
2. CoreDeadlineSingleHopTests
3. ClientChannelTests.CancelAfterRoundTrip
4. ClientChannelTests.CancelAfterAccept
5. Http2Tests.HighInitialSeqno

PiperOrigin-RevId: 842067519

adding checks in bazel/build file (#41181)

Closes #41181

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41181 from rishesh007:layering_check_1 d0cd289560ba168cd1f3a9ecd60786d5595389f5
PiperOrigin-RevId: 842034184

Skip sleuth.so and sleuth tests on Windows

PiperOrigin-RevId: 841929574

Sanity fix (#41202)



Closes #41202

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41202 from ctiller:san aab326f7d099518fe2d08dd953b36bbe71d218dc
PiperOrigin-RevId: 841921474

Add promise serialization for call op promises

Allows promise display for calls to dig into the promises being executed and display more of what's going on

PiperOrigin-RevId: 841811313

[chaotic-good] Deadline fixes (#41190)

* Increase test connection deadline to account for CI slowness
* Add experiment to use handshaker deadline instead of hard coded deadline (since this is likely a bug)

Closes #41190

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41190 from ctiller:flake-cg e7852678fe0982f42f386ee1ffb421d334721f5b
PiperOrigin-RevId: 841802046

[ValidationErrors] de-dup error messages (#41198)

Also mark the test as non-polling, so we don't run it multiple times.

Closes #41198

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41198 from markdroth:validation_errors_no_dups 2ebe0f9846e9199a9ec1a3bfceeab58b7d3b65f9
PiperOrigin-RevId: 841792148

[PH2][E2E] Enable the following E2E tests

PiperOrigin-RevId: 841788638

[PH2][Settings] Enforcing first SETTINGS frame

PiperOrigin-RevId: 841760285

[PH2][E2E] Enable the following tests:

1. Http2SingleHoptests.KeepaliveTimeout

PiperOrigin-RevId: 841731595

[PH2][E2E] Populate `grpc_message` flags

PiperOrigin-RevId: 841723848

[pick_first] go CONNECTING when selected subchannel goes CONNECTING or TF (#41029)

Needed as part of gRFC A105 (https://github.com/grpc/proposal/pull/516).

Currently, when the selected subchannel leaves READY state, the only possible state it can move to is IDLE, and pick_first handles that by itself going IDLE.  However, as part of A105, we are going to introduce the possibility of the subchannel going from READY to either CONNECTING or TRANSIENT_FAILURE, and in those two cases we want pick_first to go back into CONNECTING and start a new happy eyeballs pass.  This PR introduces an experiment that adds that behavior.

While I was at it, I noticed an existing misfeature.  There are two cases where pick_first will go IDLE, which is done by calling [`GoIdle()`](https://github.com/grpc/grpc/blob/24b25a0baa72a658cc37d1db28f77513a9670ea2/src/core/load_balancing/pick_first/pick_first.cc#L610):
1. The case mentioned above, where the selected subchannel goes from READY to IDLE (`GoIdle()` is called from [`SubchannelState::OnConnectivityStateChange()`](https://github.com/grpc/grpc/blob/24b25a0baa72a658cc37d1db28f77513a9670ea2/src/core/load_balancing/pick_first/pick_first.cc#L784)).
2. The case where pick_first already has a selected subchannel and receives a new address list, but none of the subchannels in the new list report READY.  In this case, pick_first knows that the currently selected subchannel is for an address that is not present in the new address list, so it unrefs the selected subchannel and goes IDLE (`GoIdle()` is called from [`SubchannelData::OnConnectivityStateChange()`](https://github.com/grpc/grpc/blob/24b25a0baa72a658cc37d1db28f77513a9670ea2/src/core/load_balancing/pick_first/pick_first.cc#L859)).

The code in `GoIdle()` currently requests a re-resolution, which is the right behavior for case 1.  However, it doesn't really make sense to do this for case 2, since we have just received a fresh resolver update in that case.  Therefore, as part of this experiment, I am moving the code that triggers the re-resolution out of `GoIdle()` and directly into `SubchannelState::OnConnectivityStateChange()`, where it will occur only for case 1.

Closes #41029

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41029 from markdroth:pick_first_ready_to_connecting fdb6ef68e3a73e0035520149b72a1d21775354c3
PiperOrigin-RevId: 840830927

[PH2][Refactor]
The Pausing and Restarting of the ReadLoop happens in a separate class.
We could generalize and re-use this mechanism elsewhere, but that is a task for later.

PiperOrigin-RevId: 840773537

[PH2] Build changes (#41194)

Closes #41194

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41194 from ac-patel:test10 b5c81d5c0a7c3497c898a86717ca197613d39459
PiperOrigin-RevId: 840644590

[PH2][ChannelArg] Adding support for GRPC_ARG_HTTP2_INITIAL_SEQUENCE_NUMBER. This CL also modifies the error message returned when the last stream is closed and the transport cannot create any new streams.

PiperOrigin-RevId: 840601223

[PH2][CallV3] Fix off by one error in client call

The current while condition always skips the first pending batch which can cause some ops to never poll.

PiperOrigin-RevId: 840571167

Adding layering_check and parse_headers in each bazel src/python build file

PiperOrigin-RevId: 840512315

Extend pipelined_read_secure_endpoint experiment.

PiperOrigin-RevId: 840339485

Add a mark to collect actual start timestamp and fix flow end JSON formatting.

PiperOrigin-RevId: 840339101

[PH2][Settings][Refactor]
1. Moved on_receive_settings callback logic into SettingsPromiseManager.
2. Stall reads until the first peer settings are processed.
3. Encapsulated security frame settings logic within SettingsPromiseManager.

PiperOrigin-RevId: 840235504

[PH2][Bug] Fix call to `BeginCloseStream` from `HandleError`.

`HandleError` is called from a transport promise when some stream/connection error is encountered. Hence when a stream trailing metadata is passed to the call stack, it MUST be passed with a cancelled status.

PiperOrigin-RevId: 840113911

[PH2][E2E] Enable the following E2E tests:
1. CoreLargeSendTests.Payload

PiperOrigin-RevId: 840073255

Remove the max_age_filter_float_to_top experiment since it has been rolled out for a while

PiperOrigin-RevId: 839848318

[PH2] Misc items
1. Move `SourceConstructed` to after the party is instantiated.
2. Update TODOs and comments.
3. Add debug info where mark (@roth) had left a TODO.
4. Rename GetActiveStreamCount to GetActiveStreamCountLocked

PiperOrigin-RevId: 839746883

[PH2][E2E] Remove the max limit of a single gRPC message accepted by the transport.

PiperOrigin-RevId: 839662093

[PH2][ChannelArgs] Refactor reading channel args

PiperOrigin-RevId: 839628262

Chaotic Good: Verify Peer in Chaotic Good Handshake during Data Endpoint creation

Since Chaotic Good enables using a group of TCP connections as a composite channel we need to ensure that all TCP connections are established with the same peer. In this change, we store a Ref to the `grpc_auth_context` of the Connection that created the Control Endpoint and compare it to the `grpc_auth_context` of the Connection requesting each Data Endpoint using the [Injectable Peer Comparison API](https://github.com/grpc/grpc/pull/39610). If no peer comparison API is installed, the identity verification will not be performed.

The updated Chaotic Good handshake is as follows: (changed steps are in **bolded**)

First the control channel is established:
   1. ALTS/TLS/LOAS/PSP: Each new TCP connection goes through the “normal” security handshakes for gRPC, checking certificates, establishing identity
   2. A Chaotic Good Settings frame is sent from the client, with data_channel == 0
   3. The server processes the received Settings frame, creates N pending data connections, and responds with a Settings frame with a randomly generated set of connection ids: 1 per requested data connection. **The created PendingDataConnections hold a reference to the Control Channel’s grpc_auth_context.**
   4. The client processes the received Settings frame and creates one data connection per received connection_id.

For each data channel requested:
   1. The TCP connection proceeds as usual (same as 1 above)
   2. The Settings frame sent will relay the connection_id for this data channel, with data_channel == 1
   3. The server responds with a Settings frame with data_channel == 1.
   4. **Finally, server looks up the association for this connection_id and verifies the equivalence of the current connection’s grpc_auth_context and the stored grpc_auth_context of the control channel.**
      - **If lookup is successful and peer is equivalent, we bind the connection with that chaotic good channel.**
      - **Else, we abort the connection.**

PiperOrigin-RevId: 839573243

[PH2][Trivial] Enable Cancel and Deadline suite
The flake has been fixed.

PiperOrigin-RevId: 839564751

Track allocations in tsi_zero_copy_grpc_protector towards ResourceQuota.

This change introduces a `set_allocator` method to the `tsi_zero_copy_grpc_protector` vtable and API. The ALTS zero-copy frame protector implementation is updated to use a provided allocator callback (`tsi_zero_copy_grpc_protector_allocator_cb`) for allocating protected and unprotected slices, falling back to `GRPC_SLICE_MALLOC` if no custom allocator is set.

PiperOrigin-RevId: 839519073

[PH2][E2E] Fix a race condition in stream_data_queue

The `stream_id_` is currently accessed both in Enqueue and Dequeue operations resulting in the race. Technically, in the Enqueue flow `stream_id_` is only used for logs which is redundant and hence being removed.

PiperOrigin-RevId: 839517493

Automated Code Change

PiperOrigin-RevId: 839268035

[PH2][Settings][Refactor] Step 3.3

1. Removes unused includes of http2_settings_manager.h
2. Moves settings ACK handling into SettingsPromiseManager from Http2SettingsManager
3. Deletes `MaybeSendAck` related tests from http2_settings_test.cc
4. Moved tests as-is into settings_timeout_manager_test.cc from http2_transport_test.cc

PiperOrigin-RevId: 839247137

Include GlobalCollectionScope in StatsPluginGroup::GetCollectionScope.

Also adds a requirement that the Collection Scope returned by StatsPlugin::GetCollectionScope is a Root Scope (i.e. has no parents). This is to avoid Diamond structures in the DAG (doesn't fix the problem entirely but is a good failsafe for now).

PiperOrigin-RevId: 839223367

[PH2][Experiment] Enable `sleep_use_non_owning_waker` (#41165)

Enable `sleep_use_non_owning_waker`

Closes #41165

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41165 from ac-patel:experiment1 d646a1d0c002fd2c85891adf6b83c0c2a2e554a9
PiperOrigin-RevId: 839219556

[PH2][Settings][Refactor] Step 3.2 : Consolidating related functions

| Merged functions | Final Function |
|---|---|
| `OnSettingsReceived` + `BufferPeerSettings` | `BufferPeerSettings` |
| `AckLastSend` + `OnSettingsAckReceived` | `OnSettingsAckReceived` |
| `ApplyIncomingSettings` + `TakeBufferedPeerSettings` | `ApplyBufferedPeerSettings` |

PiperOrigin-RevId: 839192044

[PH2][E2E] Enable the following tests:

1. CoreEnd2End.MaxMessageLength
2. Http2Tests.MaxMessageLength

PiperOrigin-RevId: 839155004

[PH2][Settings][Refactor] Step 3.1
This CL refactors HTTP/2 settings ACK handling by moving the did_previous_settings_promise_resolve_ flag from Http2SettingsManager to Http2SettingsPromiseManager. did_previous_settings_promise_resolve_ is now fully managed by Http2SettingsPromiseManager so other classes don't need to check it or set it.

PiperOrigin-RevId: 839129676

[PH2][Settings][Refactor] Step 2.2 Consolidate settings management

Step 2.2
Move object of Http2SettingsManager class into SettingsPromiseManager and the Http2ClientTransport will use Http2SettingsManager via SettingsPromiseManager

PiperOrigin-RevId: 839076679

[PH2][Bug][Stream]
1. Fixes a bug by preventing DATA frame processing on streams that have not yet received initial metadata.
2. Minor refactoring of existing code.

PiperOrigin-RevId: 839070320

Define TCP Metrics Domain

PiperOrigin-RevId: 839039889

Update Sleuth version.

PiperOrigin-RevId: 838844425

[Cleanup] Remove workaround Apple CFStream bug from e2e tests (#41121)



Closes #41121

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41121 from pawbhard:temp_check 185a3d8cc4f617d6df66d4f1adc738ef9a4b13f6
PiperOrigin-RevId: 838668301

[PH2][Trivial][TODO]

PiperOrigin-RevId: 838616732

Adding layering_check and parse_headers in android bazel build file

PiperOrigin-RevId: 838579798

[PH2][Settings][Refactor] Move MaybeGetSettingsAndSettingsAckFrames
Make MaybeGetSettingsAndSettingsAckFrames a data member of class SettingsPromiseManager.

PiperOrigin-RevId: 838146703

Optionalize linking postmortem library entirely

PiperOrigin-RevId: 837871164

[PH2][Settings][Refactor] Step 4 : Rename
Step 1 : https://github.com/grpc/grpc/pull/41103
Step 2, 3 : WIP
Step 4 : (This PR)
Rename variables and functions to ensure that the common confusion between SENT and RECEIVED settings is not there. The current structure and naming makes it hard to differentiate. We really have wasted a LOT of time here.

PiperOrigin-RevId: 837785968

[build] Test fix (#41146)



Closes #41146

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41146 from ctiller:sn ef53a47393f05823090781d9bcaa9185d465e68b
PiperOrigin-RevId: 837767346

[PH2][Trivial] Disable some tests

PiperOrigin-RevId: 837698567

Add call inspection to channelz

Add a new config to enable active call inspection with channelz, disabled by default. Plumb through promise_based_filter, call-v3.

PiperOrigin-RevId: 837614415

[Python] Disable layering check in grpc_tools:protoc_lib (#41142)

Python Bazel tests have been failing since yesterday after layering check was enabled in grpcio_tools build in commit: https://github.com/grpc/grpc/commit/756389e9e75ba93d7316ef9eae2ca83126ad9f94

Temporarily disabling it after discussing IRL with @rishesh007

Closes #41142

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41142 from sreenithi:temp_fix_python_bazel_test 751c420bf3a27066d6cdd912e0e08e9c0acaebb8
PiperOrigin-RevId: 837494537

[PH2][Trivial] Disabled cancel suite and remove logging

PiperOrigin-RevId: 837447558

[PH2][Settings][Refactor]

Initial Design :
```
class Http2ClientTransport {
    private :

PendingIncomingSettings object1;
SettingsTimeoutManager object2;
Http2SettingsManager object3;

    public :

void TypicalTransportFunction(){
    ... other non-settings work ...
object1.DetailedWork1();
object2.DetailedWork2();
object3.DetailedWork3();
    ... other non-settings work ...
}
};
```

New Design

```
class Http2ClientTransport{
SettingsPromiseManager settings_manager_;

void TypicalTransportFunction(){
    ... other non-settings work ...
settings_manager_.SomeWork();
    ... other non-settings work ...
}
};

class SettingsPromiseManager{
Http2SettingsManager settings_;

void SomeWork(){
DetailedWork1();
DetailedWork2();
settings_.DetailedWork3();
}

private :
DetailedWork1();
DetailedWork2();
}
```

Refactor Step 1
1. Merge class `SettingsTimeoutManager` and `PendingIncomingSettings` into a new class named `SettingsPromiseManager`
2. Replace usage of `PendingIncomingSettings` and `SettingsTimeoutManager` with usage of `SettingsPromiseManager`
3. Replace `pending_incoming_settings_` with `transport_settings_`

Future Steps
1. Step 2 : Move object of `Http2SettingsManager` class into `SettingsPromiseManager` and the `Http2ClientTransport` will use  `Http2SettingsManager` via `SettingsPromiseManager`
2. Step 3 : Earlier the `Http2ClientTransport` class had interactions between `Http2SettingsManager` `SettingsTimeoutManager` and `PendingIncomingSettings` in the transport. Move this into our new `SettingsPromiseManager` class. This will make the transport lean. This PR will need careful review to the business logic. This will also make multiple permutations of settings very easily testable and debuggable.
3. Step 4 : Rename variables and functions to ensure that the common confusion between SENT and RECEIVED settings is not there. The current structure and naming makes it hard to differentiate. We really have wasted a LOT of time here.
4. Step 5 : Write unit tests for `SettingsPromiseManager` class, modelling scenarios similar to how the transport will be using the settings. Also add missing tests to `Http2SettingsManager` if needed.

PiperOrigin-RevId: 837359318

[PH2][E2E] E2E . Multiple Changes
1. Enable logging for 2 flaking HPack tests
2. Writing a new function which will enable logging for PH2 for flaking tests
3. Splitting the CANCEL and DEADLINE test suites so that these can be switched on and off separately.

PiperOrigin-RevId: 837349917

[Ph2][E2E] Logs to debug a flake

PiperOrigin-RevId: 837347558

remove default_applicable_licenses to tools/codegen BUILD files.

PiperOrigin-RevId: 837063317

Adding layering_check and parse_headers in each bazel codegen build file

PiperOrigin-RevId: 837002393

[PH2][E2E] Add logs to debug a flake

PiperOrigin-RevId: 836947917

Adding layering_check and parse_headers in each bazel distrib python build file

PiperOrigin-RevId: 836934818

[PH2][Trivial] Enabling cancel test suite

PiperOrigin-RevId: 836893540

[PH2][E2E] Fix channelZ AddData race with transport deletion.

This CL moves `SourceDestructing` from the destructor to `Orphan`. It is possible that `AddData` call tries to take a ref on the transport while the transport is being destructed (before `SourceDestructing` is invoked). Calling `SourceDestructing` from `Orphan` ensures that `AddData` is not called after dropping the external transport ref.

PiperOrigin-RevId: 836886988

[PH2][Settings] Multiple changes
1. Complete the ProcessHttp2SettingsFrame function
2. Applying the incoming settings in the MultiplexerLoop and sending an ACK for incoming settings
3. Managing initial window size settings for acked settings (this was missed in previous PR).
4. Decoupling ApplyIncomingSettings from OnSettingsReceived

PiperOrigin-RevId: 836876453

[PH2][Bug] Move transport loop spawning out of the constructor

Spawning transport loops from the Http2ClientTransport constructor creates a race condition. An initialization error can trigger a shutdown, causing the transport to be destroyed from within its own constructor.

This CL moves the loop-spawning logic to a new public method, SpawnTransportLoops(). The Chtttp2Connector now calls this method after the transport is fully constructed. This ensures a clean separation between object construction and the start of asynchronous operations, preventing premature closure and potential bugs.

PiperOrigin-RevId: 836663966

Add support to export Instrument -> OpenTelemetry UpDownCounters.

PiperOrigin-RevId: 836631446

[PH2][Settings] Making the Settings class Ref Counted

PiperOrigin-RevId: 836568012

Adding layering_check and parse_headers in each bazel build file

PiperOrigin-RevId: 836547244

[PH2][Trivial][Log] Adding logs to connector code

PiperOrigin-RevId: 836516794

[Resource Quota] Initiate experiment to track allocations under zero copy in Resource … (#41109)

Closes #41109

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41109 from siddharthnohria:exp 7878b3cf84635b3923d7df5fc525d4d92410ea37
PiperOrigin-RevId: 836492146

Add missing dependency to stream_quota_test.

PiperOrigin-RevId: 836458457

[PH2][Trivial] Disable flakes

PiperOrigin-RevId: 836157874

[PH2][E2E] Fix race condition in Http2ClientTransport when pushing server trailing metadata.

PiperOrigin-RevId: 836040628

[PH2][Bug] Was tracing in the wrong place.

PiperOrigin-RevId: 835499454

[PH2][Settings] MaybeSpawnWaitForSettingsTimeout

This PR takes care of
1. Sending a SETTINGS frame to the peer.
2. Starting a timer to wait for the ACK
3. Processing the SETTINGS ACK received from the peer.

This does NOT include sending a SETTINGS ACK or processing a received SETTING frame.

Changes :
1. Renamed functions MarkPeerSettingsResolved to MarkPeerSettingsPromiseResolved. And renamed SpawnWaitForSettingsTimeout to MaybeSpawnWaitForSettingsTimeout
2. Moved all functions to the cc file
3. Added an if check to MaybeSpawnWaitForSettingsTimeout to prevent incorrect spawning when no settings has been sent.
4. Some plumbing.
PiperOrigin-RevId: 835120786

[PH2][E2E] Enable logs to debug a flake

PiperOrigin-RevId: 834980764

[PH2] Handle unknown stream IDs. This CL addresses the following:
1. On getting a HEADER/CONTINUATION/DATA/Window Update frame with a stream ID that is not expected will now be treated as a connection error based on the RFC.

PiperOrigin-RevId: 834636478

[PH2][Refactor] Moving repeated code into a function in HTTP2Status file

PiperOrigin-RevId: 834627981

Update dart interop Dockerfile (#41090)



Closes #41090

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41090 from athomas:patch-1 b691ea897456c182a4d0385b8b46fb85c24c0928
PiperOrigin-RevId: 834474588

[PH2][Common][Refactor] IncomingMetadataTracker
1. Moving out incoming header state and management into class IncomingMetadataTracker
2. Fixing bug in CloseStream. The state should not be altered in this case .
3. Two parameters to function ParseAndDiscardHeaders were actually data members. So I removed them. ParseAndDiscardHeaders will access the data members directly.
4. Fixing clangs issues.
5. Moving helpers from header_assembler_test into the common test class.

PiperOrigin-RevId: 834216422

Set security protocol type in AuthContext.

The Injectable Peer Comparison API added in https://github.com/grpc/grpc/pull/39610 uses the `protocol_` field of the `grpc_auth_context` to 1) Lookup the registered comparators, and 2) Perform an initial comparison to ensure that the two compared auth contexts have the same protocol. However, this field is currently unset for all types of credentials.

This change populates the `protocol` field in `grpc_auth_context` with the name of the security connector type after the peer check in the security handshaker. E2E Tests are updated to verify that the `AuthContext` contains the correct protocol type.

PiperOrigin-RevId: 834155283

Keep sleuth binary target name

PiperOrigin-RevId: 833982212

Introduce UpDownCounter instrument type.

This change adds a new instrument type, `UpDownCounter`, to the gRPC telemetry system. Unlike a standard `Counter`, an `UpDownCounter` can be incremented and decremented. It can be thought of as a UInt Gauge that is stored rather than being queried when the MetricsQuery is run.

PiperOrigin-RevId: 833686033

[benchmark]

PiperOrigin-RevId: 833652476

[subchannel connector] pass initial MAX_CONCURRENT_STREAMS value from connector (#41064)

This is needed for A105 (https://github.com/grpc/proposal/pull/516).

The subchannel will wind up getting the transport's MAX_CONCURRENT_STREAMS value via the new StateWatcher API that I added in #40952. However, because the subchannel does not start that watch until after it has a connection and reports READY to the LB policy, this means that RPCs can start on the subchannel before the subchannel knows the transport's MAX_CONCURRENT_STREAMS value. This can cause us to incorrectly scale up the number of connections when we shouldn't.

To avoid that race, this PR changes the notify_on_receive_settings hook to pass the transport's initial MAX_CONCURRENT_STREAMS value back to the connector, which in turn passes it back to the subchannel. This will allow the subchannel to use that initial value when dispatching RPCs until it receives the first notification from the StateWatcher.

I would ideally like to completely remove this bespoke notify_on_receive_settings hook and instead have the connector use the new StateWatcher API, but that would require a bit more refactoring work than I want to do right now.

Closes #41064

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41064 from markdroth:transport_initial_max_concurrent_streams 41a93a7629e97e2828df3f3c087ebb7cba43e043
PiperOrigin-RevId: 833631154

fix: Add missing includes in native_windows_dns_resolver (#41068)

This adds a couple missing includes and build dependencies

Closes #41068

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41068 from laramiel:laramiel-patch-2 ed2dbb17a0a394189603435257ba913701c720bf
PiperOrigin-RevId: 833511045

[zviz] Expand pagination api to include the first next id

PiperOrigin-RevId: 833375193

Server Wide Max Outstanding Streams: Add Build changes (#41076)

Allow servers to set max outstanding streams limit per server. This pull request only adds the BUILD changes required for this. The core logic will follow in a later PR.

Closes #41076

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41076 from siddharthnohria:max_outstanding_streams 392d962fc78be66c075952977bc3a28f2298b7ce
PiperOrigin-RevId: 833196338

Obtain a DomainStorage Ref while running MetricsQuery

In the current implementation, the DomainStorage can be deleted when the query is running. This leads to NPEs.

PiperOrigin-RevId: 833177932

[Call] Skip clearing peer string during call cancellation (#41062)

Potential fix for b/428185413

Closes #41062

PiperOrigin-RevId: 832462330

[Bazel] Fixes the issue of copt cache discarded between test and build (#41038)

This solves the issue with with copt cache dropped between `bazel build` and `bazel test`:

```
WARNING: Build option --copt has changed, discarding analysis cache.
```

This issue was introduced in #39945, which added `--copt=-DGRPC_POSTMORTEM_CHECKS` unconditionally to all `basel test`, but not `build`:

https://github.com/grpc/grpc/blob/1d6841f7d83881a7cfe63b25219d81de2abf8653/tools/bazel.rc#L155-L156

This PR moves the macro to a separate bazel profile config called `postmortem`, which is not enabled by default.
Instead, this config will be enabled in all remote CIs via tools/remote_build/include/test_config_common.bazelrc: https://github.com/grpc/grpc/blob/ba4984e8a0d21270a6cfc0481efd2de1595601d9/tools/remote_build/include/test_config_common.bazelrc#L26-L27

For the list of affected CI jobs, see my comment on this PR.

Closes #41038

PiperOrigin-RevId: 832339240

[ruby] Fix version comparison for the ruby_abi_version symbol for ruby 4 compatibility (#41061)

The next version of Ruby will be 4.0.0. Previously, development versions didn't load properly due to grpc.so not exporting the ruby_abi_version symbol. Correct the version comparison logic so we export the symbol on version 4.0.

Closes #41061

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41061 from zarinn3pal:fix/ruby-abi-version d01b2cf13be16a83f81aaa36319d04899df457eb
PiperOrigin-RevId: 832265298

[PH2][BUILD] Adding dependencies (#41070)

[PH2][BUILD] Adding dependencies

Closes #41070

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41070 from tanvi-jagtap:new_file_010 8a0630e5c7ab89583440f4c87f8b91aac8c24ec9
PiperOrigin-RevId: 832181048

[PH2][BUILD] Adding new file for IncomingMetadataTracker (#41058)

[PH2][BUILD] Adding new file for IncomingMetadataTracker

Closes #41058

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/41058 from tanvi-jagtap:new_files 6c0da18b7c8980d6ab133cb8e543e6ad13e5d69d
PiperOrigin-RevId: 832130354

[transport] add new watcher API to be used by subchannel (#40952)

This adds a new transport state watcher API.  The normal connectivity state watcher API is not what we really want in the transport, since we don't expect to see any state-change event except for disconnection, and when that happens, we want to see a lot more info about the disconnection than is available via a connectivity state watch (see [gRFC A94](https://github.com/grpc/proposal/blob/master/A94-subchannel-otel-metrics.md)).  In addition, we also need to get reports of the peer's MAX_CONCURRENT_STREAMS setting as part of implementing connection scaling (see WIP [gRFC A105](https://github.com/grpc/proposal/pull/516)).

This new API goes directly from the subchannel to the transport, bypassing the filter stack.  This is consistent with our desire to remove the transport op API in the filter stack as part the promise migration.

Eventually, this API should be used on the server side too, but that's a project for another day.

As part of this, we also change the way that keepalive data is sent from the subchannel to the channel.  This will also be needed as part of A105, where we need to propagate keepalive info to the channel even when the subchannel's connectivity state does not change.

Closes #40952

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/40952 from markdroth:transport_state_watcher 2dca54560718ffe82e542e58f4de18164087f891
PiperOrigin-RevId: 832072178

Refactor `proto_reflection_descriptor_database` in util to use `absl::flat_hash_map` and `absl::flat_hash_set` instead of `std::unordered_map` and `std::unordered_set` for potential performance improvements. This also involves including the necessary absl headers and updating the `BUILD` file.

Replace `.insert()` with `.emplace()` for `missing_symbols_` in
`proto_reflection_descriptor_database.cc.`

PiperOrigin-RevId: 831644658

[PH2] Addressing comments from PR:#41012.

PiperOrigin-RevId: 831643529

[github] fix path to generated file (#41054)

I missed this in #38825.

[CI] Add a github workflow for publishing to bcr. (#41027)

Based on https://github.com/bazel-contrib/publish-to-bcr.

---------

Co-authored-by: Sergii Tkachenko <sergiitk@google.com>