This patch adds the following intrinsics (all __arm_streaming only) along with
asm tests for them under the +sme2+fp8 flags:
- svfloat16x2_t svcvt1_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm)
- svfloat16x2_t svcvt2_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm)
- svfloat16x2_t svcvt1_bf16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm)
- svfloat16x2_t svcvt2_bf16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm)
- svfloat16x2_t svcvtl1_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm)
- svfloat16x2_t svcvtl2_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm)
- svfloat16x2_t svcvtl1_bf16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm)
- svfloat16x2_t svcvtl2_bf16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm)
gcc/
* config/aarch64/aarch64-sve-builtins-sve2.cc (svcvtl1, svcvtl2): Added
new FUNTIONs.
* config/aarch64/aarch64-sve-builtins-sve2.def
(svcvt1, svcvt2, svcvtl1, svcvtl2): Added new DEF_SVE_FUNCTION_GS_FPM.
* config/aarch64/aarch64-sve-builtins-sve2.h (svcvtl1, svcvtl2): Added
new function_base.
* config/aarch64/aarch64-sve-builtins.cc
(function_resolver::resolve_unary): use group_suffix_id when resolving
C overloads.
* config/aarch64/aarch64-sve2.md
(@aarch64_sve2_fp8_cvt_<fp8_cvt_uns_op><mode>): Added new define_insn.
* config/aarch64/aarch64.h (TARGET_SSME2_FP8): Added new define.
* config/aarch64/iterators.md
(UNSPEC_F1CVTL. UNSPEC_F2CVTL): Added new unspecs.
(FP8CVT_UNS): Extended int_iterator.
(fp8_cvt_uns_op): Likewise.
gcc/testsuite/
* g++.target/aarch64/sme2/aarch64-sme2-acle-asm.exp: Use tuning flag
to reduce churn in testsuites.
* gcc.target/aarch64/sme2/aarch64-sme2-acle-asm.exp: Likewise.
* gcc.target/aarch64/sme2/acle-asm/cvt_mf8_x2.c: Added test file.
* gcc.target/aarch64/sme2/acle-asm/cvtl_mf8_x2.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h (TEST_X2_WIDE): Added
fpm0 argument for intrinsics.
In a GCC configuration with both AMD and NVIDIA GPU code offloading supported,
and the selected AMD GPU code generation not supporting USM, but an USM-capable
NVIDIA GPU available, I see all test cases that require effective-target
'omp_usm' turn UNSUPPORTED, because:
Executing on host: gcc usm_available_2778376.c [...]
[...]
In function 'main._omp_fn.0':
lto1: warning: Unified Shared Memory is required, but XNACK is disabled
lto1: note: Try -foffload-options=-mxnack=any
gcn mkoffload: warning: conflicting settings; XNACK is forced off but Unified Shared Memory is required
UNSUPPORTED: [...]
That warning is, however, not relevant in the scenario described above: we're
not going to exercise AMD GPU code offloading at run time.
With the effective-target 'omp_usm' check robustified like this, the affected
test cases are then no longer UNSUPPORTED, but of course, there's then the
corollary issue that compilation of the test case itself now emits the very
same warning, which results in the "test for excess errors" FAILing, despite
the execution test PASSing, for example:
FAIL: libgomp.c++/target-std__valarray-concurrent-usm.C (test for excess errors)
PASS: libgomp.c++/target-std__valarray-concurrent-usm.C execution test
That's clearly not ideal either (but is representative of what real-world usage
would run into), but is certainly better than the whole test case turning
UNSUPPORTED. To be continued, I guess...
Andrew Pinski [Tue, 23 Dec 2025 21:30:00 +0000 (13:30 -0800)]
ifcvt: Move noce_try_cond_zero_arith last
I noticed that on x86_64 and aarch64, noce_try_cond_zero_arith
would produce worse code than noce_try_cmove_arith.
So we should do noce_try_cond_zero_arith last instead
of before noce_try_cmove_arith.
Pushed as obvious after bootstrap/test on x86_64-linux-gnu.
Also checked to make sure riscv testcases still work.
gcc/ChangeLog:
* ifcvt.cc (noce_process_if_block): Move noce_try_cond_zero_arith
last.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Andrew Pinski [Tue, 23 Dec 2025 21:04:28 +0000 (13:04 -0800)]
ifcvt: Only allow scalar integral modes for noce_try_cond_zero_arith [PR123276]
This is the simple fix for PR 123276 where this code can only handle scalar
integral modes. We could in theory handle scalar floating point modes here
too but it is not worth the trouble.
Pushed as obvious after bootstrap/test on x86_64-linux-gnu.
PR rtl-optimization/123276
gcc/ChangeLog:
* ifcvt.cc (noce_try_cond_zero_arith): Reject non-scalar integral modes.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Nathaniel Shead [Sun, 7 Dec 2025 12:17:15 +0000 (23:17 +1100)]
c++: Non-inline temploid friends should still be COMDAT [PR122819]
Modules allow temploid friends to no longer be implicitly inline, as
functions defined in a class body will not be implicitly inline if
attached to a named module.
This requires us to clean up linkage handling a little bit, mostly by
replacing usages of 'DECL_TEMPLATE_INSTANTIATION' with
'DECL_TEMPLOID_INSTANTIATION' when determining if an entity has vague
linkage.
This caused the friend88.C testcase to miscompile however, as 'foo' was
incorrectly having 'DECL_FRIEND_PSEUDO_TEMPLATE_INSTANTIATION' getting
set because it was keeping its tinfo.
This is because 'non_templated_friend_p' was returning 'false', since
the function didn't have a primary template. But that's expected I
think here, so fixed by also returning true for friend declarations
pushed into namespace scope, which still allows dependent nested friends
to be considered templated.
PR c++/122819
gcc/cp/ChangeLog:
* decl.cc (start_preparsed_function): Use
DECL_TEMPLOID_INSTANTIATION instead of
DECL_TEMPLATE_INSTANTIATION to check vague linkage.
* decl2.cc (vague_linkage_p): Likewise.
(c_parse_final_cleanups): Simplify condition.
* pt.cc (non_templated_friend_p): Namespace-scope friend
function declarations without a primary template are still
non-templated.
* semantics.cc (expand_or_defer_fn_1): Also check for temploid
friend functions.
gcc/testsuite/ChangeLog:
* g++.dg/modules/tpl-friend-22.C: New test.
* g++.dg/template/friend88.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
* a68.h (a68_file_size): Changed to use file descriptor.
(a68_file_read): Likewise.
* a68-parser-scanner.cc (a68_file_size): Likewise.
(a68_file_read): Likewise.
(read_source_file): Adapt `a68_file_{size,read}'.
(include_files): Likewise.
* a68-lang.cc (a68_handle_option): Likewise.
* a68-imports.cc (a68_find_export_data): Implement
reading from module's .m68 file if available.
gcc/testsuite/ChangeLog
* algol68/compile/modules/compile.exp (dg-data): New procedure
for writing binary test data to disk.
* algol68/compile/modules/program-m68-lp64.a68: New test which
embeds binary module data.
* algol68/compile/modules/program-m68-llp64.a68: Likewise.
* algol68/compile/modules/program-m68-ilp32.a68: Likewise.
* algol68/compile/modules/program-m68-lp64-be.a68: Likewise.
* algol68/compile/modules/program-m68-llp64-be.a68: Likewise.
Jeff Law [Tue, 23 Dec 2025 20:25:47 +0000 (13:25 -0700)]
[committed][RISC-V][PR target/123274] Add missing condition in usmul<mode>3 pattern
As Andrew P. noted in the BZ, the expander is missing elements in its condition
leading to generation of an insn that can't be matched.
This adds the necessary condition to the usmul<mode>3 expander which in turn
fixes the ICE. I just checked and that expander wansn't in gcc-15, so this is
just a gcc-16 issue.
Tested on riscv32-elf and riscv64-elf. I have a bootstrap in flight on the
Pioneer, but I'm not expecting any surprises. Much like the patch earlier
today, I'm going to push this now rather than wait for pre-commit CI.
Jeff Law [Tue, 23 Dec 2025 19:34:44 +0000 (12:34 -0700)]
[RISC-V][PR target/123278] Handle BF/HF modes in Andes 45 series pipeline description
So a standard run-of-the-mill case where we're testing modes to determine what
reservation to use in a pipeline model and modes were missing (BF/HF in this
case).
This adds the BF/HF cases to the fp_alu_s, fpu_mul_s and fpu_mac_s units for
the Andes 45 series. It may ultimately be the case that even lower latencies
are available for these ops, but that's something folks with a better
understanding of the Andes 45 series uarch would need to tackle.
Tested on riscv32-elf and riscv64-elf. Given the nature of the change and the
fact that I expect to be out of the office most of the next few days, I'm going
to go ahead and push without waiting for pre-commit CI. There's minimal risk.
Milan Tripkovic [Tue, 23 Dec 2025 16:39:41 +0000 (09:39 -0700)]
[RISC-V][PATCH] Adjust clmul latency in Spacemit X60 scheduler model
This patch adjusts the instruction scheduling and cost model for the Zbc
(CLMUL) extension on the Spacemit X60 core.
The tuning was evaluated using three configurations (CLMUL2, CLMUL3,
and the baseline CLMUL5) across a variety of hashing and encryption kernels.
Yuao Ma [Tue, 23 Dec 2025 14:54:34 +0000 (22:54 +0800)]
c++: clarify the comment regarding where the default dialect is set
Since r6-7026-g268be88cbeaba7, the default dialect has been set in
c_common_init_options rather than c_common_post_options. This patch updates the
corresponding comment to reflect that change.
Egas Ribeiro [Fri, 19 Dec 2025 21:34:55 +0000 (21:34 +0000)]
c++: Fix member-like friend detection for non-template classes [PR122550]
member_like_constrained_friend_p was incorrectly returning true for
constrained friend function templates declared in non-template classes,
causing them to be treated as distinct from their forward declarations.
This led to ambiguity errors at call sites.
Per [temp.friend]/9, a constrained friend is only "member-like" (and thus
declares a different function) in two cases:
1. Non-template friends with constraints (must be in a templated class)
2. Template friends whose constraints depend on outer template parameters
In both cases, the enclosing class scope must be templated. The fix adds
a check for CLASSTYPE_IMPLICIT_INSTANTIATION to ensure the friend's
context is actually a class template, not a plain class or explicit
specialization.
PR c++/122550
gcc/cp/ChangeLog:
* decl.cc (member_like_constrained_friend_p): Check that the
friend's enclosing class is an implicit instantiation.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-friend18.C: New test.
* g++.dg/cpp2a/concepts-friend18a.C: New test.
Signed-off-by: Egas Ribeiro <egas.g.ribeiro@gmail.com> Reviewed-by: Patrick Palka <ppalka@redhat.com>
Egas Ribeiro [Mon, 22 Dec 2025 22:30:12 +0000 (22:30 +0000)]
c++: Fix ICE on partial specialization redeclaration with mismatched parameters [PR122958]
When a partial specialization was redeclared with different template
parameters, maybe_new_partial_specialization was incorrectly treating it
as the same specialization by only comparing template argument lists
without comparing template-heads. This caused an ICE when the
redeclaration had different template parameters.
Per [temp.spec.partial.general]/2, two partial specializations declare
the same entity only if they have equivalent template-heads and
template argument lists.
Fix by comparing template parameter lists (template-heads) in addition
to template argument lists when checking for existing specializations,
and removing flag_concepts to provide diagnostics before c++20 for the
testcase.
PR c++/122958
gcc/cp/ChangeLog:
* pt.cc (maybe_new_partial_specialization): Compare template
parameter lists when checking for existing specializations and
remove flag_concepts check.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/partial-spec-redecl.C: New test.
Signed-off-by: Egas Ribeiro <egas.g.ribeiro@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Fix cfg_attr expansion and feature gate attribute handling
Fixes Rust-GCC#4245
gcc/rust/ChangeLog:
* checks/errors/feature/rust-feature-gate.cc (FeatureGate::visit): Added
handling for META_ITEM type attributes to properly process feature gates.
* expand/rust-cfg-strip.cc (expand_cfg_attrs): Fixed a bug where
newly inserted cfg_attr attributes wheren't being reprocessed,
and cleaned up the loop increment logic.
Lucas Ly Ba [Fri, 14 Nov 2025 21:07:00 +0000 (21:07 +0000)]
gccrs: refactor unused var lint
gcc/rust/ChangeLog:
* checks/lints/unused-var/rust-unused-var-checker.cc (UnusedVarChecker::visit):
Change unused name warning to unused variable warning.
* checks/lints/unused-var/rust-unused-var-collector.cc (UnusedVarCollector::visit):
Remove useless methods.
* checks/lints/unused-var/rust-unused-var-collector.h: Same here.
* checks/lints/unused-var/rust-unused-var-context.cc (UnusedVarContext::add_variable):
Add used variables to set.
(UnusedVarContext::mark_used): Remove method.
(UnusedVarContext::is_variable_used):
Check if the set contains the hir id linked to a variable.
(UnusedVarContext::as_string): Refactor method for new set.
* checks/lints/unused-var/rust-unused-var-context.h: Refactor methods.
* lang.opt: Change description for unused check flag.
Ryutaro Okada [Sun, 10 Aug 2025 02:24:56 +0000 (19:24 -0700)]
gccrs: implement unused variable checker on HIR.
This change moves the unused variable checker from the type resolver
to HIR. We can now use the HIR Default Visitor, and it will be much more
easier to implement other unused lints with this change.
Harishankar [Mon, 24 Nov 2025 20:41:33 +0000 (02:11 +0530)]
gccrs: Fix ICE with continue/break/return in while condition
Fixes Rust-GCC/gccrs#3977
The predicate expression must be evaluated before type checking
to ensure side effects occur even when the predicate has never type.
This prevents skipping function calls, panics, or other side effects
in diverging predicates.
* backend/rust-compile-expr.cc (CompileExpr::visit): Always
evaluate predicate expression before checking for never type
to preserve side effects in while loop conditions.
* typecheck/rust-hir-type-check-expr.cc: Update handling of break/continue.
Egas Ribeiro [Fri, 19 Dec 2025 16:58:58 +0000 (16:58 +0000)]
c++: Fix ICE with lambdas combining explicit and implicit template params [PR117518]
When a lambda with explicit template parameters like []<int> also has
implicit template parameters from auto, and is used as a default
template argument, processing_template_parmlist remained set
from the outer template context. This caused
function_being_declared_is_template_p to incorrectly return false,
leading synthesize_implicit_template_parm to create a new template
scope instead of extending the existing one, resulting in a binding
level mismatch and an ICE in poplevel_class.
Fix by clearing processing_template_parmlist in
cp_parser_lambda_expression alongside the other parser state
save/restore operations.
PR c++/117518
gcc/cp/ChangeLog:
* parser.cc (cp_parser_lambda_expression): Clear
processing_template_parmlist when parsing lambda body.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/lambda-targ19.C: New test.
Signed-off-by: Egas Ribeiro <egas.g.ribeiro@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Xi Ruoyao [Thu, 18 Dec 2025 03:39:38 +0000 (11:39 +0800)]
LoongArch: relax the check for --with-tune
Someone (via a WeChat group) reported that --with-arch=la464
--with-tune=la664 had stopped to work after commiting the LA32 support.
While this can be treated as a simple logic error (i.e. we may simply
change "loongarch64" in the case statement to an asterisk), IMO we
should just relax the check: at runtime the "unreasonable" combinations
like "-march=la64v1.0 -mtune=loongarch32" or "-march=la664 -mtune=la464"
is allowed (and the second case has been allowed for a long time), and a
combination of --with-arch=A --with-tune=T should be allowed if -march=A
-mtune=T is allowed at runtime.
Also if we consider the fact that --with-tune= and -mtune= only select a
set of heruistic parameters, such combinations may be not so
unreasonable.
gcc/
* config.gcc: Relax the check for LoongArch with_tune.
Andrew Pinski [Tue, 23 Dec 2025 01:58:35 +0000 (17:58 -0800)]
ifcvt: Fix noce_try_cond_zero_arith after get_base_reg change [PR123267]
A few fixes are needed after the change to get_base_reg of r16-6333-gac64ceb33bf05b. First we need to use the correct target mode
of the operand, this means if we are doing a subreg of QI mode, using
QImode for the conditional move.
Second we also need to use the original operands instead of the ones
removing the subreg still.
Pushed as obvious after a bootstrap/test on x86_64-linux-gnu.
PR rtl-optimization/123267
gcc/ChangeLog:
* ifcvt.cc (noce_try_cond_zero_arith): Pass the original operands
of a instead of the stripped off values. The mode of the operand
which is being used.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr123267-1.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
AutoFDO: Implement summary information in auto-profile
This patch aims to implement summary support in auto-profile, similar to
LLVM. The summary information stores various information about the
profile being read such as the number of functions, the maximum sample
count, the total number of samples and so on.
It also adds a section called the "detailed summary" which contains a
histogram-based calculation of the minimum execution count for a sample
needed to belong to a specific percentile of samples. This is used to
decide the hot count threshold (which can be controlled with a command
line parameter). The default is any sample belonging to the 99th percentile
being marked as hot.
This patch requires the changes from https://github.com/google/autofdo/pull/251
to work correctly.
* auto-profile.cc (string_table::~string_table): Update to free
original_names_map_.
(string_table::original_names_map_): New member.
(string_table::clashing_names_map_): Likewise.
(string_table::get_original_name): New function.
(string_table::read): Figure out clashes while reading.
(autofdo_source_profile::offline_external_functions): Call
get_original_name.
Nathaniel Shead [Thu, 4 Dec 2025 13:03:46 +0000 (00:03 +1100)]
c++/modules: Ignore exposures in lambdas in initializers [PR122994]
As the PR rightly points out, a lambda is not really a declaration in
and of itself by the standard, and so a lambda only used in a context
where exposures are ignored should not itself cause an error.
This patch implements this by way of a new flag set on deps that are
first found in an ignored context. This flag gets cleared if we ever
see the dep in a context where exposures are not ignored. Then, while
walking a declaration with this flag set, we re-establish an ignored
context. This is done for all decls (not just lambdas) to handle
block-scope classes as well.
Additionally, we prevent walking of attached declarations for a
DECL_MODULE_KEYED_DECLS_P entity during dependency gathering, so that we
don't think we've seen the decl at this point. This means we may not
have an appropriate entity to stream for this walk; to prevent any
potential issues with merging we stream a NULL_TREE 'hole' in the vector
and handle this carefully on import.
This requires a small amount of testsuite adjustment because we no
longer diagnose errors we used to. Because our ABI for inline variables
with dynamic initialization is to just do the initialization in the
module's initializer function (and importers only perform the static
initialization) we don't bother to walk the definition of inline
variables containing lambdas and so don't see the exposures, despite
us considering TU-local entities in static initializers of inline
variables being exposures (see PR c++/119551). This is legal by the
current wording of the standard, which does not consider the definition
of any variable to be an exposure (even an inline one).
PR c++/122994
gcc/cp/ChangeLog:
* module.cc (depset::disc_bits): New flag
DB_IGNORED_EXPOSURE_BIT.
(depset::is_ignored_exposure_context): New getter.
(depset::hash::ignore_tu_local): Rename to...
(depset::hash::ignore_exposure): ...this, and make private.
(depset::hash::hash): Rename ignore_tu_local.
(depset::hash::ignore_exposure_if): New function.
(trees_out::decl_value): Don't build deps for keyed entities.
(trees_in::decl_value): Handle missing keys.
(trees_out::write_function_def): Use ignore_exposure_if.
(trees_out::write_var_def): Likewise.
(trees_out::write_class_def): Likewise.
(depset::hash::make_dependency): Set DB_IGNORED_EXPOSURE_BIT if
appropriate, or clear it otherwise.
(depset::hash::add_dependency): Rename ignore_tu_local.
(depset::hash::find_dependencies): Set ignore_exposure if in
such a context.
gcc/testsuite/ChangeLog:
* g++.dg/modules/internal-17_b.C: Use functions and internal
types rather than lambdas.
* g++.dg/modules/internal-4_b.C: Correct expected result.
* g++.dg/modules/internal-20_a.C: New test.
* g++.dg/modules/internal-20_b.C: New test.
* g++.dg/modules/internal-20_c.C: New test.
* g++.dg/modules/internal-21_a.C: New test.
* g++.dg/modules/internal-21_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Steve Kargl [Mon, 22 Dec 2025 02:32:46 +0000 (18:32 -0800)]
fortran [PR122957] DTIO incompatibility with -fdefault-interger-8
The -fdefault-integer-8 option is optional to assist with legacy
fortran codes. It is not a Standard requirement and is not
compatible with the newer user defined derived type I/O.
PR fortran/122957
gcc/fortran/ChangeLog:
* interface.cc (gfc_match_generic_spec): Issue an error
so that users do not use -fdefault-integer-8 with DTIO.
Harald Anlauf [Mon, 22 Dec 2025 20:05:29 +0000 (21:05 +0100)]
Fortran: fix variable definition context checks for SELECT TYPE [PR123253]
Commit r16-6300 introduced a regression when checking the variable
definition context of SELECT TYPE variables where the selector was not a
dummy argument as the scan for the association target was too shallow.
Scan through association lists for the ultimate selector.
PR fortran/123253
gcc/fortran/ChangeLog:
* expr.cc (gfc_check_vardef_context): Replace simple check by a
scan through the association targets for a dummy argument.
gcc/testsuite/ChangeLog:
* gfortran.dg/associate_76.f90: Extended testcase.
* gfortran.dg/associate_77.f90: New test.
Tomasz Kamiński [Mon, 22 Dec 2025 10:53:45 +0000 (11:53 +0100)]
libstdc++/doc: Document generate_canonical and variant compat macros.
The _GLIBCXX_USE_OLD_GENERATE_CANONICAL was introduced by r16-6177-g866bc8a9214b1d that implemented P0952R2 [1] resolution
for LWG2524 as DR against C++20.
The _GLIBCXX_USE_VARIANT_CXX17_OLD_ABI was introduced by r16-6301-gb3c167b61fd75f that resovled PR112591.
Eric Botcazou [Mon, 22 Dec 2025 17:50:59 +0000 (18:50 +0100)]
Ada: Fix bogus component visibility error for class-wide type in generic
The problem is that Analyze_Overloaded_Selected_Component does:
-- If the prefix is a class-wide type, the visible components
-- are those of the base type.
if Is_Class_Wide_Type (T) then
T := Etype (T);
end if;
and Resolve_Selected_Component does:
-- The visible components of a class-wide type are those of
-- the root type.
if Is_Class_Wide_Type (T) then
T := Etype (T);
end if;
while Analyze_Selected_Component does:
-- For class-wide types, use the entity list of the root type
if Is_Class_Wide_Type (Prefix_Type) then
Type_To_Use := Root_Type (Prefix_Type);
end if;
when faced with a selected component. So the 3rd goes to the root type, the
1st to the base type, and the 2nd wants to do like the 3rd but ends up doing
like the 1st! This does not change anything for the class-wide type itself,
but does for its class-wide subtypes. The correct processing is the 3rd.
gcc/ada/
PR ada/123185
* sem_ch4.adb (Analyze_Overloaded_Selected_Component): Go to the
root when the prefix has a class-wide type.
* sem_res.adb (Resolve_Selected_Component): Likewise.
gcc/testsuite/
* gnat.dg/specs/class_wide1.ads: New test.
Jeff Law [Mon, 22 Dec 2025 17:54:05 +0000 (10:54 -0700)]
[RISC-V][V2] Improve spill code for RVV slightly to fix regressions after recent changes
Surya's recent patch for hard register propagation has caused regressions on
the RISC-V port for the various spill-* testcases. After reviewing the newer
generated code it was clear the new code was worse.
The core problem is we have a copy insn that is not frame related (and should
not be frame related) and a use of the destination of the copy in an insn that
is frame related. Prior to Surya's change we could propagate away the copy,
but not anymore.
Ideally we'd just avoid generating the copy entirely, but the structure of the
code to legitimize a poly_int isn't well suited for that. So instead we have
the code signal that it created a trivial copy and we try to optimize the code
after creation, but well before regcprop would have run. That fixes the code
quality aspect of the regression. In fact, it looks like the code can at times
be slightly better, but I didn't track down the precise reason why we were able
to re-use the read of VLEN so much better then before.
The optimization step is pretty simple. When it's been signaled that a copy was
generated, look back one insn and change it from writing the scratch register
to write the final destination instead.
That triggers the need to generalize the testcases so that they don't use
specific registers. We can also see the csr reads of the VLEN register getting
CSE'd more often in those testcases, so they're adjusted for that change as
well. There's some hope this will improve spill code more generally -- I
haven't really evaluated that, but I do know that when we spill vector
registers, the resulting code seems to have a lot of redundant VLEN reads.
Anyway, bootstrapped and regression tested on riscv (BPI and Pioneer). It's
also been through rv32 and rv64 regression testing. It doesn't fix all the
regressions for RISC-V on the trunk because (of course) something new got
introduced this week ;(
[ This is the spill-7 part of my last commit. After reviewing the logs from
the pre-commit system, it's good. ]
Vineet Gupta [Mon, 22 Dec 2025 16:54:10 +0000 (08:54 -0800)]
ifcvt: cond zero arith: handle subreg for shift count
Some backends, RISC-V included, wrap shift counts in subreg which
current cond zero arith wasn't handling.
This came up up when looking at the original submission of cond zero
arith which did handle subregs but then was omitted to for initial
simplicity and then got lost along the way.
Vineet Gupta [Mon, 22 Dec 2025 16:54:06 +0000 (08:54 -0800)]
ifcvt: cond zero arith: elide short forward branch for signed GE 0 comparison [PR122769]
Before After
---------------------+----------------------
bge a0,zero,.L2 | slti a0,a0,0
| czero.eqz a0,a0,a0
xor a1,a1,a3 | xor a0,a0,a0
.L2 |
mv a0,a1 |
ret | ret
This is what all the prev NFC patches have been preparing to get to.
Currently the cond arith code only handles EQ/NE zero conditions missing
ifcvt optimization for cases such as GE zero, as show in example above.
This is due to the limitation of noce_emit_czero () so switch to
noce_emit_cmove () which can handle conditions other than EQ/NE and
if needed generate additional supporting insns such as SLT.
This also allows us to remove the constraint at the entry to limit to EQ/NE
conditions, improving ifcvt outcomes in general.
PR target/122769
gcc/ChangeLog:
* ifcvt.cc (noce_try_cond_zero_arith): Use noce_emit_cmove.
Delete noce_emit_czero () no longer used.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr122769.c: New test.
Co-authored-by: Philipp Tomsich <philipp.tomsich@vrull.eu> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
Vineet Gupta [Mon, 22 Dec 2025 16:52:07 +0000 (08:52 -0800)]
ifcvt: cond zero arith: opencode helper noce_bbs_ok_for_cond_zero_arith [NFC]
This makes the code more readable by eliminating a bunch of pointer
intermediaries which obfuscate if_info items needed later in
noce_try_cond_zero_arith (). And while here add some top level comments
about what cond zero arith actually does.
gcc/ChangeLog:
* ifcvt.cc (noce_bbs_ok_for_cond_zero_arith): Move logic out.
(noce_try_cond_zero_arith): Into here.
Jeff Law [Mon, 22 Dec 2025 16:47:26 +0000 (09:47 -0700)]
[RISC-V][V2] Improve spill code for RVV slightly to fix regressions after recent changes
Surya's recent patch for hard register propagation has caused regressions on
the RISC-V port for the various spill-* testcases. After reviewing the newer
generated code it was clear the new code was worse.
The core problem is we have a copy insn that is not frame related (and should
not be frame related) and a use of the destination of the copy in an insn that
is frame related. Prior to Surya's change we could propagate away the copy,
but not anymore.
Ideally we'd just avoid generating the copy entirely, but the structure of the
code to legitimize a poly_int isn't well suited for that. So instead we have
the code signal that it created a trivial copy and we try to optimize the code
after creation, but well before regcprop would have run. That fixes the code
quality aspect of the regression. In fact, it looks like the code can at times
be slightly better, but I didn't track down the precise reason why we were able
to re-use the read of VLEN so much better then before.
The optimization step is pretty simple. When it's been signaled that a copy was
generated, look back one insn and change it from writing the scratch register
to write the final destination instead.
That triggers the need to generalize the testcases so that they don't use
specific registers. We can also see the csr reads of the VLEN register getting
CSE'd more often in those testcases, so they're adjusted for that change as
well. There's some hope this will improve spill code more generally -- I
haven't really evaluated that, but I do know that when we spill vector
registers, the resulting code seems to have a lot of redundant VLEN reads.
Anyway, bootstrapped and regression tested on riscv (BPI and Pioneer). It's
also been through rv32 and rv64 regression testing. It doesn't fix all the
regressions for RISC-V on the trunk because (of course) something new got
introduced this week ;(
I didn't include the spill-7.c change from either version of the patch. It
didn't fix the regression in pre-commit CI, so I'll chase that down
independently.
gcc/
* config/riscv/riscv.cc (riscv_expand_mult_with_const_int): Signal
when this creates a simple copy that may be optimized.
(riscv_legitimate_poly_move): Try to optimize away any copy created
by riscv_expand_mult_with_const_int.
* a68-parser-scanner.cc (a68_file_size): Fix comment to mention
it accepts `FILE *' and not file descriptor.
Fix invocation of `lseek' to correctly revert position of file
offset to previous one.
Harald Anlauf [Sun, 21 Dec 2025 22:03:28 +0000 (23:03 +0100)]
fortran: fix testsuite regression for gfortran.dg/value_9.f90 [PR123201]
Commit r16-3499 introduced a regression on targets where truncation of a
string argument passed to a CHARACTER(len=1),VALUE dummy argument missed
the special treatment needed for passing single characters.
PR fortran/123201
gcc/fortran/ChangeLog:
* trans-expr.cc (conv_dummy_value): Convert string of length 1 to a
single character for passing as actual argument.
Jerry DeLisle [Sun, 21 Dec 2025 21:33:15 +0000 (13:33 -0800)]
fortran: [PR121472] Fix ICE with constructor for finalized zero-size type.
When a derived type has a final subroutine and a constructor interface,
but is effectively zero-sized, the gimplifier fails on the finalization
code. The existing check for empty types (!derived->components) only
catches completely empty types, not types with empty components.
Replace with a tree-level TYPE_SIZE_UNIT check that catches all
zero-size cases.
PR fortran/121472
gcc/fortran/ChangeLog:
* trans.cc (gfc_finalize_tree_expr): Replace !derived->components
check with TYPE_SIZE_UNIT check for zero-size types.
Tamar Christina [Sun, 21 Dec 2025 08:27:13 +0000 (08:27 +0000)]
vect: use wider precision type for generating early break scalar IV [PR123089]
In the PR we see that the new scalar IV tricks other passes to think there's an
overflow to the use of a signed counter:
The loop is known to iterate 8191 times and we have a VF of 8 and it starts
at 2.
The codegen out of the vectorizer is the same as before, except we now have a
scalar variable counting the scalar iteration count vs a vector one.
i.e. we have
_45 = _39 + 8;
vs
_46 = _45 + { 16, 16, 16, 16, ... }
we pick a lower VF now since costing allows it to but that's not important.
When we get to cunroll since the value is now scalar, it sees that 8 * 8191
would overflow a signed short and so it changes the loop bounds to the largest
possible signed value and then uses this to elide the ivtmp_50 < 8191 as always
true and so you get an infinite loop:
Analyzing # of iterations of loop 1
exit condition [1, + , 1](no_overflow) < 8191
bounds on difference of bases: 8190 ... 8190
result:
# of iterations 8190, bounded by 8190
Statement (exit)if (ivtmp_50 < 8191)
is executed at most 8190 (bounded by 8190) + 1 times in loop 1.
Induction variable (signed short) 8 + 8 * iteration does not wrap in statement
_45 = _39 + 8;
in loop 1.
Statement _45 = _39 + 8;
is executed at most 4094 (bounded by 4094) + 1 times in loop 1.
The signed type was originally chosen because of the negative offset we use when
adjusting for peeling for alignments with masks. However this then introduces
issues as we see here with signed overflow. This patch instead determines the
smallest possible unsigned type for use by the scalar IV where the overflow
won't happen when we include the extra bit for the sign. i.e. if the scalar IV
is an unsigned 8 bit value we pick a signed 16-bit type. But if a signed 8-bit
value we pick a unsigned 8 bit type.
We use the initial niters value to determine the smallest size possible, to
prevent certain cases like when the IV in code is a 64-bit to need a TImode
counter. I also only require the additional bit when I know we'll be generating
the SMAX. I've now moved this to vectorizable_early_exit such that if we do
end up needing something like TImode that we don't vectorize if the target
doesn't support it.
I've also added some testcases for masking around the boundary values. I've
only added them for char to reduce the runtime of the tests.
gcc/ChangeLog:
PR tree-optimization/123089
* tree-vect-loop.cc (vect_update_ivs_after_vectorizer_for_early_breaks):
Add conversion if required, Note that if we did truncate the original
scalar loop had an overflow here anyway.
(vect_get_max_nscalars_per_iter): Expose.
* tree-vect-stmts.cc (vect_compute_type_for_early_break_scalar_iv): New.
(vectorizable_early_exit): Find smallest type where we won't have UB in
the signed IV and store it.
* tree-vectorizer.h (LOOP_VINFO_EARLY_BRK_IV_TYPE): New.
(class _loop_vec_info): Add early_break_iv_type.
(vect_min_prec_for_max_niters): New.
* tree-vect-loop-manip.cc (vect_do_peeling): Use it.
gcc/testsuite/ChangeLog:
PR tree-optimization/123089
* gcc.dg/vect/vect-early-break_141-pr123089.c: New test.
* gcc.target/aarch64/sve/peel_ind_14.c: New test.
* gcc.target/aarch64/sve/peel_ind_14_run.c: New test.
* gcc.target/aarch64/sve/peel_ind_15.c: New test.
* gcc.target/aarch64/sve/peel_ind_15_run.c: New test.
* gcc.target/aarch64/sve/peel_ind_16.c: New test.
* gcc.target/aarch64/sve/peel_ind_16_run.c: New test.
* gcc.target/aarch64/sve/peel_ind_17.c: New test.
* gcc.target/aarch64/sve/peel_ind_17_run.c: New test.
Andrew Pinski [Sat, 20 Dec 2025 20:00:36 +0000 (12:00 -0800)]
extension: Fix documentation for __builtin_*_overflow_p [PR123222]
This fixes the copy-and-pasto for these builtins.
Basically the documentation currently says "addition" as that was copied from
__builtin_add_overflow documentation but really it should say corresponding operation
instead.
Pushed as obvious.
PR middle-end/123222
gcc/ChangeLog:
* doc/extend.texi: Fix copy-and-pasto for __builtin_*_overflow_p.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Jose E. Marchesi [Sat, 20 Dec 2025 14:59:50 +0000 (15:59 +0100)]
a68: fix layout of incomplete types
Apparently there is some case where the c_union of an union may be
incomplete and the containing union complete. At this point I don't
fully understand how is that possible and the layering out of modes
should probably be rethinked, but for now fix this corner case.
Signed-off-by: Jose E. Marchesi <jemarch@gnu.org>
gcc/algol68/ChangeLog
* a68-low-moids.cc (a68_lower_moids): Fix for layout of
incomplete types.
Nathaniel Shead [Fri, 14 Nov 2025 23:34:36 +0000 (10:34 +1100)]
c++: Implement dependent ADL for use with modules
[module.global.frag] p3.3 says "A declaration D is decl-reachable from a
declaration S in the same translation unit if ... S contains a dependent
call E ([temp.dep]) and D is found by any name lookup performed for an
expression synthesized from E by replacing each type-dependent argument
or operand with a value of a placeholder type with no associated
namespaces or entities".
This requires doing partial ADL ondependent calls, in case there are
non-dependent arguments that would cause new functions to become
decl-reachable. This patch implements this with an additional lookup
during modules streaming to find any such entities.
This causes us to do ADL in more circumstances; this means also that we
might instantiate templates in cases we didn't use to. This could cause
issues given we have already started our modules walk at this point, or
break any otherwise valid existing code. To fix this patch adds a flag
to do a "tentative" ADL pass which doesn't attempt to complete any types
(and hence cause instantiations to occur); this means that we might miss
some associated entities however. During a tentative walk we can also
skip entities that we know won't contribute to the missing
decl-reachable set, as an optimisation.
One implementation limitation is that both modules tree walking and
name lookup marks tree nodes as TREE_VISITED for different purposes; to
avoid conflicts this patch caches calls that will require lookup in a
separate worklist to be processed after the walk is done.
PR c++/122712
gcc/cp/ChangeLog:
* module.cc (depset::hash::dep_adl_info): New type.
(depset::hash::dep_adl_entity_list): New work list.
(depset::hash::hash): Create it.
(depset::hash::~hash): Release it.
(trees_out::tree_value): Cache possibly dependent
calls during tree walk.
(depset::hash::add_dependent_adl_entities): New function.
(depset::hash::find_dependencies): Process cached entities.
* name-lookup.cc (name_lookup::tentative): New member.
(name_lookup::name_lookup): Initialize it.
(name_lookup::preserve_state): Propagate tentative from previous
lookup.
(name_lookup::adl_namespace_fns): Don't search imported bindings
during tentative lookup.
(name_lookup::adl_class): Don't attempt to complete class types
during tentative lookup.
(name_lookup::search_adl): Skip type-dependent args and avoid
unnecessary work during tentative lookup.
(lookup_arg_dependent): Add tentative parameter.
* name-lookup.h (lookup_arg_dependent): Likewise.
gcc/testsuite/ChangeLog:
* g++.dg/modules/adl-12_a.C: New test.
* g++.dg/modules/adl-12_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Jakub Jelinek [Sat, 20 Dec 2025 11:04:36 +0000 (12:04 +0100)]
c++: Ignore access in is_implicit_lifetime trait decisions [PR122690]
I've implemented the non-aggregate part of is_implicit_lifetime
paper according to the paper's comment how it can be implemented, i.e.
the std::conjunction from
template<typename T>
struct is_implicit_lifetime : std::disjunction<
std::is_scalar<T>,
std::is_array<T>,
std::is_aggregate<T>,
std::conjunction<
std::is_trivially_destructible<T>,
std::disjunction<
std::is_trivially_default_constructible<T>,
std::is_trivially_copy_constructible<T>,
std::is_trivially_move_constructible<T>>>> {};
in the paper. But as reported in PR122690, the actual wording in the
paper is different from that, the
https://eel.is/c++draft/class.prop#16.2 part of it:
"it has at least one trivial eligible constructor and a trivial,
non-deleted destructor" doesn't talk anything about accessibility
of those ctors or dtors, only triviality, not being deleted and eligibility.
My understanding is that GCC handles the last 2 bullets of
https://eel.is/c++draft/special#6 by not adding ctors ineligible because
of those into the overload at all, and for testing deleted cdtors
I need to lazily declare them in case such synthetization makes them
deleted.
So, this patch first checks for the easy cases (where the flags on the
type say the dtor is non-trivial or all the 3 special member ctors are
non-trivial) and if not, lazily declares them if needed and checks if they
are trivial and non-deleted.
2025-12-20 Jakub Jelinek <jakub@redhat.com>
PR c++/122690
* tree.cc (implicit_lifetime_type_p): Don't test is_trivially_xible,
instead try to lazily declare dtor and default, copy and move ctors
if needed and check for their triviality and whether they are
deleted.
Jakub Jelinek [Sat, 20 Dec 2025 10:59:19 +0000 (11:59 +0100)]
i386: Fix up handling of some -mno-avx512* options [PR123216]
This PR is about -mavx10.2 -mno-avx512vl ICE on some builtin.
Though, because -mavx10.2 implies -mavx512vl (among many others), the
pattern is right and doesn't need to care about such weird cases.
What is wrong is the handling of -mno-avx512vl and various other options,
that should unset -mavx10.1 and that should unset -mavx10.2, but it doesn't.
I went through various ISAs which 10.1 or 10.2 implies, looking for the
ISA{,2}_*_SET and corresponding ISA{,2}_*_UNSET macros and their use or lack
thereof.
Here is what I found.
OPTION_MASK_ISA_AVX512FP16_UNSET has been incorrectly defined (avx512fp16
implies avx512bw, not the other way around), but fortunately wasn't used.
And then various ISAs implied by -mavx10.1 (except for -mavx512f which was
correct) missed clearing -mavx10.{1,2} on -mno-* handling.
As mentioned in the PR, it would be really nice to add some verification of
the set and unset macros to verify consistency.
2025-12-20 Jakub Jelinek <jakub@redhat.com>
PR target/123216
* common/config/i386/i386-common.cc (OPTION_MASK_ISA_AVX512FP16_UNSET):
Remove unused macro.
(OPTION_MASK_ISA2_AVX512FP16_UNSET, OPTION_MASK_ISA2_AVX512BF16_UNSET,
OPTION_MASK_ISA2_AVX512BW_UNSET): Or in OPTION_MASK_ISA2_AVX10_1_UNSET.
(OPTION_MASK_ISA2_AVX512CD_UNSET, OPTION_MASK_ISA2_AVX512DQ_UNSET,
OPTION_MASK_ISA2_AVX512VL_UNSET, OPTION_MASK_ISA2_AVX512IFMA_UNSET,
OPTION_MASK_ISA2_AVX512VNNI_UNSET,
OPTION_MASK_ISA2_AVX512VPOPCNTDQ_UNSET,
OPTION_MASK_ISA2_AVX512VBMI_UNSET, OPTION_MASK_ISA2_AVX512VBMI2_UNSET,
OPTION_MASK_ISA2_AVX512BITALG_UNSET): Define.
(ix86_handle_option): For
-mno-avx512{cd,dq,vl,ifma,vnni,vpopcntdq,vbmi,vbmi2,bitalg} also remove
corresponding OPTION_MASK_ISA2_AVX512*_UNSET from ix86_isa_flags2
and add it to ix86_isa_flags2_explicit.
Jakub Jelinek [Sat, 20 Dec 2025 10:58:25 +0000 (11:58 +0100)]
i386: Fix up expansion of 2 keylocker and one user_msr builtin [PR123217]
target can be especially at -O0 a MEM, not just a REG, and most of the
ix86_expand_builtin spots which use target and can't support MEM
destinations deal with it properly, except these 3 spots don't.
Fixed thusly, when we change target to a new pseudo, the caller will
take care of storing that pseudo into the MEM, and this is the
solution other spots with similar requirements use in the function.
2025-12-20 Jakub Jelinek <jakub@redhat.com>
PR target/123217
* config/i386/i386-expand.cc (ix86_expand_builtin)
<case IX86_BUILTIN_ENCODEKEY128U32, case IX86_BUILTIN_ENCODEKEY256U32,
case IX86_BUILTIN_URDMSR>: Set target to a new pseudo even if it is
non-NULL but doesn't satisfy register_operand predicate.
* gcc.target/i386/keylocker-pr123217.c: New test.
* gcc.target/i386/user_msr-pr123217.c: New test.
The `recompute_dominator' function used in the code fragment within
this patch assumes correctness in the rest of the CFG. Consequently,
it is wrong to rely upon it before the subsequent updates are made in
the "Update dominators for multiple exits" loop in the function.
Furthermore, if `loop_exit' == `scalar_exit', the "Update dominators for
multiple exits" logic will already take care of updating the
dominator for `scalar_exit->dest', such that the moved statement is
unnecessary.
gcc/ChangeLog:
PR tree-optimization/123152
* tree-vect-loop-manip.cc
(slpeel_tree_duplicate_loop_to_edge_cfg): Correct order of
dominator update.
Jakub Jelinek [Fri, 19 Dec 2025 22:10:36 +0000 (23:10 +0100)]
fortran, openmp: Add default: clause in order to avoid -Wmaybe-uninitialized warning
While the enum has only 4 enumerators and all of them are listed, in theory values
with the enum type could contain other values and so without default:
-Wmaybe-uninitialized warning for the s variable can happen.
2025-12-19 Jakub Jelinek <jakub@redhat.com>
* dump-parse-tree.cc (show_omp_clauses): Add default: with
gcc_unreachable () to avoid spurious -Wmaybe-uninitialized warnings.
Jakub Jelinek [Fri, 19 Dec 2025 22:09:06 +0000 (23:09 +0100)]
Some further comment typos
This patch attempts to fix various comment typos (inspired by Gemini AI
on dwarf2out.cc, gimplify.cc and combine.cc files producing list of
typos, then manual grep for all the occurrences and changing them case
by case (e.g. there was one correct recourse use elsewhere I believe).
Tomasz Kamiński [Thu, 11 Dec 2025 09:43:44 +0000 (10:43 +0100)]
libstdc++: Use union to store non-trivially destructible types in C++17 mode [PR112591]
This patch disables use of specialization _Uninitialized<_Type, false> for
non-trivially destructible types by default in C++17, and fallbacks to
the primary template, that stores the type in union directly. This makes the
ABI consistent between C++17 and C++20 (or later). This partial specialization
is no longer required after the changes introduced in r16-5961-g09bece00d0ec98.
This fixes non-conformance in C++17 mode where global variables of a variant
specialization type, were not statically-initialized for non-trivially
destructible types, even if initialization of the selected alternative could
be performed at compile time. For illustration, the following global variable
will be statically initialized after this change:
std::variant<std::unique_ptr<T>, std::unique_ptr<U>> ptr;
This constitutes an ABI break, and changes the layout of the types, that uses
the same non-trivially copyable both as the base class, as alternative of the
variant object that is first member:
struct EmptyNonTrivial { ~EmptyNonTrivial(); };
struct Affected : EmptyNonTrivial {
std::variant<EmptyNonTrivial, char> mem; // mem was at offset zero,
// will use non-zero offset now
};
After changes the layout of such types consistent with one used for empty types
with trivial destructor, or one used for any empty type in C++20 or later.
For programs affected by this change, it can be reverted in C++17 mode, by
defining _GLIBCXX_USE_VARIANT_CXX17_OLD_ABI. However, presence of this macro
has no effect in C++20 or later modes.
PR libstdc++/112591
libstdc++-v3/ChangeLog:
* include/std/variant (_Uninitialized::_M_get, __get_n)
(_Uninitialized<_Type, false>): Add _GLIBCXX_USE_VARIANT_CXX17_OLD_ABI
check to preprocessor guard.
* testsuite/20_util/variant/112591.cc: Updated tests.
* testsuite/20_util/variant/112591_compat.cc: New test.
* testsuite/20_util/variant/constinit.cc: New test.
* testsuite/20_util/variant/constinit_compat.cc: New test.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Harald Anlauf [Fri, 19 Dec 2025 20:15:44 +0000 (21:15 +0100)]
Fortran: INTENT(IN) polymorphic argument with pointer components [PR71565]
PR fortran/71565
gcc/fortran/ChangeLog:
* expr.cc (gfc_check_vardef_context): Fix treatment of INTENT(IN)
checks for ASSOCIATE variables. Correct checking of PROTECTED
objects, as subobjects inherit the PROTECTED attribute.
gcc/testsuite/ChangeLog:
* gfortran.dg/protected_8.f90: Adjust patterns.
* gfortran.dg/associate_76.f90: New test.
Robin Dapp [Thu, 18 Dec 2025 10:19:57 +0000 (11:19 +0100)]
RISC-V: Fix overflow check in interleave pattern [PR122970].
In the pattern where we interpret and code-gen two interleaving series as if
they were represented in a larger type we check for overflow.
The overflow check is basically
if (base + (nelems - 1) * step >> inner_bits != 0)
overflow = true;
In the PR, base is negative and we interpret it as negative uint64
value. Thus, e.g. base + (nelems - 1) * step = -32 + 7 * 8 = 24.
24 fits uint8 and we wrongly assume that no overflow happens.
This patch reinterprets base as type of inner bit size which makes the
overflow check work.
PR target/122970
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_const_vector_interleaved_stepped_npatterns):
Reinterpret base as smaller type.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp: Add rvv_zvl128b_ok.
* gcc.target/riscv/rvv/autovec/pr122970.c: New test.
Robin Dapp [Thu, 13 Nov 2025 08:23:40 +0000 (09:23 +0100)]
RISC-V: Generic vec_extract via subreg.
We are missing several vec_extract chances because the current autovec
patterns are not comprehensive. In particular we don't extract from
pseudo-VLA modes that are actually VLS modes (just VLA modes in name).
Rather than add even more mode combinations to vec_extract, this patch
uses a dynamic approach in legitimize_move. At that point we can just check
if the mode sizes make sense and then emit the same code as before.
This is not the ideal solution as the middle-end and the vectorizer in
particular queries the vec_extract optab for support and won't emit
certain code sequences if it's not present (e.g. in VMAT_STRIDED_SLP
or when trying intermediate-sized vectors in a chain).
For simple BIT_FIELD_REFs it works, though.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_vector_subreg_extract): New
function that checks for and performs "vector extracts".
(legitimize_move): Call new function.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/subreg-extract.c: New test.
Robin Dapp [Thu, 6 Nov 2025 12:16:40 +0000 (13:16 +0100)]
RISC-V: Add VLS modes to autovec iterators.
In order to allow more VLS vectorization, add more VLS modes to the
autovec expanders, as well as some missing VLS modes that I encountered while
testing.
Robin Dapp [Thu, 6 Nov 2025 16:43:58 +0000 (17:43 +0100)]
RISC-V: Change gather/scatter iterators.
This patch changes the gather/scatter mode iterators from a ratio
scheme to a more direct one where the index mode size is
1/2, 1/4, 1/8, 2, 4, 8 times the data mode size. It also adds VLS modes
to the iterators and removes the now unnecessary
gather_scatter_valid_offset_p.
Robin Dapp [Mon, 15 Dec 2025 10:20:54 +0000 (11:20 +0100)]
vect: Fix scale-only pass in vect_gather_scatter_fn_p [PR123118].
In the process of refactoring the gather/scatter rework this likely got
lost. In the "third pass" we look for a configuration with a smaller
scale and a larger offset type with the same signedness. We want to be
able to multiply the offset by the new scale but not change the offset
sign. What we actually checked is whether a converted offset type was
supported without setting *supported_offset_vectype.
This patch removes the check for the offset type change and replaces it
with a TYPE_SIGN match.
PR tree-optimization/123118
gcc/ChangeLog:
* tree-vect-data-refs.cc (vect_gather_scatter_fn_p): Check that
the type sign is equal.
gcc/testsuite/ChangeLog:
* g++.target/riscv/rvv/autovec/pr123118.C: New test.
Robin Dapp [Mon, 15 Dec 2025 12:01:40 +0000 (13:01 +0100)]
forwprop: Check type conversion in pack/unpack [PR123117].
When using pack or unpack in the simplification of a vector constructor
we must make sure that the original BIT_FIELD_REF was no sign-changing
nop conversion. If it was we cannot safely pack/unpack as that would
skip sign or zero extensions. This patch adds useless_type_conversion_p
to both paths.
PR tree-optimization/123117
gcc/ChangeLog:
* tree-ssa-forwprop.cc (simplify_vector_constructor):
Check if we had a nop conversion and don't use pack/unpack in
that case.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/vector/lsx/pr123117.c: New test.
Robin Dapp [Mon, 8 Dec 2025 11:18:55 +0000 (12:18 +0100)]
RISC-V: Implement cbranch_all/any.
This implements the (cond_len_)cbranch_all/_any optabs for riscv and
adds a few tests. The patch requires a small vectorizer fix before
optabs take any effect.
* gcc.target/riscv/rvv/autovec/early-break-3.c: New test.
* gcc.target/riscv/rvv/autovec/early-break-4.c: New test.
* gcc.target/riscv/rvv/autovec/early-break-5.c: New test.
Robin Dapp [Fri, 12 Dec 2025 08:52:16 +0000 (09:52 +0100)]
vect: Use type precision in reduction epilogue [PR123097].
In the PR we extract non-existent bits/elements from a vector. This is
because we use TYPE_SIZE (vectype) for a boolean vector which returns 8
instead of 4 for RVV's vector (4) <signed-boolean:1>.
The patch uses TYPE_VECTOR_SUBPARTS instead and multiplies its result
with vector_element_bits to get the proper number of elements and size.
PR tree-optimization/123097
gcc/ChangeLog:
* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Calculate vector size by number of elements * bit size per
element.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr123097-run.c: New test.
* gcc.target/riscv/rvv/autovec/pr123097.c: New test.
Robin Dapp [Fri, 19 Dec 2025 18:36:35 +0000 (11:36 -0700)]
[PATCH] testsuite: Check for effective-target ctz [PR123192].
Rainer reported that ctz-ch.c fails on several platforms.
This patch adds /* { dg-require-effective-target ctz } */
to the test which looks like it does the right thing.
[PR123223, LRA]: Fix ICE of GCC built with checking rtl
The latest PR55212 patch improving dealing with scratch pseudos does not
check reload rtx on reg when recognizing scratch pseudos. This
results in failure of GCC built with checking rtl.
gcc/ChangeLog:
PR rtl-optimization/123223
* lra-constraints.cc (match_reload, curr_insn_transform): Check
rtx on REG when testing scratch pseudos.
Jeff Law [Fri, 19 Dec 2025 17:13:24 +0000 (10:13 -0700)]
[committed] Improve shift loops on the H8
Inspired by Georg-Johann's work on the AVR to convert the shift loops to a
sentinel approach and a rough work week, I revisited the shift patterns on the
H8 to see if we could improve things on that port as well. It also serves as a
good verification that things are working in my environment.
The basic idea of Georg-Johann's patch is to clear the bits that are going to
be shifted away, then turn on a sentinel bit (the last shifted away bit). This
is done outside the loop. The loop then iterates until the sentinel bit shows
up in C. This eliminates decrementing the loop counter and better performance.
It turns out to be fairly easy to implement on the H8. The first
implementation did the clearing and setting in the most simplistic way
possible, but to avoid significant code size regressions the clearing and
setting really needed to be handled by output_logical_op which has several
short cuts. So a bit of adjustment was necessary to make output_logical_op
callable from other contexts.
Second the H8/S and newer parts have shift-by-2 instructions. These aren't
normally used in shift loops unless we're optimizing for size. This requires
slight adjustment of the sentinel location for odd shift counts. The residual
single bit shift for that case is handled outside the loop.
Otherwise it's an uneventful patch. My hope was that it will save a minuscule
amount of testing time as the H8 continues to be the slowest cross target for
testing. Hard to judge that right now -- while the latest run on the H8 was
about 30 minutes faster than any run in the last month, the machine was
unloaded for that run while it was fully loaded for the standard nightly runs.
If this even approaches 1% I'll jump for joy.
Anyway, tested on the H8 with no regressions. Given the H8 is a dead ISA with
very few users, I'm going to go ahead and commit even though we're in stage3.
gcc/
* config/h8300/h8300.cc (output_logical_op): Adjust last argument to
be a pattern, not an insn. Corresponding implementation changes.
(output_shift_loop): Extracted from output_a_shift and improved
to use a sentinel to indicate when to stop the loop.
(output_a_shift): Use output_shift_loop.
(compute_a_shift_length): Handle adjusted shift loop code.
* config/h8300/logical.md (logicals): Pass pattern to output_logical_op
rather then the full insn.
* config/h8300/h8300-protos.h (output_logical_op): Update prototype.
Jakub Jelinek [Fri, 19 Dec 2025 15:44:16 +0000 (16:44 +0100)]
c++: Suppress -Wreturn-type warnings for functions with failed assertions [PR91388]
This is something Jonathan has asked for recently. E.g. in the recent
libstdc++ r16-6177 random.tcc changes, there was
if constexpr (__d <= 32)
return __generate_canonical_any<_RealT, uint64_t, __d>(__urng);
else
{
#if defined(__SIZEOF_INT128__)
static_assert(__d <= 64,
"irregular RNG with float precision >64 is not supported");
return __generate_canonical_any<
_RealT, unsigned __int128, __d>(__urng);
#else
static_assert(false, "irregular RNG with float precision"
" >32 requires __int128 support");
#endif
}
and when we hit there the static_assert, we don't get just an error about
that, but also a -Wreturn-type warning in the same function because that
path falls through to the end of function without returning a value.
But a function with a failed static_assert is erroneous and will never
fall through to the end. We could treat failed static_assert in functions
as __builtin_unreachable (), but I think it doesn't matter where exactly
in a function static_assert(false); appears, so this patch just suppresses
-Wreturn-type warning in that function instead.
2025-12-19 Jakub Jelinek <jakub@redhat.com>
PR c++/91388
* semantics.cc (finish_static_assert): Suppress -Wreturn-type warnings
in functions with failed assertions.