Jose E. Marchesi [Sat, 27 Dec 2025 10:09:04 +0000 (11:09 +0100)]
a68: allow joined list of revelations in access clauses
This commit adds support for having a joined list of revelations in
access clauses, like in:
access Module18a,
Module18b,
Module18c
begin assert (foo = 10);
assert (bar = 20);
assert (baz = 30)
end
Signed-off-by: Jose E. Marchesi <jemarch@gnu.org>
gcc/algol68/ChangeLog
* a68-parser-bottom-up.cc (reduce_enclosed_clauses): Reduce joined
list of revelations.
* a68-low-clauses.cc (a68_lower_revelation_ludes): New function.
(a68_lower_access_clause): Use a68_lower_revelation_ludes.
Jose E. Marchesi [Tue, 23 Dec 2025 14:53:36 +0000 (15:53 +0100)]
a68: fix support for nested access clauses
This commit fixes the support for having an access clause as the
controlled clause of another access clause.
Signed-off-by: Jose E. Marchesi <jemarch@gnu.org>
gcc/algol68/ChangeLog
* a68-parser-top-down.cc (top_down_access): An access clause may
be nested in another access clause.
* a68-parser-extract.cc (a68_extract_indicants): Coalesce 'pub'
symbols.
(a68_extract_indicants): Nested access are not allowed in module
texts.
* a68-parser-bottom-up.cc (expected_module_text): New function.
(reduce_prelude_packet): Use expected_module_text.
(a68_bottom_up_error_check): Add comment.
gcc/testsuite/ChangeLog
* algol68/compile/error-module-nested-access-1.a68: New test.
* algol68/execute/modules/program-21.a68: Likewise.
Jose E. Marchesi [Mon, 22 Dec 2025 23:52:52 +0000 (00:52 +0100)]
a68: fetch module exports from packet by name
A packet (compilation unit) may emit more than one module interface in
its exports section. This is because a module may publicize the
exports of other module. This commit makes the import infrastructure
to read multiple module interfaces from exports sections and then look
for the accessed module in the data.
Signed-off-by: Jose E. Marchesi <jemarch@gnu.org>
gcc/algol68/ChangeLog
* a68-types.h (struct MOIF_T): Add chain_next to GTY info.
* a68-imports.cc (a68_decode_modes): Mode offsets are relative to
the start of the moif, not the start of the exports.
(a68_decode_moifs): Renamed from a68_decode_moif and changed to
decode multiple moifs from the exports.
(a68_open_packet): Call a68_decode_moifs and look for the right
moif.
* a68-exports.cc (a68_moif_new): Initialize NEXT (moif).
Jakub Jelinek [Sat, 27 Dec 2025 10:45:18 +0000 (11:45 +0100)]
simplify-rtx: Fix up (ne (ior (ne x 0) y) 0) simplification [PR123114]
The following testcase ICEs on x86_64-linux since the PR52345
(ne (ior (ne x 0) y) 0) simplification was (slightly) fixed.
It wants to optimize
(set (reg/i:DI 10 a0)
(ne:DI (ior:DI (ne:DI (reg:DI 151 [ a ])
(const_int 0 [0]))
(reg:DI 152 [ b ]))
(const_int 0 [0])))
but doesn't check important property of that, in particular
that the mode of the inner NE operand is the same as the
mode of the inner NE.
The following testcase has
(set (reg:CCZ 17 flags)
(compare:CCZ (ior:QI (ne:QI (reg/v:SI 104 [ c ])
(const_int 0 [0]))
(reg:QI 98 [ _5 ]))
(const_int 0 [0])))
where cmp_mode is QImode, but the mode of the inner NE operand
is SImode instead, and it attempts to create
(ne:CCZ (ior:QI (reg/v:SI 104 [ c ]) (reg:QI 98 [ _5 ])) (const_int 0))
which obviously crashes later on.
The following patch fixes it by checking the mode of the inner NE operand
and also by using CONST0_RTX (cmp_mode) instead of CONST0_RTX (mode)
because that is the mode of the other operand, not mode which is the
mode of the outer comparison (though, guess for most modes it will still
be const0_rtx).
I guess for mode mismatches we could arbitrarily choose some extension (zero
or sign) and extend the narrower mode to the wider mode, but I doubt that it
would ever match on any target. But even then we'd need to limit it, we
wouldn't want to deal with another mode class (say floating point
comparisons), and dunno about vector modes etc.
2025-12-27 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/123114
* simplify-rtx.cc (simplify_context::simplify_relational_operation):
Verify XEXP (XEXP (op0, 0), 0) mode and use CONST0_RTX (cmp_mode)
instead of CONST0_RTX (mode).
Eric Botcazou [Sat, 27 Dec 2025 09:24:52 +0000 (10:24 +0100)]
Ada: Fix assertion failure for unfrozen mutably tagged type as actual
...in instance. An instantiation is a freezing point for the actuals,
so the mutably tagged type will be frozen by the instantiation, but this
happens too late in the current implementation of mutably tagged types,
because the declaration of their CW-equivalent type is not analyzed until
after the type is frozen.
gcc/ada/
PR ada/123306
* sem_ch12.adb (Analyze_One_Association): Immediately freeze the
root type of mutably tagged types used as actual type parameters.
gcc/testsuite/
* gnat.dg/specs/mutably_tagged1.ads: New test.
Jeff Law [Fri, 26 Dec 2025 22:24:56 +0000 (15:24 -0700)]
[RISC-V][PR target/123283] Wrap naked REG operands with a USE.
I was in the process of testing this patch when Andreas filed PR123283.
What's going on is we have patterns in sync.md which have naked operands:
(define_insn "subword_atomic_fetch_strong_<atomic_optab>"
[(set (match_operand:SI 0 "register_operand" "=&r") ;; old value at mem
(match_operand:SI 1 "memory_operand" "+A")) ;; mem location
(set (match_dup 1)
(unspec_volatile:SI
[(any_atomic:SI (match_dup 1)
(match_operand:SI 2 "arith_operand" "rI")) ;; value for op
(match_operand:SI 3 "const_int_operand")] ;; model
UNSPEC_SYNC_OLD_OP_SUBWORD))
(match_operand:SI 4 "arith_operand" "rI") ;; mask
(match_operand:SI 5 "arith_operand" "rI") ;; not_mask
(clobber (match_scratch:SI 6 "=&r")) ;; tmp_1
(clobber (match_scratch:SI 7 "=&r"))] ;; tmp_2
Note carefully operands #4 and #5 and the fact they are a toplevel construct as
opposed to being an operand of another RTX. That's a no-no. They need to be
wrapped with a USE.
I spot-checked sync.md and found a few more instances. Fixing the set I found
fixed the testsuite regressions I was seeing and also fixes the mis-compilation
of libgo. Bootstrapped and regression tested on my BPI and Pioneer. It's also
clean on the riscv64-elf and riscv32-elf targets in my tester.
PR target/123283
gcc/
* config/riscv/sync.md (subword_atomic_fetch_strong_nand): Add
USEs for naked operands that might be pseudos.
(subword_atomic_fetch_strong_<atomic_optab>): Likewise.
(subword_atomic_exchange_strong): Likewise.
(subword_atomic_cas_strong): Likewise.
Eric Botcazou [Fri, 26 Dec 2025 13:52:32 +0000 (14:52 +0100)]
Ada: Fix bogus error on aggregate in call with qualified type in instance
This happens with a container aggregate in the testcase, although this can
very likely happen with a record aggregate as well. The trick used in the
Save_Global_References procedure for aggregates loses the qualification of
the type of the formal for which the aggregate is the actual.
gcc/ada/
PR ada/123302
* sem_ch12.adb (Save_Global_Reference.Save_References_In_Aggregate):
Recurse on the scope of the type to find one that is visible, in the
case of an actual in a subprogram call with a local type.
gcc/testsuite/
* gnat.dg/aggr34.adb: New test.
* gnat.dg/aggr34_pkg1.ads, gnat.dg/aggr34_pkg1.adb: New helper.
* gnat.dg/aggr34_pkg2.ads, gnat.dg/aggr34_pkg2.adb: Likewise.
* gnat.dg/aggr34_pkg3.ads: Likewise.
Egas Ribeiro [Mon, 22 Dec 2025 21:41:00 +0000 (21:41 +0000)]
c-family: Fix ICE with -MD and -fdeps-format sharing output [PR121864]
When -MD, -fdeps-format=p1689r5 and -save-temps are used without
explicit output files, they default to the same stream, which is
invalid. The error message attempted to print fdeps_file, but this is
NULL in this case, causing an ICE.
Use out_fname as a fallback when fdeps_file is NULL to avoid the ICE
and provide a meaningful error message.
Fix suggested by Andrew Pinski.
PR c++/121864
gcc/c-family/ChangeLog:
* c-opts.cc (c_common_finish): Use out_fname as fallback when
fdeps_file is NULL in error message.
Signed-off-by: Egas Ribeiro <egas.g.ribeiro@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Eric Botcazou [Fri, 26 Dec 2025 09:44:57 +0000 (10:44 +0100)]
Ada: Fix illegal Aggregate aspect not rejected
The Ada 2022 RM is adamant that the names specified in the Aggregate aspect
must denote "exactly one" subprogram, in other words that it is illegal to
use names that denote more than one subprogram in the Aggregate aspect.
gcc/ada/
PR ada/123289
* sem_ch13.adb (Resolve_Aspect_Aggregate.Resolve_Operation): Give
an error if the operation's name denotes more than one subprogram.
gcc/testsuite/
* gnat.dg/specs/aggr9.ads: New test.
Sandra Loosemore [Sun, 21 Dec 2025 02:23:43 +0000 (02:23 +0000)]
doc, riscv: Clean up RISC-V extensions documentation
This patch fixes a number of problems I observed in the RISC-V
extensions documentation, which is autogenerated from .def files:
- The formatting of the table looked terrible in the PDF output, with
overlapping text. I made the first two columns wider to fix this.
- Also the extension names in the table should have @samp{} markup.
- Many extensions were missing a full name/description. (Documenting
something as "xyzzy extension" adds nothing useful to readers when we
are already listing the extension name "xyzzy" in the table.)
- Irregular spelling and capitalization in the full names.
Sandra Loosemore [Sun, 14 Dec 2025 00:38:48 +0000 (00:38 +0000)]
doc, riscv: Clean up documentation of RISC-V options [PR122243]
gcc/ChangeLog
PR other/122243
* config/riscv/riscv.opt (mplt): Mark deprecated option Undocumented.
(msmall-data-limit=): Mark RejectNegative.
* doc/invoke.texi (Option Summary) <RISC-V Options>: Remove -mplt
documentation. Only list one form of each option. Add missing
options -mcpu, -mscalar-strict-align, -mno-vector-strict-align,
-momit-leaf-frame-pointer, -mstringop-strategy, -mrvv-vector-bits,
-mrvv-max-lmul, -madjust-lmul-cost, -mmax-vectorization, and
-mno-autovec-segment.
(RISC-V Options): Remove -mplt documentation. Add documentation for
missing options listed above. Add missing index entries for negative
forms. Correct the default for the -minline-str* options, which
has changed. Copy-edit for markup, spelling, and usage. Trivial
whitespace fixes.
Then, sincos attempts to find the type of the IFN_SIN/IFN_COS via
mathfn_built_in_type. This fails, so the compiler crashes.
For these IFNs, their input type is the same as their output type, so
we can fall back to that.
Note that, currently, GCC can't seem to handle vector sincos/cexpi
operations, so any attempt to CSE these will fail quickly after. This
patch does not fix that, only the ICE that happens in the attempt.
gcc/ChangeLog:
* tree-ssa-math-opts.cc (execute_cse_sincos_1): If
mathfn_built_in_type fails to determine a type for our
operation, presume that it is the same as the input type.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/sincos-ice-on-ifn_sin-call.c: New test.
* gcc.target/gcn/sincos-ice-on-ifn_sin-call-1.c: New test.
Pan Li [Sun, 21 Dec 2025 12:07:43 +0000 (20:07 +0800)]
RISC-V: Combine vec_duplicate + vmsleu.vv to vmsleu.vx on GR2VR cost
This patch would like to combine the vec_duplicate + vmsleu.wv to the
vmsleu.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.
Assume we have asm code like below, GR2VR cost is 0.
After this patch:
11 beq a3,zero,.L8
...
14 .L3:
15 vsetvli a5,a3,e32,m1,ta,ma
...
20 vmsleu.wx v1,a2,v3
...
23 bne a3,zero,.L3
gcc/ChangeLog:
* config/riscv/predicates.md: Add geu to the swappable
cmp operator iterator.
* config/riscv/riscv-v.cc (get_swapped_cmp_rtx_code): Take
care of the swapped rtx code correspondly.
aarch64: Add the ability to have three types in an sve/sme intrinsic name
The majority of sve/sme intrinsics have names which are defined by one type
(like svuint8_t svextq[_u8]) or two types (like svsub_za32[_f32]_vg1x2).
Some intrinsics now have three types (like svtmopa_lane_za32[_s8_u8]).
This change extends the number of type_suffix_indexes from two to three
to cover this case.
gcc/
* config/aarch64/aarch64-sve-builtins-base.cc: (svmul_impl::fold):
Replace use of type_suffix_pair with type_suffix_triple.
* config/aarch64/aarch64-sve-builtins-shapes.cc: (parse_element_type):
Handle third type suffix.
(parse_type): Handle c2 in function signature. Add the u signature with
the ability to pass a tuple with twice as many vectors as the base type.
Calculate number of vectors against the type with the maximum number of
bits rather than "the other one".
(load_contiguous_base::resolve): Add argument to resolve_to call.
(compare_scalar_def::resolve): Likewise.
(ternary_mfloat8_def::resolve): Likewise.
(ternary_mfloat8_lane_def::resolve): Likewise.
(ternary_mfloat8_opt_n_def::resolve): Likewise.
* config/aarch64/aarch64-sve-builtins.cc: (TYPES_all_pred,
TYPES_all_count, TYPES_all_pred_count, TYPES_all_float,
TYPES_all_signed, TYPES_all_float_and_signed, TYPES_all_unsigned,
TYPES_all_integer, TYPES_all_arith, TYPES_all_data, TYPES_b, TYPES_c,
TYPES_b_unsigned, TYPES_b_integer, TYPES_b_data, TYPES_bh_integer,
TYPES_bs_unsigned, TYPES_bhs_signed, TYPES_bhs_unsigned,
TYPES_bhs_integer, TYPES_bh_data, TYPES_bhs_data, TYPES_bhs_widen,
TYPES_h_bfloat, TYPES_h_float, TYPES_h_integer, TYPES_h_data,
TYPES_hs_signed, TYPES_hs_integer, TYPES_hs_float, TYPES_hs_data,
TYPES_hd_unsigned, TYPES_hsd_signed, TYPES_hsd_integer, TYPES_hsd_data,
TYPES_h_float_mf8, TYPES_s_float, TYPES_s_float_mf8,
TYPES_s_float_hsd_integer, TYPES_s_float_sd_integer, TYPES_s_signed,
TYPES_s_unsigned, TYPES_s_integer, TYPES_s_data, TYPES_sd_signed,
TYPES_sd_unsigned, TYPES_sd_integer, TYPES_sd_data,
TYPES_all_float_and_sd_integer, TYPES_d_float, TYPES_d_unsigned,
TYPES_d_integer, TYPES_d_data, TYPES_cvt, TYPES_cvt_bfloat,
TYPES_cvt_h_s_float, TYPES_cvt_f32_f16, TYPES_cvt_long,
TYPES_cvt_narrow_s, TYPES_cvt_narrow, TYPES_cvt_s_s, TYPES_cvt_mf8,
TYPES_cvtn_mf8, TYPES_cvtnx_mf8, TYPES_inc_dec_n, TYPES_qcvt_x2,
TYPES_qcvt_x4, TYPES_qrshr_x2,TYPES_qrshru_x2, TYPES_qrshr_x4,
TYPES_qrshru_x4, TYPES_reinterpret, TYPES_reinterpret_b,TYPES_while,
TYPES_while_x, TYPES_while_x_c, TYPES_s_narrow_fsu,TYPES_all_za,
TYPES_d_za, TYPES_za_bhsd_data,TYPES_za_all_data, TYPES_za_h_mf8,
TYPES_za_hs_mf8, TYPES_za_h_bfloat, TYPES_za_h_float,
TYPES_za_s_b_signed, TYPES_za_s_b_unsigned, TYPES_za_s_b_integer,
TYPES_za_s_h_integer,TYPES_za_s_h_data, TYPES_za_s_unsigned,
TYPES_za_s_integer, TYPES_za_s_mf8, TYPES_za_s_float, TYPES_za_s_data,
TYPES_za_d_h_integer, TYPES_za_d_float, TYPES_za_d_integer,
TYPES_mop_base, TYPES_mop_base_signed, TYPES_mop_base_unsigned,
TYPES_mop_i16i64, TYPES_mop_i16i64_signed, TYPES_mop_i16i64_unsigned,
ΤYPES_za): Extend defines to three arguments.
(DEF_VECTOR_TYPE, DEF_DOUBLE_TYPE): Likewise.
(DEF_TRIPLE_TYPE): Add new define.
(DEF_SVE_TYPES_ARRAY): Redefine all types_ arrays into arrays of
type_suffix_triple.
(types_none): Likewise.
(function_instance::hash): Add third type to hash calculation.
(function_builder::get_name): Add third type to function name.
(function_builder::add_overloaded_functions): Handle third type.
(function_resolver::lookup_form): Likewise.
(function_resolver::resolve_to): Likewise.
(function_resolver::resolve_unary): Likewise.
* config/aarch64/aarch64-sve-builtins.h: (type_suffix_triple): replace
type_suffix_pair.
(function_group_info::types): Likewise.
(function_instance::ctor): Likewise.
(function_instance::type_suffix_ids): Likewise.
(function_resolver::lookup_form): Add third type argument.
(function_resolver::resolve_to): Likewise.
(function_instance::operator==): Add third type to equality calculation.
Karl Meakin [Wed, 24 Dec 2025 11:41:27 +0000 (11:41 +0000)]
aarch64: add 8-bit floating point dot product
This patch adds support for the following intrinsics when sme-f8f16 is enabled:
* svdot_za16[_mf8]_vg1x2_fpm
* svdot_za16[_mf8]_vg1x4_fpm
* svdot[_single]_za16[_mf8]_vg1x2_fpm
* svdot[_single]_za16[_mf8]_vg1x4_fpm
* svdot_lane_za16[_mf8]_vg1x2_fpm
* svdot_lane_za16[_mf8]_vg1x4_fpm
This patch adds support for the following intrinsics when sme-f8f32 is enabled:
* svdot_za32[_mf8]_vg1x2_fpm
* svdot_za32[_mf8]_vg1x4_fpm
* svdot[_single]_za32[_mf8]_vg1x2_fpm
* svdot[_single]_za32[_mf8]_vg1x4_fpm
* svdot_lane_za32[_mf8]_vg1x2_fpm
* svdot_lane_za32[_mf8]_vg1x4_fpm
* svvdot_lane_za32[_mf8]_vg1x2_fpm
* svvdotb_lane_za32[_mf8]_vg1x4_fpm
* svvdott_lane_za32[_mf8]_vg1x4_fpm
gcc:
* config/aarch64/aarch64-sme.md
(@aarch64_sme_<optab><SME_ZA_F8F16_32:mode><SME_ZA_FP8_x24:mode>): New insn.
(@aarch64_fvdot_half<optab>): Likewise.
(@aarch64_fvdot_half<optab>_plus): Likewise.
* config/aarch64/aarch64-sve-builtins-functions.h
(class svvdot_half_impl): New function impl.
* config/aarch64/aarch64-sve-builtins-sme.cc (FUNCTION): Likewise.
* config/aarch64/aarch64-sve-builtins-shapes.cc (struct dot_half_za_slice_lane_def):
New function shape.
* config/aarch64/aarch64-sve-builtins-shapes.h: Likewise.
* config/aarch64/aarch64-sve-builtins-sme.def (svdot): New function.
(svdot_lane): Likewise.
(svvdot_lane): Likewise.
(svvdotb_lane): Likewise.
(svvdott_lane): Likewise.
* config/aarch64/aarch64-sve-builtins-sme.h (svvdotb_lane_za): New function.
(svvdott_lane_za): Likewise.
* config/aarch64/aarch64-sve-builtins.cc (TYPES_za_s_mf8): New types array.
(TYPES_za_hs_mf8): Likewise.
(za_hs_mf8): Likewise.
* config/aarch64/iterators.md (SME_ZA_F8F16): New mode iterator.
(SME_ZA_F8F32): Likewise.
(SME_ZA_FP8_x1): Likewise.
(SME_ZA_FP8_x2): Likewise.
(SME_ZA_FP8_x4): Likewise.
(UNSPEC_SME_FDOT_FP8): New unspec.
(UNSPEC_SME_FVDOT_FP8): Likewise.
(UNSPEC_SME_FVDOTT_FP8): Likewise.
(UNSPEC_SME_FVDOTB_FP8): Likewise.
(SME_FP8_DOTPROD): New int iterator.
(SME_FP8_FVDOT): Likewise.
(SME_FP8_FVDOT_HALF): Likewise.
gcc/testsuite:
* gcc.target/aarch64/sme2/acle-asm/dot_lane_za16_mf8_vg1x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/dot_lane_za16_mf8_vg1x4.c: New test.
* gcc.target/aarch64/sme2/acle-asm/dot_lane_za32_mf8_vg1x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/dot_lane_za32_mf8_vg1x4.c: New test.
* gcc.target/aarch64/sme2/acle-asm/dot_single_za16_mf8_vg1x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/dot_single_za16_mf8_vg1x4.c: New test.
* gcc.target/aarch64/sme2/acle-asm/dot_single_za32_mf8_vg1x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/dot_single_za32_mf8_vg1x4.c: New test.
* gcc.target/aarch64/sme2/acle-asm/dot_za16_mf8_vg1x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/dot_za16_mf8_vg1x4.c: New test.
* gcc.target/aarch64/sme2/acle-asm/dot_za32_mf8_vg1x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/dot_za32_mf8_vg1x4.c: New test.
* gcc.target/aarch64/sme2/acle-asm/vdot_lane_za16_mf8_vg1x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/vdotb_lane_za32_mf8_vg1x4.c: New test.
* gcc.target/aarch64/sme2/acle-asm/vdott_lane_za32_mf8_vg1x4.c: New test.
* gcc.target/aarch64/sve/acle/general-c/dot_half_za_slice_lane_fpm.c: New test.
aarch64: add 8-bit floating-point sum of outer products and accumulate
This patch adds support for FMOPA (widening, 2-way, FP8 to FP16) when
sme-f8f16 is enabled using svmopa_za16[_mf8]_m_fpm and for FMOPA (widening,
4-way) when sme-f8f32 is enabled using svmopa_za32[_mf8]_m_fpm.
Asm tests for the new intrinsics are added, similar to those for existing
mopa_z16 intrinsics. Tests for the binary_za_m shape are added.
gcc:
* config/aarch64/aarch64-sme.md
(@aarch64_sme_<optab><SME_ZA_F8F16_32:mode><VNx16QI_ONLY:mode>): Add
new define_insn.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(struct binary_za_m_base): Support fpm argument.
* config/aarch64/aarch64-sve-builtins-sme.cc (svmopa_za): Extend for
fp8.
* config/aarch64/aarch64-sve-builtins-sme.def (svmopa): Add new
DEF_SME_ZA_FUNCTION_GS_FPM entries.
aarch64: add Multi-vector 8-bit floating-point multiply-add long
This patch adds support for the following intrinsics when sme-f8f16 is enabled:
* svmla_lane_za16[_mf8]_vg2x1_fpm
* svmla_lane_za16[_mf8]_vg2x2_fpm
* svmla_lane_za16[_mf8]_vg2x4_fpm
* svmla_za16[_mf8]_vg2x1_fpm
* svmla[_single]_za16[_mf8]_vg2x2_fpm
* svmla[_single]_za16[_mf8]_vg2x4_fpm
* svmla_za16[_mf8]_vg2x2_fpm
* svmla_za16[_mf8]_vg2x4_fpm
This patch adds support for the following intrinsics when sme-f8f32 is enabled:
* svmla_lane_za32[_mf8]_vg4x1_fpm
* svmla_lane_za32[_mf8]_vg4x2_fpm
* svmla_lane_za32[_mf8]_vg4x4_fpm
* svmla_za32[_mf8]_vg4x1_fpm
* svmla[_single]_za32[_mf8]_vg4x2_fpm
* svmla[_single]_za32[_mf8]_vg4x4_fpm
* svmla_za32[_mf8]_vg4x2_fpm
* svmla_za32[_mf8]_vg4x4_fpm
Asm tests for the 32 bit versions follow the blueprint set in
mla_lane_za32_u8_vg4x1.c mla_za32_u8_vg4x1.c and similar.
16 bit versions follow similar patterns modulo differences in allowed offsets.
gcc:
* config/aarch64/aarch64-sme.md
(@aarch64_sme_<optab><SME_ZA_F8F16_32:mode><SME_ZA_FP8_x24:mode>): Add
new define_insn.
(*aarch64_sme_<optab><VNx8HI_ONLY:mode><SME_ZA_FP8_x24:mode>_plus,
*aarch64_sme_<optab><VNx4SI_ONLY:mode><SME_ZA_FP8_x24:mode>_plus,
@aarch64_sme_<optab><SME_ZA_F8F16_32:mode><VNx16QI_ONLY:mode>,
*aarch64_sme_<optab><VNx8HI_ONLY:mode><VNx16QI_ONLY:mode>_plus,
*aarch64_sme_<optab><VNx4SI_ONLY:mode><VNx16QI_ONLY:mode>_plus,
@aarch64_sme_single_<optab><SME_ZA_F8F16_32:mode><SME_ZA_FP8_x24:mode>,
*aarch64_sme_single_<optab><VNx8HI_ONLY:mode><SME_ZA_FP8_x24:mode>_plus,
*aarch64_sme_single_<optab><VNx4SI_ONLY:mode><SME_ZA_FP8_x24:mode>_plus,
@aarch64_sme_lane_<optab><SME_ZA_F8F16_32:mode><SME_ZA_FP8_x124:mode>,
*aarch64_sme_lane_<optab><VNx8HI_ONLY:mode><SME_ZA_FP8_x124:mode>,
*aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_FP8_x124:mode>):
Likewise.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(struct binary_za_slice_lane_base): Support fpm argument.
(struct binary_za_slice_opt_single_base): Likewise.
* config/aarch64/aarch64-sve-builtins-sme.cc (svmla_za): Extend for fp8.
(svmla_lane_za): Likewise.
* config/aarch64/aarch64-sve-builtins-sme.def (svmla_lane): Add new
DEF_SME_ZA_FUNCTION_GS_FPM entries.
(svmla): Likewise.
* config/aarch64/iterators.md (SME_ZA_F8F16_32): Add new mode iterator.
(SME_ZA_FP8_x24, SME_ZA_FP8_x124): Likewise.
(UNSPEC_SME_FMLAL): Add new unspec.
(za16_offset_range): Add new mode_attr.
(za16_32_long): Likewise.
(za16_32_last_offset): Likewise.
(SME_FP8_TERNARY_SLICE): Add new iterator.
(optab): Add entry for UNSPEC_SME_FMLAL.
gcc/testsuite:
* gcc.target/aarch64/sme2/acle-asm/test_sme2_acle.h: (TEST_ZA_X1,
TEST_ZA_XN, TEST_ZA_SINGLE, TEST_ZA_SINGLE_Z15, TEST_ZA_LANE,
TEST_ZA_LANE_Z15): Add fpm0 parameter.
* gcc.target/aarch64/sve/acle/general-c/binary_za_slice_lane_1.c: Add
tests for variants accepting fpm.
* gcc.target/aarch64/sve/acle/general-c/binary_za_slice_opt_single_1.c:
Likewise.
* gcc.target/aarch64/sme2/acle-asm/mla_lane_za16_mf8_vg2x1.c: New test.
* gcc.target/aarch64/sme2/acle-asm/mla_lane_za16_mf8_vg2x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/mla_lane_za16_mf8_vg2x4.c: New test.
* gcc.target/aarch64/sme2/acle-asm/mla_lane_za32_mf8_vg4x1.c: New test.
* gcc.target/aarch64/sme2/acle-asm/mla_lane_za32_mf8_vg4x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/mla_lane_za32_mf8_vg4x4.c: New test.
* gcc.target/aarch64/sme2/acle-asm/mla_za16_mf8_vg2x1.c: New test.
* gcc.target/aarch64/sme2/acle-asm/mla_za16_mf8_vg2x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/mla_za16_mf8_vg2x4.c: New test.
* gcc.target/aarch64/sme2/acle-asm/mla_za32_mf8_vg4x1.c: New test.
* gcc.target/aarch64/sme2/acle-asm/mla_za32_mf8_vg4x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/mla_za32_mf8_vg4x4.c: New test.
aarch64: add basic support for sme-f8f16 and sme-f8f32
This patch adds support for the SME_F8F16 and SME_F8F32 features as architecture
options, along with related definitions. This support is required for subsequent
intrinsics to work.
gcc/
* config/aarch64/aarch64.h:
(TARGET_STREAMING_SME_F8F16, TARGET_STREAMING_SME_F8F32): Add defines.
* config/aarch64/aarch64-c.cc:
(__ARM_FEATURE_SME_F8F16, __ARM_FEATURE_SME_F8F32): Add defines.
* config/aarch64/aarch64-option-extensions.def:
(sme-f8f16, sme-f8f32): Add arch options in command line.
* config/aarch64/aarch64-sve-builtins-functions.h:
(sme_2mode_function_t): Pass unspec_for_mfp8 parameter through ctor.
* config/aarch64/aarch64-sve-builtins-sme.def:
(DEF_SME_FUNCTION_GS, DEF_SME_FUNCTION): Redefine based on
DEF_SME_FUNCTION_GS_FPM.
(DEF_SME_ZA_FUNCTION_GS, DEF_SME_ZA_FUNCTION): Redefine based on
DEF_SME_ZA_FUNCTION_GS_FPM.
(AARCH64_FL_SME_F8F16, AARCH64_FL_SME_F8F32): Add new
REQUIRED_EXTENSIONS sections.
* config/aarch64/aarch64-sve-builtins.cc:
(TYPES_za_h_mf8): Add new types.
(TYPES_za_s_mf8): Likewise.
(sme_function_groups): Define using DEF_SME_FUNCTION_GS_FPM instead of
DEF_SME_FUNCTION_GS.
* doc/invoke.texi: (sme-f8f16, sme-f8f32): Add documentation of option.
gcc/testsuite/
* gcc.target/aarch64/pragma_cpp_predefs_4.c: Add tests checking that
sme-f8f16 and sme-f8f32 prefefs are off by default, and checks for
feature dependencies.
* lib/target-supports.exp: Add check_effective_target support for
sme-f8f16 and sme-f8f32.
Test structure is based on the urshl ones that have a similar structure in how
they treat arguments.
gcc/
* config/aarch64/aarch64-sve-builtins-base.cc (svscale_impl): Added new
class for dealing with all svscale functions (including sve)
(svscale): updated FUNCTION macro call to make use of new class.
* config/aarch64/aarch64-sve-builtins-sve2.def: (svscale):
Added new DEF_SVE_FUNCTION_GS call to enable recognition of new variant.
* config/aarch64/aarch64-sve2.md (@aarch64_sve_fscale<mode>): Added
new define_insn. (@aarch64_sve_single_fscale<mode>): Likewise.
* config/aarch64/iterators.md: (SVE_Fx24_NOBF): Added new iterator,
similar to SVE_Fx24 but without brainfloat.
(SVE_Fx24): Updated to make use of SVE_Fx24_NOBF.
(SVSCALE_SINGLE_INTARG): Added new mode_attr.
(SVSCALE_INTARG): Likewise.
This patch adds the following intrinsics (all __arm_streaming only) along with
asm tests for them under the +sme2+fp8 flags:
- svfloat16x2_t svcvt1_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm)
- svfloat16x2_t svcvt2_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm)
- svfloat16x2_t svcvt1_bf16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm)
- svfloat16x2_t svcvt2_bf16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm)
- svfloat16x2_t svcvtl1_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm)
- svfloat16x2_t svcvtl2_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm)
- svfloat16x2_t svcvtl1_bf16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm)
- svfloat16x2_t svcvtl2_bf16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm)
gcc/
* config/aarch64/aarch64-sve-builtins-sve2.cc (svcvtl1, svcvtl2): Added
new FUNTIONs.
* config/aarch64/aarch64-sve-builtins-sve2.def
(svcvt1, svcvt2, svcvtl1, svcvtl2): Added new DEF_SVE_FUNCTION_GS_FPM.
* config/aarch64/aarch64-sve-builtins-sve2.h (svcvtl1, svcvtl2): Added
new function_base.
* config/aarch64/aarch64-sve-builtins.cc
(function_resolver::resolve_unary): use group_suffix_id when resolving
C overloads.
* config/aarch64/aarch64-sve2.md
(@aarch64_sve2_fp8_cvt_<fp8_cvt_uns_op><mode>): Added new define_insn.
* config/aarch64/aarch64.h (TARGET_SSME2_FP8): Added new define.
* config/aarch64/iterators.md
(UNSPEC_F1CVTL. UNSPEC_F2CVTL): Added new unspecs.
(FP8CVT_UNS): Extended int_iterator.
(fp8_cvt_uns_op): Likewise.
gcc/testsuite/
* g++.target/aarch64/sme2/aarch64-sme2-acle-asm.exp: Use tuning flag
to reduce churn in testsuites.
* gcc.target/aarch64/sme2/aarch64-sme2-acle-asm.exp: Likewise.
* gcc.target/aarch64/sme2/acle-asm/cvt_mf8_x2.c: Added test file.
* gcc.target/aarch64/sme2/acle-asm/cvtl_mf8_x2.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h (TEST_X2_WIDE): Added
fpm0 argument for intrinsics.
In a GCC configuration with both AMD and NVIDIA GPU code offloading supported,
and the selected AMD GPU code generation not supporting USM, but an USM-capable
NVIDIA GPU available, I see all test cases that require effective-target
'omp_usm' turn UNSUPPORTED, because:
Executing on host: gcc usm_available_2778376.c [...]
[...]
In function 'main._omp_fn.0':
lto1: warning: Unified Shared Memory is required, but XNACK is disabled
lto1: note: Try -foffload-options=-mxnack=any
gcn mkoffload: warning: conflicting settings; XNACK is forced off but Unified Shared Memory is required
UNSUPPORTED: [...]
That warning is, however, not relevant in the scenario described above: we're
not going to exercise AMD GPU code offloading at run time.
With the effective-target 'omp_usm' check robustified like this, the affected
test cases are then no longer UNSUPPORTED, but of course, there's then the
corollary issue that compilation of the test case itself now emits the very
same warning, which results in the "test for excess errors" FAILing, despite
the execution test PASSing, for example:
FAIL: libgomp.c++/target-std__valarray-concurrent-usm.C (test for excess errors)
PASS: libgomp.c++/target-std__valarray-concurrent-usm.C execution test
That's clearly not ideal either (but is representative of what real-world usage
would run into), but is certainly better than the whole test case turning
UNSUPPORTED. To be continued, I guess...
Andrew Pinski [Tue, 23 Dec 2025 21:30:00 +0000 (13:30 -0800)]
ifcvt: Move noce_try_cond_zero_arith last
I noticed that on x86_64 and aarch64, noce_try_cond_zero_arith
would produce worse code than noce_try_cmove_arith.
So we should do noce_try_cond_zero_arith last instead
of before noce_try_cmove_arith.
Pushed as obvious after bootstrap/test on x86_64-linux-gnu.
Also checked to make sure riscv testcases still work.
gcc/ChangeLog:
* ifcvt.cc (noce_process_if_block): Move noce_try_cond_zero_arith
last.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Andrew Pinski [Tue, 23 Dec 2025 21:04:28 +0000 (13:04 -0800)]
ifcvt: Only allow scalar integral modes for noce_try_cond_zero_arith [PR123276]
This is the simple fix for PR 123276 where this code can only handle scalar
integral modes. We could in theory handle scalar floating point modes here
too but it is not worth the trouble.
Pushed as obvious after bootstrap/test on x86_64-linux-gnu.
PR rtl-optimization/123276
gcc/ChangeLog:
* ifcvt.cc (noce_try_cond_zero_arith): Reject non-scalar integral modes.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Nathaniel Shead [Sun, 7 Dec 2025 12:17:15 +0000 (23:17 +1100)]
c++: Non-inline temploid friends should still be COMDAT [PR122819]
Modules allow temploid friends to no longer be implicitly inline, as
functions defined in a class body will not be implicitly inline if
attached to a named module.
This requires us to clean up linkage handling a little bit, mostly by
replacing usages of 'DECL_TEMPLATE_INSTANTIATION' with
'DECL_TEMPLOID_INSTANTIATION' when determining if an entity has vague
linkage.
This caused the friend88.C testcase to miscompile however, as 'foo' was
incorrectly having 'DECL_FRIEND_PSEUDO_TEMPLATE_INSTANTIATION' getting
set because it was keeping its tinfo.
This is because 'non_templated_friend_p' was returning 'false', since
the function didn't have a primary template. But that's expected I
think here, so fixed by also returning true for friend declarations
pushed into namespace scope, which still allows dependent nested friends
to be considered templated.
PR c++/122819
gcc/cp/ChangeLog:
* decl.cc (start_preparsed_function): Use
DECL_TEMPLOID_INSTANTIATION instead of
DECL_TEMPLATE_INSTANTIATION to check vague linkage.
* decl2.cc (vague_linkage_p): Likewise.
(c_parse_final_cleanups): Simplify condition.
* pt.cc (non_templated_friend_p): Namespace-scope friend
function declarations without a primary template are still
non-templated.
* semantics.cc (expand_or_defer_fn_1): Also check for temploid
friend functions.
gcc/testsuite/ChangeLog:
* g++.dg/modules/tpl-friend-22.C: New test.
* g++.dg/template/friend88.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
* a68.h (a68_file_size): Changed to use file descriptor.
(a68_file_read): Likewise.
* a68-parser-scanner.cc (a68_file_size): Likewise.
(a68_file_read): Likewise.
(read_source_file): Adapt `a68_file_{size,read}'.
(include_files): Likewise.
* a68-lang.cc (a68_handle_option): Likewise.
* a68-imports.cc (a68_find_export_data): Implement
reading from module's .m68 file if available.
gcc/testsuite/ChangeLog
* algol68/compile/modules/compile.exp (dg-data): New procedure
for writing binary test data to disk.
* algol68/compile/modules/program-m68-lp64.a68: New test which
embeds binary module data.
* algol68/compile/modules/program-m68-llp64.a68: Likewise.
* algol68/compile/modules/program-m68-ilp32.a68: Likewise.
* algol68/compile/modules/program-m68-lp64-be.a68: Likewise.
* algol68/compile/modules/program-m68-llp64-be.a68: Likewise.
Jeff Law [Tue, 23 Dec 2025 20:25:47 +0000 (13:25 -0700)]
[committed][RISC-V][PR target/123274] Add missing condition in usmul<mode>3 pattern
As Andrew P. noted in the BZ, the expander is missing elements in its condition
leading to generation of an insn that can't be matched.
This adds the necessary condition to the usmul<mode>3 expander which in turn
fixes the ICE. I just checked and that expander wansn't in gcc-15, so this is
just a gcc-16 issue.
Tested on riscv32-elf and riscv64-elf. I have a bootstrap in flight on the
Pioneer, but I'm not expecting any surprises. Much like the patch earlier
today, I'm going to push this now rather than wait for pre-commit CI.
Jeff Law [Tue, 23 Dec 2025 19:34:44 +0000 (12:34 -0700)]
[RISC-V][PR target/123278] Handle BF/HF modes in Andes 45 series pipeline description
So a standard run-of-the-mill case where we're testing modes to determine what
reservation to use in a pipeline model and modes were missing (BF/HF in this
case).
This adds the BF/HF cases to the fp_alu_s, fpu_mul_s and fpu_mac_s units for
the Andes 45 series. It may ultimately be the case that even lower latencies
are available for these ops, but that's something folks with a better
understanding of the Andes 45 series uarch would need to tackle.
Tested on riscv32-elf and riscv64-elf. Given the nature of the change and the
fact that I expect to be out of the office most of the next few days, I'm going
to go ahead and push without waiting for pre-commit CI. There's minimal risk.
Milan Tripkovic [Tue, 23 Dec 2025 16:39:41 +0000 (09:39 -0700)]
[RISC-V][PATCH] Adjust clmul latency in Spacemit X60 scheduler model
This patch adjusts the instruction scheduling and cost model for the Zbc
(CLMUL) extension on the Spacemit X60 core.
The tuning was evaluated using three configurations (CLMUL2, CLMUL3,
and the baseline CLMUL5) across a variety of hashing and encryption kernels.
Yuao Ma [Tue, 23 Dec 2025 14:54:34 +0000 (22:54 +0800)]
c++: clarify the comment regarding where the default dialect is set
Since r6-7026-g268be88cbeaba7, the default dialect has been set in
c_common_init_options rather than c_common_post_options. This patch updates the
corresponding comment to reflect that change.
Egas Ribeiro [Fri, 19 Dec 2025 21:34:55 +0000 (21:34 +0000)]
c++: Fix member-like friend detection for non-template classes [PR122550]
member_like_constrained_friend_p was incorrectly returning true for
constrained friend function templates declared in non-template classes,
causing them to be treated as distinct from their forward declarations.
This led to ambiguity errors at call sites.
Per [temp.friend]/9, a constrained friend is only "member-like" (and thus
declares a different function) in two cases:
1. Non-template friends with constraints (must be in a templated class)
2. Template friends whose constraints depend on outer template parameters
In both cases, the enclosing class scope must be templated. The fix adds
a check for CLASSTYPE_IMPLICIT_INSTANTIATION to ensure the friend's
context is actually a class template, not a plain class or explicit
specialization.
PR c++/122550
gcc/cp/ChangeLog:
* decl.cc (member_like_constrained_friend_p): Check that the
friend's enclosing class is an implicit instantiation.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-friend18.C: New test.
* g++.dg/cpp2a/concepts-friend18a.C: New test.
Signed-off-by: Egas Ribeiro <egas.g.ribeiro@gmail.com> Reviewed-by: Patrick Palka <ppalka@redhat.com>
Egas Ribeiro [Mon, 22 Dec 2025 22:30:12 +0000 (22:30 +0000)]
c++: Fix ICE on partial specialization redeclaration with mismatched parameters [PR122958]
When a partial specialization was redeclared with different template
parameters, maybe_new_partial_specialization was incorrectly treating it
as the same specialization by only comparing template argument lists
without comparing template-heads. This caused an ICE when the
redeclaration had different template parameters.
Per [temp.spec.partial.general]/2, two partial specializations declare
the same entity only if they have equivalent template-heads and
template argument lists.
Fix by comparing template parameter lists (template-heads) in addition
to template argument lists when checking for existing specializations,
and removing flag_concepts to provide diagnostics before c++20 for the
testcase.
PR c++/122958
gcc/cp/ChangeLog:
* pt.cc (maybe_new_partial_specialization): Compare template
parameter lists when checking for existing specializations and
remove flag_concepts check.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/partial-spec-redecl.C: New test.
Signed-off-by: Egas Ribeiro <egas.g.ribeiro@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Fix cfg_attr expansion and feature gate attribute handling
Fixes Rust-GCC#4245
gcc/rust/ChangeLog:
* checks/errors/feature/rust-feature-gate.cc (FeatureGate::visit): Added
handling for META_ITEM type attributes to properly process feature gates.
* expand/rust-cfg-strip.cc (expand_cfg_attrs): Fixed a bug where
newly inserted cfg_attr attributes wheren't being reprocessed,
and cleaned up the loop increment logic.
Lucas Ly Ba [Fri, 14 Nov 2025 21:07:00 +0000 (21:07 +0000)]
gccrs: refactor unused var lint
gcc/rust/ChangeLog:
* checks/lints/unused-var/rust-unused-var-checker.cc (UnusedVarChecker::visit):
Change unused name warning to unused variable warning.
* checks/lints/unused-var/rust-unused-var-collector.cc (UnusedVarCollector::visit):
Remove useless methods.
* checks/lints/unused-var/rust-unused-var-collector.h: Same here.
* checks/lints/unused-var/rust-unused-var-context.cc (UnusedVarContext::add_variable):
Add used variables to set.
(UnusedVarContext::mark_used): Remove method.
(UnusedVarContext::is_variable_used):
Check if the set contains the hir id linked to a variable.
(UnusedVarContext::as_string): Refactor method for new set.
* checks/lints/unused-var/rust-unused-var-context.h: Refactor methods.
* lang.opt: Change description for unused check flag.
Ryutaro Okada [Sun, 10 Aug 2025 02:24:56 +0000 (19:24 -0700)]
gccrs: implement unused variable checker on HIR.
This change moves the unused variable checker from the type resolver
to HIR. We can now use the HIR Default Visitor, and it will be much more
easier to implement other unused lints with this change.
Harishankar [Mon, 24 Nov 2025 20:41:33 +0000 (02:11 +0530)]
gccrs: Fix ICE with continue/break/return in while condition
Fixes Rust-GCC/gccrs#3977
The predicate expression must be evaluated before type checking
to ensure side effects occur even when the predicate has never type.
This prevents skipping function calls, panics, or other side effects
in diverging predicates.
* backend/rust-compile-expr.cc (CompileExpr::visit): Always
evaluate predicate expression before checking for never type
to preserve side effects in while loop conditions.
* typecheck/rust-hir-type-check-expr.cc: Update handling of break/continue.
Egas Ribeiro [Fri, 19 Dec 2025 16:58:58 +0000 (16:58 +0000)]
c++: Fix ICE with lambdas combining explicit and implicit template params [PR117518]
When a lambda with explicit template parameters like []<int> also has
implicit template parameters from auto, and is used as a default
template argument, processing_template_parmlist remained set
from the outer template context. This caused
function_being_declared_is_template_p to incorrectly return false,
leading synthesize_implicit_template_parm to create a new template
scope instead of extending the existing one, resulting in a binding
level mismatch and an ICE in poplevel_class.
Fix by clearing processing_template_parmlist in
cp_parser_lambda_expression alongside the other parser state
save/restore operations.
PR c++/117518
gcc/cp/ChangeLog:
* parser.cc (cp_parser_lambda_expression): Clear
processing_template_parmlist when parsing lambda body.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/lambda-targ19.C: New test.
Signed-off-by: Egas Ribeiro <egas.g.ribeiro@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Xi Ruoyao [Thu, 18 Dec 2025 03:39:38 +0000 (11:39 +0800)]
LoongArch: relax the check for --with-tune
Someone (via a WeChat group) reported that --with-arch=la464
--with-tune=la664 had stopped to work after commiting the LA32 support.
While this can be treated as a simple logic error (i.e. we may simply
change "loongarch64" in the case statement to an asterisk), IMO we
should just relax the check: at runtime the "unreasonable" combinations
like "-march=la64v1.0 -mtune=loongarch32" or "-march=la664 -mtune=la464"
is allowed (and the second case has been allowed for a long time), and a
combination of --with-arch=A --with-tune=T should be allowed if -march=A
-mtune=T is allowed at runtime.
Also if we consider the fact that --with-tune= and -mtune= only select a
set of heruistic parameters, such combinations may be not so
unreasonable.
gcc/
* config.gcc: Relax the check for LoongArch with_tune.
Andrew Pinski [Tue, 23 Dec 2025 01:58:35 +0000 (17:58 -0800)]
ifcvt: Fix noce_try_cond_zero_arith after get_base_reg change [PR123267]
A few fixes are needed after the change to get_base_reg of r16-6333-gac64ceb33bf05b. First we need to use the correct target mode
of the operand, this means if we are doing a subreg of QI mode, using
QImode for the conditional move.
Second we also need to use the original operands instead of the ones
removing the subreg still.
Pushed as obvious after a bootstrap/test on x86_64-linux-gnu.
PR rtl-optimization/123267
gcc/ChangeLog:
* ifcvt.cc (noce_try_cond_zero_arith): Pass the original operands
of a instead of the stripped off values. The mode of the operand
which is being used.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr123267-1.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
AutoFDO: Implement summary information in auto-profile
This patch aims to implement summary support in auto-profile, similar to
LLVM. The summary information stores various information about the
profile being read such as the number of functions, the maximum sample
count, the total number of samples and so on.
It also adds a section called the "detailed summary" which contains a
histogram-based calculation of the minimum execution count for a sample
needed to belong to a specific percentile of samples. This is used to
decide the hot count threshold (which can be controlled with a command
line parameter). The default is any sample belonging to the 99th percentile
being marked as hot.
This patch requires the changes from https://github.com/google/autofdo/pull/251
to work correctly.
* auto-profile.cc (string_table::~string_table): Update to free
original_names_map_.
(string_table::original_names_map_): New member.
(string_table::clashing_names_map_): Likewise.
(string_table::get_original_name): New function.
(string_table::read): Figure out clashes while reading.
(autofdo_source_profile::offline_external_functions): Call
get_original_name.
Nathaniel Shead [Thu, 4 Dec 2025 13:03:46 +0000 (00:03 +1100)]
c++/modules: Ignore exposures in lambdas in initializers [PR122994]
As the PR rightly points out, a lambda is not really a declaration in
and of itself by the standard, and so a lambda only used in a context
where exposures are ignored should not itself cause an error.
This patch implements this by way of a new flag set on deps that are
first found in an ignored context. This flag gets cleared if we ever
see the dep in a context where exposures are not ignored. Then, while
walking a declaration with this flag set, we re-establish an ignored
context. This is done for all decls (not just lambdas) to handle
block-scope classes as well.
Additionally, we prevent walking of attached declarations for a
DECL_MODULE_KEYED_DECLS_P entity during dependency gathering, so that we
don't think we've seen the decl at this point. This means we may not
have an appropriate entity to stream for this walk; to prevent any
potential issues with merging we stream a NULL_TREE 'hole' in the vector
and handle this carefully on import.
This requires a small amount of testsuite adjustment because we no
longer diagnose errors we used to. Because our ABI for inline variables
with dynamic initialization is to just do the initialization in the
module's initializer function (and importers only perform the static
initialization) we don't bother to walk the definition of inline
variables containing lambdas and so don't see the exposures, despite
us considering TU-local entities in static initializers of inline
variables being exposures (see PR c++/119551). This is legal by the
current wording of the standard, which does not consider the definition
of any variable to be an exposure (even an inline one).
PR c++/122994
gcc/cp/ChangeLog:
* module.cc (depset::disc_bits): New flag
DB_IGNORED_EXPOSURE_BIT.
(depset::is_ignored_exposure_context): New getter.
(depset::hash::ignore_tu_local): Rename to...
(depset::hash::ignore_exposure): ...this, and make private.
(depset::hash::hash): Rename ignore_tu_local.
(depset::hash::ignore_exposure_if): New function.
(trees_out::decl_value): Don't build deps for keyed entities.
(trees_in::decl_value): Handle missing keys.
(trees_out::write_function_def): Use ignore_exposure_if.
(trees_out::write_var_def): Likewise.
(trees_out::write_class_def): Likewise.
(depset::hash::make_dependency): Set DB_IGNORED_EXPOSURE_BIT if
appropriate, or clear it otherwise.
(depset::hash::add_dependency): Rename ignore_tu_local.
(depset::hash::find_dependencies): Set ignore_exposure if in
such a context.
gcc/testsuite/ChangeLog:
* g++.dg/modules/internal-17_b.C: Use functions and internal
types rather than lambdas.
* g++.dg/modules/internal-4_b.C: Correct expected result.
* g++.dg/modules/internal-20_a.C: New test.
* g++.dg/modules/internal-20_b.C: New test.
* g++.dg/modules/internal-20_c.C: New test.
* g++.dg/modules/internal-21_a.C: New test.
* g++.dg/modules/internal-21_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Steve Kargl [Mon, 22 Dec 2025 02:32:46 +0000 (18:32 -0800)]
fortran [PR122957] DTIO incompatibility with -fdefault-interger-8
The -fdefault-integer-8 option is optional to assist with legacy
fortran codes. It is not a Standard requirement and is not
compatible with the newer user defined derived type I/O.
PR fortran/122957
gcc/fortran/ChangeLog:
* interface.cc (gfc_match_generic_spec): Issue an error
so that users do not use -fdefault-integer-8 with DTIO.
Harald Anlauf [Mon, 22 Dec 2025 20:05:29 +0000 (21:05 +0100)]
Fortran: fix variable definition context checks for SELECT TYPE [PR123253]
Commit r16-6300 introduced a regression when checking the variable
definition context of SELECT TYPE variables where the selector was not a
dummy argument as the scan for the association target was too shallow.
Scan through association lists for the ultimate selector.
PR fortran/123253
gcc/fortran/ChangeLog:
* expr.cc (gfc_check_vardef_context): Replace simple check by a
scan through the association targets for a dummy argument.
gcc/testsuite/ChangeLog:
* gfortran.dg/associate_76.f90: Extended testcase.
* gfortran.dg/associate_77.f90: New test.
Tomasz Kamiński [Mon, 22 Dec 2025 10:53:45 +0000 (11:53 +0100)]
libstdc++/doc: Document generate_canonical and variant compat macros.
The _GLIBCXX_USE_OLD_GENERATE_CANONICAL was introduced by r16-6177-g866bc8a9214b1d that implemented P0952R2 [1] resolution
for LWG2524 as DR against C++20.
The _GLIBCXX_USE_VARIANT_CXX17_OLD_ABI was introduced by r16-6301-gb3c167b61fd75f that resovled PR112591.
Eric Botcazou [Mon, 22 Dec 2025 17:50:59 +0000 (18:50 +0100)]
Ada: Fix bogus component visibility error for class-wide type in generic
The problem is that Analyze_Overloaded_Selected_Component does:
-- If the prefix is a class-wide type, the visible components
-- are those of the base type.
if Is_Class_Wide_Type (T) then
T := Etype (T);
end if;
and Resolve_Selected_Component does:
-- The visible components of a class-wide type are those of
-- the root type.
if Is_Class_Wide_Type (T) then
T := Etype (T);
end if;
while Analyze_Selected_Component does:
-- For class-wide types, use the entity list of the root type
if Is_Class_Wide_Type (Prefix_Type) then
Type_To_Use := Root_Type (Prefix_Type);
end if;
when faced with a selected component. So the 3rd goes to the root type, the
1st to the base type, and the 2nd wants to do like the 3rd but ends up doing
like the 1st! This does not change anything for the class-wide type itself,
but does for its class-wide subtypes. The correct processing is the 3rd.
gcc/ada/
PR ada/123185
* sem_ch4.adb (Analyze_Overloaded_Selected_Component): Go to the
root when the prefix has a class-wide type.
* sem_res.adb (Resolve_Selected_Component): Likewise.
gcc/testsuite/
* gnat.dg/specs/class_wide1.ads: New test.
Jeff Law [Mon, 22 Dec 2025 17:54:05 +0000 (10:54 -0700)]
[RISC-V][V2] Improve spill code for RVV slightly to fix regressions after recent changes
Surya's recent patch for hard register propagation has caused regressions on
the RISC-V port for the various spill-* testcases. After reviewing the newer
generated code it was clear the new code was worse.
The core problem is we have a copy insn that is not frame related (and should
not be frame related) and a use of the destination of the copy in an insn that
is frame related. Prior to Surya's change we could propagate away the copy,
but not anymore.
Ideally we'd just avoid generating the copy entirely, but the structure of the
code to legitimize a poly_int isn't well suited for that. So instead we have
the code signal that it created a trivial copy and we try to optimize the code
after creation, but well before regcprop would have run. That fixes the code
quality aspect of the regression. In fact, it looks like the code can at times
be slightly better, but I didn't track down the precise reason why we were able
to re-use the read of VLEN so much better then before.
The optimization step is pretty simple. When it's been signaled that a copy was
generated, look back one insn and change it from writing the scratch register
to write the final destination instead.
That triggers the need to generalize the testcases so that they don't use
specific registers. We can also see the csr reads of the VLEN register getting
CSE'd more often in those testcases, so they're adjusted for that change as
well. There's some hope this will improve spill code more generally -- I
haven't really evaluated that, but I do know that when we spill vector
registers, the resulting code seems to have a lot of redundant VLEN reads.
Anyway, bootstrapped and regression tested on riscv (BPI and Pioneer). It's
also been through rv32 and rv64 regression testing. It doesn't fix all the
regressions for RISC-V on the trunk because (of course) something new got
introduced this week ;(
[ This is the spill-7 part of my last commit. After reviewing the logs from
the pre-commit system, it's good. ]
Vineet Gupta [Mon, 22 Dec 2025 16:54:10 +0000 (08:54 -0800)]
ifcvt: cond zero arith: handle subreg for shift count
Some backends, RISC-V included, wrap shift counts in subreg which
current cond zero arith wasn't handling.
This came up up when looking at the original submission of cond zero
arith which did handle subregs but then was omitted to for initial
simplicity and then got lost along the way.
Vineet Gupta [Mon, 22 Dec 2025 16:54:06 +0000 (08:54 -0800)]
ifcvt: cond zero arith: elide short forward branch for signed GE 0 comparison [PR122769]
Before After
---------------------+----------------------
bge a0,zero,.L2 | slti a0,a0,0
| czero.eqz a0,a0,a0
xor a1,a1,a3 | xor a0,a0,a0
.L2 |
mv a0,a1 |
ret | ret
This is what all the prev NFC patches have been preparing to get to.
Currently the cond arith code only handles EQ/NE zero conditions missing
ifcvt optimization for cases such as GE zero, as show in example above.
This is due to the limitation of noce_emit_czero () so switch to
noce_emit_cmove () which can handle conditions other than EQ/NE and
if needed generate additional supporting insns such as SLT.
This also allows us to remove the constraint at the entry to limit to EQ/NE
conditions, improving ifcvt outcomes in general.
PR target/122769
gcc/ChangeLog:
* ifcvt.cc (noce_try_cond_zero_arith): Use noce_emit_cmove.
Delete noce_emit_czero () no longer used.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr122769.c: New test.
Co-authored-by: Philipp Tomsich <philipp.tomsich@vrull.eu> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
Vineet Gupta [Mon, 22 Dec 2025 16:52:07 +0000 (08:52 -0800)]
ifcvt: cond zero arith: opencode helper noce_bbs_ok_for_cond_zero_arith [NFC]
This makes the code more readable by eliminating a bunch of pointer
intermediaries which obfuscate if_info items needed later in
noce_try_cond_zero_arith (). And while here add some top level comments
about what cond zero arith actually does.
gcc/ChangeLog:
* ifcvt.cc (noce_bbs_ok_for_cond_zero_arith): Move logic out.
(noce_try_cond_zero_arith): Into here.
Jeff Law [Mon, 22 Dec 2025 16:47:26 +0000 (09:47 -0700)]
[RISC-V][V2] Improve spill code for RVV slightly to fix regressions after recent changes
Surya's recent patch for hard register propagation has caused regressions on
the RISC-V port for the various spill-* testcases. After reviewing the newer
generated code it was clear the new code was worse.
The core problem is we have a copy insn that is not frame related (and should
not be frame related) and a use of the destination of the copy in an insn that
is frame related. Prior to Surya's change we could propagate away the copy,
but not anymore.
Ideally we'd just avoid generating the copy entirely, but the structure of the
code to legitimize a poly_int isn't well suited for that. So instead we have
the code signal that it created a trivial copy and we try to optimize the code
after creation, but well before regcprop would have run. That fixes the code
quality aspect of the regression. In fact, it looks like the code can at times
be slightly better, but I didn't track down the precise reason why we were able
to re-use the read of VLEN so much better then before.
The optimization step is pretty simple. When it's been signaled that a copy was
generated, look back one insn and change it from writing the scratch register
to write the final destination instead.
That triggers the need to generalize the testcases so that they don't use
specific registers. We can also see the csr reads of the VLEN register getting
CSE'd more often in those testcases, so they're adjusted for that change as
well. There's some hope this will improve spill code more generally -- I
haven't really evaluated that, but I do know that when we spill vector
registers, the resulting code seems to have a lot of redundant VLEN reads.
Anyway, bootstrapped and regression tested on riscv (BPI and Pioneer). It's
also been through rv32 and rv64 regression testing. It doesn't fix all the
regressions for RISC-V on the trunk because (of course) something new got
introduced this week ;(
I didn't include the spill-7.c change from either version of the patch. It
didn't fix the regression in pre-commit CI, so I'll chase that down
independently.
gcc/
* config/riscv/riscv.cc (riscv_expand_mult_with_const_int): Signal
when this creates a simple copy that may be optimized.
(riscv_legitimate_poly_move): Try to optimize away any copy created
by riscv_expand_mult_with_const_int.
* a68-parser-scanner.cc (a68_file_size): Fix comment to mention
it accepts `FILE *' and not file descriptor.
Fix invocation of `lseek' to correctly revert position of file
offset to previous one.
Harald Anlauf [Sun, 21 Dec 2025 22:03:28 +0000 (23:03 +0100)]
fortran: fix testsuite regression for gfortran.dg/value_9.f90 [PR123201]
Commit r16-3499 introduced a regression on targets where truncation of a
string argument passed to a CHARACTER(len=1),VALUE dummy argument missed
the special treatment needed for passing single characters.
PR fortran/123201
gcc/fortran/ChangeLog:
* trans-expr.cc (conv_dummy_value): Convert string of length 1 to a
single character for passing as actual argument.
Jerry DeLisle [Sun, 21 Dec 2025 21:33:15 +0000 (13:33 -0800)]
fortran: [PR121472] Fix ICE with constructor for finalized zero-size type.
When a derived type has a final subroutine and a constructor interface,
but is effectively zero-sized, the gimplifier fails on the finalization
code. The existing check for empty types (!derived->components) only
catches completely empty types, not types with empty components.
Replace with a tree-level TYPE_SIZE_UNIT check that catches all
zero-size cases.
PR fortran/121472
gcc/fortran/ChangeLog:
* trans.cc (gfc_finalize_tree_expr): Replace !derived->components
check with TYPE_SIZE_UNIT check for zero-size types.
Tamar Christina [Sun, 21 Dec 2025 08:27:13 +0000 (08:27 +0000)]
vect: use wider precision type for generating early break scalar IV [PR123089]
In the PR we see that the new scalar IV tricks other passes to think there's an
overflow to the use of a signed counter:
The loop is known to iterate 8191 times and we have a VF of 8 and it starts
at 2.
The codegen out of the vectorizer is the same as before, except we now have a
scalar variable counting the scalar iteration count vs a vector one.
i.e. we have
_45 = _39 + 8;
vs
_46 = _45 + { 16, 16, 16, 16, ... }
we pick a lower VF now since costing allows it to but that's not important.
When we get to cunroll since the value is now scalar, it sees that 8 * 8191
would overflow a signed short and so it changes the loop bounds to the largest
possible signed value and then uses this to elide the ivtmp_50 < 8191 as always
true and so you get an infinite loop:
Analyzing # of iterations of loop 1
exit condition [1, + , 1](no_overflow) < 8191
bounds on difference of bases: 8190 ... 8190
result:
# of iterations 8190, bounded by 8190
Statement (exit)if (ivtmp_50 < 8191)
is executed at most 8190 (bounded by 8190) + 1 times in loop 1.
Induction variable (signed short) 8 + 8 * iteration does not wrap in statement
_45 = _39 + 8;
in loop 1.
Statement _45 = _39 + 8;
is executed at most 4094 (bounded by 4094) + 1 times in loop 1.
The signed type was originally chosen because of the negative offset we use when
adjusting for peeling for alignments with masks. However this then introduces
issues as we see here with signed overflow. This patch instead determines the
smallest possible unsigned type for use by the scalar IV where the overflow
won't happen when we include the extra bit for the sign. i.e. if the scalar IV
is an unsigned 8 bit value we pick a signed 16-bit type. But if a signed 8-bit
value we pick a unsigned 8 bit type.
We use the initial niters value to determine the smallest size possible, to
prevent certain cases like when the IV in code is a 64-bit to need a TImode
counter. I also only require the additional bit when I know we'll be generating
the SMAX. I've now moved this to vectorizable_early_exit such that if we do
end up needing something like TImode that we don't vectorize if the target
doesn't support it.
I've also added some testcases for masking around the boundary values. I've
only added them for char to reduce the runtime of the tests.
gcc/ChangeLog:
PR tree-optimization/123089
* tree-vect-loop.cc (vect_update_ivs_after_vectorizer_for_early_breaks):
Add conversion if required, Note that if we did truncate the original
scalar loop had an overflow here anyway.
(vect_get_max_nscalars_per_iter): Expose.
* tree-vect-stmts.cc (vect_compute_type_for_early_break_scalar_iv): New.
(vectorizable_early_exit): Find smallest type where we won't have UB in
the signed IV and store it.
* tree-vectorizer.h (LOOP_VINFO_EARLY_BRK_IV_TYPE): New.
(class _loop_vec_info): Add early_break_iv_type.
(vect_min_prec_for_max_niters): New.
* tree-vect-loop-manip.cc (vect_do_peeling): Use it.
gcc/testsuite/ChangeLog:
PR tree-optimization/123089
* gcc.dg/vect/vect-early-break_141-pr123089.c: New test.
* gcc.target/aarch64/sve/peel_ind_14.c: New test.
* gcc.target/aarch64/sve/peel_ind_14_run.c: New test.
* gcc.target/aarch64/sve/peel_ind_15.c: New test.
* gcc.target/aarch64/sve/peel_ind_15_run.c: New test.
* gcc.target/aarch64/sve/peel_ind_16.c: New test.
* gcc.target/aarch64/sve/peel_ind_16_run.c: New test.
* gcc.target/aarch64/sve/peel_ind_17.c: New test.
* gcc.target/aarch64/sve/peel_ind_17_run.c: New test.
Andrew Pinski [Sat, 20 Dec 2025 20:00:36 +0000 (12:00 -0800)]
extension: Fix documentation for __builtin_*_overflow_p [PR123222]
This fixes the copy-and-pasto for these builtins.
Basically the documentation currently says "addition" as that was copied from
__builtin_add_overflow documentation but really it should say corresponding operation
instead.
Pushed as obvious.
PR middle-end/123222
gcc/ChangeLog:
* doc/extend.texi: Fix copy-and-pasto for __builtin_*_overflow_p.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Jose E. Marchesi [Sat, 20 Dec 2025 14:59:50 +0000 (15:59 +0100)]
a68: fix layout of incomplete types
Apparently there is some case where the c_union of an union may be
incomplete and the containing union complete. At this point I don't
fully understand how is that possible and the layering out of modes
should probably be rethinked, but for now fix this corner case.
Signed-off-by: Jose E. Marchesi <jemarch@gnu.org>
gcc/algol68/ChangeLog
* a68-low-moids.cc (a68_lower_moids): Fix for layout of
incomplete types.