git.feebdaed.xyz Git - 0xmirror/ovs.git/log

classifier: Remove leftover bits of classifier partitioning.

Partitioning was replaced with a more generic staged lookup, but the
comment and the hash map still remain in the code. Let's remove them.

Fixes: a14502a7c529 ("classifier: Retire partitions.")
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

ovsdb: raft: Discard pre-vote replies during the actual election.

I am not sure how we could receive a pre-vote reply for a term of the
actual vote, but if somehow it does happen, we must not accept it, as
the same server can vote differently in the pre-vote and the real vote
and so we may end up with more than one elected leader.

Ignore the pre-vote reply during the actual elections and warn the user
if this ever happens, so we could investigate further.

Found while investigating a report with a cluster with two elected
leaders. It may not be the cause of the issue and, as stated above,
I'm not even sure if receiving a pre-vote for the actual election term
is possible. But it's better to cover this case explicitly, as the
flag in the reply is not used today.

Fixes: 85634fd58004 ("ovsdb: raft: Support pre-vote mechanism to deal with disruptive server.")
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

netdev-afxdp: Require kernel v5.4+ and need-wakeup.

Kernels with AF_XDP support below v5.4 are not supported and are not
used in any major distributions.  Using kernels that old with AF_XDP
that was rapidly changing at the time is also not a good idea in
general.  Let's drop support for them.  This allows to have the
XDP_USE_NEED_WAKEUP flag to be always on, so we can simplify the code
and stop guessing how many packets kernel can send in one go.

The next milestone may be to require libxdp 1.2+ and kernel 5.18+, so
we can simplify acinclude.m4 and avoid checking for bpf_xdp_detach and
bpf_xdp_query_id API.  But it feels a little bit too early for that,
as it excludes Ubuntu 22.04, even if it's probably not a great idea
to run AF_XDP there either.  The gains from moving the pole to 5.18
are just a couple of checks in the code, so not doing that for now.

Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

ofproto-dpif-xlate: Log clarification for dp_hash method.

From Open vSwitch 2.10 onwards, dp_hash is the default selection method
for select groups. When this method is used, packet recirculation occurs
to compute the hash at the datapath level. Previously, this case was
logged as "no live bucket", which was misleading.

Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2023-December/052860.html
Fixes: 53cc166ae5fe ("xlate: Use dp_hash for select groups.")
Signed-off-by: Alexandra Rukomoinikova <arukomoinikova@k2.cloud>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

dpdk: Update to use v25.11.

This commit adds support for DPDK v25.11.

It updates the CI script and documentation.

The following commit was previously added and is required for
compatibility with DPDK 25.11.
cd2ff530d740 ("dpdk: Convert dpdk-lcore-mask to DPDK lcore args.")

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>

NEWS: Deprecate running user-space with OOT Kernel module.

It is some time now since the OOT kernel module was removed. It is yet
longer since it was deprecated. And it is no longer present in any
supported versions.

However, running user-space with the OOT kernel module, from unsupported
releases, is still supported.

This notice deprecates that support as of the v3.7 release.
With the intention to remove it in the v3.8 release.

As discussed at OVS+OVN Conference 2025.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Simon Horman <horms@ovn.org>

dpif-netlink-rtnl: Tighten probe for OOT Kernel tunnels.

It is common to run make check as an unprivileged user.
And in that case calling dpif_netlink_rtnl_create() to add a tunnel
fails due to insufficient privileges. Which is assumed
to mean that OOT Kernel tunnels are used.

Reverse this assumption and assume they are not being used.
An assumption in making this change is that the distinction
is unimportant in terms of run-time tunnel usage as as
make check doesn't use kernel tunnels.

However, a follow-up patch will add a deprecation notice, which prints
in the case that OOT Kernel tunnels are used. It does not seem useful
to print in the case of make check. And doing so breaks existing tests
run by that target.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Simon Horman <horms@ovn.org>

NEWS: Deprecate Windows support.

Deprecate Windows support which has become increasingly difficult to
maintain.

As discussed at OVS+OVN Conference 2025.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Simon Horman <horms@ovn.org>

ovsdb-idl: Fix returning non-existent rows from uuid lookup.

IDL may contain deleted or orphan rows that should never be visible
to the user.  However, ovsdb_idl_get_row_for_uuid() function simply
looks up the row by UUID and returns it without checking if the row
actually exists in the IDL.  This is causing a crash on assertion
failure in ovn-controller when it looks up and finds port binding
records that were already deleted:

  5 vlog_abort                        at lib/vlog.c:1325
  6 ovs_assert_failure                at lib/util.c:90
  7 ovsdb_idl_txn_write__.constprop.0 at lib/ovsdb-idl.c:3650
  8 ovsdb_idl_txn_write               at lib/ovsdb-idl.c:3742
  9 sbrec_port_binding_set_up         at lib/ovn-sb-idl.c:39665
10 port_binding_set_down             at controller/binding.c:3700
11 if_status_mgr_update              at controller/if-status.c:645
12 main                              at controller/ovn-controller.c:7544

  7 ovsdb_idl_txn_write__.constprop.0 at lib/ovsdb-idl.c:3650
    3650        ovs_assert(row->new_datum != NULL);

Can be easily reproduced with ovs-vsctl:

$ ovs-vsctl add-br br-int
$ ovs-vsctl del-br br-int \
     -- set bridge $(ovs-vsctl get bridge br-int _uuid) other-config:a=b
   ovs-vsctl: lib/ovsdb-idl.c:2673:
     assertion row->new_datum != NULL failed in ovsdb_idl_read()
   Aborted (core dumped)

Fix that by adding an extra check for row existence like in IDL
iterators, so deleted or orphan rows can no longer be found.

Fixes: 979821c0a6b0 ("ovsdb-idl: Allow clients to modify records without using structs.")
Reported-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

ovsdb: transaction: Fix logging order of duplicate index rows.

The message about conflicting rows is trying to order the two rows
in a consistent manner, so the log messages do not change in tests.
But it fails to do so, because the order of columns in the column
set depends on the order of columns inside the hash map, which
depends on the hash function and the internal implementation details
of the hash map. This results in random test failures, when two
rows end up in the opposite order.

Uncovered while testing a different hash map implementation, but the
failure is technically possible even without any changes in the code,
e.g., by running on a different CPU architecture or with different
compiler flags.

Fix that by introducing a new function that constructs the column
set with columns in a predictable order and without UUID columns that
have random values in most cases and so not actually comparable.

Fixes: 6910a6e6f25f ("ovsdb: Implement table uniqueness constraints ("indexes").")
Reported-by: Rosemarie O'Riorden <rosemarie@redhat.com>
Acked-by: Eelco Chaudron echaudro@redhat.com
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

dns-resolve: Do not treat never accessed responses as expired.

If a daemon has multiple remotes to connect to and it already reached
pretty high backoff interval on those connections, it is possible that
it will never be able to connect, if the DNS TTL value is too low.

For example, if ovn-controller has 3 remotes in the ovn-remote
configuration, where each is presented as a host name, the following
is happening when it reaches default max backoff of 8 seconds:

1. Tries to connect to the first remote - sends async DNS request.
2. Since jsonrpc/reconnect modules are not really aware of this, they
   just treat connection as temporarily failed - 8 second backoff.
3. After the backoff - switching to a new remote - sending DNS request.
4. Temporarily failing - 8 second backoff.
5. Switching to the third remote - sending DNS request.
6. Temporarily failing - 8 second backoff.
7. Finally trying the first remote again - checking DNS.
8. There is a processed response, but it is already 24 seconds old.
9. If DNS TTL is lower than 24 seconds - consider expired - send
   a new DNS request.
10. Go to step 2.

With that, if DNS TTL is lower than 3x of the backoff interval, the
process will never be able to connect without some external help to
break the loop.

A proper solution for this should include:

1. Making jsonrpc and reconnect and all the users of these modules
   aware of the DNS request being made.  This means introduction of
   a new RECONNECT state for DNS request and not switching to a new
   target while we're still in this state.

2. Making the poll loop state machine properly react to DNS responses
   by waiting on the file descriptor provided by the unbound library.

However, such solution will be very invasive to the code structure
and all the involved libraries, so it may not be something that we
would want to backport as a bug fix to stable branches.

Instead, making a much simpler change to allow use of never previously
accessed DNS replies for a short period of time, so the loop can be
broken.  It's not caching if we just requested the value, but didn't
use it yet, it's a "transaction in progress" situation in which we can
use the response even if TTL is zero.  Without a proper solution though
we can't be sure that the process will ever look at the result of
asynchronous request, so we need to have an upper limit for such
"transactions in progress".  Limiting them to a fairly arbitrary, but
big enough, value of 5 minutes.  In the worst case where the address
actually goes stale in between our request and the first access, we'll
try to use the stale value once and then re-request right away on
failure to connect.

This solution seems reasonable and simple enough to backport to stable
branches while working on the proper solution on main.

Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2025-June/053738.html
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

AUTHORS: Add MJ Ponsonby.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

dpif: Maintain upcall_pid, mtu and hash.

The commit in the fixes tag extended struct dpif_execute with an
upcall_pid member.

This member was left uninitialized in dpif_execute_helper_cb(),
leading to Open vSwitch providing the kernel with a PID for which
it does not have a listener, we fix this by extending the
execute_helper_aux struct to include the previous dpif_execute struct
which we use to carry over various values such as the mtu, hash and
upcall_pid.

This condition is caught by one of the existing system-traffic
tests which is extended to explicitly check for presence of lost
upcalls in the datapath, however only with linux kernel >= 6.17.0-5.5
and gcc < 15.

Fixes: 0d9dc8e9ca4a ("dpif-netlink: Provide original upcall pid in 'execute' commands.")
Fixes: 27130224cd5c ("dpif-netlink: Allow MRU packet attribute.")
Fixes: 0442bfb11d6c ("ofproto-dpif-upcall: Echo HASH attribute back to datapath.")
Fixes: f9d30390392c ("dpif: Fix use of uninitialized execute hash.")
Signed-off-by: MJ Ponsonby <mj.ponsonby@canonical.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

dpif-netlink: Fix probing for broken meters on Linux v5.10+.

If someone creates a lot of meters (>1017) in the kernel datapath
and then re-starts ovs-vswitchd, OVS fails to probe meter support
and refuses to install any meters afterwards:

  dpif_netlink|INFO|The kernel module has a broken meter implementation.

The reason is that probing for broken meters relies on creating
two meters with high meter IDs.  Inside the kernel, however, the
meter table is not a real hash table since v5.10 and commit:

  c7c4c44c9a95 ("net: openvswitch: expand the meters supported number")

Instead, it's just an array with meter IDs mapped to indexes with a
simple modulo operation.  This array can expand, but only when it is
full.  There is no handling of collisions, so if the meter at
ID % size is not the right one, then the lookup just fails.  This is
fine as long as userspace creates meters with densely packed IDs,
which is the case for ovs-vswitchd...  except for probing.

While probing, we attempt to create meters 54545401 and 54545402.
Without expanding the table size, these map onto 1017 and 1018.
So, creation of these meters fails if there are already meters with
IDs 1017 or 1018.  At this point OVS declares meter implementation
broken.

Ideally, we should make the "hash" table in the kernel handle
collisions and otherwise be a more or less proper hash table.  But
we can also improve probing in userspace and avoid this issue by
choosing lower numbered meter IDs and trying to get them from the
kernel first before trying to create.  Choosing high values at the
top of the 0-1023 range, so they are guaranteed to fit into the
minimal size table in the kernel (1024).  If one of these already
exists and has a proper ID, then meters are likely working fine
and we don't need to install new ones for probing.  If these meters
are not in the kernel, then we can try to create and check.

This logic should work fine for older or future kernels with a
proper hash table as well.

There is no Fixes tag here as the check was correct at the moment
it was introduced.  It's the kernel change that broke it.

Reported-at: https://github.com/openvswitch/ovs-issues/issues/337
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

dpif-netlink: Fix potential leak of a malformed meters netlink reply.

If netlink transaction returns malformed or otherwise not parsable
reply, it should be freed, as callers will not do that on failure.

Found while reading the code.

Fixes: 80738e5f93a7 ("dpif-netlink: Add meter support.")
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

windows: Fix absolute-header.m4 for newer msys2 appearing as cygwin.

MSYS2 is trying to move closer to cygwin and as part of that effort
they changed reporting of the system triplets from msys to cygwin:
https://www.msys2.org/news/#2025-06-20-replacing-x86_64-pc-msys-with-x86_64-pc-cygwin

This broke our builds in AppVeyor, since detection of absolute paths
for system headers is now using the wrong path separator list:

checking absolute name of <stdio.h>... ""
checking absolute name of <string.h>... ""

Fix by adding cygwin to the list for Windows-style paths. With that,
we can properly detect the headers again:

checking absolute name of <stdio.h>...
"C:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.22000.0\\ucrt\\stdio.h"
checking absolute name of <string.h>...
"C:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.22000.0\\ucrt\\string.h"

We used the 'Previous' variant of the image for AppVeyor before to
avoid this issue, but now even the previous image has the new MSYS2
and CI is fully broken without the fix.

Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

netdev: Remove legacy duplex API.

This API was announcing duplex capability based on a fixed list of known
link speeds.
It is now unneeded with the new duplex API.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

netdev: Fix full duplex capability for 25G ports.

As OVS does not know of the 25G speed, matching on a list of known speed
for deducing full duplex capability is broken.

Add a new netdev op to retrieve the duplex status for the existing netdev
implementations.

Reported-at: https://issues.redhat.com/browse/FDP-883
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

netdev-bsd: Cache features and speed.

Cache netdev features and speed in a similar manner than netdev-linux.c.

Protect concurrent access when calling netdev_get_features()
or netdev_get_speed() for BSD netdevs.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

netdev-linux: Fix compatibility with older kernels.

SPEED_UNKNOWN was introduced in Linux kernel v3.2.

While at it, add some defines for 40G and 100G that were added in more
recent kernels.

Fixes: 6240c0b4c80e ("netdev: Add netdev_get_speed() to netdev API.")
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

netdev: Fix reported rate for Linux and BSD ports.

As the link speeds have evolved over the years, the list of supported
rates is obsolete in OVS.
Yet a catch all value (NETDEV_F_OTHER) is available, so use it.

Fixes: 6240c0b4c80e ("netdev: Add netdev_get_speed() to netdev API.")
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

ipf: Work around thread safety warnings with clang 21.

In a few places the list sizes are checked without holding the lock.
While this probably speeds up the processing a bit, it's technically
incorrect as the access is not atomic.  Caught by thread-safety
analyzer of clang 21, as it now properly supports GUARDED annotations
on structure fields:

  lib/ipf.c:1117:33:
    error: reading the value pointed to by 'frag_exp_list' requires
    holding any mutex [-Werror,-Wthread-safety-analysis]
   1117 |     if (ovs_list_is_empty(&ipf->frag_exp_list)) {
        |                                 ^
  lib/ipf.c:1154:33:
    error: reading the value pointed to by 'reassembled_pkt_list'
    requires holding any mutex [-Werror,-Wthread-safety-analysis]
   1154 |     if (ovs_list_is_empty(&ipf->reassembled_pkt_list)) {
        |                                 ^

Not trying to fix the actual non-atomic access to the list as getting
the wrong result in these particular calls will not meaningfully
impact the correctness of the code, but taking the lock before checking
will likely reduce performance.

For now, just working around the problem by turning off thread safety
analysis for these particular calls.  The new _unsafe() access function
is also a bit more obvious in a way that it's more clear that these
calls should not be relied upon for logic that requires precision.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

ipf: Fix potential deadlock with the clean thread on exit.

There is a potential deadlock between the thread destroying the ipf
and the clean thread as they are using the latch and the ipf lock in
the opposite order. The latch itself is thread-safe, so we should
not be holding the lock while setting it and waiting for the clean
thread that may be waiting for the lock.

Fixes: 4ea96698f667 ("Userspace datapath: Add fragmentation handling.")
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

dpif-netlink: Fix memory leak when re-add vport channel.

When vport_add_channel() is called duplicate, the resources for
previously specified sock was not freed. This issue only occurs
when per-cpu dispatch is disabled. This patch fixes this issue.

Reported by Address Sanitizer.

Direct leak of 60 byte(s) in 3 object(s) allocated from:
    0 0xffffb3658080 in malloc (/usr/lib64/libasan.so.6+0xa9080)
    1 0x922630 in xmalloc__ lib/util.c:141
    2 0x922718 in xmalloc lib/util.c:176
    3 0x9c67e4 in nl_sock_create lib/netlink-socket.c:147
    4 0x94cb6c in create_nl_sock lib/dpif-netlink.c:283
    5 0x950bec in dpif_netlink_port_add__ lib/dpif-netlink.c:978
    6 0x951a20 in dpif_netlink_port_add_compat lib/dpif-netlink.c:1101
    7 0x951cd0 in dpif_netlink_port_add lib/dpif-netlink.c:1147
    8 0x616354 in dpif_port_add lib/dpif.c:602
    9 0x49f424 in port_add ofproto/ofproto-dpif.c:4144
    10 0x44d51c in ofproto_port_add ofproto/ofproto.c:2204
    11 0x416914 in iface_do_create vswitchd/bridge.c:2203
    12 0x416dbc in iface_create vswitchd/bridge.c:2246
    13 0x40e1d0 in bridge_add_ports__ vswitchd/bridge.c:1225
    14 0x40e290 in bridge_add_ports vswitchd/bridge.c:1241
    15 0x40cc6c in bridge_reconfigure vswitchd/bridge.c:952
    16 0x420884 in bridge_run vswitchd/bridge.c:3440
    17 0x42f3d0 in main vswitchd/ovs-vswitchd.c:137

Reproduce steps:
    ovs-vsctl add-br br-ovs
    ovs-vsctl add-port br-ovs test -- set interface test type=internal
    ip netns add ns_test
    ip link set test netns ns_test
    ip netns del ns_test
    ifconfig br-ovs up
    sleep 1
    ifconfig br-ovs down
    sleep 1
    ovs-vsctl del-br br-ovs

Fixes: 1579cf677fcb ("dpif-linux: Implement the API functions to allow multiple handler threads read upcall.")
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

configure: Disable clang 21 warning for uninitialized const fields.

'default-const-init-field-unsafe' warning complains about allocating
structures on stack if they have any const fields.  We use this for
classifier rules that are embedded into many other structures:

  ofproto/ofproto.c:4980:26:
    error: default initialization of an object of type
    'struct rule_criteria' with const member leaves the
    object uninitialized [-Werror,-Wdefault-const-init-field-unsafe]
   4980 |     struct rule_criteria criteria;
        |                          ^
  ./lib/classifier.h:356:15:
    note: member 'priority' declared 'const' here
    356 |     const int priority;
        |               ^

Priority is marked as const as it is not supposed to be changed after
the rule creation, as that would break the classifier logic.  Any code
that changes the value is clearly seen, as it requires the explicit
CONST_CAST.  Initialization functions are the only places where this is
happening.  And these functions are always called for all the instances
allocated on stack, but clang fails to recognize this.

I believe, described code pattern is useful, so it's better to turn
off the warning instead of making the field non-const.  Alternative
would be to use an explicit initializer for every stack allocation
to silence the warning, but it feels like a bit too much.

Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

thread: Convert init/destroy lock functions to use non-const arguments.

'-Wuninitialized-const-pointer' complains about passing uninitialized
data into a function via a const pointer.  That's a valid concern,
since normally the function accepting a const argument wouldn't write
to it, but will likely use the data.

However, we do have some cases of such a coding pattern in OVS today
and they generate a ton of warnings with clang 21.

The main offenders of this are all the types of locks in OVS: mutex,
rwlock, spinlock.  All the locking functions accept const pointers
and then do a CONST_CAST inside.  This is useful for the cases where
we have a lock protecting a structure and want to have some read-only
function that accepts this structure, e.g.:

  struct my_type {
      struct ovs_mutex mutex;
      int value;
  };

  int get_value(const struct my_type *p)
  {
      int res;

      ovs_mutex_lock(&p->mutex);
      res = p->value;
      ovs_mutex_unlock(&p->mutex);

      return res;
  }

If the ovs_mutex_lock() required non-const argument, then we'd have
to use CONST_CAST for every lock/unlock, or we would have to make the
original pointer 'p' non-const, cascading the change through all the
callers of this "read-only" functions.

However, it's totally reasonable for the mutex init and destroy to
not have a constant argument, since we're explicitly trying to change
it and the initialization and destruction likely not happening for
the constant parent structure.

Let's convert both init and destroy API for all the locks to accept
non-const arguments.  This makes clang happy as we're no longer
passing in the uninitialized locks as const.

This technically changes the API, but we're relaxing the restriction,
so it shouldn't be a concern.  All the code that used these functions
before should still work fine.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

treewide: Fix clang 21 thread-safety warnings for init/create/destroy.

Clang 21 now properly implements GUARDED annotations for structure
fields, and so it complains about those fields being accessed in
init/create/destroy type of functions that normally do not take any
locks, as the data is just allocated and not available to other
threads.  Add some locks where it is simple to do, so we don't have
to turn off the analysis entirely, in other places just turn off the
analysis.

While at it, adding a couple of missed annotations that clang started
to complain about due to actual support for the GUARDED annotation on
structure fields.  Added some unused arguments to a couple of functions
to be able to specify which lock is required.  It seemed better than
turning off the analysis.

Acked-by: Eelco Chaudron echaudro@redhat.com
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

treewide: Remove OVS_GUARDED from RCU-based structure fields.

This annotation never worked on structure fields, but it does now with
clang 21.  The problem is that structures like cmap or rculist are
designed for lockless access for readers and so we're not holding any
locks while reading.  That makes new clang generate thread-safety
warnings:

  lib/dpif-netdev.c:3632:40:
    warning: reading the value pointed to by 'flow_table' requires
    holding any mutex [-Wthread-safety-analysis]
   3632 |                                  &pmd->flow_table) {
        |                                        ^

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

ssl: Support for SNI extension in clients for ssl/tls.

This commit introduces Server Name Indication (SNI) support for all OVS
SSL/TLS clients, enabling clients to specify which hostname they are
attempting to reach during the SSL handshake. This is essential for
connecting through proxies, load balancers, and service meshes where
the connection endpoint differs from the intended server name.

An example use case is trying to connect to a ovsdb-server through
a istio proxy gateway in kubernetes environments.

Signed-off-by: Gurucharan Shetty <guru@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

stream-ssl: Fix missing OPT_SSL_CIPHERSUITES in STREAM_SSL_CASES.

The OPT_SSL_CIPHERSUITES option was defined in SSL_OPTION_ENUMS and handled
in STREAM_SSL_OPTION_HANDLERS, but was missing from STREAM_SSL_CASES.
This could cause tools using STREAM_SSL_CASES to not recognize
--ssl-ciphersuites as a valid SSL option.

Fixes: 4d09d6b48ef5 ("stream-ssl: Add explicit support for configuring TLSv1.3.")
Signed-off-by: Gurucharan Shetty <guru@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

odp-execute: Fix null pointer dereference on null nlattr.

This patch fixes a potential null pointer dereference reported
by Coverity if an null actions list is passed to nl_attr_get()
in odp_execute_sample().

Fixes: 26c6b6cd2b2e ("dpif-netdev: Implement OVS_ACTION_ATTR_SAMPLE action.")
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>

netdev-vport: Free error string from str_to_u8() in tunnel config.

Free the error string returned by str_to_u8() in set_tunnel_config().
While at it, also add a proper error message for this error case.

Fixes: ebe0e518b048 ("tunnel: Bareudp Tunnel Support.")
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>

github: Add a job to build on ubuntu-14.04.

While adding new code we frequently miss that certain compiler features
may be relatively new, or more often that some system headers are not
available or do not have certain definitions in them.  This results in
builds failing on older systems.

Adding a new CI job that runs inside Ubuntu 14.04 container, which is
the oldest Ubuntu that is in the "legacy support" mode:
  https://ubuntu.com/about/release-cycle

This image has GCC 4.8 that is missing a lot of modern features and
it's based on Linux v3.13 kernel that also has a lot of definitions
missing in uAPI headers, compared to modern ones.  This makes this
image a good candidate for a baseline "old distribution" testing.

This job can't cover everything and there will be different
configurations and distributions that may still fail, especially if
they have custom backports or some packages much newer than others.
But it should cover the vast majority of potential issues.

Since we're running inside a very old container, we can't use any of
the pre-defined GitHub workflows like 'checkout' or 'cache', as they
are based on Node.js that is built for much newer version of Ubuntu
and so requires much newer glibc to run.  Hence doing everything
manually.

Need to disable SSL, as we require OpenSSL 1.1.1+, which can probably
be built, but it seems like a bit of a waste of time to re-build so
many large things from sources.  Need to build a newer python though,
as python >= 3.7 is required in order to build OVS.

Building python 3.12 because it's the same as in other tests.  We could
also find and choose the latest 3.12.z release automatically, but it's
much less code to just manually stick to the current latest 3.12.11.
There should be no reason to update it frequently.

Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

test-psample: Fix missing field initializer warnings on older GCC.

GCC 4.8 complains for some reason:

  tests/test-psample.c: In function 'run':
  tests/test-psample.c:217:12:
    error: missing initializer for field 'packet' of 'struct sample'
    [-Werror=missing-field-initializers]
     struct sample sample = {};
            ^

While it is a little strange to complain, the {} initializer is a C++
thing and a GNU extension, so we should not be using it.  It's only
available in C23 standard.

Also, the initialization is not even necessary here, all the fields
will be initialized later with sample_clear(), as long as it covers
all the fields (rate was missing).

Fixes: 742de01a4a2e ("tests: Add test-psample testing utility.")
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

test-netlink-policy: Fix missing field initializer warnings on older GCC.

For some reason GCC 4.8 doesn't like this style of initialization,
complaining that nla_type is missing in the initializer:

  tests/test-netlink-policy.c: In function 'test_nl_policy_parse_ll_addr':
  tests/test-netlink-policy.c:60:9:
    error: missing initializer for field 'nla_type' of 'struct nlattr'
    [-Werror=missing-field-initializers]
         .nlattr.nla_type = TEST_POLICY_ATTR,
         ^
Let's use a normal designated initializer for the nested structure
to avoid this issue.

Fixes: 2f2ae5b6bdef ("tests: Fix endianness in netlink policy test fixtures.")
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

test-lib-route-table: Add missing RTPROT_BGP definition.

RTPROT_BGP was added in Linux v4.18. We should define it if it's
not available, otherwise build fails on older kernels.

Fixes: 91fc51106cfe ("route-table: Support parsing multipath routes.")
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

netlink: Fix build on kernels < 4.1 due to missing RTA_VIA.

Current code assumes that if the rtnetlink.h exists, then it must
have definitions for RTA_VIA and 'struct rtvia'.  This is causing
build failures on older systems:

  lib/netlink.c: In function 'min_attr_len':
  lib/netlink.c:832:38:
    error: invalid application of 'sizeof' to incomplete type 'struct rtvia'
     case NL_A_RTA_VIA: return sizeof(struct rtvia) + sizeof(struct in_addr);
                                      ^

The code structure overall is not great, we should not include Linux
specific headers in the common netlink.c.  The only reason for the
inclusion though is the structure size.  We can just put a number
instead, as it is already done for all other types.  It should never
change, as long as kernel uAPI is stable.

Structure and the RTA_VIA should be defined in the Linux-specific
route-table.c, in case rtnetlink.h doesn't have a definition for them.

Fixes: 9d9a99d157f2 ("route-table: Support parsing RTA_VIA attribute.")
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2025-September/053800.html
Reported-by: Brendan Doyle <brendan.doyle@oracle.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

m4: Don't add -mno-avx512f if compiler doesn't support it.

When building using an older compiler that doesn't support avx512,
we are adding '-mno-avx512f' into the command line, which results
with a build failure:

gcc: error: unrecognized command line option '-mno-avx512f'

This is a case, for example, while trying to build OVS with GCC 4.8.

Fix that by avoiding binutils check when compiler doesn't understand
-mavx512f.

Later in the call chain there is also an explicit check for -mavx512f
support, but it will just use the cached result, so it's not a problem.

Fixes: 930f135f5ddc ("configure: explicitly disable avx512 if bintuils check fails")
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

ovsdb-idl: Add a way to assert IDL txn as read only.

Add a way to assert IDL txn as read only, that is useful in cases when
we want to ensure that the program doesn't write anything into the
txn. It is done via assert because this is considered a bug which
should be fixed in the IDL caller.

Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ales Musil <amusil@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

offload-dpdk: Don't use 24bit value as 32bit.

Currently while handling the 24 bit VNI field in VXLAN code,
netdev-offload-dpdk will actually read and write 32 bits. The byte that
is overwritten is reserved and supposed to be set to zero anyways, so
this is mostly harmless.

However, Openscanhub correctly identified this as a buffer overrun. Now
use ovs_16aligned_be32 to access the field as a 32bit integer.

Reported-at: https://issues.redhat.com/browse/FDP-1122
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

compiler: Avoid 'returns_nonnull' attribute warnings on older GCC.

The attribute was added in GCC 4.9.  Older versions complain:

  lib/util.h:164:1:
    warning: 'returns_nonnull' attribute directive ignored [-Wattributes]
     OVS_RETURNS_NONNULL void *xmalloc(size_t) MALLOC_LIKE;
     ^

Fixes: ed738eca39ef ("util: Annotate function that will never return NULL.")
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2025-September/053800.html
Reported-by: Brendan Doyle <brendan.doyle@oracle.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

treewide: Fix unused result warnings in NDEBUG builds.

With NDEBUG, all assertions are just ignoring the result of the
condition and this is causing build warnings when the condition
is a result of a function marked with OVS_WARN_UNUSED_RESULT:

  lib/dp-packet.c: In function 'dp_packet_ol_send_prepare':
  lib/dp-packet.c:580:28:
    error: ignoring return value of 'dp_packet_l4_proto_sctp' declared
    with attribute 'warn_unused_result' [-Werror=unused-result]
    580 |                 ovs_assert(dp_packet_l4_proto_sctp(p));
  ./include/openvswitch/util.h:58:40:
    note: in definition of macro 'ovs_assert'
     58 | #define ovs_assert(CONDITION) ((void) (CONDITION))
        |                                        ^~~~~~~~~
  lib/dp-packet.c:621:24:
    error: ignoring return value of 'dp_packet_inner_l4_proto_sctp'
    declared with attribute 'warn_unused_result' [-Werror=unused-result]
    621 |             ovs_assert(dp_packet_inner_l4_proto_sctp(p));
  ./include/openvswitch/util.h:58:40:
    note: in definition of macro 'ovs_assert'
     58 | #define ovs_assert(CONDITION) ((void) (CONDITION))
        |                                        ^~~~~~~~~
  lib/dp-packet.c:634:20:
    error: ignoring return value of 'dp_packet_l4_proto_udp' declared
    with attribute 'warn_unused_result' [-Werror=unused-result]
    634 |         ovs_assert(dp_packet_l4_proto_udp(p));
  ./include/openvswitch/util.h:58:40:
    note: in definition of macro 'ovs_assert'
     58 | #define ovs_assert(CONDITION) ((void) (CONDITION))
        |                                        ^~~~~~~~~
  cc1: all warnings being treated as errors

Let's export the ignore() function as ovs_ignore() and use it for
the ovs_assert in case of NDEBUG build.  This will silence the
warnings.

Most files already import "util.h", so there is no need to add
the public "openvswitch/util.h" as well.

Fixes: 3daf04a4c56d ("dp-packet: Rework IP checksum offloads.")
Fixes: 67abd51540d0 ("dp-packet: Rework tunnel offloads.")
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

ofproto-dpif.at: Wait for logs in select group tests.

These tests are executing group updates and then check the logs, but
it is possible that logs are not yet flushed to the disk.  That is
causing sporadic failures like this one:

  +awk: stddev.awk:10: fatal: division by zero attempted

This happens when we run the awk script to check the bucket load
before the logs actually appear in the file.

Fix that by always waiting for the logs.

FreeBSD also has a long standing issue with async IO where some logs
can be re-ordered and jammed together.  This mostly happens in the
'equal weights' test due to high amount of logging.  Changing the
pattern from 'ofproto_dpif|DBG|.*Bucket' to a shorter '|DBG|.*Bucket'
should solve most of the issues of this kind in this particular test.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

dpif-netdev: Optimize rwlock in port query API.

The query port only needs to use a rdlock, not a rwlock.
This patch can optimize it.

Fixes: 7daa50346830 ("dpif-netdev: Replace port mutex by rwlock.")
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

AUTHORS: Add Ivan Burnin.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

checkpatch: Set explicit encoding in do_authors_exist.

The UTF-8 encoding system is used by default on most operating systems, but
not all of them. This can cause the ’24:checkpatch - AUTHORS.rst existence’
test to fail with a UnicodeDecodeError. All Python versions < 3.15 are
affected, resolved in PEP 686 (https://peps.python.org/pep-0686/)

Fixes: a6ccd111552d ("checkpatch: Add new check-authors-file option to checkpatch.py.")
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ivan Burnin <iburnin@k2.cloud>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

AUTHORS: Add Ihar Hrachyshka.

Ihar contributed many patches before, but somehow wasn't added
to the AUTHORS file. Fixing that mistake.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

tests: Remove unnecessary revalidator/wait's.

These tests don't send any packets through a bridge, so it's not needed
to wait for revalidator.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

tests: Revalidate on flow change in tunnel-push-pop.

The `tunnel_push_pop - action` test case may fail due to a race
condition between the revalidator modifying flows and dummy `receive`
action. Other test cases are also potentially affected.

To fix the race for this test case and other test cases in the same
file, this patch will call revalidator/wait every time a flow is
modified before sending a packet through a bridge using `receive`.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

tests: Ignore no tunnel port error on vswitch stop.

The test failure occurs quite consistently (~50% of the runs) when using
a musl static build. It always happens if I add `sync` or `sleep` before
vswitchd shutdown. The warning message can be seen in the
retained vswitchd log (-d) even when the test case passes.

I believe a race condition between vswitchd flushing the log to disc and
the test code reading from the log file explains why the test case
doesn't fail consistently.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

tests: Don't fail when test path contains 'p2' string.

The `grep` pattern was so broad that it allowed to match any string that
contained `p2`. This is problematic because the path to service log and
sockets is part of log message. It made it so that the test case failed
whenever executed from a directory that has `p2` in any of its absolute
path chain.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

tests: Gracefully handle EADDRINUSE string from musl.

Different libc implementations may have different string representations
for the same errno. Service shutdown routines verify that no
*unexpected* warnings are seen in service logs. This makes the test case
fail because a slightly different warning is observed.

This patch makes the test case gracefully handle the musl libc
representation.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

tests: Handle duration= with no fractional part.

When nsec == 0, ofp_print_duration will not add a fractional part. This
may happen either when you are very lucky, or when you use a libc
without nanosecond precision.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

dpdk: Use DPDK 24.11.3 release.

Update the CI and docs to use DPDK 24.11.3.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>

AUTHORS: Add Aleksandr Smirnov.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>

id_pool: Prevent requested ID range overflows uint32.

The upper range of IDs (base + n_ids) should not exceed max uint32.
In that case ovs_assert is called.

Signed-off-by: Aleksandr Smirnov <alekssmirnov@k2.cloud>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>

Documentation: Add section about AI assistants.

Add a section to the submitting-patches documentation describing the
use of an "Assisted-by" tag for patches created with AI assistance.

Co-authored-by: Mark Michelson <mmichels@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Mark Michelson <mmichels@redhat.com>

appveyor: More robust OpenSSL installation wrt path changes.

AppVeyor had an issue where C:\OpenSSL-Win64 folder didn't exist for
some reason and the installation of OpenSSL was placing it into
C:\Program Files\OpenSSL-Win64 instead as well.  After that we couldn't
find the libraries and the build failed:
  https://help.appveyor.com/discussions/problems/38517-cannot-find-path-copenssl-win64-because-it-does-not-exist

It's better if we check the paths and install into specific location to
be more resilient to this kind of environment issues, i.e. make less
assumptions.

While at it, fixing the slash type for the path.  Remove-Item somehow
accepts the "wrong" one, but a backslash is more native in paths on
Windows.  We use forward slash while in msys2 shell, but should not use
it in PowerShell environment.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

appveyor: Temporarily move to previous image to unblock CI.

The current image has OpenSSL issues along with some other PATH-related
issues that break our builds.

Move to the "Previous" variant for now while the AppVeyor team is
figuring this out.

Link: https://help.appveyor.com/discussions/problems/38517-cannot-find-path-copenssl-win64-because-it-does-not-exist
Acked-by: Alin Serdean <aserdean@ovn.org>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

dpdk: Convert dpdk-lcore-mask to DPDK lcore args.

OVS currently uses other_config:dpdk-lcore-mask <coremask> directly
in DPDK rte_eal_init() with '-c <coremask>' argument.

'-c' argument is now deprecated from DPDK and will be removed in
DPDK 25.11, so OVS will no longer be able to use the '-c <coremask>'
argument.

Convert dpdk-lcore-mask core mask to a core list that can be used with
'--lcores' and add some tests.

The core list is validated to prevent invalid cores being passed to
DPDK rte_eal_init().

Using the '--lcores' argument also adds compatibility for using a core
in the core mask that is greater than the max lcore, similar to commit
fe53b478f86e ("dpdk: Fix main lcore on systems with many cores.")

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>

dpdk: Clean up dpdk option map.

Default values are not used here, so remove the option for them.

Suggested-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>

vswitch.xml: Update docs for dpdk-lcore-mask.

Add some more details about dpdk-lcore-mask and remove
the recommendation to use it.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>

netdev-dpdk: Remove workarounds for fixed checksum bugs.

net/tap fixes were merged as part of
v21.11.6/v22.11.4/v23.11 DPDK LTS versions.

A net/txgbe fix was merged as part of
v21.11.9/v22.11.7/v23.11.3/v24.11 DPDK LTS versions.

We can remove the capability filtering workarounds now that
OVS main branch requires at least v24.11.

Link: https://git.dpdk.org/dpdk/commit/drivers/net/tap?id=0bbafe48ae0a
Link: https://git.dpdk.org/dpdk/commit/drivers/net/txgbe?id=25fe1c780d39
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

odp-execute: Fix packet length check for TSO packets.

TSO packets were incorrectly treated as too big by the check_pkt_len
action with the userspace datapath.
Adjust the check by looking at the requested segment size.

Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Reported-at: https://issues.redhat.com/browse/FDP-1631
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

netdev-dpdk: Fix IP checksum with net/virtio.

OVS checks a netdev checksum offload capabilities before calling its
transmit helper.
Yet the OVS transmit preparation helper for DPDK drivers requests IP
checksum (setting RTE_MBUF_F_TX_IP_CSUM) regardless of the DPDK driver
support if some L4 checksum is requested.

This is problematic since net/virtio does support L4 checksum offload,
but does not support nor announce IP checksum offload and this DPDK
driver tx prepare helper (calling helper shared with other DPDK drivers)
clears the ip checksum in packet data if RTE_MBUF_F_TX_IP_CSUM is set.

The DPDK mbuf API mandates IP csum when TCP segmentation is requested.
However, IP csum is not required when TCP or other L4 checksum is
requested.

While most DPDK drivers implement both IP and L4 checksums (and this is
likely why the bug was never caught), let's avoid requesting an
unsupported offload.

Reported-at: https://issues.redhat.com/browse/FDP-1666
Link: https://git.dpdk.org/dpdk-stable/tree/drivers/net/virtio/virtio_rxtx.c?h=v24.11.2#n1742
Link: https://git.dpdk.org/dpdk-stable/tree/lib/net/rte_net.h?h=v24.11.2#n179
Fixes: 8b5fe2dc6080 ("userspace: Add Generic Segmentation Offloading.")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

netdev-native-tnl: Fix DF bit not being extracted into tunnel flags.

OVS adds matches on the DF/CSUM/etc bits of the tunnel info flags, but
the DF bit is never actually extracted from the outer IP header during
the tunnel decapsulation.  This is not a huge problem, as we'll only
match on what was parsed out, since matching on DF flag is not exposed
through OpenFlow to the users.  And since the bit is never extracted,
we'll just always match on -df, which sort of "works", because the bit
is never extracted and so it is never set.  However, this causes
misleading -df matches in the datapath flow dumps even if the packets
actually have the DF bit set, which it is by default.

Fix that by actually extracting the bit from the outer header while
decapsulating tunneled traffic.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

MAINTAINERS: Move Russell to emeritus status.

Moving Russell to emeritus status at his request.

Thanks, Russell, for all your contributions and support!

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

bond: Do not flag a revalidation when adjusting time.

When adjusting bond parameters, any adjustment is considered sufficient
for triggering a rebalance. This is a very simplistic config update
scheme that triggers a complete rebalance even if the time adjustment
would move the next expiration out beyond the last calculated expiration.

For the interval parameter only, we can simply recalculate the expiry
deadline and let the next bond_run() event do the rebalance if needed.
Even if the recalculation would cause the deadline to have occurred in
the past, it should execute on the next bond_run() anyway. This is
still okay, as the rebalance interval timeout may not result in a
full rebalance anyway.

Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2025-April/053588.html
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Aaron Conole <aconole@redhat.com>

cirrus: Only update the main FreeBSD repository.

We're not installing anything from FreeBSD-kmods or any other non-main
repository, but it seems to have mismatching packages with the latest
Google Cloud image that appears to not be updated yet:

  # pkg update -f
  Updating FreeBSD repository catalogue...
  Fetching meta.conf: . done
  Fetching data.pkg: .......... done
  Processing entries: .......... done
  FreeBSD repository update completed. 36502 packages processed.
  Updating FreeBSD-kmods repository catalogue...
  Fetching meta.conf: . done
  Fetching data.pkg: .. done
  Processing entries:
  Newer FreeBSD version for package parallels-tools:
  To ignore this error set IGNORE_OSVERSION=yes
  - package: 1403506
  - running userland: 1403505
  Ignore the mismatch and continue? [y/N]:
    pkg: repository FreeBSD-kmods contains packages for wrong OS version:
    FreeBSD:14:amd64
  Processing entries...
  Unable to update repository FreeBSD-kmods
  Error updating repositories!

Not updating FreeBSD-kmods should avoid this issue.  In the worst case
the next 'pkg install' will fail, but it shouldn't really ever happen
as we do not install any kernel modules or packages that depend on
any kernel modules.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

ovsdb: transaction: Silence dereference after null check warning.

Coverity reported a possible null dereference of datum->values
when handling value-type weak references in find_and_add_weak_refs().

In practice, datum->values must be non-NULL for value-type weak references.
Adding ovs_assert(datum->values) documents this invariant and ensures that
we catch any unexpected misuse.

No change to runtime behavior in production builds; this only improves
safety and satisfies static analysis.

Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>

ovsdb: raft: Silence dereference after null check warning.

Coverity flagged a potential null dereference of txnid
when calling raft_next_entry() in ovsdb_storage_read().

To prevent this, raft_next_entry() now checks whether txnid is non-NULL
before attempting to write to it. This preserves existing behavior
when txnid is valid, while safely handling the case where it is NULL.

No change to normal operation; this only prevents potential crashes
if a NULL txnid is passed.

Acked-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>

ovsdb: relay: Silence dereference before null check warning.

Coverity reported a possible null dereference of ctx->db
inside the event processing loop in ovsdb_relay_run().

In practice, ctx->db is guaranteed to be non-NULL for valid relay_ctx
structures. Adding ovs_assert(ctx->db) documents this invariant and
ensures debug builds catch any misuse.

No change in runtime behavior; this only improves code safety and
satisfies static analysis.

Acked-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>

ovsdb: trigger: Silence dereference after null check warning.

Coverity reported a possible null dereference of t->reply
when handling errors in the "committing" state for "transact" requests.

In practice, t->reply is always non-NULL at this point, but static analysis
cannot infer this guarantee. Adding ovs_assert(t->reply) documents the
invariant for developers and ensures debug builds will catch any misuse.

No change in runtime behavior; this only improves code safety and
satisfies static analysis.

Acked-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>

ovsdb: monitor: Silence dereference after null check warning.

This patch resolves a Coverity-reported issue where a dereference
could occur after a null check in ovsdb_monitor_row_update_type().
An additional condition has been added to ensure a safe early exit
in ovsdb_monitor_compose_row_update().

Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>

ofproto: Fix missing lock when reading oftable->vacancy_up/down.

Coverity reported an issue where the oftable->vacancy_up/down
members were read without holding the ofproto_mutex.

This patch ensures the mutex is acquired before accessing these
members and adds OVS_REQUIRES(ofproto_mutex) to the affected
function to enforce proper locking.

Acked-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Fixes: bab86012066c ("Implement Vacancy Events for OFPMP_TABLE_DESC.")

netdev-offload-dpdk: Fix memory leak for matching fragments.

The error paths should release any allocated resources that are not
added to the patterns.

Found by Coverity.

Fixes: c1a5d0e2b573 ("netdev-offload-dpdk: Support IPv4 fragmentation types.")
Fixes: 1318929f904c ("netdev-offload-dpdk: Support IPv6 fragmentation types.")
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>

AUTHORS: Add Linda Wang.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>

netdev-offload-dpdk: Fix vport hw-offload stat.

For vport devices hardware offload counters were being incremented
on the physical device but decremented on the vport, causing
incorrect statistics that didn't return to zero when flows are
removed.

Fix by using physdev instead of netdev for counter decrements to match
the increment behavior.

Fixes: 0e6366c2399d ("netdev-offload-dpdk: Implement hw-offload statistics read.")
Signed-off-by: Linda Wang <linda.wang@jaguarmicro.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>

tests: system-traffic: Fix flaky floating IP test.

The last ACK from the 3-WAY handshake in this test goes to userspace
(miss upcall) after going through conntrack because +est traffic is
handled differently from +new in this set of OpenFlow rules.

The sender though proceeds with sending the data that may end up
re-ordered with the aforementioned ACK.  Since connection is very
short, it is possible that this one ACK will be delivered even after
the connection is already closed, i.e., after all the data is sent
and the connection termination sequence (FIN-ACK-FIN-ACK) is done.
Delivery in this case triggers an RST reply.  And RST transitions
TIME_WAIT into CLOSE (CLOSING in OVS terms), causing the test failure.

It's not really possible to fully avoid the packet re-ordering as it
is part of the upcall mechanism.  But there is a couple ways to avoid
it in this particular case, e.g., if we predict how the +est packet
will look like and install the datapath flow for +est while processing
the original +new, or by storing the upcall PID in the kernel for the
whole time of action execution and not only for one packet we're
executing actions for (TCP handshake replies are happening in the
context of the initiator, which is OVS handler thread in our case).
But these are large changes that will not help with test failures
on older branches / older kernels.  For now, fixing the test instead.

The intention in the test was to check that the state is terminal at
the end of the connection, i.e., that our actions do not leave
established or incomplete conntrack entries.  So it should be fine to
check for both TIME_WAIT and CLOSING, as both of these are terminal
states.

Fixes: 8d48d5f39436 ("system-traffic: Add conntrack floating IP test")
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

conntrack: Add support for IPv4 EPRT and EPSV.

Recent versions of Curl combine the EPSV and EPRT commands with IPv4
connections instead of the PASV and PORT commands. EPSV and EPRT were
added to FTP for IPv6 support but these commands also support IPv4. Most
software still uses the old PASV and PORT commands in IPv4 connections
but recent versions of curl will default to EPSV and EPRT for all
connections. This patch adds support for these extended commands, and
added tests for both with and without them.

Reported-at: https://issues.redhat.com/browse/FDP-907
Reported-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>

AUTHORS: Add Eli Oliver.

Signed-off-by: Aaron Conole <aconole@redhat.com>

checkpatch: Add simple python comment spell checking.

Checkpatch is used to do spell checking for C files, currently. However,
python code is also a routine part of the Open vSwitch suite, and it should
also receive spell checking where possible.

This patch adds an initial implementation of spell checking to checkpatch
that will catch all of the comments starting with '#', and some of the
doc-string lines. Future work would implement a more robust python
parser that will handle doc-string entries.

Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Eli Oliver <eoliver@redhat.com>
Co-Authored-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>

Set release date for 3.6.0.

While at it, adding a few missed NEWS entries.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

ipf: Complete l4 checksum before frag out.

Previously during NAT actions we could calculate a checksum on a packet,
if this checksum was valid we would mark it as such for offloading. Then
when modifying the packet we would check if we set the checksum as
partial. This worked fine for most packets, but fragmented packets can
not complete their checksum because we process these packets one at a
time instead of all at once. This breaks NAT of fragmented packets in
userspace conntrack.

This patch resolves any outstanding checksums on the reassembled
fragment before adding the fragments to a batch.

Reported-at: https://issues.redhat.com/browse/FDP-1571
Reported-by: Hekai Wang <hewang@redhat.com>
Fixes: e36793e11fe8 ("dp-packet: Resolve unknown checksums.")
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

Documentation: Use @development-tools on dnf5.

dnf5 requires the group ID (e.g., development-tools) rather than the group
name (e.g., Development Tools) used in previous versions of dnf and in yum.

Updated documentation to reflect this change.

Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>

ofproto: Move group-modify to mod_start instead of mod_finish.

Upon modifying a group, the following steps occur:
1. ofproto_group_mod_start()->modify_group_start():
Find an old group object, create a new one.
2. ofproto_bump_tables_version()
3. ofproto_group_mod_finish():
Modify the new group object with buckets etc.

At step #3, the new group object is already in use by revalidators,
that may read incorrect data while being modified.

Instead, move the group modification of the new object to step #1.

Fixes: 0a8f6beb54ab ("ofproto-dpif: Fix dp_hash mapping after select group modification.")
Acked-by: Gaetan Rivet <gaetanr@nvidia.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

Prepare for post-3.6.0 (3.6.90).

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

Prepare for 3.6.0.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

checkpatch: Separate out the built-in spelling words.

Checkpatch contains a large list of spelling words, currently 326.  The
in-built list is quite cumbersome, and includes duplicate entries
('netdev' and 'revalidation').  Managing this list will continue to grow
difficult over time, and really shouldn't be so difficult.  To make the
maintenance of OVS specific spelling words easier, create a new file
that will contain the spelling words and move the list into that file
(sorted).  This patch adds 'FreeBSD' and 'gcloud' to the list as it
was tested against
80d723736b64 ("cirrus: Update to FreeBSD 14.3 and 13.5.").  It also
adds "apis", "enablement", "ifdef", and "tnl" after the patch proposed
by Eelco at the referenced link.

Link: https://patchwork.ozlabs.org/project/openvswitch/patch/d6fecfc9491893b852ff9de3ef8b7ef068535d6d.1750328980.git.echaudro@redhat.com/
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Aaron Conole <aconole@redhat.com>

AUTHORS: Add Alexandra Rukomoinikova.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

db-ctl-base: Added filter option in show command.

The --filter option allows filtering output in two ways:
1. Basic filtering (comma-separated list, e.g., 'filter1,filter2').
2. Recursive filtering by table (e.g., 'interface(geneve)').

Signed-off-by: Alexandra Rukomoinikova <arukomoinikova@k2.cloud>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

ovs-vsctl: Exit with error if postdb checks report errors.

Today the exit code refers to the execution of the change
in the database. However, when not using parameter --no-wait
(default), the ovs-vsctl also checks if OVSDB transactions
are successfully recorded and reload by ovs-vswitchd. In this
case, an error message is printed if there is a problem during
the reload, like for example the one below:

# ovs-vsctl add-port br0 gre0 -- \
    set interface gre0 type=ip6gre options:key=100 \
    options:remote_ip=2001::2
ovs-vsctl: Error detected while setting up 'gre0': could not \
add network device gre0 to ofproto (Address family not supported\
by protocol).  See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch".
# echo $?
0

This patch changes to exit with specific error code 160
(ERROR_BAD_ARGUMENTS) on Windows and 65 (EX_DATAERR) on
Linux or BSD if errors were reported during the reload.

This change may break existing scripts because ovs-vsctl will
start to fail when before it was succeeding. However, if an
error is printed, then it is likely that the change was not
functional anyway.

Reported-at: https://issues.redhat.com/browse/FDP-793
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

ovsdb: Use table indexes if available for ovsdb_query().

Currently all OVSDB database queries except for UUID lookups all result
in linear lookups over the entire table, even if an index is present.

This patch modifies ovsdb_query() to attempt an index lookup first, if
possible. If no matching indexes are present then a linear index is
still conducted.

To test this, I set up an ovsdb database with a variable number of rows
and timed the average of how long ovsdb-client took to query a single
row. The first two tests involved a linear scan that didn't match any
rows, so there was no overhead associated with sending or encoding
output. The post-patch linear scan was a worst case scenario where the
table did have an appropriate index but the conditions made its usage
impossible. The indexed lookup test was for a matching row, which did
also include overhead associated with a match. The results are included
in the table below.

Rows                   | 100k | 200k | 300k | 400k | 500k
-----------------------+------+------+------+------+-----
Pre-patch linear scan  |  9ms | 24ms | 37ms | 49ms | 61ms
Post-patch linear scan |  9ms | 24ms | 38ms | 49ms | 61ms
Indexed lookup         |  3ms |  3ms |  3ms |  3ms |  3ms

I also tested the performance of ovsdb_query() by wrapping it in a loop
and measuring the time it took to perform 1000 linear scans on 1, 10,
100k, and 200k rows. This test showed that the new index checking code
did not slow down worst case lookups to a statistically detectable
degree.

Reported-at: https://issues.redhat.com/browse/FDP-590
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

dpif: Fix infinite netlink loop in dpif_execute_helper_cb.

When a meter action is encountered and stored in the auxiliary
structure, and subsequently, a non-meter action is processed
within a nested list during callback execution, an infinite
loop is triggered.

This patch maintains the current behavior but stores all
required meter actions in an ofpbuf for deferred execution.

Reported-at: https://patchwork.ozlabs.org/project/openvswitch/patch/20250506022337.3242-1-danieldin186@gmail.com/
Fixes: 076caa2fb077 ("ofproto: Meter translation.")
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>

utilities:gdb: Add GDB function to dump Netlink attributes.

This commit adds the ovs_dump_nla GDB command, which allows
dumping Netlink attributes. Here are some examples:

(gdb) ovs_dump_nla 0x7f10e35d8858 172 ovs_check_pkt_len_attr
(struct nlattr *) 0x7f10e35d8858:[OVS_CHECK_PKT_LEN_ATTR_PKT_...
(struct nlattr *) 0x7f10e35d8860:[OVS_CHECK_PKT_LEN_ATTR_...
(struct nlattr *) 0x7f10e35d88b0:[OVS_CHECK_PKT_LEN_ATTR_...

(gdb) ovs_dump_nla 0x7f10e35d8858 172
(struct nlattr *) 0x7f10e35d8858: {nla_len = 6, nla_type ...
(struct nlattr *) 0x7f10e35d8860: {nla_len = 80, nla_type ...
...len = 80, nla_type = 3}, nl_attr_get() = 0x7f10e35d88b4

(gdb) ovs_dump_nla 0x7f10e35d88b4 80 ovs_action_attr dump
... nla_type = 19}, nl_attr_get() = 0x7f10e35d88b8: 3f 01 00 00
...

Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>

docs: Specify retis dependency on USDT probes.

Retis' "--ovs-track" option relies on USDT probes being available.

Fixes: 22732c0e6770 ("tests: Add support for running system tests under retis.")
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

dpif-netlink: Provide original upcall pid in 'execute' commands.

When a packet enters kernel datapath and there is no flow to handle it,
packet goes to userspace through a MISS upcall.  With per-CPU upcall
dispatch mechanism, we're using the current CPU id to select the
Netlink PID on which to send this packet.  This allows us to send
packets from the same traffic flow through the same handler.

The handler will process the packet, install required flow into the
kernel and re-inject the original packet via OVS_PACKET_CMD_EXECUTE.

While handling OVS_PACKET_CMD_EXECUTE, however, we may hit a
recirculation action that will pass the (likely modified) packet
through the flow lookup again.  And if the flow is not found, the
packet will be sent to userspace again through another MISS upcall.

However, the handler thread in userspace is likely running on a
different CPU core, and the OVS_PACKET_CMD_EXECUTE request is handled
in the syscall context of that thread.  So, when the time comes to
send the packet through another upcall, the per-CPU dispatch will
choose a different Netlink PID, and this packet will end up processed
by a different handler thread on a different CPU.

The process continues as long as there are new recirculations, each
time the packet goes to a different handler thread before it is sent
out of the OVS datapath to the destination port.  In real setups the
number of recirculations can go up to 4 or 5, sometimes more.

There is always a chance to re-order packets while processing upcalls,
because userspace will first install the flow and then re-inject the
original packet.  So, there is a race window when the flow is already
installed and the second packet can match it inside the kernel and be
forwarded to the destination before the first packet is re-injected.
But the fact that packets are going through multiple upcalls handled
by different userspace threads makes the reordering noticeably more
likely, because we not only have a race between the kernel and a
userspace handler (which is hard to avoid), but also between multiple
userspace handlers.

For example, let's assume that 10 packets got enqueued through a MISS
upcall for handler-1, it will start processing them, will install the
flow into the kernel and start re-injecting packets back, from where
they will go through another MISS to handler-2.  Handler-2 will install
the flow into the kernel and start re-injecting the packets, while
handler-1 continues to re-inject the last of the 10 packets, they will
hit the flow installed by handler-2 and be forwarded without going to
the handler-2, while handler-2 still re-injects the first of these 10
packets.  Given multiple recirculations and misses, these 10 packets
may end up completely mixed up on the output from the datapath.

Let's provide the original upcall PID via the new netlink attribute
OVS_PACKET_ATTR_UPCALL_PID.  This way the upcall triggered during the
execution will go to the same handler.  Packets will be enqueued to
the same socket and re-injected in the same order.  This doesn't
eliminate re-ordering as stated above, since we still have a race
between the kernel and the handler thread, but it allows to eliminate
races between multiple handlers.

The openvswitch kernel module ignores unknown attributes for the
OVS_PACKET_CMD_EXECUTE, so it's safe to provide it even on older
kernels.

Reported-at: https://issues.redhat.com/browse/FDP-1479
Link: https://lore.kernel.org/netdev/20250702155043.2331772-1-i.maximets@ovn.org/
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

mcast-snooping: Properly check MLD packet length.

If an MLD packet is not large enough to contain the
message-specific data, it may lead to a NULL pointer access.
This patch fixes the issue by adding appropriate length checks.

Fixes: 06994f879c9d ("mcast-snooping: Add Multicast Listener Discovery support")
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>

tests: Add support for running system tests under retis.

Retis is very useful for debugging our system tests or debugging
kernel issues through our system tests.  This change adds a convenient
way to run any kernel system test with the retis capture on the
background.  E.g.:

  make check-kernel OVS_TEST_WITH_RETIS=yes TESTSUITEFLAGS='167 -d'

Retis 1.5 is required, since we're using ifdump profile, and it also
will mount debugfs for us in case of running in a different namespace.
It should be available in $PATH.

In addition to just capturing the retis.data, we're also running the
capture with --print to print all the events as they appear, and
producing the sorted output in the end.  This makes it easier to work
across systems with different versions of retis and saves time for
running the sort manually.  The raw data is still available for
advanced processing, if needed.

Not specifying any particular collector, capturing everything that's
enabled by default.  OVS tracking is turned on by default.

Since OVS tracking is used, it's required to start retis after the
kernel datapath is created, otherwise it will fail to obtain the map
of upcall PIDs.  That's why we need to start it after the bridge is
created.

Only adding support for kernel-related test suites for now.  For
userspace test suites it may also be useful at some point, but
currently that requires running without --ovs-track and isn't too
important.

Startup of the retis capture adds significant amount of time to each
test, so not running it by default.

Link: https://github.com/retis-org/retis
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

seq: Fix deadlock with the time_init.

There is an ABBA deadlock between time_init() and seq_wait():

    Thread 1:
        poll_block()
        time_poll()
        time_init()
        pthread_once() <-- lock A
        do_time_init()
        seq_create()
        pthread_mutex_lock(seq_mutex) <-- lock B

    Thread 2:
        seq_wait(different seqno)
        pthread_mutex_lock(seq_mutex) <-- lock B
        poll_immediate_wake()
        poll_timer_wait()
        time_msec()
        time_init()
        pthread_once() <-- lock A

This is likely the same deadlock Intel CI saw last year before the lab
was shut down.

The issue should not happen with normal applications as those would
normally have the time module initialized early in the process before
waiting on any sequence numbers, but it happens in the test-barrier
application from time to time causing the test suite to hang.

Fix that by making sure we're not calling poll_immediate_wake() under
the seq_mutex.  The time and seq modules are independent and it's hard
to ensure the dependency without exporting some of their internals.
Instead re-defining the prototype of the poll_immediate_wake_at(),
adding the thread safety annotation, so we have some basic protection
from this deadlock if the code ever changes.  Compiler will warn on
the prototype mismatch as well if it ever happens, so it's not a big
problem.  Having this prototype also gives us a spot in the code where
we can place a comment explaining the locking order.

Reportde-at: https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/415436.html
Reported-at: https://issues.redhat.com/browse/FDP-1493
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>